summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2008-11-11fix for account_group_exec_runtime(), make sure ->signal can't be freed ↵Oleg Nesterov
under rq->lock Impact: fix hang/crash on ia64 under high load This is ugly, but the simplest patch by far. Unlike other similar routines, account_group_exec_runtime() could be called "implicitly" from within scheduler after exit_notify(). This means we can race with the parent doing release_task(), we can't just check ->signal != NULL. Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock) before __cleanup_signal() to make sure ->signal can't be freed under task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care about the case when tsk changes cpu/rq under us, this should be OK. Thanks to Ingo who nacked my previous buggy patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Doug Chapman <doug.chapman@hp.com>
2008-11-10ring-buffer: prevent infinite looping on time stampingSteven Rostedt
Impact: removal of unnecessary looping The lockless part of the ring buffer allows for reentry into the code from interrupts. A timestamp is taken, a test is preformed and if it detects that an interrupt occurred that did tracing, it tries again. The problem arises if the timestamp code itself causes a trace. The detection will detect this and loop again. The difference between this and an interrupt doing tracing, is that this will fail every time, and cause an infinite loop. Currently, we test if the loop happens 1000 times, and if so, it will produce a warning and disable the ring buffer. The problem with this approach is that it makes it difficult to perform some types of tracing (tracing the timestamp code itself). Each trace entry has a delta timestamp from the previous entry. If a trace entry is reserved but and interrupt occurs and traces before the previous entry is commited, the delta timestamp for that entry will be zero. This actually makes sense in terms of tracing, because the interrupt entry happened before the preempted entry was commited, so one may consider the two happening at the same time. The order is still preserved in the buffer. With this idea, instead of trying to get a new timestamp if an interrupt made it in between the timestamp and the test, the entry could simply make the delta zero and continue. This will prevent interrupts or tracers in the timer code from causing the above loop. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2008-11-10ftrace: disable tracing on resizeSteven Rostedt
Impact: fix for bug on resize This patch addresses the bug found here: http://bugzilla.kernel.org/show_bug.cgi?id=11996 When ftrace converted to the new unified trace buffer, the resizing of the buffer was not protected as much as it was originally. If tracing is performed while the resize occurs, then the buffer can be corrupted. This patch disables all ftrace buffer modifications before a resize takes place. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2008-11-10nohz: disable tick_nohz_kick_tick() for nowThomas Gleixner
Impact: nohz powersavings and wakeup regression commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 (NOHZ: restart tick device from irq_enter()) causes a serious wakeup regression. While the patch is correct it does not take into account that spurious wakeups happen on x86. A fix for this issue is available, but we just revert to the .27 behaviour and let long running softirqs screw themself. Disable it for now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-11-10irq: call __irq_enter() before calling the tick_idle_checkThomas Gleixner
Impact: avoid spurious ksoftirqd wakeups The tick idle check which is called from irq_enter() was run before the call to __irq_enter() which did not set the in_interrupt() bits in preempt_count. That way the raise of a softirq woke up softirqd for nothing as the softirq was handled on return from interrupt. Call __irq_enter() before calling into the tick idle check code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-10sched: clean up debug infoPeter Zijlstra
Impact: clean up and fix debug info printout While looking over the sched_debug code I noticed that we printed the rq schedstats for every cfs_rq, ammend this. Also change nr_spead_over into an int, and fix a little buglet in min_vruntime printing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-09irq: fix typoIngo Molnar
Impact: build fix fix build failure on UP. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-09genirq: fix the affinity setting in setup_irqThomas Gleixner
The affinity setting in setup irq is called before the NO_BALANCING flag is checked and might therefore override affinity settings from the calling code with the default setting. Move the NO_BALANCING flag check before the call to the affinity setting. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-09genirq: keep affinities set from userspace across free/request_irq()Thomas Gleixner
Impact: preserve user-modified affinities on interrupts Kumar Galak noticed that commit 18404756765c713a0be4eb1082920c04822ce588 (genirq: Expose default irq affinity mask (take 3)) overrides an already set affinity setting across a free / request_irq(). Happens e.g. with ifdown/ifup of a network device. Change the logic to mark the affinities as set and keep them intact. This also fixes the unlocked access to irq_desc in irq_select_affinity() when called from irq_affinity_proc_write() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-09Merge branch 'cpus4096' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: cpumask: introduce new API, without changing anything, v3 cpumask: new API, v2 cpumask: introduce new API, without changing anything
2008-11-08ftrace: display start of CPU buffer in trace outputSteven Rostedt
Impact: change in trace output Because the trace buffers are per cpu ring buffers, the start of the trace can be confusing. If one CPU is very active at the end of the trace, its history will not go as far back as the other CPU traces. This means that output for a particular CPU may not appear for the first part of a trace. To help annotate what is happening, and to prevent any more confusion, this patch adds a line that annotates the start of a CPU buffer output. For example: automount-3495 [001] 184.596443: dnotify_parent <-vfs_write [...] automount-3495 [001] 184.596449: dput <-path_put automount-3496 [002] 184.596450: down_read_trylock <-do_page_fault [...] sshd-3497 [001] 184.597069: up_read <-do_page_fault <idle>-0 [000] 184.597074: __exit_idle <-exit_idle [...] automount-3496 [002] 184.597257: filemap_fault <-__do_fault <idle>-0 [003] 184.597261: exit_idle <-smp_apic_timer_interrupt Note, parsers of a trace output should always ignore any lines that start with a '#'. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: force pass of preemptoff selftestSteven Rostedt
Impact: preemptoff not tested in selftest Due to the BKL not being preemptable anymore, the selftest of the preemptoff code can not be tested. It requires that it is called with preemption enabled, but since the BKL is held, that is no longer the case. This patch simply skips those tests if it detects that the context is not preemptable. The following will now show up in the tests: Testing tracer preemptoff: can not test ... force PASSED Testing tracer preemptirqsoff: can not test ... force PASSED When the BKL is removed, or it becomes preemptable once again, then the tests will be performed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: remove trace array ctrlSteven Rostedt
Impact: remove obsolete variable in trace_array structure With the new start / stop method of ftrace, the ctrl variable in the trace_array structure is now obsolete. Remove it. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: remove ctrl_update methodSteven Rostedt
Impact: Remove the ctrl_update tracer method With the new quick start/stop method of tracing, the ctrl_update method is out of date. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: enable trace_printk by defaultSteven Rostedt
Impact: have the ftrace_printk enabled on startup It is confusing to have to "echo trace_printk > /debug/tracing/iter_ctrl" after adding ftrace_printk in the kernel. Currently the trace_printk is set to off by default. ftrace_printk should only be in open kernel code when used for debugging, and thus it should be enabled by default. It may also be used to record data within a tracer, but those ftrace_printks should be within wrappers that are either enabled by trace_points or have a variable protecting the code path from being entered when the tracer is disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: irqsoff tracer incorrect resetSteven Rostedt
Impact: fix to irqsoff tracer output In converting to the new start / stop ftrace handling, the irqsoff tracer start called the irqsoff reset function. irqsoff tracer is not the same as the other traces, and it resets the buffers while searching for the longest latency. The reset that the irqsoff stop method calls disables the function tracing. That means that, by starting the tracer, the function tracer is disabled incorrectly. This patch simply removes the call to reset which keeps the function tracing enabled. Reset is not needed for the irqsoff stop method. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: fix sched_switch APISteven Rostedt
Impact: fix for sched_switch that broke dynamic ftrace startup The commit: tracing/fastboot: use sched switch tracer from boot tracer broke the API of the sched_switch trace. The use of the tracing_start/stop_cmdline record is for only recording the cmdline, NOT recording the schedule switches themselves. Seeing that the boot tracer broke the API to do something that it wanted, this patch adds a new interface for the API while puting back the original interface of the old API. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: fix boot trace sched startupSteven Rostedt
Impact: boot tracer startup modified The boot tracer calls into some of the schedule tracing private functions that should not be exported. This patch cleans it up, and makes way for further changes in the ftrace infrastructure. This patch adds a api to assign a tracer array to the schedule context switch tracer. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08ftrace: fix set_ftrace_filterSteven Rostedt
Impact: fix of output of set_ftrace_filter Commit ftrace: do not show freed records in available_filter_functions Removed a bit too much from the set_ftrace_filter code, where we now see all functions in the set_ftrace_filter file even when we set a filter. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08Merge branches 'tracing/ftrace', 'tracing/fastboot', 'tracing/nmisafe' and ↵Ingo Molnar
'tracing/urgent' into tracing/core
2008-11-07sched: clean up SCHED_CPUMASK_ALLOCLi Zefan
Impact: cleanup The #if/#endif is ugly. Change SCHED_CPUMASK_ALLOC and SCHED_CPUMASK_FREE to static inline functions. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-07Merge branch 'sched/urgent' into sched/coreIngo Molnar
2008-11-07sched: fix memory leak in a failure pathLi Zefan
Impact: fix rare memory leak in the sched-domains manual reconfiguration code In the failure path, rd is not attached to a sched domain, so it causes a leak. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-07sched: fix a bug in sched domain degenerateLi Zefan
Impact: re-add incorrectly eliminated sched domain layers (1) on i386 with SCHED_SMT and SCHED_MC enabled # mount -t cgroup -o cpuset xxx /mnt # echo 0 > /mnt/cpuset.sched_load_balance # mkdir /mnt/0 # echo 0 > /mnt/0/cpuset.cpus # dmesg CPU0 attaching sched-domain: domain 0: span 0 level CPU groups: 0 (2) on i386 with SCHED_MC enabled but SCHED_SMT disabled # same with (1) # dmesg CPU0 attaching NULL sched-domain. The bug is that some sched domains may be skipped unintentionally when degenerating (optimizing) sched domains. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-07Merge branch 'master' of git://oss.sgi.com:8090/xfs/linux-2.6Niv Sardi
2008-11-06Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: Block: use round_jiffies_up() Add round_jiffies_up and related routines block: fix __blkdev_get() for removable devices generic-ipi: fix the smp_mb() placement blk: move blk_delete_timer call in end_that_request_last block: add timer on blkdev_dequeue_request() not elv_next_request() bio: define __BIOVEC_PHYS_MERGEABLE block: remove unused ll_new_mergeable()
2008-11-06Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: re-tune balancing sched: fix buddies for group scheduling sched: backward looking buddy sched: fix fair preempt check sched: cleanup fair task selection
2008-11-06cgroups: fix invalid cgrp->dentry before cgroup has been completely removedLi Zefan
This fixes an oops when reading /proc/sched_debug. A cgroup won't be removed completely until finishing cgroup_diput(), so we shouldn't invalidate cgrp->dentry in cgroup_rmdir(). Otherwise, when a group is being removed while cgroup_path() gets called, we may trigger NULL dereference BUG. The bug can be reproduced: # cat test.sh #!/bin/sh mount -t cgroup -o cpu xxx /mnt for (( ; ; )) { mkdir /mnt/sub rmdir /mnt/sub } # ./test.sh & # cat /proc/sched_debug BUG: unable to handle kernel NULL pointer dereference at 00000038 IP: [<c045a47f>] cgroup_path+0x39/0x90 ... Call Trace: [<c0420344>] ? print_cfs_rq+0x6e/0x75d [<c0421160>] ? sched_debug_show+0x72d/0xc1e ... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06sched, lockdep: inline double_unlock_balance()Sripathi Kodi
We have a test case which measures the variation in the amount of time needed to perform a fixed amount of work on the preempt_rt kernel. We started seeing deterioration in it's performance recently. The test should never take more than 10 microseconds, but we started 5-10% failure rate. Using elimination method, we traced the problem to commit 1b12bbc747560ea68bcc132c3d05699e52271da0 (lockdep: re-annotate scheduler runqueues). When LOCKDEP is disabled, this patch only adds an additional function call to double_unlock_balance(). Hence I inlined double_unlock_balance() and the problem went away. Here is a patch to make this change. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06cpumask: introduce new API, without changing anythingRusty Russell
Impact: introduce new APIs We want to deprecate cpumasks on the stack, as we are headed for gynormous numbers of CPUs. Eventually, we want to head towards an undefined 'struct cpumask' so they can never be declared on stack. 1) New cpumask functions which take pointers instead of copies. (cpus_* -> cpumask_*) 2) Several new helpers to reduce requirements for temporary cpumasks (cpumask_first_and, cpumask_next_and, cpumask_any_and) 3) Helpers for declaring cpumasks on or offstack for large NR_CPUS (cpumask_var_t, alloc_cpumask_var and free_cpumask_var) 4) 'struct cpumask' for explicitness and to mark new-style code. 5) Make iterator functions stop at nr_cpu_ids (a runtime constant), not NR_CPUS for time efficiency and for smaller dynamic allocations in future. 6) cpumask_copy() so we can allocate less than a full cpumask eventually (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask' definition eventually. 7) work_on_cpu() helper for doing task on a CPU, rather than saving old cpumask for current thread and manipulating it. 8) smp_call_function_many() which is smp_call_function_mask() except taking a cpumask pointer. Note that this patch simply introduces the new functions and leaves the obsolescent ones in place. This is to simplify the transition patches. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06Add round_jiffies_up and related routinesAlan Stern
This patch (as1158b) adds round_jiffies_up() and friends. These routines work like the analogous round_jiffies() functions, except that they will never round down. The new routines will be useful for timeouts where we don't care exactly when the timer expires, provided it doesn't expire too soon. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-06generic-ipi: fix the smp_mb() placementSuresh Siddha
smp_mb() is needed (to make the memory operations visible globally) before sending the ipi on the sender and the receiver (on Alpha atleast) needs smp_read_barrier_depends() in the handler before reading the call_single_queue list in a lock-free fashion. On x86, x2apic mode register accesses for sending IPI's don't have serializing semantics. So the need for smp_mb() before sending the IPI becomes more critical in x2apic mode. Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that smp_mb() doesn't mean anything on the sender, when the ipi receiver is not doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-06ring-buffer: convert to raw spinlocksSteven Rostedt
Impact: no lockdep debugging of ring buffer The problem with running lockdep on the ring buffer is that the ring buffer is the core infrastructure of ftrace. What happens is that the tracer will start tracing the lockdep code while lockdep is testing the ring buffers locks. This can cause lockdep to fail due to testing cases that have not fully finished their locking transition. This patch converts the spin locks used by the ring buffer back into raw spin locks which lockdep does not check. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06ftrace: restructure tracing start/stop infrastructureSteven Rostedt
Impact: change where tracing is started up and stopped Currently, when a new tracer is selected via echo'ing a tracer name into the current_tracer file, the startup is only done if tracing_enabled is set to one. If tracing_enabled is changed to zero (by echo'ing 0 into the tracing_enabled file) a full shutdown is performed. The full startup and shutdown of a tracer can be expensive and the user can lose out traces when echo'ing in 0 to the tracing_enabled file, because the process takes too long. There can also be places that the user would like to start and stop the tracer several times and doing the full startup and shutdown of a tracer might be too expensive. This patch performs the full startup and shutdown when a tracer is selected. It also adds a way to do a quick start or stop of a tracer. The quick version is just a flag that prevents the tracing from taking place, but the overhead of the code is still there. For example, the startup of a tracer may enable tracepoints, or enable the function tracer. The stop and start will just set a flag to have the tracer ignore the calls when the tracepoint or function trace is called. The overhead of the tracer may still be present when the tracer is stopped, but no tracing will occur. Setting the tracer to the 'nop' tracer (or any other tracer) will perform the shutdown of the tracer which will disable the tracepoint or disable the function tracer. The tracing_enabled file will simply start or stop tracing. This change is all internal. The end result for the user should be the same as before. If tracing_enabled is not set, no trace will happen. If tracing_enabled is set, then the trace will happen. The tracing_enabled variable is static between tracers. Enabling tracing_enabled and going to another tracer will keep tracing_enabled enabled. Same is true with disabling tracing_enabled. This patch will now provide a fast start/stop method to the users for enabling or disabling tracing. Note: There were two methods to the struct tracer that were never used: The methods start and stop. These were to be used as a hook to the reading of the trace output, but ended up not being necessary. These two methods are now used to enable the start and stop of each tracer, in case the tracer needs to do more than just not write into the buffer. For example, the irqsoff tracer must stop recording max latencies when tracing is stopped. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06ftrace: soft tracing stop and startSteven Rostedt
Impact: add way to quickly start stop tracing from the kernel This patch adds a soft stop and start to the trace. This simply disables function tracing via the ftrace_disabled flag, and disables the trace buffers to prevent recording. The tracing code may still be executed, but the trace will not be recorded. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06ftrace: add quick function trace stopSteven Rostedt
Impact: quick start and stop of function tracer This patch adds a way to disable the function tracer quickly without the need to run kstop_machine. It adds a new variable called function_trace_stop which will stop the calls to functions from mcount when set. This is just an on/off switch and does not handle recursion like preempt_disable(). It's main purpose is to help other tracers/debuggers start and stop tracing fuctions without the need to call kstop_machine. The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs that implement the testing of the function_trace_stop in the mcount arch dependent code. Otherwise, the test is done in the C code. x86 is the only arch at the moment that supports this. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06Merge branch 'tracing/fastboot' into tracing/ftraceIngo Molnar
2008-11-06file capabilities: add no_file_caps switch (v4)Serge E. Hallyn
Add a no_file_caps boot option when file capabilities are compiled into the kernel (CONFIG_SECURITY_FILE_CAPABILITIES=y). This allows distributions to ship a kernel with file capabilities compiled in, without forcing users to use (and understand and trust) them. When no_file_caps is specified at boot, then when a process executes a file, any file capabilities stored with that file will not be used in the calculation of the process' new capability sets. This means that booting with the no_file_caps boot option will not be the same as booting a kernel with file capabilities compiled out - in particular a task with CAP_SETPCAP will not have any chance of passing capabilities to another task (which isn't "really" possible anyway, and which may soon by killed altogether by David Howells in any case), and it will instead be able to put new capabilities in its pI. However since fI will always be empty and pI is masked with fI, it gains the task nothing. We also support the extra prctl options, setting securebits and dropping capabilities from the per-process bounding set. The other remaining difference is that killpriv, task_setscheduler, setioprio, and setnice will continue to be hooked. That will be noticable in the case where a root task changed its uid while keeping some caps, and another task owned by the new uid tries to change settings for the more privileged task. Changelog: Nov 05 2008: (v4) trivial port on top of always-start-\ with-clear-caps patch Sep 23 2008: nixed file_caps_enabled when file caps are not compiled in as it isn't used. Document no_file_caps in kernel-parameters.txt. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-05sched: fix buddies for group schedulingPeter Zijlstra
Impact: scheduling order fix for group scheduling For each level in the hierarchy, set the buddy to point to the right entity. Therefore, when we do the hierarchical schedule, we have a fair chance of ending up where we meant to. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05sched: backward looking buddyPeter Zijlstra
Impact: improve/change/fix wakeup-buddy scheduling Currently we only have a forward looking buddy, that is, we prefer to schedule to the task we last woke up, under the presumption that its going to consume the data we just produced, and therefore will have cache hot benefits. This allows co-waking producer/consumer task pairs to run ahead of the pack for a little while, keeping their cache warm. Without this, we would interleave all pairs, utterly trashing the cache. This patch introduces a backward looking buddy, that is, suppose that in the above scenario, the consumer preempts the producer before it can go to sleep, we will therefore miss the wakeup from consumer to producer (its already running, after all), breaking the cycle and reverting to the cache-trashing interleaved schedule pattern. The backward buddy will try to schedule back to the task that woke us up in case the forward buddy is not available, under the assumption that the last task will be the one with the most cache hot task around barring current. This will basically allow a task to continue after it got preempted. In order to avoid starvation, we allow either buddy to get wakeup_gran ahead of the pack. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05sched: fix fair preempt checkPeter Zijlstra
Impact: fix cross-class preemption Inter-class wakeup preemptions should go on class order. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05sched: cleanup fair task selectionPeter Zijlstra
Impact: cleanup Clean up task selection Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05ftrace: fix breakage in bin_fmt resultsEric Anholt
In 777e208d40d0953efc6fb4ab58590da3f7d8f02d we changed from outputting field->cpu (a char) to iter->cpu (unsigned int), increasing the resulting structure size by 3 bytes. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04tracing/ftrace: fix a bug when switch current tracer to sched tracerFrederic Weisbecker
Impact: fix boot tracer + sched tracer coupling bug Fix a bug that made the sched_switch tracer unable to run if set as the current_tracer after the boot tracer. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04tracing/ftrace: types and naming corrections for sched tracerFrederic Weisbecker
Impact: cleanup This patch applies some corrections suggested by Steven Rostedt. Change the type of shed_ref into int since it is used into a Mutex, we don't need it anymore as an atomic variable in the sched_switch tracer. Also change the name of the register mutex. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04tracing/fastboot: use sched switch tracer from boot tracerFrederic Weisbecker
Impact: enhance boot trace output with scheduling events Use the sched_switch tracer from the boot tracer. We also can trace schedule events inside the initcalls. Sched tracing is disabled after the initcall has finished and then reenabled before the next one is started. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04tracing/ftrace: remove unused code in sched_switch tracerFrederic Weisbecker
Impact: cleanup When init_sched_switch_trace() is called, it has no reason to start the sched tracer if the sched_ref is not zero. _ If this is non-zero, the tracer is already used, but we can register it to the tracing engine. There is already a security which avoid the tracer probes not to be resgistered twice. _ If this is zero, this block will not be used. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04tracing/ftrace: fix a race condition in sched_switch tracerFrederic Weisbecker
Impact: fix race condition in sched_switch tracer This patch fixes a race condition in the sched_switch tracer. If several tasks (IE: concurrent initcalls) are playing with tracing_start_cmdline_record() and tracing_stop_cmdline_record(), the following situation could happen: _ Task A and B are using the same tracepoint probe. Task A holds it. Task B is sleeping and doesn't hold it. _ Task A frees the sched tracer, then sched_ref is decremented to 0. _ Task A is preempted and hadn't yet unregistered its tracepoint probe, then B runs. _ B increments sched_ref, sees it's 1 and then guess it has to register its probe. But it has not been freed by task A. _ A lot of bad things can happen after that... Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04tracing/fastboot: Enable boot tracing only during initcallsFrederic Weisbecker
Impact: modify boot tracer We used to disable the initcall tracing at a specified time (IE: end of builtin initcalls). But we don't need it anymore. It will be stopped when initcalls are finished. However we want two things: _Start this tracing only after pre-smp initcalls are finished. _Since we are planning to trace sched_switches at the same time, we want to enable them only during the initcall execution. For this purpose, this patch introduce two functions to enable/disable the sched_switch tracing during boot. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04ftrace: sysctl typoPeter Zijlstra
Impact: fix sysctl name typo Steve must have needed more coffee ;-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>