summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-02-21rcu: Streamline code produced by __rcu_read_unlock()Paul E. McKenney
This is a port of commit #be0e1e21 to TINY_PREEMPT_RCU. This uses noinline to prevent rcu_read_unlock_special() from being inlined into __rcu_read_unlock(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Protect __rcu_read_unlock() against scheduler-using irq handlersPaul E. McKenney
This commit ports commit #10f39bb1b2 (rcu: protect __rcu_read_unlock() against scheduler-using irq handlers) from TREE_PREEMPT_RCU to TINY_PREEMPT_RCU. The following is a corresponding port of that commit message. The addition of RCU read-side critical sections within runqueue and priority-inheritance critical sections introduced some deadlocks, for example, involving interrupts from __rcu_read_unlock() where the interrupt handlers call wake_up(). This situation can cause the instance of __rcu_read_unlock() invoked from interrupt to do some of the processing that would otherwise have been carried out by the task-level instance of __rcu_read_unlock(). When the interrupt-level instance of __rcu_read_unlock() is called with a scheduler lock held from interrupt-entry/exit situations where in_irq() returns false, deadlock can result. Of course, in a UP kernel, there are not really any deadlocks, but the upper-level critical section can still be be fatally confused by the lower-level critical section changing things out from under it. This commit resolves these deadlocks by using negative values of the per-task ->rcu_read_lock_nesting counter to indicate that an instance of __rcu_read_unlock() is in flight, which in turn prevents instances from interrupt handlers from doing any special processing. Note that nested rcu_read_lock()/rcu_read_unlock() pairs are still permitted, but they will never see ->rcu_read_lock_nesting go to zero, and will therefore never invoke rcu_read_unlock_special(), thus preventing them from seeing the RCU_READ_UNLOCK_BLOCKED bit should it be set in ->rcu_read_unlock_special. This patch also adds a check for ->rcu_read_unlock_special being negative in rcu_check_callbacks(), thus preventing the RCU_READ_UNLOCK_NEED_QS bit from being set should a scheduling-clock interrupt occur while __rcu_read_unlock() is exiting from an outermost RCU read-side critical section. Of course, __rcu_read_unlock() can be preempted during the time that ->rcu_read_lock_nesting is negative. This could result in the setting of the RCU_READ_UNLOCK_BLOCKED bit after __rcu_read_unlock() checks it, and would also result it this task being queued on the corresponding rcu_node structure's blkd_tasks list. Therefore, some later RCU read-side critical section would enter rcu_read_unlock_special() to clean up -- which could result in deadlock (OK, OK, fatal confusion) if that RCU read-side critical section happened to be in the scheduler where the runqueue or priority-inheritance locks were held. To prevent the possibility of fatal confusion that might result from preemption during the time that ->rcu_read_lock_nesting is negative, this commit also makes rcu_preempt_note_context_switch() check for negative ->rcu_read_lock_nesting, thus refraining from queuing the task (and from setting RCU_READ_UNLOCK_BLOCKED) if we are already exiting from the outermost RCU read-side critical section (in other words, we really are no longer actually in that RCU read-side critical section). In addition, rcu_preempt_note_context_switch() invokes rcu_read_unlock_special() to carry out the cleanup in this case, which clears out the ->rcu_read_unlock_special bits and dequeues the task (if necessary), in turn avoiding needless delay of the current RCU grace period and needless RCU priority boosting. It is still illegal to call rcu_read_unlock() while holding a scheduler lock if the prior RCU read-side critical section has ever had both preemption and irqs enabled. However, the common use case is legal, namely where then entire RCU read-side critical section executes with irqs disabled, for example, when the scheduler lock is held across the entire lifetime of the RCU read-side critical section. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Remove single-rcu_node optimization in rcu_start_gp()Paul E. McKenney
The grace-period initialization sequence in rcu_start_gp() has a special case for systems where the rcu_node tree is a single rcu_node structure. This made sense some years ago when systems were smaller and up to 64 CPUs could share a single rcu_node structure, but now that large systems are common and a given leaf rcu_node structure can support only 16 CPUs (due to lock contention on the rcu_node's ->lock field), this optimization is almost never taken. And even the small mobile platforms that might make use of it might rather have the kernel text reduction. Therefore, this commit removes the check for single-rcu_node trees. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-02-21rcu: Don't make callbacks go through second full grace periodPaul E. McKenney
RCU's current CPU-offline code path dumps all of the outgoing CPU's callbacks onto the RCU_NEXT_TAIL portion of the surviving CPU's callback list. This means that all the ready-to-invoke callbacks from the outgoing CPU must wait for another full RCU grace period. This was just fine when CPU-hotplug events were rare, but there is increasing evidence that users are planning to make increasing use of CPU hotplug. Therefore, this commit changes the callback-dumping procedure so that callbacks that are ready to invoke are moved to the RCU_DONE_TAIL portion of the surviving CPU's callback list. This avoids running these callbacks through a second unnecessary grace period. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Check for callback invocation from offline CPUsPaul E. McKenney
Because quiescent states are now reported from offline CPUs in CPU_DYING state, there is some possibility that such a CPU might note the end of a grace period and attempt to start invoking callbacks. This would be a very bad thing, and is supposed to be prevented by the fact that the CPU_DYING CPU gets rid of all its callbacks before reporting the quiescent state. However, there is other CPU-offline code in the kernel, and it is quite possible that someone will invoke RCU core processing from that code. Therefore, this commit adds a warning for this case. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Limit lazy-callback durationPaul E. McKenney
Currently, a given CPU is permitted to remain in dyntick-idle mode indefinitely if it has only lazy RCU callbacks queued. This is vulnerable to corner cases in NUMA systems, so limit the time to six seconds by default. (Currently controlled by a cpp macro.) Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Make rcutorture flag online/offline failuresPaul E. McKenney
Make rcutorture check for CPU-hotplug failures and complain if there were any. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Simplify offline processingPaul E. McKenney
Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING notifier. This simplifies the code by eliminating a potential deadlock and by reducing the responsibilities of force_quiescent_state(). Also rename functions to make their connection to the CPU-hotplug stages explicit. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Avoid waking up CPUs having only kfree_rcu() callbacksPaul E. McKenney
When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to enter dyntick-idle mode even if it still has RCU callbacks queued. RCU avoids system hangs in this case by scheduling a timer for several jiffies in the future. However, if all of the callbacks on that CPU are from kfree_rcu(), there is no reason to wake the CPU up, as it is not a problem to defer freeing of memory. This commit therefore tracks the number of callbacks on a given CPU that are from kfree_rcu(), and avoids scheduling the timer if all of a given CPU's callbacks are from kfree_rcu(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Add diagnostic for misaligned rcu_head structuresPaul E. McKenney
The push for energy efficiency will require that RCU tag rcu_head structures to indicate whether or not their invocation is time critical. This tagging is best carried out in the bottom bits of the ->next pointers in the rcu_head structures. This tagging requires that the rcu_head structures be properly aligned, so this commit adds the required diagnostics. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Add lockdep-RCU checks for simple self-deadlockPaul E. McKenney
It is illegal to have a grace period within a same-flavor RCU read-side critical section, so this commit adds lockdep-RCU checks to splat when such abuse is encountered. This commit does not detect more elaborate RCU deadlock situations. These situations might be a job for lockdep enhancements. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21ftrace, perf: Add filter support for function trace eventJiri Olsa
Adding support to filter function trace event via perf interface. It is now possible to use filter interface in the perf tool like: perf record -e ftrace:function --filter="(ip == mm_*)" ls The filter syntax is restricted to the the 'ip' field only, and following operators are accepted '==' '!=' '||', ending up with the filter strings like: ip == f1[, ]f2 ... || ip != f3[, ]f4 ... with comma ',' or space ' ' as a function separator. If the space ' ' is used as a separator, the right side of the assignment needs to be enclosed in double quotes '"', e.g.: perf record -e ftrace:function --filter '(ip == do_execve,sys_*,ext*)' ls perf record -e ftrace:function --filter '(ip == "do_execve,sys_*,ext*")' ls perf record -e ftrace:function --filter '(ip == "do_execve sys_* ext*")' ls The '==' operator adds trace filter with same effect as would be added via set_ftrace_filter file. The '!=' operator adds trace filter with same effect as would be added via set_ftrace_notrace file. The right side of the '!=', '==' operators is list of functions or regexp. to be added to filter separated by space. The '||' operator is used for connecting multiple filter definitions together. It is possible to have more than one '==' and '!=' operators within one filter string. Link: http://lkml.kernel.org/r/1329317514-8131-8-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21ftrace: Allow to specify filter field type for ftrace eventsJiri Olsa
Adding FILTER_TRACE_FN event field type for function tracepoint event, so it can be properly recognized within filtering code. Currently all fields of ftrace subsystem events share the common field type FILTER_OTHER. Since the function trace fields need special care within the filtering code we need to recognize it properly, hence adding the FILTER_TRACE_FN event type. Adding filter parameter to the FTRACE_ENTRY macro, to specify the filter field type for the event. Link: http://lkml.kernel.org/r/1329317514-8131-7-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21ftrace, perf: Add support to use function tracepoint in perfJiri Olsa
Adding perf registration support for the ftrace function event, so it is now possible to register it via perf interface. The perf_event struct statically contains ftrace_ops as a handle for function tracer. The function tracer is registered/unregistered in open/close actions. To be efficient, we enable/disable ftrace_ops each time the traced process is scheduled in/out (via TRACE_REG_PERF_(ADD|DELL) handlers). This way tracing is enabled only when the process is running. Intentionally using this way instead of the event's hw state PERF_HES_STOPPED, which would not disable the ftrace_ops. It is now possible to use function trace within perf commands like: perf record -e ftrace:function ls perf stat -e ftrace:function ls Allowed only for root. Link: http://lkml.kernel.org/r/1329317514-8131-6-git-send-email-jolsa@redhat.com Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21ftrace: Add FTRACE_ENTRY_REG macro to allow event registrationJiri Olsa
Adding FTRACE_ENTRY_REG macro so particular ftrace entries could specify registration function and thus become accesible via perf. This will be used in upcomming patch for function trace. Link: http://lkml.kernel.org/r/1329317514-8131-5-git-send-email-jolsa@redhat.com Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21ftrace, perf: Add add/del tracepoint perf registration actionsJiri Olsa
Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle perf event schedule in/out actions. The add action is invoked for when the perf event is scheduled in, while the del action is invoked when the event is scheduled out. Link: http://lkml.kernel.org/r/1329317514-8131-4-git-send-email-jolsa@redhat.com Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21ftrace, perf: Add open/close tracepoint perf registration actionsJiri Olsa
Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate register/unregister from open/close actions. The register/unregister actions are invoked for the first/last tracepoint user when opening/closing the event. The open/close actions are invoked for each tracepoint user when opening/closing the event. Link: http://lkml.kernel.org/r/1329317514-8131-3-git-send-email-jolsa@redhat.com Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21ftrace: Add enable/disable ftrace_ops control interfaceJiri Olsa
Adding a way to temporarily enable/disable ftrace_ops. The change follows the same way as 'global' ftrace_ops are done. Introducing 2 global ftrace_ops - control_ops and ftrace_control_list which take over all ftrace_ops registered with FTRACE_OPS_FL_CONTROL flag. In addition new per cpu flag called 'disabled' is also added to ftrace_ops to provide the control information for each cpu. When ftrace_ops with FTRACE_OPS_FL_CONTROL is registered, it is set as disabled for all cpus. The ftrace_control_list contains all the registered 'control' ftrace_ops. The control_ops provides function which iterates ftrace_control_list and does the check for 'disabled' flag on current cpu. Adding 3 inline functions: ftrace_function_local_disable/ftrace_function_local_enable - enable/disable the ftrace_ops on current cpu ftrace_function_local_disabled - get disabled ftrace_ops::disabled value for current cpu Link: http://lkml.kernel.org/r/1329317514-8131-2-git-send-email-jolsa@redhat.com Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21tracing: Don't use p->len field to determine output in __print_*() functionsSteven Rostedt
If more than one __print_*() function is used in a tracepoint (__print_flags(), __print_symbols(), etc), then the temp seq buffer will not be zero on entry. Using the temp seq buffer's length to know if data has been printed or not in the current function is incorrect and may produce incorrect results. Currently, no in-tree tracepoint causes this bug, but new ones may be created. Cc: Andrew Vagin <avagin@openvz.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-20tracing: Don't print an extra separator of flagsAndrey Vagin
If __print_flags() is used after another __print_*() function, the temp seq_file buffer will not be empty on entry, and the delimiter will be printed even though there's just one field. We get something like: |S instead of just: S This is because the length of the temp seq buffer is used to determine if the delimiter is printed or not. But this algorithm fails when the seq buffer is not empty on entry, and the delimiter will be printed because it thinks that a previous field was already printed. Link: http://lkml.kernel.org/r/1329650167-480655-1-git-send-email-avagin@openvz.org Signed-off-by: Andrew Vagin <avagin@openvz.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Assorted fixes, sat in -next for a week or so... * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() vfs: fix compat_sys_stat() handling of overflows in st_nlink quota: Fix deadlock with suspend and quotas vfs: Provide function to get superblock and wait for it to thaw vfs: fix panic in __d_lookup() with high dentry hashtable counts autofs4 - fix lockdep splat in autofs vfs: fix d_inode_lookup() dentry ref leak
2012-02-17Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
2012-02-15Merge branch 'fortglx/3.4/time' of git://git.linaro.org/people/jstultz/linux ↵Thomas Gleixner
into timers/core
2012-02-15timer: Fix bad idle check on irq entryFrederic Weisbecker
idle_cpu() is called on irq entry to guess if we need to call tick_check_idle(). This way we can catch up with jiffies if the tick was stopped, stop accounting idle time during the interrupt and maintain the sched clock if it is unstable. But if we are going to exit the idle loop to schedule a new task (ie: if we have a task in the runqueue or a remotely enqueued ttwu to perform), the idle_cpu() check will return 0 such that we miss the call to tick_check_idle() for all interrupts happening before we schedule the new task. As a result these interrupts and the softirqs coming along may deal with stale jiffies values, bad sched clock values, and won't substract their time from the idle time accounting. Fix this with using is_idle_task() instead that strictly checks that we are running the idle task, without caring about the fact we are going to schedule a task soon. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@elte.hu> Link: http://lkml.kernel.org/r/1327427984-23282-3-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15nohz: Remove ts->Einidle checks before restarting the tickFrederic Weisbecker
ts->inidle is set by tick_nohz_idle_enter() and unset by tick_nohz_idle_exit(). However these two calls are assumed to be always paired. This means that by the time we call tick_nohz_idle_exit(), ts->inidle is supposed to be always set to 1. Remove the checks for ts->inidle in tick_nohz_idle_exit(). This simplifies a bit the code and improves its debuggability (ie: ensure the call is paired with a tick_nohz_idle_enter() call). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@elte.hu> Link: http://lkml.kernel.org/r/1327427984-23282-2-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15nohz: Remove update_ts_time_stat from tick_nohz_start_idleMichal Hocko
There is no reason to call update_ts_time_stat from tick_nohz_start_idle anymore (after e0e37c20 sched: Eliminate the ts->idle_lastupdate field) when we updated idle_lastupdate unconditionally. We haven't set idle_active yet and do not provide last_update_time so the whole call end up being just 2 wasted branches. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/1322755222-6951-1-git-send-email-mhocko@suse.cz Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15clockevents: Leave the broadcast device in shutdown mode when not neededSuresh Siddha
Platforms with Always Running APIC Timer doesn't use the broadcast timer but the kernel is leaving the broadcast timer (HPET in this case) in oneshot mode. On these platforms, before the switch to oneshot mode, broadcast device is actually in shutdown mode. Code checks for empty tick_broadcast_mask and avoids going into the periodic mode. During switch to oneshot mode, add the same tick_broadcast_mask checks in the tick_broadcast_switch_to_oneshot() and avoid the broadcast device going into the oneshot mode. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: john stultz <johnstul@us.ibm.com> Cc: venki@google.com Link: http://lkml.kernel.org/r/1320452301.15071.16.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15futex: Simplify return logicThomas Gleixner
No need to assign ret in each case and break. Simply return the result of the handler function directly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@linux.intel.com>
2012-02-15futex: Cover all PI opcodes with cmpxchg enabled checkThomas Gleixner
Some of the newer futex PI opcodes do not check the cmpxchg enabled variable and call unconditionally into the handling functions. Cover all PI opcodes in a separate check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@linux.intel.com>
2012-02-15genirq: Handle pending irqs in irq_startup()Thomas Gleixner
An interrupt might be pending when irq_startup() is called, but the startup code does not invoke the resend logic. In some cases this prevents the device from issuing another interrupt which renders the device non functional. Call the resend function in irq_startup() to keep things going. Reported-and-tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-15genirq: Unmask oneshot irqs when thread was not wokenThomas Gleixner
When the primary handler of an interrupt which is marked IRQ_ONESHOT returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not woken and the unmask logic of the interrupt line is never invoked. This keeps the interrupt masked forever. This was not noticed as most IRQ_ONESHOT users wake the thread unconditionally (usually because they cannot access the underlying device from hard interrupt context). Though this behaviour was nowhere documented and not necessarily intentional. Some drivers can avoid the thread wakeup in certain cases and run into the situation where the interrupt line s kept masked. Handle it gracefully. Reported-and-tested-by: Lothar Wassmann <lw@karo-electronics.de> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-02-13vfs: fix panic in __d_lookup() with high dentry hashtable countsDimitri Sivanich
When the number of dentry cache hash table entries gets too high (2147483648 entries), as happens by default on a 16TB system, use of a signed integer in the dcache_init() initialization loop prevents the dentry_hashtable from getting initialized, causing a panic in __d_lookup(). Fix this in dcache_init() and similar areas. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-02-13Merge tag 'for-linus' of git://github.com/rustyrussell/linuxLinus Torvalds
* tag 'for-linus' of git://github.com/rustyrussell/linux: module: fix broken isapnp handling in file2alias module: make module param bint handle nul value
2012-02-14module: make module param bint handle nul valueDave Young
Allow bint param accept nul values, just do same as bool param. Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-02-13tracing/trivial: Use kcalloc instead of kzalloc to allocate arrayThomas Meyer
The advantage of kcalloc is, that will prevent integer overflows which could result from the multiplication of number of elements and size and it is also a bit nicer to read. The semantic patch that makes this change is available in https://lkml.org/lkml/2011/11/25/107 Link: http://lkml.kernel.org/r/1322600880.1534.347.camel@localhost.localdomain Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-13printk/tracing: Add console output tracingJohannes Berg
Add a printk.console trace point to record any printk messages into the trace, regardless of the current console loglevel. This can help correlate (existing) printk debugging with other tracing. Link: http://lkml.kernel.org/r/1322161388.5366.54.camel@jlt3.sipsolutions.net Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-13ftrace: sched_switch plugin is deprecatedGeunsik Lim
Actually, sched_switch function tracer is merged into wakeup/wakeup_rt Update 'mini-HOWTO' for ftrace(Kernel function tracer). If we want to trace "sched:sched_switch" to trace sched_switch func, We may utilize event option.(e.g: trace-cmd list -e | grep sched) This patch is based on Linux-3.3.rc2-SMP-PREEMPT Link: http://lkml.kernel.org/r/1328695537-15081-1-git-send-email-geunsik.lim@gmail.com Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-11Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Says Jens: "Time to push off some of the pending items. I really wanted to wait until we had the regression nailed, but alas it's not quite there yet. But I'm very confident that it's "just" a missing expire on exit, so fix from Tejun should be fairly trivial. I'm headed out for a week on the slopes. - Killing the barrier part of mtip32xx. It doesn't really support barriers, and it doesn't need them (writes are fully ordered). - A few fixes from Dan Carpenter, preventing overflows of integer multiplication. - A fixup for loop, fixing a previous commit that didn't quite solve the partial read problem from Dave Young. - A bio integer overflow fix from Kent Overstreet. - Improvement/fix of the door "keep locked" part of the cdrom shared code from Paolo Benzini. - A few cfq fixes from Shaohua Li. - A fix for bsg sysfs warning when removing a file it did not create from Stanislaw Gruszka. - Two fixes for floppy from Vivek, preventing a crash. - A few block core fixes from Tejun. One killing the over-optimized ioc exit path, cleaning that up nicely. Two others fixing an oops on elevator switch, due to calling into the scheduler merge check code without holding the queue lock." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix lockdep warning on io_context release put_io_context() relay: prevent integer overflow in relay_open() loop: zero fill bio instead of return -EIO for partial read bio: don't overflow in bio_get_nr_vecs() floppy: Fix a crash during rmmod floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called cdrom: move shared static to cdrom_device_info bsg: fix sysfs link remove warning block: don't call elevator callbacks for plug merges block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions mtip32xx: removed the irrelevant argument of mtip_hw_submit_io() and the unused member of struct driver_data block: strip out locking optimization in put_io_context() cdrom: use copy_to_user() without the underscores block: fix ioc locking warning block: fix NULL icq_cache reference block,cfq: change code order
2012-02-11watchdog: Fix code/comments mismatchesFernando Luis Vázquez Cao
Reflect the change in the soft and hard lockup thresholds and their relation to the frequency of the hrtimer and NMI events in the code comments. While at it, remove references to files that do not exist anymore. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1328827342-6253-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix double start/stop in x86_pmu_start() perf evsel: Fix an issue where perf report fails to show the proper percentage perf tools: Fix prefix matching for kernel maps perf tools: Fix perf stack to non executable on x86_64 perf: Remove deprecated WARN_ON_ONCE()
2012-02-10relay: prevent integer overflow in relay_open()Dan Carpenter
"subbuf_size" and "n_subbufs" come from the user and they need to be capped to prevent an integer overflow. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-07perf: Fix double start/stop in x86_pmu_start()Stephane Eranian
The following patch fixes a bug introduced by the following commit: e050e3f0a71b ("perf: Fix broken interrupt rate throttling") The patch caused the following warning to pop up depending on the sampling frequency adjustments: ------------[ cut here ]------------ WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4() It was caused by the following call sequence: perf_adjust_freq_unthr_context.part() { stop() if (delta > 0) { perf_adjust_period() { if (period > 8*...) { stop() ... start() } } } start() } Which caused a double start and a double stop, thus triggering the assert in x86_pmu_start(). The patch fixes the problem by avoiding the double calls. We pass a new argument to perf_adjust_period() to indicate whether or not the event is already stopped. We can't just remove the start/stop from that function because it's called from __perf_event_overflow where the event needs to be reloaded via a stop/start back-toback call. The patch reintroduces the assertion in x86_pmu_start() which was removed by commit: 84f2b9b ("perf: Remove deprecated WARN_ON_ONCE()") In this second version, we've added calls to disable/enable PMU during unthrottling or frequency adjustment based on bug report of spurious NMI interrupts from Eric Dumazet. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: markus@trippelsdorf.de Cc: paulus@samba.org Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad [ Minor edits to the changelog and to the code ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-07block: strip out locking optimization in put_io_context()Tejun Heo
put_io_context() performed a complex trylock dancing to avoid deferring ioc release to workqueue. It was also broken on UP because trylock was always assumed to succeed which resulted in unbalanced preemption count. While there are ways to fix the UP breakage, even the most pathological microbench (forced ioc allocation and tight fork/exit loop) fails to show any appreciable performance benefit of the optimization. Strip it out. If there turns out to be workloads which are affected by this change, simpler optimization from the discussion thread can be applied later. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <1328514611.21268.66.camel@sli10-conroe> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-06Merge branch 'perf/urgent' into perf/coreArnaldo Carvalho de Melo
So that we can get the perf bench exec stack fixes and then apply the remaining fix for the files added after what is in perf/urgent. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-02-04Merge tag 'pm-fixes-for-3.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Power management fixes for 3.3-rc3 Three power management regression fixes, one for a recent regression introcuded by the freezer changes during the 3.3 merge window and two for regressions in cpuidle (resulting from PM QoS changes) and in the hibernate user space interface, both introduced during the 3.2 development cycle. They include: * Two hibernate (s2disk) regression fixes from Srivatsa S. Bhat (for regressions introduced during the 3.3 merge window and during the 3.2 development cycle). * A cpuidle fix from Venki Pallipadi for a regression resulting from PM QoS changes during the 3.2 development cycle causing cpuidle to work incorrectly for CONFIG_PM unset. * tag 'pm-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / QoS: CPU C-state breakage with PM Qos change PM / Freezer: Thaw only kernel threads if freezing of kernel threads fails PM / Hibernate: Thaw kernel threads in SNAPSHOT_CREATE_IMAGE ioctl path
2012-02-04PM / Freezer: Thaw only kernel threads if freezing of kernel threads failsSrivatsa S. Bhat
If freezing of kernel threads fails, we are expected to automatically thaw tasks in the error recovery path. However, at times, we encounter situations in which we would like the automatic error recovery path to thaw only the kernel threads, because we want to be able to do some more cleanup before we thaw userspace. Something like: error = freeze_kernel_threads(); if (error) { /* Do some cleanup */ /* Only then thaw userspace tasks*/ thaw_processes(); } An example of such a situation is where we freeze/thaw filesystems during suspend/hibernation. There, if freezing of kernel threads fails, we would like to thaw the frozen filesystems before thawing the userspace tasks. So, modify freeze_kernel_threads() to thaw only kernel threads in case of freezing failure. And change suspend_freeze_processes() accordingly. (At the same time, let us also get rid of the rather cryptic usage of the conditional operator (:?) in that function.) [rjw: In fact, this patch fixes a regression introduced during the 3.3 merge window, because without it thaw_processes() may be called before swsusp_free() in some situations and that may lead to massive memory allocation failures.] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-02-03kprobes: fix a memory leak in function pre_handler_kretprobe()Jiang Liu
In function pre_handler_kretprobe(), the allocated kretprobe_instance object will get leaked if the entry_handler callback returns non-zero. This may cause all the preallocated kretprobe_instance objects exhausted. This issue can be reproduced by changing samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix is straightforward: just put the allocated kretprobe_instance object back onto the free_instances list. [akpm@linux-foundation.org: use raw_spin_lock/unlock] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-03tracing/softirq: Move __raise_softirq_irqoff() out of headerSteven Rostedt
The __raise_softirq_irqoff() contains a tracepoint. As tracepoints in headers can cause issues, and not to mention, bloats the kernel when they are in a static inline, it is best to move the function that contains the tracepoint out of the header and into softirq.c. Link: http://lkml.kernel.org/r/20120118120711.GB14863@elte.hu Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-03ftrace: Change filter/notrace set functions to return exit codeJiri Olsa
Currently the ftrace_set_filter and ftrace_set_notrace functions do not return any return code. So there's no way for ftrace_ops user to tell wether the filter was correctly applied. The set_ftrace_filter interface returns error in case the filter did not match: # echo krava > set_ftrace_filter bash: echo: write error: Invalid argument Changing both ftrace_set_filter and ftrace_set_notrace functions to return zero if the filter was applied correctly or -E* values in case of error. Link: http://lkml.kernel.org/r/1325495060-6402-2-git-send-email-jolsa@redhat.com Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-02Fix race in process_vm_rw_coreChristopher Yeoh
This fixes the race in process_vm_core found by Oleg (see http://article.gmane.org/gmane.linux.kernel/1235667/ for details). This has been updated since I last sent it as the creation of the new mm_access() function did almost exactly the same thing as parts of the previous version of this patch did. In order to use mm_access() even when /proc isn't enabled, we move it to kernel/fork.c where other related process mm access functions already are. Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>