summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2013-06-11tracing: Fix file mode of free_bufferWang YanQing
Commit 4f271a2a60c748599b30bb4dafff30d770439b96 (tracing: Add a proc file to stop tracing and free buffer) implement a method to free up ring buffer in kernel memory in the release code path of free_buffer's fd. Then we don't need read/write support for free_buffer, indeed we just have a dummy write fop, and don't implement read fop. So the 0200 is more reasonable file mode for free_buffer than the current file mode 0644. Link: http://lkml.kernel.org/r/20130526085201.GA3183@udknight Acked-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Acked-by: David Sharp <dhsharp@google.com> Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11tracing/trivial: Consolidate error return conditionHarsh Prateek Bora
Consolidate the checks for !enabled and !param to return -EINVAL in event_enable_func(). Link: http://lkml.kernel.org/r/1369380137-12452-1-git-send-email-harsh@linux.vnet.ibm.com Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11tracing: Add function probe to trigger a ftrace dump of current CPU traceSteven Rostedt (Red Hat)
Add the "cpudump" command to have the current CPU ftrace buffer dumped to console if a function is hit. This is useful when debugging a tripple fault, where you have an idea of a function that is called just before the tripple fault occurs, and can tell ftrace to dump its content out to the console before it continues. This differs from the "dump" command as it only dumps the content of the ring buffer for the currently executing CPU, and does not show the contents of the other CPUs. Format is: <function>:cpudump echo 'bad_address:cpudump' > /debug/tracing/set_ftrace_filter To remove this: echo '!bad_address:cpudump' > /debug/tracing/set_ftrace_filter Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11tracing: Add function probe to trigger a ftrace dump to consoleSteven Rostedt (Red Hat)
Add the "dump" command to have the ftrace buffer dumped to console if a function is hit. This is useful when debugging a tripple fault, where you have an idea of a function that is called just before the tripple fault occurs, and can tell ftrace to dump its content out to the console before it continues. Format is: <function>:dump echo 'bad_address:dump' > /debug/tracing/set_ftrace_filter To remove this: echo '!bad_address:dump' > /debug/tracing/set_ftrace_filter Requested-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11PM / wakeup: Adjust messaging for wake events during suspendBernie Thompson
This adds in a new message to the wakeup code which adds an indication to the log that suspend was cancelled due to a wake event occouring during the suspend sequence. It also adjusts the message printed in suspend.c to reflect the potential that a suspend was aborted, as opposed to a device failing to suspend. Without these message adjustments one can end up with a kernel log that says that a device failed to suspend with no actual device suspend failures, which can be confusing to the log examiner. Signed-off-by: Bernie Thompson <bhthompson@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-11idle: Add the stack canary init to cpu_startup_entry()Thomas Gleixner
Moving x86 to the generic idle implementation (commit 7d1a9417 "x86: Use generic idle loop") wreckaged the stack protector. I stupidly missed that boot_init_stack_canary() must be inlined from a function which never returns, but I put that call into arch_cpu_idle_prepare() which of course returns. I pondered to play tricks with arch_cpu_idle_prepare() first, but then I noticed, that the other archs which have implemented the stackprotector (ARM and SH) do not initialize the canary for the non-boot cpus. So I decided to move the boot_init_stack_canary() call into cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef is just a temporary measure as I don't want to inflict the boot_init_stack_canary() call on ARM and SH that late in the cycle. I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH maintainers have no objection. Reported-by: Wouter van Kesteren <woutershep@gmail.com> Cc: x86@kernel.org Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-11tracing: Fix outputting formats of x86-tsc and counter when use trace_clockYoshihiro YUNOMAE
Outputting formats of x86-tsc and counter should be a raw format, but after applying the patch(2b6080f28c7cc3efc8625ab71495aae89aeb63a0), the format was changed to nanosec. This is because the global variable trace_clock_id was used. When we use multiple buffers, clock_id of each sub-buffer should be used. Then, this patch uses tr->clock_id instead of the global variable trace_clock_id. [ Basically, this fixes a regression where the multibuffer code changed the trace_clock file to update tr->clock_id but the traces still use the old global trace_clock_id variable, negating the file's effect. The global trace_clock_id variable is obsolete and removed. - SR ] Link: http://lkml.kernel.org/r/20130423013239.22334.7394.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11genirq: Set irq thread to RT priority on creationIvo Sieben
When a threaded irq handler is installed the irq thread is initially created on normal scheduling priority. Only after the irq thread is woken up it sets its priority to RT_FIFO MAX_USER_RT_PRIO/2 itself. This means that interrupts that occur directly after the irq handler is installed will be handled on a normal scheduling priority instead of the realtime priority that one would expect. Fix this by setting the RT priority on creation of the irq_thread. Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1370254322-17240-1-git-send-email-meltedpianoman@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-10Fix lockup related to stop_machine being stuck in __do_softirq.Ben Greear
The stop machine logic can lock up if all but one of the migration threads make it through the disable-irq step and the one remaining thread gets stuck in __do_softirq. The reason __do_softirq can hang is that it has a bail-out based on jiffies timeout, but in the lockup case, jiffies itself is not incremented. To work around this, re-add the max_restart counter in __do_irq and stop processing irqs after 10 restarts. Thanks to Tejun Heo and Rusty Russell and others for helping me track this down. This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce latencies"). It may be worth looking into ath9k to see if it has issues with its irq handler at a later date. The hang stack traces look something like this: ------------[ cut here ]------------ WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7() Watchdog detected hard LOCKUP on cpu 2 Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc] Pid: 23, comm: migration/2 Tainted: G C 3.9.4+ #11 Call Trace: <NMI> warn_slowpath_common+0x85/0x9f warn_slowpath_fmt+0x46/0x48 watchdog_overflow_callback+0x9c/0xa7 __perf_event_overflow+0x137/0x1cb perf_event_overflow+0x14/0x16 intel_pmu_handle_irq+0x2dc/0x359 perf_event_nmi_handler+0x19/0x1b nmi_handle+0x7f/0xc2 do_nmi+0xbc/0x304 end_repeat_nmi+0x1e/0x2e <<EOE>> cpu_stopper_thread+0xae/0x162 smpboot_thread_fn+0x258/0x260 kthread+0xc7/0xcf ret_from_fork+0x7c/0xb0 ---[ end trace 4947dfa9b0a4cec3 ]--- BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17] Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc] irq event stamp: 835637905 hardirqs last enabled at (835637904): __do_softirq+0x9f/0x257 hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80 softirqs last enabled at (5654720): __do_softirq+0x1ff/0x257 softirqs last disabled at (5654725): irq_exit+0x5f/0xbb CPU 1 Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. RIP: tasklet_hi_action+0xf0/0xf0 Process migration/1 Call Trace: <IRQ> __do_softirq+0x117/0x257 irq_exit+0x5f/0xbb smp_apic_timer_interrupt+0x8a/0x98 apic_timer_interrupt+0x72/0x80 <EOI> printk+0x4d/0x4f stop_machine_cpu_stop+0x22c/0x274 cpu_stopper_thread+0xae/0x162 smpboot_thread_fn+0x258/0x260 kthread+0xc7/0xcf ret_from_fork+0x7c/0xb0 Signed-off-by: Ben Greear <greearb@candelatech.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Pekka Riikonen <priikone@iki.fi> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-10Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', ↵Paul E. McKenney
'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD cbnum.2013.06.10a: Apply simplifications stemming from the new callback numbering. doc.2013.06.10a: Documentation updates. fixes.2013.06.10a: Miscellaneous fixes. srcu.2013.06.10a: Updates to SRCU. tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.
2013-06-10rcu: Shrink TINY_RCU by reworking CPU-stall ifdefsPaul E. McKenney
TINY_RCU's reset_cpu_stall_ticks() and check_cpu_stalls() functions are defined unconditionally, and are empty functions if CONFIG_RCU_TRACE is disabled (which in turns disables detection of RCU CPU stalls). This commit saves a few lines of source code by defining these functions only if CONFIG_RCU_TRACE=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-06-10rcu: Shrink TINY_RCU by moving exit_rcu()Paul E. McKenney
Now that TINY_PREEMPT_RCU is no more, exit_rcu() is always an empty function. But if TINY_RCU is going to have an empty function, it should be in include/linux/rcutiny.h, where it does not bloat the kernel. This commit therefore moves exit_rcu() out of kernel/rcupdate.c to kernel/rcutree_plugin.h, and places a static inline empty function in include/linux/rcutiny.h in order to shrink TINY_RCU a bit. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Consolidate rcutiny_plugin.h ifdefsPaul E. McKenney
This commit rearranges code in order to allow ifdefs to be consolidated in kernel/rcutiny_plugin.h, simplifying the code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove check_cpu_stall_preempt()Paul E. McKenney
With the removal of CONFIG_TINY_PREEMPT_RCU, check_cpu_stall_preempt() is now an empty function. This commit therefore eliminates it by inlining it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Simplify RCU_TINY RCU callback invocationPaul E. McKenney
TINY_PREEMPT_RCU could use a kthread to handle RCU callback invocation, which required an API to abstract kthread vs. softirq invocation. Now that TINY_PREEMPT_RCU is no longer with us, this commit retires this API in favor of direct use of the relevant softirq primitives. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove rcu_preempt_process_callbacks()Paul E. McKenney
With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_process_callbacks() is now an empty function. This commit therefore eliminates it by inlining it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove rcu_preempt_remove_callbacks()Paul E. McKenney
With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_remove_callbacks() is now an empty function. This commit therefore eliminates it by inlining it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove rcu_preempt_check_callbacks()Paul E. McKenney
With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_check_callbacks() is now an empty function. This commit therefore eliminates it by inlining it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove show_tiny_preempt_stats()Paul E. McKenney
With the removal of CONFIG_TINY_PREEMPT_RCU, show_tiny_preempt_stats() is now an empty function. This commit therefore eliminates it by inlining it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove TINY_PREEMPT_RCUPaul E. McKenney
TINY_PREEMPT_RCU adds significant code and complexity, but does not offer commensurate benefits. People currently using TINY_PREEMPT_RCU can get much better memory footprint with TINY_RCU, or, if they really need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively minor degradation in memory footprint. Please note that this move has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545) and on LWN (http://lwn.net/Articles/541037/). This commit therefore removes TINY_PREEMPT_RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().Paul E. McKenney
These interfaces never did get used, so this commit removes them, their rcutorture tests, and documentation referencing them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Merge adjacent identical ifdefsPaul E. McKenney
Two ifdefs in kernel/rcupdate.c now have identical conditions with nothing between them, so the commit merges them into a single ifdef. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Drive quiescent-state-forcing delay from HZPaul E. McKenney
Systems with HZ=100 can have slow bootup times due to the default three-jiffy delays between quiescent-state forcing attempts. This commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based on the value of HZ. However, this would break very large systems that require more time between quiescent-state forcing attempts. This commit therefore also ups the default delay by one jiffy for each 256 CPUs that might be on the system (based off of nr_cpu_ids at runtime, -not- NR_CPUS at build time). Updated to collapse #ifdefs for RCU_JIFFIES_TILL_FORCE_QS into a step-function definition as suggested by Josh Triplett. Reported-by: Paul Mackerras <paulus@au1.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-06-10rcu: Remove "Experimental" flagsPaul E. McKenney
After a release or two, features are no longer experimental. Therefore, this commit removes the "Experimental" tag from them. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Move redundant call to note_gp_changes() into called functionPaul E. McKenney
The __rcu_process_callbacks() invokes note_gp_changes() immediately before invoking rcu_check_quiescent_state(), which conditionally invokes that same function. This commit therefore eliminates the call to note_gp_changes() in __rcu_process_callbacks() in favor of making unconditional to call from rcu_check_quiescent_state() to note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Inline trivial wrapper function rcu_start_gp_per_cpu()Paul E. McKenney
Given the changes that introduce note_gp_change(), rcu_start_gp_per_cpu() is now a trivial wrapper function with only one caller. This commit therefore inlines it into its sole call site. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Eliminate check_for_new_grace_period() wrapper functionPaul E. McKenney
One of the calls to check_for_new_grace_period() is now redundant due to an immediately preceding call to note_gp_changes(). Eliminating this redundant call leaves a single caller, which is simpler if inlined. This commit therefore eliminates the redundant call and inlines the body of check_for_new_grace_period() into the single remaining call site. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Merge __rcu_process_gp_end() into __note_gp_changes()Paul E. McKenney
This commit eliminates some duplicated code by merging __rcu_process_gp_end() into __note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Switch callers from rcu_process_gp_end() to note_gp_changes()Paul E. McKenney
Because note_gp_changes() now incorporates rcu_process_gp_end() function, this commit switches to the former and eliminates the latter. In addition, this commit changes external calls from __rcu_process_gp_end() to __note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Convert rcutree_plugin.h printk callsPaul E. McKenney
This commit converts printk() calls to the corresponding pr_*() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Rename note_new_gpnum() to note_gp_changes()Paul E. McKenney
Because note_new_gpnum() now also checks for the ends of old grace periods, this commit changes its name to note_gp_changes(). Later commits will merge rcu_process_gp_end() into note_gp_changes(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Make __note_new_gpnum() check for ends of prior grace periodsPaul E. McKenney
The current implementation can detect the beginning of a new grace period before noting the end of a previous grace period. Although the current implementation correctly handles this sort of nonsense, it would be good to reduce RCU's state space by making such nonsense unnecessary, which is now possible thanks to the fact that RCU's callback groups are now numbered. This commit therefore makes __note_new_gpnum() invoke __rcu_process_gp_end() in order to note the ends of prior grace periods before noting the beginnings of new grace periods. Of course, this now means that note_new_gpnum() notes both the beginnings and ends of grace periods, and could therefore be used in place of rcu_process_gp_end(). But that is a job for later commits. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Move code to apply callback-numbering simplificationsPaul E. McKenney
The addition of callback numbering allows combining the detection of the ends of old grace periods and the beginnings of new grace periods. This commit moves code to set the stage for this combining. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Convert rcutree.c printk callsPaul E. McKenney
This commit converts printk() calls to the corresponding pr_*() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migrationPaul E. McKenney
In Steven Rostedt's words: > I've been debugging the last couple of days why my tests have been > locking up. One of my tracing tests, runs all available tracers. The > lockup always happened with the mmiotrace, which is used to trace > interactions between priority drivers and the kernel. But to do this > easily, when the tracer gets registered, it disables all but the boot > CPUs. The lockup always happened after it got done disabling the CPUs. > > Then I decided to try this: > > while :; do > for i in 1 2 3; do > echo 0 > /sys/devices/system/cpu/cpu$i/online > done > for i in 1 2 3; do > echo 1 > /sys/devices/system/cpu/cpu$i/online > done > done > > Well, sure enough, that locked up too, with the same users. Doing a > sysrq-w (showing all blocked tasks): > > [ 2991.344562] task PC stack pid father > [ 2991.344562] rcu_preempt D ffff88007986fdf8 0 10 2 0x00000000 > [ 2991.344562] ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908 > [ 2991.344562] ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80 > [ 2991.344562] ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81541750>] schedule_timeout+0xbc/0xf9 > [ 2991.344562] [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f > [ 2991.344562] [<ffffffff81049513>] ? cascade+0xa8/0xa8 > [ 2991.344562] [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20 > [ 2991.344562] [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b > [ 2991.344562] [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50 > [ 2991.344562] [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64 > [ 2991.344562] [<ffffffff81061cdb>] kthread+0xb1/0xb9 > [ 2991.344562] [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] kworker/0:1 D ffffffff81a30680 0 47 2 0x00000000 > [ 2991.344562] Workqueue: events cpuset_hotplug_workfn > [ 2991.344562] ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8 > [ 2991.344562] ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80 > [ 2991.344562] ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24 > [ 2991.344562] [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40 > [ 2991.344562] [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8 > [ 2991.344562] [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a > [ 2991.344562] [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3 > [ 2991.344562] [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3 > [ 2991.344562] [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1 > [ 2991.344562] [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1 > [ 2991.344562] [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5 > [ 2991.344562] [<ffffffff81059365>] ? rescuer_thread+0x332/0x332 > [ 2991.344562] [<ffffffff81061cdb>] kthread+0xb1/0xb9 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] bash D ffffffff81a4aa80 0 2618 2612 0x10000000 > [ 2991.344562] ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c > [ 2991.344562] ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80 > [ 2991.344562] ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24 > [ 2991.344562] [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40 > [ 2991.344562] [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53 > [ 2991.344562] [<ffffffff81548912>] notifier_call_chain+0x6b/0x98 > [ 2991.344562] [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10 > [ 2991.344562] [<ffffffff8103cf64>] __cpu_notify+0x20/0x32 > [ 2991.344562] [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36 > [ 2991.344562] [<ffffffff815225de>] _cpu_down+0x154/0x259 > [ 2991.344562] [<ffffffff81522710>] cpu_down+0x2d/0x3a > [ 2991.344562] [<ffffffff81526351>] store_online+0x4e/0xe7 > [ 2991.344562] [<ffffffff8134d764>] dev_attr_store+0x20/0x22 > [ 2991.344562] [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144 > [ 2991.344562] [<ffffffff8114c5ef>] vfs_write+0xfd/0x158 > [ 2991.344562] [<ffffffff8114c928>] SyS_write+0x5c/0x83 > [ 2991.344562] [<ffffffff8154c494>] tracesys+0xdd/0xe2 > > As well as held locks: > > [ 3034.728033] Showing all locks held in the system: > [ 3034.728033] 1 lock held by rcu_preempt/10: > [ 3034.728033] #0: (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b > [ 3034.728033] 4 locks held by kworker/0:1/47: > [ 3034.728033] #0: (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1 > [ 3034.728033] #1: (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1 > [ 3034.728033] #2: (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a > [ 3034.728033] #3: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50 > [ 3034.728033] 1 lock held by mingetty/2563: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2565: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2569: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2572: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2575: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 7 locks held by bash/2618: > [ 3034.728033] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c > [ 3034.728033] #1: (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144 > [ 3034.728033] #2: (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144 > [ 3034.728033] #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19 > [ 3034.728033] #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19 > [ 3034.728033] #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d > [ 3034.728033] #6: (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e > [ 3034.728033] 1 lock held by bash/2980: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > > Things looked a little weird. Also, this is a deadlock that lockdep did > not catch. But what we have here does not look like a circular lock > issue: > > Bash is blocked in rcu_cpu_notify(): > > 1961 /* Exclude any attempts to start a new grace period. */ > 1962 mutex_lock(&rsp->onoff_mutex); > > > kworker is blocked in get_online_cpus(), which makes sense as we are > currently taking down a CPU. > > But rcu_preempt is not blocked on anything. It is simply sleeping in > rcu_gp_kthread (really rcu_gp_init) here: > > 1453 #ifdef CONFIG_PROVE_RCU_DELAY > 1454 if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 && > 1455 system_state == SYSTEM_RUNNING) > 1456 schedule_timeout_uninterruptible(2); > 1457 #endif /* #ifdef CONFIG_PROVE_RCU_DELAY */ > > And it does this while holding the onoff_mutex that bash is waiting for. > > Doing a function trace, it showed me where it happened: > > [ 125.940066] rcu_pree-10 3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread > [...] > [ 125.940066] rcu_pree-10 3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120 > > The watchdog ran, and then: > > [ 125.940066] watchdog-38 3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118 > > Not sure what modprobe was doing, but shortly after that: > > [ 125.940066] modprobe-2848 3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0 > > Where the migration thread took down the CPU: > > [ 125.940066] migratio-40 3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120 > > which finally did: > > [ 125.940066] <idle>-0 3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry > [ 125.940066] <idle>-0 3...1 28389282548: native_play_dead <-arch_cpu_idle_dead > [ 125.940066] <idle>-0 3...1 28389282924: play_dead_common <-native_play_dead > [ 125.940066] <idle>-0 3...1 28389283468: idle_task_exit <-play_dead_common > [ 125.940066] <idle>-0 3...1 28389284644: amd_e400_remove_cpu <-play_dead_common > > > CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still > doing a schedule_timeout_uninterruptible() and it registered it's > timeout to the timer base for CPU 3. You would think that it would get > migrated right? The issue here is that the timer migration happens at > the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for > CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is > held by the thread that just put itself into a uninterruptible sleep, > that wont wake up until the CPU_DEAD notifier of the timer > infrastructure is called, which wont happen until the rcu notifier > finishes. Here's our deadlock! This commit breaks this deadlock cycle by substituting a shorter udelay() for the previous schedule_timeout_uninterruptible(), while at the same time increasing the probability of the delay. This maintains the intensity of the testing. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-10rcu: Don't call wakeup() with rcu_node structure ->lock heldSteven Rostedt
This commit fixes a lockdep-detected deadlock by moving a wake_up() call out from a rnp->lock critical section. Please see below for the long version of this story. On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote: > [12572.705832] ====================================================== > [12572.750317] [ INFO: possible circular locking dependency detected ] > [12572.796978] 3.10.0-rc3+ #39 Not tainted > [12572.833381] ------------------------------------------------------- > [12572.862233] trinity-child17/31341 is trying to acquire lock: > [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12572.878859] > but task is already holding lock: > [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12572.903381] > which lock already depends on the new lock. > > [12572.927541] > the existing dependency chain (in reverse order) is: > [12572.943736] > -> #4 (&ctx->lock){-.-...}: > [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0 > [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0 > [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0 > [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70 > [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e > [12573.018321] > -> #3 (&rq->lock){-.-.-.}: > [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260 > [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470 > [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30 > [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140 > [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe > [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c > [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf > [12573.109528] > -> #2 (&p->pi_lock){-.-.-.}: > [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320 > [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20 > [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40 > [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90 > [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50 > [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 > [12573.221379] > -> #1 (&rsp->gp_wq){..-.-.}: > [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50 > [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 Notice the above call chain. rcu_start_future_gp() is called with the rnp->lock held. Then it calls rcu_start_gp_advance, which does a wakeup. You can't do wakeups while holding the rnp->lock, as that would mean that you could not do a rcu_read_unlock() while holding the rq lock, or any lock that was taken while holding the rq lock. This is because... (See below). > [12573.295067] > -> #0 (rcu_node_0){..-.-.}: > [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 Notice the above trace. perf took its own ctx->lock, which can be taken while holding the rq lock. While holding this lock, it did a rcu_read_unlock(). The perf_lock_task_context() basically looks like: rcu_read_lock(); raw_spin_lock(ctx->lock); rcu_read_unlock(); Now, what looks to have happened, is that we scheduled after taking that first rcu_read_lock() but before taking the spin lock. When we scheduled back in and took the ctx->lock, the following rcu_read_unlock() triggered the "special" code. The rcu_read_unlock_special() takes the rnp->lock, which gives us a possible deadlock scenario. CPU0 CPU1 CPU2 ---- ---- ---- rcu_nocb_kthread() lock(rq->lock); lock(ctx->lock); lock(rnp->lock); wake_up(); lock(rq->lock); rcu_read_unlock(); rcu_read_unlock_special(); lock(rnp->lock); lock(ctx->lock); **** DEADLOCK **** > [12573.382068] > other info that might help us debug this: > > [12573.403229] Chain exists of: > rcu_node_0 --> &rq->lock --> &ctx->lock > > [12573.424471] Possible unsafe locking scenario: > > [12573.438499] CPU0 CPU1 > [12573.445599] ---- ---- > [12573.452691] lock(&ctx->lock); > [12573.459799] lock(&rq->lock); > [12573.467010] lock(&ctx->lock); > [12573.474192] lock(rcu_node_0); > [12573.481262] > *** DEADLOCK *** > > [12573.501931] 1 lock held by trinity-child17/31341: > [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12573.516475] > stack backtrace: > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39 > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00 > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40 > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0 > [12573.567856] Call Trace: > [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b > [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f > [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100 > [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0 > [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40 > [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0 > [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370 > [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50 > [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 This commit delays the wakeup via irq_work(), which is what perf and ftrace use to perform wakeups in critical sections. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-06-10irqdomain: Beef up debugfs outputGrant Likely
This patch increases the amount of output produced by the irq_domain_mapping debugfs file by first listing all of the registered irq domains at the beginning of the output, and then by including all mapped IRQs in the output, not just the active ones. It is very useful when debugging irqdomain issues to be able to see the entire list of mapped irqs, not just the ones that happen to be connected to devices. Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-10irqdomain: Clean up aftermath of irq_domain refactoringGrant Likely
After refactoring the irqdomain code, there are a number of API functions that are merely empty wrappers around core code. Drop those wrappers out of the C file and replace them with static inlines in the header. Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-10irqdomain: Eliminate revmap typeGrant Likely
The NOMAP irq_domain type is only used by a handful of interrupt controllers and it unnecessarily complicates the code by adding special cases on how to look up mappings and different revmap functions are used for each type which need to validate the correct type is passed to it before performing the reverse map. Eliminating the revmap_type and making a single reverse mapping function simplifies the code. It also shouldn't be any slower than having separate revmap functions because the type of the revmap needed to be checked anyway. The linear and tree revmap types were already merged in a previous patch. This patch rolls the NOMAP or direct mapping behaviour into the same domain code making is possible for an irq domain to do any mapping type; linear, tree or direct; and that the mapping will be transparent to the interrupt controller driver. With this change, direct mappings will get stored in the linear or tree mapping for consistency. Reverse mapping from the hwirq to virq will go through the normal lookup process. However, any controller using a direct mapping can take advantage of knowing that hwirq==virq for any mapped interrupts skip doing a revmap lookup when handling IRQs. Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-10irqdomain: merge linear and tree reverse mappings.Grant Likely
Keeping them separate makes irq_domain more complex and adds a lot of code (as proven by the diffstat). Merging them simplifies the whole scheme. This change makes it so both the tree and linear methods can be used by the same irq_domain instance. If the hwirq is less than the ->linear_size, then the linear map is used to reverse map the hwirq. Otherwise the radix tree is used. The test for which map to use is no more expensive that the existing code, so the performance of fast path is preserved. It also means that complex interrupt controllers can use both the linear map and a tree in the same domain. This may be useful for an interrupt controller with a base set of core irqs and a large number of GPIOs which might be used as irqs. The linear map could cover the core irqs, and the tree used for thas irqs. The linear map could cover the core irqs, and the tree used for the gpios. v2: Drop reorganization of revmap data Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rob Herring <rob.herring@calxeda.com>
2013-06-10irqdomain: Add a name fieldGrant Likely
This patch adds a name field to the irq_domain structure to help mere mortals understand the mappings between irq domains and virqs. It also converts a number of places that have open-coded some kind of fudging an irqdomain name to use the new field. This means a more consistent display of names in irq domain log messages and debugfs output. Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-10irqdomain: Replace LEGACY mapping with LINEARGrant Likely
The LEGACY mapping unnecessarily complicates the irqdomain code and can easily be implemented with a linear mapping. By ripping it out and replacing it with the LINEAR mapping the object size of irqdomain.c shrinks by about 330 bytes (ARMv7) which offsets the additional allocation required by the linear map. It also makes it possible for current LEGACY map users to pre-allocate irq_descs for a subset of the hwirqs and dynamically allocate the rest as needed. Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rob Herring <rob.herring@calxeda.com>
2013-06-10irqdomain: Relax failure path on setting up mappingsGrant Likely
Commit 98aa468e, "irqdomain: Support for static IRQ mapping and association" introduced an API for directly associating blocks of hwirqs to linux irqs. However, if any irq in that block failed to map (say if the mapping functions returns an error because the irq is already mapped) then the whole thing will fail and roll back. This is probably too aggressive since there are valid reasons why a mapping may fail. ie. Firmware may have a particular IRQ marked as unusable. This patch drops the error path out of irq_domain_associate(). If a mapping fails, then it is simply skipped. There is no reason to fail the entire allocation. v2: Still output an information message on failed mappings and make sure attempted mapping gets cleared out of the irq_data structure. Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-06-10Merge remote-tracking branch 'tip/irq/for-arm' into irqdomain/nextGrant Likely
2013-06-09cpuset: remove async hotplug propagation workLi Zefan
As we can drop rcu read lock while iterating cgroup hierarchy, we don't have to do propagation asynchronously via workqueue. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-09cpuset: let hotplug propagation work wait for task attachingLi Zefan
Instead of triggering propagation work in cpuset_attach(), we make hotplug propagation work wait until there's no task attaching in progress. IMO this is more robust. We won't see empty masks in cpuset_attach(). Also it's a preparation for removing propagation work. Without asynchronous propagation we can't call move_tasks_in_empty_cpuset() in cpuset_attach(), because otherwise we'll deadlock on cgroup_mutex. tj: typo fixes. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-08Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Trivial: unused variable removal - Posix-timers: Add the clock ID to the new proc interface to make it useful. The interface is new and should be functional when we reach the final 3.10 release. - Cure a false positive warning in the tick code introduced by the overhaul in 3.10 - Fix for a persistent clock detection regression introduced in this cycle * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Correct run-time detection of persistent_clock. ntp: Remove unused variable flags in __hardpps posix-timers: Show clock ID in proc file tick: Cure broadcast false positive pending bit warning
2013-06-08Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linuxLinus Torvalds
Pull irqdomain bug fixes from Grant Likely: "This branch contains a set of straight forward bug fixes to the irqdomain code and to a couple of drivers that make use of it." * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux: irqchip: Return -EPERM for reserved IRQs irqdomain: document the simple domain first_irq kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid. irqdomain: export irq_domain_add_simple
2013-06-08irqdomain: document the simple domain first_irqLinus Walleij
The first_irq needs to be zero to get a linear domain and that comes with special semantics. We want to simplify this going forward but some documentation never hurts. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-08kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid.Chen Gang
Since irq_data may be NULL, if so, we WARN_ON(), and continue, 'hwirq' which related with 'irq_data' has to initialize later, or it will cause issue. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Grant Likely <grant.likely@linaro.org>