summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-01-23rcu: Check cond_resched_rcu_qs() state less often to reduce GP overheadPaul E. McKenney
Commit 4a81e8328d37 ("rcu: Reduce overhead of cond_resched() checks for RCU") moved quiescent-state generation out of cond_resched() and commit bde6c3aa9930 ("rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops") introduced cond_resched_rcu_qs(), and commit 5cd37193ce85 ("rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently polled by the RCU core state machine. This frequent polling can increase grace-period rate, which in turn increases grace-period overhead, which is visible in some benchmarks (for example, the "open1" benchmark in Anton Blanchard's "will it scale" suite). This commit therefore reduces the rate at which rcu_qs_ctr is polled by moving that polling into the force-quiescent-state (FQS) machinery, and by further polling it only after the grace period has been in effect for at least jiffies_till_sched_qs jiffies. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Abstract extended quiescent state determinationPaul E. McKenney
This commit is the fourth step towards full abstraction of all accesses to the ->dynticks counter, implementing previously open-coded checks and comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since() functions. This abstraction will ease changes to the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Abstract dynticks extended quiescent state enter/exit operationsPaul E. McKenney
This commit is the third step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of 1 and entry checks in a new rcu_dynticks_eqs_enter() function, and the same but with exit checks in a new rcu_dynticks_eqs_exit() function. This abstraction will ease changes to the ->dynticks counter operation. Note that this commit gets rid of the smp_mb__before_atomic() and the smp_mb__after_atomic() calls that were previously present. The reason that this is OK from a memory-ordering perspective is that the atomic operation is now atomic_add_return(), which, as a value-returning atomic, guarantees full ordering. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fixed RCU_TRACE() statements added by this commit. ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Add lockdep checks to synchronous expedited primitivesPaul E. McKenney
The non-expedited synchronize_*rcu() primitives have lockdep checks, but their expedited counterparts lack these checks. This commit therefore adds these checks to the expedited synchronize_*rcu() primitives. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Eliminate unused expedited_normal counterPaul E. McKenney
Expedited grace periods no longer fall back to normal grace periods in response to lock contention, given that expedited grace periods now use the rcu_node tree so as to avoid contention. This commit therfore removes the expedited_normal counter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23llist: Clarify comments about when locking is neededJoel Fernandes
llist.h comments are confusing about when locking is needed versus when it isn't. Clarify these comments by being more descriptive about why locking is needed for llist_del_first. Cc: Ingo Molnar <mingo@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Huang Ying <ying.huang@intel.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Fix comment in rcu_organize_nocb_kthreads()Paul E. McKenney
It used to be that the rcuo callback-offload kthreads were spawned in rcu_organize_nocb_kthreads(), and the comment before the "for" loop says as much. However, this spawning has long since moved to the CPU-hotplug code, so this commit fixes this comment. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Enable RCU tracepoints by default to aid in debuggingMatt Fleming
While debugging a performance issue I needed to understand why RCU sofitrqs were firing so frequently. Unfortunately, the RCU callback tracepoints are hidden behind CONFIG_RCU_TRACE which defaults to off in the upstream kernel and is likely to also be disabled in enterprise distribution configs. Enable it by default for CONFIG_TREE_RCU. However, we must keep it disabled for tiny RCU, because it would otherwise pull in a large amount of code that would make tiny RCU less than tiny. I ran some file system metadata intensive workloads (git checkout, FS-Mark) on a variety of machines with this patch and saw no detectable change in performance. Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Make rcu_cpu_starting() use its "cpu" argumentPaul E. McKenney
The rcu_cpu_starting() function uses this_cpu_ptr() to locate the incoming CPU's rcu_data structure. This works for the boot CPU and for all CPUs onlined after rcu_init() executes (during very early boot). Currently, this is the full set of CPUs, so all is well. But if anyone ever parallelizes boot before rcu_init() time, it will fail. This commit therefore substitutes the rcu_cpu_starting() function's this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and (arguably) improving readability. This commit inadvertently fixes a latent bug: If there ever had been more than just the boot CPU online at rcu_init() time, the old code would not initialize the non-boot CPUs, but rather would repeatedly initialize the boot CPU. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Add comment headers to expedited-grace-period counter functionsPaul E. McKenney
These functions (rcu_exp_gp_seq_start(), rcu_exp_gp_seq_end(), rcu_exp_gp_seq_snap(), and rcu_exp_gp_seq_done() seemed too obvious to comment when written, but not so much when being documented. This commit therefore adds header comments to each of them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Don't wake rcuc/X kthreads on NOCB CPUsPaul E. McKenney
Chris Friesen notice that rcuc/X kthreads were consuming CPU even on NOCB CPUs. This makes no sense because the only purpose or these kthreads is to invoke normal (non-offloaded) callbacks, of which there will never be any on NOCB CPUs. This problem was due to a bug in cpu_has_callbacks_ready_to_invoke(), which should have been checking ->nxttail[RCU_NEXT_TAIL] for NULL, but which was instead (incorrectly) checking ->nxttail[RCU_DONE_TAIL]. Because ->nxttail[RCU_DONE_TAIL] is never NULL, the only effect is to cause the rcuc/X kthread to execute when it should not do so. This commit therefore checks ->nxttail[RCU_NEXT_TAIL], which is NULL for NOCB CPUs. Reported-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Re-enable TASKS_RCU for User Mode LinuxPaul E. McKenney
Now that User Mode Linux supports arch_irqs_disabled_flags(), this commit re-enables TASKS_RCU for User Mode Linux. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Once again use NMI-based stack traces in stall warningsPaul E. McKenney
This commit is for all intents and purposes a revert of bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to suppose that this can now safely be reverted is the presence of 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"), which is said to have made NMI-based stack dumps safe. However, this reversion keeps one nice property of bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that only those CPUs blocking the grace period are dumped. The new trigger_single_cpu_backtrace() is used to make this happen, as suggested by Josh Poimboeuf. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Remove short-term CPU kickingPaul E. McKenney
Commit 4914950aaa12d ("rcu: Stop treating in-kernel CPU-bound workloads as errors") added a (relatively) short-timeout call to resched_cpu(). This was inspired by as issue that was fixed by b7e7ade34e61 ("sched/core: Fix remote wakeups"). But given that this issue was fixed, it is time for the current commit to remove this call to resched_cpu(). Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Add long-term CPU kickingPaul E. McKenney
This commit prepares for the removal of short-term CPU kicking (in a subsequent commit). It does so by starting to invoke resched_cpu() for each holdout at each force-quiescent-state interval that is more than halfway through the stall-warning interval. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Remove unused but set variableTobias Klauser
Since commit 7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU"), the variable mask in rcu_init_percpu_data is set but no longer used. Remove it to fix the following warning when building with 'W=1': kernel/rcu/tree.c: In function ‘rcu_init_percpu_data’: kernel/rcu/tree.c:3765:16: warning: variable ‘mask’ set but not used [-Wunused-but-set-variable] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Remove unneeded rcu_process_callbacks() declarationsPaul E. McKenney
The declarations of __rcu_process_callbacks() and rcu_process_callbacks() are not needed, as the definition of both of these functions appear before any uses. This commit therefore removes both declarations. Reported-by: "Ahmed, Iftekhar" <ahmedi@oregonstate.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Only dump stalled-tasks stacks if there was a real stallByungchul Park
The print_other_cpu_stall() function currently unconditionally invokes rcu_print_detail_task_stall(). This is OK because if there was a stall sufficient to cause print_other_cpu_stall() to be invoked, that stall is very likely to persist through the entire print_other_cpu_stall() execution. However, if the stall did not persist, the variable ndetected will be zero, and that variable is already tested in an "if" statement. Therefore, this commit moves the call to rcu_print_detail_task_stall() under that pre-existing "if" to improve readability, with a very rare reduction in overhead. Signed-off-by: Byungchul Park <byungchul.park@lge.com> [ paulmck: Reworked commit log. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23Fix: Disable sys_membarrier when nohz_full is enabledMathieu Desnoyers
Userspace applications should be allowed to expect the membarrier system call with MEMBARRIER_CMD_SHARED command to issue memory barriers on nohz_full CPUs, but synchronize_sched() does not take those into account. Given that we do not want unrelated processes to be able to affect real-time sensitive nohz_full CPUs, simply return ENOSYS when membarrier is invoked on a kernel with enabled nohz_full CPUs. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: <stable@vger.kernel.org> [3.10+] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23lockdep: Make RCU suspicious-access splats use pr_errPaul E. McKenney
This commit switches RCU suspicious-access splats use pr_err() instead of the current INFO printk()s. This change makes it easier to automatically classify splats. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-01-23xfs: fix COW writeback raceChristoph Hellwig
Due to the way how xfs_iomap_write_allocate tries to convert the whole found extents from delalloc to real space we can run into a race condition with multiple threads doing writes to this same extent. For the non-COW case that is harmless as the only thing that can happen is that we call xfs_bmapi_write on an extent that has already been converted to a real allocation. For COW writes where we move the extent from the COW to the data fork after I/O completion the race is, however, not quite as harmless. In the worst case we are now calling xfs_bmapi_write on a region that contains hole in the COW work, which will trip up an assert in debug builds or lead to file system corruption in non-debug builds. This seems to be reproducible with workloads of small O_DSYNC write, although so far I've not managed to come up with a with an isolated reproducer. The fix for the issue is relatively simple: tell xfs_bmapi_write that we are only asked to convert delayed allocations and skip holes in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-23xen-blkfront: correct maximum segment accountingJan Beulich
Making use of "max_indirect_segments=" has issues: - blkfront_setup_indirect() may end up with zero psegs when PAGE_SIZE is sufficiently much larger than XEN_PAGE_SIZE - the variable driven by the command line option (xen_blkif_max_segments) has a somewhat different purpose, and hence should namely never end up being zero - as long as the specified value is lower than the legacy default, we better don't use indirect segments at all (or we'd in fact lower throughput) Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-01-23xen-blkfront: feature flags handling adjustmentsJan Beulich
Don't truncate the "feature-persistent" value read from xenstore: Any non-zero value is supposed to enable the feature, just like is already being done for feature_secdiscard. Just like the other feature_* fields, feature_flush and feature_fua are boolean flags, and hence fit well into a single bit. Keep all bit fields together to limit gaps. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-01-23regulator: axp20x: AXP806: Fix dcdcb being set instead of dcdceRask Ingemann Lambertsen
A typo or copy-paste bug means that the register access intended for regulator dcdce goes to dcdcb instead. This patch corrects it. Fixes: 2ca342d391e3 (regulator: axp20x: Support AXP806 variant) Signed-off-by: Rask Ingemann Lambertsen <rask@formelder.dk> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2017-01-23net: mpls: Fix multipath selection for LSR use caseDavid Ahern
MPLS multipath for LSR is broken -- always selecting the first nexthop in the one label case. For example: $ ip -f mpls ro ls 100 nexthop as to 200 via inet 172.16.2.2 dev virt12 nexthop as to 300 via inet 172.16.3.2 dev virt13 101 nexthop as to 201 via inet6 2000:2::2 dev virt12 nexthop as to 301 via inet6 2000:3::2 dev virt13 In this example incoming packets have a single MPLS labels which means BOS bit is set. The BOS bit is passed from mpls_forward down to mpls_multipath_hash which never processes the hash loop because BOS is 1. Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len tracks the total mpls header length on each pass (on pass N mpls_hdr_len is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set it verifies the skb has sufficient header for ipv4 or ipv6, and find the IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to advance past it. With these changes I have verified the code correctly sees the label, BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp traffic for ipv4 and ipv6 are distributed across the nexthops. Fixes: 1c78efa8319ca ("mpls: flow-based multipath selection") Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24userns: Make ucounts lock irq-safeNikolay Borisov
The ucounts_lock is being used to protect various ucounts lifecycle management functionalities. However, those services can also be invoked when a pidns is being freed in an RCU callback (e.g. softirq context). This can lead to deadlocks. There were already efforts trying to prevent similar deadlocks in add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock"), however they just moved the context from hardirq to softrq. Fix this issue once and for all by explictly making the lock disable irqs altogether. Dmitry Vyukov <dvyukov@google.com> reported: > I've got the following deadlock report while running syzkaller fuzzer > on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid > device if it matters): > > ================================= > [ INFO: inconsistent lock state ] > 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted > --------------------------------- > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes: > (ucounts_lock){+.?...}, at: [< inline >] spin_lock > ./include/linux/spinlock.h:302 > (ucounts_lock){+.?...}, at: [<ffff2000081678c8>] > put_ucounts+0x60/0x138 kernel/ucount.c:162 > {SOFTIRQ-ON-W} state was registered at: > [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054 > [< inline >] mark_irqflags kernel/locking/lockdep.c:2941 > [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295 > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753 > [< inline >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144 > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151 > [< inline >] spin_lock ./include/linux/spinlock.h:302 > [< inline >] get_ucounts kernel/ucount.c:131 > [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189 > [< inline >] inc_mnt_namespaces fs/namespace.c:2818 > [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849 > [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959 > [< inline >] init_mount_tree fs/namespace.c:3199 > [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251 > [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626 > [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648 > [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456 > irq event stamp: 2316924 > hardirqs last enabled at (2316924): [< inline >] rcu_do_batch > kernel/rcu/tree.c:2911 > hardirqs last enabled at (2316924): [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > hardirqs last enabled at (2316924): [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > hardirqs last enabled at (2316924): [<ffff200008210414>] > rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166 > hardirqs last disabled at (2316923): [< inline >] rcu_do_batch > kernel/rcu/tree.c:2900 > hardirqs last disabled at (2316923): [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > hardirqs last disabled at (2316923): [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > hardirqs last disabled at (2316923): [<ffff20000820fe80>] > rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166 > softirqs last enabled at (2316912): [<ffff20000811b4c4>] > _local_bh_enable+0x4c/0x80 kernel/softirq.c:155 > softirqs last disabled at (2316913): [< inline >] > do_softirq_own_stack ./include/linux/interrupt.h:488 > softirqs last disabled at (2316913): [< inline >] > invoke_softirq kernel/softirq.c:371 > softirqs last disabled at (2316913): [<ffff20000811c994>] > irq_exit+0x264/0x308 kernel/softirq.c:405 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(ucounts_lock); > <Interrupt> > lock(ucounts_lock); > > *** DEADLOCK *** > > 1 lock held by swapper/2/0: > #0: (rcu_callback){......}, at: [< inline >] __rcu_reclaim > kernel/rcu/rcu.h:108 > #0: (rcu_callback){......}, at: [< inline >] rcu_do_batch > kernel/rcu/tree.c:2919 > #0: (rcu_callback){......}, at: [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > #0: (rcu_callback){......}, at: [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > #0: (rcu_callback){......}, at: [<ffff200008210390>] > rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166 > > stack backtrace: > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6 > Hardware name: Hardkernel ODROID-C2 (DT) > Call trace: > [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500 > [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225 > [<ffff2000088a99e0>] dump_stack+0x110/0x168 > [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc > kernel/locking/lockdep.c:2387 > [< inline >] print_usage_bug kernel/locking/lockdep.c:2357 > [< inline >] valid_state kernel/locking/lockdep.c:2400 > [< inline >] mark_lock_irq kernel/locking/lockdep.c:2617 > [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065 > [< inline >] mark_irqflags kernel/locking/lockdep.c:2923 > [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295 > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753 > [< inline >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144 > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151 > [< inline >] spin_lock ./include/linux/spinlock.h:302 > [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162 > [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214 > [< inline >] dec_pid_namespaces kernel/pid_namespace.c:89 > [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156 > [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 > [< inline >] rcu_do_batch kernel/rcu/tree.c:2919 > [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3182 > [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3149 > [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166 > [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284 > [< inline >] do_softirq_own_stack ./include/linux/interrupt.h:488 > [< inline >] invoke_softirq kernel/softirq.c:371 > [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405 > [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636 > [<ffff200008081c80>] gic_handle_irq+0x68/0xd8 > Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00) > 7dc0: ffff8000648d4b3c 0000000000000007 > 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967 > 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001 > 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002 > 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000 > 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140 > 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d > 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00 > 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145 > 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478 > [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486 > [< inline >] arch_local_irq_restore > ./arch/arm64/include/asm/irqflags.h:81 > [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030 > [< inline >] cpuidle_idle_call kernel/sched/idle.c:200 > [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243 > [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345 > [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358 > arch/arm64/kernel/smp.c:276 > [<000000000279f1a4>] 0x279f1a4 Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Fixes: add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock") Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces") Cc: stable@vger.kernel.org Link: https://www.spinics.net/lists/kernel/msg2426637.html Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-01-23iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP anymoreEric Auger
IOMMU_CAP_INTR_REMAP has been advertised in arm-smmu(-v3) although on ARM this property is not attached to the IOMMU but rather is implemented in the MSI controller (GICv3 ITS). Now vfio_iommu_type1 checks MSI remapping capability at MSI controller level, let's correct this. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23vfio/type1: Check MSI remapping at irq domain levelEric Auger
In case the IOMMU translates MSI transactions (typical case on ARM), we check MSI remapping capability at IRQ domain level. Otherwise it is checked at IOMMU level. At this stage the arm-smmu-(v3) still advertise the IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be removed in subsequent patches. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23vfio/type1: Allow transparent MSI IOVA allocationEric Auger
When attaching a group to the container, check the group's reserved regions and test whether the IOMMU translates MSI transactions. If yes, we initialize an IOVA allocator through the iommu_get_msi_cookie API. This will allow the MSI IOVAs to be transparently allocated on MSI controller's compose(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23irqchip/gicv3-its: Sets IRQ_DOMAIN_FLAG_MSI_REMAPEric Auger
The GICv3 ITS is MSI remapping capable. Let's advertise this property so that VFIO passthrough can assess IRQ safety. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23irqdomain: irq_domain_check_msi_remapEric Auger
This new function checks whether all MSI irq domains implement IRQ remapping. This is useful to understand whether VFIO passthrough is safe with respect to interrupts. On ARM typically an MSI controller can sit downstream to the IOMMU without preventing VFIO passthrough. As such any assigned device can write into the MSI doorbell. In case the MSI controller implements IRQ remapping, assigned devices will not be able to trigger interrupts towards the host. On the contrary, the assignment must be emphasized as unsafe with respect to interrupts. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23genirq/msi: Set IRQ_DOMAIN_FLAG_MSI on MSI domain creationEric Auger
Now we have a flag value indicating an IRQ domain implements MSI, let's set it on msi_create_irq_domain(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23irqdomain: Add irq domain MSI and MSI_REMAP flagsEric Auger
We introduce two new enum values for the irq domain flag: - IRQ_DOMAIN_FLAG_MSI indicates the irq domain corresponds to an MSI domain - IRQ_DOMAIN_FLAG_MSI_REMAP indicates the irq domain has MSI remapping capabilities. Those values will be useful to check all MSI irq domains have MSI remapping support when assessing the safety of IRQ assignment to a guest. irq_domain_hierarchical_is_msi_remap() allows to check if an irq domain or any parent implements MSI remapping. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/arm-smmu-v3: Implement reserved region get/put callbacksEric Auger
The get() populates the list with the MSI IOVA reserved window. At the moment an arbitray MSI IOVA window is set at 0x8000000 of size 1MB. This will allow to report those info in iommu-group sysfs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/arm-smmu: Implement reserved region get/put callbacksEric Auger
The get() populates the list with the MSI IOVA reserved window. At the moment an arbitray MSI IOVA window is set at 0x8000000 of size 1MB. This will allow to report those info in iommu-group sysfs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modesArd Biesheuvel
Update the ARMv8 Crypto Extensions and the plain NEON AES implementations in CBC and CTR modes to return the next IV back to the skcipher API client. This is necessary for chaining to work correctly. Note that for CTR, this is only done if the request is a round multiple of the block size, since otherwise, chaining is impossible anyway. Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an algSalvatore Benedetto
Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with the algorithm registration. This fixes qat-dh registration when driver is restarted Cc: <stable@vger.kernel.org> Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23iommu/amd: Declare MSI and HT regions as reserved IOVA regionsEric Auger
This patch registers the MSI and HT regions as non mappable reserved regions. They will be exposed in the iommu-group sysfs. For direct-mapped regions let's also use iommu_alloc_resv_region(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/vt-d: Implement reserved region get/put callbacksEric Auger
This patch registers the [FEE0_0000h - FEF0_000h] 1MB MSI range as a reserved region and RMRR regions as direct regions. This will allow to report those reserved regions in the iommu-group sysfs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Implement reserved_regions iommu-group sysfs fileEric Auger
A new iommu-group sysfs attribute file is introduced. It contains the list of reserved regions for the iommu-group. Each reserved region is described on a separate line: - first field is the start IOVA address, - second is the end IOVA address, - third is the type. Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: iommu_get_group_resv_regionsEric Auger
Introduce iommu_get_group_resv_regions whose role consists in enumerating all devices from the group and collecting their reserved regions. The list is sorted and overlaps between regions of the same type are handled by merging the regions. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Only map direct mapped regionsEric Auger
As we introduced new reserved region types which do not require mapping, let's make sure we only map direct mapped regions. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: iommu_alloc_resv_regionEric Auger
Introduce a new helper serving the purpose to allocate a reserved region. This will be used in iommu driver implementing reserved region callbacks. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Add a new type field in iommu_resv_regionEric Auger
We introduce a new field to differentiate the reserved region types and specialize the apply_resv_region implementation. Legacy direct mapped regions have IOMMU_RESV_DIRECT type. We introduce 2 new reserved memory types: - IOMMU_RESV_MSI will characterize MSI regions that are mapped - IOMMU_RESV_RESERVED characterize regions that cannot by mapped. Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Rename iommu_dm_regions into iommu_resv_regionsEric Auger
We want to extend the callbacks used for dm regions and use them for reserved regions. Reserved regions can be - directly mapped regions - regions that cannot be iommu mapped (PCI host bridge windows, ...) - MSI regions (because they belong to another address space or because they are not translated by the IOMMU and need special handling) So let's rename the struct and also the callbacks. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/dma: Allow MSI-only cookiesRobin Murphy
IOMMU domain users such as VFIO face a similar problem to DMA API ops with regard to mapping MSI messages in systems where the MSI write is subject to IOMMU translation. With the relevant infrastructure now in place for managed DMA domains, it's actually really simple for other users to piggyback off that and reap the benefits without giving up their own IOVA management, and without having to reinvent their own wheel in the MSI layer. Allow such users to opt into automatic MSI remapping by dedicating a region of their IOVA space to a managed cookie, and extend the mapping routine to implement a trivial linear allocator in such cases, to avoid the needless overhead of a full-blown IOVA domain. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iwlwifi: mvm: avoid crash on restart w/o reserved queuesJohannes Berg
When the firmware restarts in a situation in which any station has no queue reserved anymore because that queue was used, the code will crash trying to access the queue_info array at the offset 255, which is far too big. Fix this by checking that a queue is actually reserved before writing its status. Fixes: 8d98ae6eb0d5 ("iwlwifi: mvm: re-assign old queues after hw restart in dqa mode") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-01-23iwlwifi: fix double hyphen in MODULE_FIRMWARE for 8000Jürg Billeter
Mistakenly, the driver is trying to load the 8000C firmware with an incorrect name (i.e. with two hyphens where there should be only one) and that fails. Fix that by removing the hyphen from the format macro. Fixes: e1ba684f762b ("iwlwifi: 8000: fix MODULE_FIRMWARE input") Signed-off-by: Jürg Billeter <j@bitron.ch> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-01-23EDAC, sb_edac: Get rid of ->show_interleave_mode()Nicolas Iooss
Function sbridge_register_mci() sets pvt->info.show_interleave_mode to knl_show_interleave_mode() on Knight's Landing and show_interleave_mode() anywhere else. Merge show_interleave_mode() and knl_show_interleave_mode() in a single implementation and use it without an indirect function pointer. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org [ Call it get_intlv_mode_str(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-23ARM: dts: imx6dl: fix GPIO4 rangeSébastien Szymanski
GPIO4_11 is on pin 152(MX6DL_PAD_KEY_ROW2) and not on pin 151(MX6DL_PAD_KEY_ROW1). I found the error while booting a mainline kernel on APF6S SoM and noticed the following message: [ 2.609337] imx6dl-pinctrl 20e0000.iomuxc: pin MX6DL_PAD_KEY_ROW1 already requested by 20a8000.gpio:105; cannot claim for 20a8000.gpio:107 [ 2.621884] imx6dl-pinctrl 20e0000.iomuxc: pin-151 (20a8000.gpio:107) status -22 [ 2.629303] spi_imx 2008000.ecspi: Can't get CS GPIO 107 With this patch, the message is gone and spi_imx driver probes correctly. Fixes: bb728d662bed ("ARM: dts: add gpio-ranges property to iMX GPIO controllers") Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>