summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-01-23rcu: Re-enable TASKS_RCU for User Mode LinuxPaul E. McKenney
Now that User Mode Linux supports arch_irqs_disabled_flags(), this commit re-enables TASKS_RCU for User Mode Linux. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Once again use NMI-based stack traces in stall warningsPaul E. McKenney
This commit is for all intents and purposes a revert of bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to suppose that this can now safely be reverted is the presence of 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"), which is said to have made NMI-based stack dumps safe. However, this reversion keeps one nice property of bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that only those CPUs blocking the grace period are dumped. The new trigger_single_cpu_backtrace() is used to make this happen, as suggested by Josh Poimboeuf. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Remove short-term CPU kickingPaul E. McKenney
Commit 4914950aaa12d ("rcu: Stop treating in-kernel CPU-bound workloads as errors") added a (relatively) short-timeout call to resched_cpu(). This was inspired by as issue that was fixed by b7e7ade34e61 ("sched/core: Fix remote wakeups"). But given that this issue was fixed, it is time for the current commit to remove this call to resched_cpu(). Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Add long-term CPU kickingPaul E. McKenney
This commit prepares for the removal of short-term CPU kicking (in a subsequent commit). It does so by starting to invoke resched_cpu() for each holdout at each force-quiescent-state interval that is more than halfway through the stall-warning interval. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Remove unused but set variableTobias Klauser
Since commit 7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU"), the variable mask in rcu_init_percpu_data is set but no longer used. Remove it to fix the following warning when building with 'W=1': kernel/rcu/tree.c: In function ‘rcu_init_percpu_data’: kernel/rcu/tree.c:3765:16: warning: variable ‘mask’ set but not used [-Wunused-but-set-variable] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Remove unneeded rcu_process_callbacks() declarationsPaul E. McKenney
The declarations of __rcu_process_callbacks() and rcu_process_callbacks() are not needed, as the definition of both of these functions appear before any uses. This commit therefore removes both declarations. Reported-by: "Ahmed, Iftekhar" <ahmedi@oregonstate.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23rcu: Only dump stalled-tasks stacks if there was a real stallByungchul Park
The print_other_cpu_stall() function currently unconditionally invokes rcu_print_detail_task_stall(). This is OK because if there was a stall sufficient to cause print_other_cpu_stall() to be invoked, that stall is very likely to persist through the entire print_other_cpu_stall() execution. However, if the stall did not persist, the variable ndetected will be zero, and that variable is already tested in an "if" statement. Therefore, this commit moves the call to rcu_print_detail_task_stall() under that pre-existing "if" to improve readability, with a very rare reduction in overhead. Signed-off-by: Byungchul Park <byungchul.park@lge.com> [ paulmck: Reworked commit log. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23Fix: Disable sys_membarrier when nohz_full is enabledMathieu Desnoyers
Userspace applications should be allowed to expect the membarrier system call with MEMBARRIER_CMD_SHARED command to issue memory barriers on nohz_full CPUs, but synchronize_sched() does not take those into account. Given that we do not want unrelated processes to be able to affect real-time sensitive nohz_full CPUs, simply return ENOSYS when membarrier is invoked on a kernel with enabled nohz_full CPUs. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: <stable@vger.kernel.org> [3.10+] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23lockdep: Make RCU suspicious-access splats use pr_errPaul E. McKenney
This commit switches RCU suspicious-access splats use pr_err() instead of the current INFO printk()s. This change makes it easier to automatically classify splats. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-01-23xfs: fix COW writeback raceChristoph Hellwig
Due to the way how xfs_iomap_write_allocate tries to convert the whole found extents from delalloc to real space we can run into a race condition with multiple threads doing writes to this same extent. For the non-COW case that is harmless as the only thing that can happen is that we call xfs_bmapi_write on an extent that has already been converted to a real allocation. For COW writes where we move the extent from the COW to the data fork after I/O completion the race is, however, not quite as harmless. In the worst case we are now calling xfs_bmapi_write on a region that contains hole in the COW work, which will trip up an assert in debug builds or lead to file system corruption in non-debug builds. This seems to be reproducible with workloads of small O_DSYNC write, although so far I've not managed to come up with a with an isolated reproducer. The fix for the issue is relatively simple: tell xfs_bmapi_write that we are only asked to convert delayed allocations and skip holes in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-23xen-blkfront: correct maximum segment accountingJan Beulich
Making use of "max_indirect_segments=" has issues: - blkfront_setup_indirect() may end up with zero psegs when PAGE_SIZE is sufficiently much larger than XEN_PAGE_SIZE - the variable driven by the command line option (xen_blkif_max_segments) has a somewhat different purpose, and hence should namely never end up being zero - as long as the specified value is lower than the legacy default, we better don't use indirect segments at all (or we'd in fact lower throughput) Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-01-23xen-blkfront: feature flags handling adjustmentsJan Beulich
Don't truncate the "feature-persistent" value read from xenstore: Any non-zero value is supposed to enable the feature, just like is already being done for feature_secdiscard. Just like the other feature_* fields, feature_flush and feature_fua are boolean flags, and hence fit well into a single bit. Keep all bit fields together to limit gaps. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-01-23regulator: anatop: Add support for "anatop-enable-bit"Andrey Smirnov
Add code to support support for "anatop-enable-bit" device-tree property. This property translates to LINREG_ENABLE bit in real hardware and is present on 1p1, 2p5 and 3p0 regulators on i.MX6 and 1p0d regulator on i.MX7. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-01-23regulator: axp20x: AXP806: Fix dcdcb being set instead of dcdceRask Ingemann Lambertsen
A typo or copy-paste bug means that the register access intended for regulator dcdce goes to dcdcb instead. This patch corrects it. Fixes: 2ca342d391e3 (regulator: axp20x: Support AXP806 variant) Signed-off-by: Rask Ingemann Lambertsen <rask@formelder.dk> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2017-01-23spi: pca2xx-pci: Allow MSIJan Kiszka
Now that the core is ready for edge-triggered interrupts, we can safely allow the PCI versions that provide this to enable the feature and, thus, have less shared interrupts. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-01-23spi: pxa2xx: Prepare for edge-triggered interruptsJan Kiszka
When using the a device with edge-triggered interrupts, such as MSIs, the interrupt handler has to ensure that there is a point in time during its execution where all interrupts sources are silent so that a new event can trigger a new interrupt again. This is achieved here by disabling all interrupt sources for a moment before processing them according to the status register. If a new interrupt should have arrived after we read the status, it will now re-trigger the interrupt, even in edge mode. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-01-23regulator: tps65217: Allow DCDC1 and DCDC3 up to 3.3VMåns Andersson
The data sheet statement that DCDC1 and DCDC3 only can be set in the range 0.9V - 1.5V refers to storage on its internal EEPROM and therefore cold boot configuration. After power-on the device can be reconfigured over i2c and DCDC1/3 set up to 3.3V. Signed-off-by: Måns Andersson <mans.andersson@nibe.se> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-01-23net: mpls: Fix multipath selection for LSR use caseDavid Ahern
MPLS multipath for LSR is broken -- always selecting the first nexthop in the one label case. For example: $ ip -f mpls ro ls 100 nexthop as to 200 via inet 172.16.2.2 dev virt12 nexthop as to 300 via inet 172.16.3.2 dev virt13 101 nexthop as to 201 via inet6 2000:2::2 dev virt12 nexthop as to 301 via inet6 2000:3::2 dev virt13 In this example incoming packets have a single MPLS labels which means BOS bit is set. The BOS bit is passed from mpls_forward down to mpls_multipath_hash which never processes the hash loop because BOS is 1. Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len tracks the total mpls header length on each pass (on pass N mpls_hdr_len is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set it verifies the skb has sufficient header for ipv4 or ipv6, and find the IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to advance past it. With these changes I have verified the code correctly sees the label, BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp traffic for ipv4 and ipv6 are distributed across the nexthops. Fixes: 1c78efa8319ca ("mpls: flow-based multipath selection") Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24userns: Make ucounts lock irq-safeNikolay Borisov
The ucounts_lock is being used to protect various ucounts lifecycle management functionalities. However, those services can also be invoked when a pidns is being freed in an RCU callback (e.g. softirq context). This can lead to deadlocks. There were already efforts trying to prevent similar deadlocks in add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock"), however they just moved the context from hardirq to softrq. Fix this issue once and for all by explictly making the lock disable irqs altogether. Dmitry Vyukov <dvyukov@google.com> reported: > I've got the following deadlock report while running syzkaller fuzzer > on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid > device if it matters): > > ================================= > [ INFO: inconsistent lock state ] > 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted > --------------------------------- > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes: > (ucounts_lock){+.?...}, at: [< inline >] spin_lock > ./include/linux/spinlock.h:302 > (ucounts_lock){+.?...}, at: [<ffff2000081678c8>] > put_ucounts+0x60/0x138 kernel/ucount.c:162 > {SOFTIRQ-ON-W} state was registered at: > [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054 > [< inline >] mark_irqflags kernel/locking/lockdep.c:2941 > [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295 > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753 > [< inline >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144 > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151 > [< inline >] spin_lock ./include/linux/spinlock.h:302 > [< inline >] get_ucounts kernel/ucount.c:131 > [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189 > [< inline >] inc_mnt_namespaces fs/namespace.c:2818 > [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849 > [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959 > [< inline >] init_mount_tree fs/namespace.c:3199 > [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251 > [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626 > [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648 > [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456 > irq event stamp: 2316924 > hardirqs last enabled at (2316924): [< inline >] rcu_do_batch > kernel/rcu/tree.c:2911 > hardirqs last enabled at (2316924): [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > hardirqs last enabled at (2316924): [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > hardirqs last enabled at (2316924): [<ffff200008210414>] > rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166 > hardirqs last disabled at (2316923): [< inline >] rcu_do_batch > kernel/rcu/tree.c:2900 > hardirqs last disabled at (2316923): [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > hardirqs last disabled at (2316923): [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > hardirqs last disabled at (2316923): [<ffff20000820fe80>] > rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166 > softirqs last enabled at (2316912): [<ffff20000811b4c4>] > _local_bh_enable+0x4c/0x80 kernel/softirq.c:155 > softirqs last disabled at (2316913): [< inline >] > do_softirq_own_stack ./include/linux/interrupt.h:488 > softirqs last disabled at (2316913): [< inline >] > invoke_softirq kernel/softirq.c:371 > softirqs last disabled at (2316913): [<ffff20000811c994>] > irq_exit+0x264/0x308 kernel/softirq.c:405 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(ucounts_lock); > <Interrupt> > lock(ucounts_lock); > > *** DEADLOCK *** > > 1 lock held by swapper/2/0: > #0: (rcu_callback){......}, at: [< inline >] __rcu_reclaim > kernel/rcu/rcu.h:108 > #0: (rcu_callback){......}, at: [< inline >] rcu_do_batch > kernel/rcu/tree.c:2919 > #0: (rcu_callback){......}, at: [< inline >] > invoke_rcu_callbacks kernel/rcu/tree.c:3182 > #0: (rcu_callback){......}, at: [< inline >] > __rcu_process_callbacks kernel/rcu/tree.c:3149 > #0: (rcu_callback){......}, at: [<ffff200008210390>] > rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166 > > stack backtrace: > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6 > Hardware name: Hardkernel ODROID-C2 (DT) > Call trace: > [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500 > [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225 > [<ffff2000088a99e0>] dump_stack+0x110/0x168 > [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc > kernel/locking/lockdep.c:2387 > [< inline >] print_usage_bug kernel/locking/lockdep.c:2357 > [< inline >] valid_state kernel/locking/lockdep.c:2400 > [< inline >] mark_lock_irq kernel/locking/lockdep.c:2617 > [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065 > [< inline >] mark_irqflags kernel/locking/lockdep.c:2923 > [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295 > [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753 > [< inline >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144 > [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151 > [< inline >] spin_lock ./include/linux/spinlock.h:302 > [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162 > [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214 > [< inline >] dec_pid_namespaces kernel/pid_namespace.c:89 > [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156 > [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118 > [< inline >] rcu_do_batch kernel/rcu/tree.c:2919 > [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3182 > [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3149 > [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166 > [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284 > [< inline >] do_softirq_own_stack ./include/linux/interrupt.h:488 > [< inline >] invoke_softirq kernel/softirq.c:371 > [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405 > [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636 > [<ffff200008081c80>] gic_handle_irq+0x68/0xd8 > Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00) > 7dc0: ffff8000648d4b3c 0000000000000007 > 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967 > 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001 > 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002 > 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000 > 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140 > 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d > 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00 > 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145 > 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478 > [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486 > [< inline >] arch_local_irq_restore > ./arch/arm64/include/asm/irqflags.h:81 > [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030 > [< inline >] cpuidle_idle_call kernel/sched/idle.c:200 > [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243 > [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345 > [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358 > arch/arm64/kernel/smp.c:276 > [<000000000279f1a4>] 0x279f1a4 Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Fixes: add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock") Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces") Cc: stable@vger.kernel.org Link: https://www.spinics.net/lists/kernel/msg2426637.html Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-01-23cfq-iosched: Adjust one function call together with a variable assignmentMarkus Elfring
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code place. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-23blk-throttle: Adjust two function calls together with a variable assignmentMarkus Elfring
The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code places. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-23block: Initialize cfqq->ioprio_class in cfq_get_queue()Alexander Potapenko
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in cfq_init_cfqq(): ================================================================== BUG: KMSAN: use of unitialized memory ... Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:? [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:? [< inline >] cfq_init_cfqq block/cfq-iosched.c:3754 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857 ... origin: [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:? [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:? [< inline >] allocate_slab mm/slub.c:1627 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641 [< inline >] new_slab_objects mm/slub.c:2407 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564 [< inline >] __slab_alloc mm/slub.c:2606 [< inline >] slab_alloc_node mm/slub.c:2669 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850 ... ================================================================== (the line numbers are relative to 4.8-rc6, but the bug persists upstream) The uninitialized struct cfq_queue is created by kmem_cache_alloc_node() and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class before it's initialized. Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-23iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP anymoreEric Auger
IOMMU_CAP_INTR_REMAP has been advertised in arm-smmu(-v3) although on ARM this property is not attached to the IOMMU but rather is implemented in the MSI controller (GICv3 ITS). Now vfio_iommu_type1 checks MSI remapping capability at MSI controller level, let's correct this. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23vfio/type1: Check MSI remapping at irq domain levelEric Auger
In case the IOMMU translates MSI transactions (typical case on ARM), we check MSI remapping capability at IRQ domain level. Otherwise it is checked at IOMMU level. At this stage the arm-smmu-(v3) still advertise the IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be removed in subsequent patches. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23vfio/type1: Allow transparent MSI IOVA allocationEric Auger
When attaching a group to the container, check the group's reserved regions and test whether the IOMMU translates MSI transactions. If yes, we initialize an IOVA allocator through the iommu_get_msi_cookie API. This will allow the MSI IOVAs to be transparently allocated on MSI controller's compose(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23irqchip/gicv3-its: Sets IRQ_DOMAIN_FLAG_MSI_REMAPEric Auger
The GICv3 ITS is MSI remapping capable. Let's advertise this property so that VFIO passthrough can assess IRQ safety. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23irqdomain: irq_domain_check_msi_remapEric Auger
This new function checks whether all MSI irq domains implement IRQ remapping. This is useful to understand whether VFIO passthrough is safe with respect to interrupts. On ARM typically an MSI controller can sit downstream to the IOMMU without preventing VFIO passthrough. As such any assigned device can write into the MSI doorbell. In case the MSI controller implements IRQ remapping, assigned devices will not be able to trigger interrupts towards the host. On the contrary, the assignment must be emphasized as unsafe with respect to interrupts. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23genirq/msi: Set IRQ_DOMAIN_FLAG_MSI on MSI domain creationEric Auger
Now we have a flag value indicating an IRQ domain implements MSI, let's set it on msi_create_irq_domain(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23irqdomain: Add irq domain MSI and MSI_REMAP flagsEric Auger
We introduce two new enum values for the irq domain flag: - IRQ_DOMAIN_FLAG_MSI indicates the irq domain corresponds to an MSI domain - IRQ_DOMAIN_FLAG_MSI_REMAP indicates the irq domain has MSI remapping capabilities. Those values will be useful to check all MSI irq domains have MSI remapping support when assessing the safety of IRQ assignment to a guest. irq_domain_hierarchical_is_msi_remap() allows to check if an irq domain or any parent implements MSI remapping. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/arm-smmu-v3: Implement reserved region get/put callbacksEric Auger
The get() populates the list with the MSI IOVA reserved window. At the moment an arbitray MSI IOVA window is set at 0x8000000 of size 1MB. This will allow to report those info in iommu-group sysfs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/arm-smmu: Implement reserved region get/put callbacksEric Auger
The get() populates the list with the MSI IOVA reserved window. At the moment an arbitray MSI IOVA window is set at 0x8000000 of size 1MB. This will allow to report those info in iommu-group sysfs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modesArd Biesheuvel
Update the ARMv8 Crypto Extensions and the plain NEON AES implementations in CBC and CTR modes to return the next IV back to the skcipher API client. This is necessary for chaining to work correctly. Note that for CTR, this is only done if the request is a round multiple of the block size, since otherwise, chaining is impossible anyway. Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an algSalvatore Benedetto
Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with the algorithm registration. This fixes qat-dh registration when driver is restarted Cc: <stable@vger.kernel.org> Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23iommu/amd: Declare MSI and HT regions as reserved IOVA regionsEric Auger
This patch registers the MSI and HT regions as non mappable reserved regions. They will be exposed in the iommu-group sysfs. For direct-mapped regions let's also use iommu_alloc_resv_region(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/vt-d: Implement reserved region get/put callbacksEric Auger
This patch registers the [FEE0_0000h - FEF0_000h] 1MB MSI range as a reserved region and RMRR regions as direct regions. This will allow to report those reserved regions in the iommu-group sysfs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Implement reserved_regions iommu-group sysfs fileEric Auger
A new iommu-group sysfs attribute file is introduced. It contains the list of reserved regions for the iommu-group. Each reserved region is described on a separate line: - first field is the start IOVA address, - second is the end IOVA address, - third is the type. Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: iommu_get_group_resv_regionsEric Auger
Introduce iommu_get_group_resv_regions whose role consists in enumerating all devices from the group and collecting their reserved regions. The list is sorted and overlaps between regions of the same type are handled by merging the regions. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Only map direct mapped regionsEric Auger
As we introduced new reserved region types which do not require mapping, let's make sure we only map direct mapped regions. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: iommu_alloc_resv_regionEric Auger
Introduce a new helper serving the purpose to allocate a reserved region. This will be used in iommu driver implementing reserved region callbacks. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Add a new type field in iommu_resv_regionEric Auger
We introduce a new field to differentiate the reserved region types and specialize the apply_resv_region implementation. Legacy direct mapped regions have IOMMU_RESV_DIRECT type. We introduce 2 new reserved memory types: - IOMMU_RESV_MSI will characterize MSI regions that are mapped - IOMMU_RESV_RESERVED characterize regions that cannot by mapped. Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu: Rename iommu_dm_regions into iommu_resv_regionsEric Auger
We want to extend the callbacks used for dm regions and use them for reserved regions. Reserved regions can be - directly mapped regions - regions that cannot be iommu mapped (PCI host bridge windows, ...) - MSI regions (because they belong to another address space or because they are not translated by the IOMMU and need special handling) So let's rename the struct and also the callbacks. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iommu/dma: Allow MSI-only cookiesRobin Murphy
IOMMU domain users such as VFIO face a similar problem to DMA API ops with regard to mapping MSI messages in systems where the MSI write is subject to IOMMU translation. With the relevant infrastructure now in place for managed DMA domains, it's actually really simple for other users to piggyback off that and reap the benefits without giving up their own IOVA management, and without having to reinvent their own wheel in the MSI layer. Allow such users to opt into automatic MSI remapping by dedicating a region of their IOVA space to a managed cookie, and extend the mapping routine to implement a trivial linear allocator in such cases, to avoid the needless overhead of a full-blown IOVA domain. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com> Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-01-23iwlwifi: mvm: avoid crash on restart w/o reserved queuesJohannes Berg
When the firmware restarts in a situation in which any station has no queue reserved anymore because that queue was used, the code will crash trying to access the queue_info array at the offset 255, which is far too big. Fix this by checking that a queue is actually reserved before writing its status. Fixes: 8d98ae6eb0d5 ("iwlwifi: mvm: re-assign old queues after hw restart in dqa mode") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-01-23iwlwifi: fix double hyphen in MODULE_FIRMWARE for 8000Jürg Billeter
Mistakenly, the driver is trying to load the 8000C firmware with an incorrect name (i.e. with two hyphens where there should be only one) and that fails. Fix that by removing the hyphen from the format macro. Fixes: e1ba684f762b ("iwlwifi: 8000: fix MODULE_FIRMWARE input") Signed-off-by: Jürg Billeter <j@bitron.ch> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2017-01-23EDAC, sb_edac: Get rid of ->show_interleave_mode()Nicolas Iooss
Function sbridge_register_mci() sets pvt->info.show_interleave_mode to knl_show_interleave_mode() on Knight's Landing and show_interleave_mode() anywhere else. Merge show_interleave_mode() and knl_show_interleave_mode() in a single implementation and use it without an indirect function pointer. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org [ Call it get_intlv_mode_str(). ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-23ARM: dts: imx6dl: fix GPIO4 rangeSébastien Szymanski
GPIO4_11 is on pin 152(MX6DL_PAD_KEY_ROW2) and not on pin 151(MX6DL_PAD_KEY_ROW1). I found the error while booting a mainline kernel on APF6S SoM and noticed the following message: [ 2.609337] imx6dl-pinctrl 20e0000.iomuxc: pin MX6DL_PAD_KEY_ROW1 already requested by 20a8000.gpio:105; cannot claim for 20a8000.gpio:107 [ 2.621884] imx6dl-pinctrl 20e0000.iomuxc: pin-151 (20a8000.gpio:107) status -22 [ 2.629303] spi_imx 2008000.ecspi: Can't get CS GPIO 107 With this patch, the message is gone and spi_imx driver probes correctly. Fixes: bb728d662bed ("ARM: dts: add gpio-ranges property to iMX GPIO controllers") Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2017-01-23HID: wacom: do not attempt to switch mode while in probeBenjamin Tissoires
The Intuos Pro seems to not like when we set the features right after being powered up. Instead of waiting during probe, we can schedule the switch mode and LED control in a deferred worker so that we don't have the 5 secs of delay from USB when the device is not accessible. The USB timeout delays were really a pain because if you happen to unplug the tablet while it is still waiting, you are just adding 5 second timeouts to the USB stack. Which means that a new plug of the same tablet will also gets delayed, and will also attempt to access the hardware while in .probe(). So the tablet doesn't appear in the dmesg, the user unplug/replug it to make it appearing... and so on so forth. Really, this is for the best :) Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-01-23HID: wacom: remove warning while disconnecting devicesBenjamin Tissoires
When the LED class gets removed, it actually tries to reset the LED. However, the device being disconnected, the set_report fails. Previously, the attempt to cut lose this last event was through unsetting the HID drvdata, but it was not working properly. Simply reset the LED groups to NULL makes a more efficient solution. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-01-23HID: wacom: release the resources before leaving despite devmBenjamin Tissoires
In the general case, the resources are properly released by devm without needing to do anything. However, when unplugging the wireless receiver, the kernel segfaults from time to time while calling devres_release_all(). I think in that case the resources attempt to access hid_get_drvdata(hdev) which has been set to null while leaving wacom_remove(). Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-01-23x86/fpu: Set the xcomp_bv when we fake up a XSAVES areaKevin Hao
I got the following calltrace on a Apollo Lake SoC with 32-bit kernel: WARNING: CPU: 2 PID: 261 at arch/x86/include/asm/fpu/internal.h:363 fpu__restore+0x1f5/0x260 [...] Hardware name: Intel Corp. Broxton P/NOTEBOOK, BIOS APLIRVPA.X64.0138.B35.1608091058 08/09/2016 Call Trace: dump_stack() __warn() ? fpu__restore() warn_slowpath_null() fpu__restore() __fpu__restore_sig() fpu__restore_sig() restore_sigcontext.isra.9() sys_sigreturn() do_int80_syscall_32() entry_INT80_32() The reason is that a #GP occurs when executing XRSTORS. The root cause is that we forget to set the xcomp_bv when we fake up the XSAVES area in the copyin_to_xsaves() function. Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1485075023-30161-1-git-send-email-haokexin@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>