git.armlinux.org.uk/linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2017-08-17	Merge branches 'intel_pstate-fix' and 'cpufreq-x86-fix'	Rafael J. Wysocki
	* intel_pstate-fix: cpufreq: intel_pstate: report correct CPU frequencies during trace * cpufreq-x86-fix: cpufreq: x86: Disable interrupts during MSRs reading
2017-08-17	ACPI: EC: Fix regression related to wrong ECDT initialization order	Lv Zheng
	Commit 2a5708409e4e (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events) introduced acpi_ec_ecdt_start(), but that function is invoked before acpi_ec_query_init(), which is too early. This causes the kernel to crash if an EC event occurs after boot, when ec_query_wq is not valid: BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 ... Workqueue: events acpi_ec_event_handler task: ffff9f539790dac0 task.stack: ffffb437c0e10000 RIP: 0010:__queue_work+0x32/0x430 Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start() is actually a no-op in the majority of cases. However, commit c712bb58d827 (ACPI / EC: Add support to skip boot stage DSDT probe) caused the probing of the DSDT EC as the "boot EC" to be skipped when the ECDT EC is valid and uncovered the bug. Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init() in acpi_ec_init(). Link: https://jira01.devtools.intel.com/browse/LCK-4348 Fixes: 2a5708409e4e (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events) Fixes: c712bb58d827 (ACPI / EC: Add support to skip boot stage DSDT probe) Reported-by: Wang Wendy <wendy.wang@intel.com> Tested-by: Feng Chenzhou <chenzhoux.feng@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-17	Merge branch 'parisc-4.13-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - Fix PCI memory bar assignments with 64-bit kernels on machines with Dino/Cujo PCI chipsets. This makes PCI graphic cards work on such machines (from Thomas Bogendoerfer). - Fix documentation to be more clear about the difference between %pF and %pS printk format usage. There are still many places in the kernel which have it wrong (from Petr Mladek, Sergey Senozhatsky & me). * 'parisc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: printk-formats.txt: Better describe the difference between %pS and %pF parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo
2017-08-17	bpf: Update sysctl documentation to list all supported architectures	Michael Ellerman
	The sysctl documentation states that the JIT is only available on x86_64, which is no longer correct. Update the list, and break it out to indicate which architectures support the cBPF JIT (via HAVE_CBPF_JIT) or the eBPF JIT (HAVE_EBPF_JIT). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-17	nvme-fabrics: fix reporting of unrecognized options	Christoph Hellwig
	Only print the specified options that are not recognized, instead of the whole list of options. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2017-08-17	Merge branch 'for_linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fix from Jan Kara: "A fix of a check for quota limit" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: correct space limit check
2017-08-17	pty: fix the cached path of the pty slave file descriptor in the master	Linus Torvalds
	Christian Brauner reported that if you use the TIOCGPTPEER ioctl() to get a slave pty file descriptor, the resulting file descriptor doesn't look right in /proc/<pid>/fd/<fd>. In particular, he wanted to use readlink() on /proc/self/fd/<fd> to get the pathname of the slave pty (basically implementing "ptsname{_r}()"). The reason for that was that we had generated the wrong 'struct path' when we create the pty in ptmx_open(). In particular, the dentry was correct, but the vfsmount pointed to the mount of the ptmx node. That _can_ be correct - in case you use "/dev/pts/ptmx" to open the master - but usually is not. The normal case is to use /dev/ptmx, which then looks up the pts/ directory, and then the vfsmount of the ptmx node is obviously the /dev directory, not the /dev/pts/ directory. We actually did have the right vfsmount available, but in the wrong place (it gets looked up in 'devpts_acquire()' when we get a reference to the pts filesystem), and so ptmx_open() used the wrong mnt pointer. The end result of this confusion was that the pty worked fine, but when if you did TIOCGPTPEER to get the slave side of the pty, end end result would also work, but have that dodgy 'struct path'. And then when doing "d_path()" on to get the pathname, the vfsmount would not match the root of the pts directory, and d_path() would return an empty pathname thinking that the entry had escaped a bind mount into another mount. This fixes the problem by making devpts_acquire() return the vfsmount for the pts filesystem, allowing ptmx_open() to trivially just use the right mount for the pts dentry, and create the proper 'struct path'. Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17	drm/tegra: Prevent BOs from being freed during job submission	Dmitry Osipenko
	Since DRM IOCTL's are lockless, there is a chance that BOs could be released while a job submission is in progress. To avoid that, keep the GEM reference until the job has been pinned, part of which will be to take another reference. v2: remove redundant check and avoid memory leak Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: gem: Implement mmap() for PRIME buffers	Thierry Reding
	The mapping of PRIME buffers can reuse much of the GEM mapping code, so extract the common bits into a new tegra_gem_mmap() helper. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: Support render node	Thierry Reding
	None of the driver-specific IOCTLs are privileged, so mark them as such and advertise that the driver supports render nodes. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: sor: Trace register accesses	Thierry Reding
	Add tracepoint events for SOR controller register accesses. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: dpaux: Trace register accesses	Thierry Reding
	Add tracepoint events for DPAUX controller register accesses. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: dsi: Trace register accesses	Thierry Reding
	Add tracepoint events for DSI controller register accesses. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: hdmi: Trace register accesses	Thierry Reding
	Add tracepoint events for HDMI controller register accesses. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: dc: Trace register accesses	Thierry Reding
	Add tracepoint events for display controller register accesses. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: sor: Use unsigned int for register offsets	Thierry Reding
	Register offsets are usually fairly small numbers, so an unsigned int is more than enough to represent them. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: hdmi: Use unsigned int for register offsets	Thierry Reding
	Register offsets are usually fairly small numbers, so an unsigned int is more than enough to represent them. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: dsi: Use unsigned int for register offsets	Thierry Reding
	Register offsets are usually fairly small numbers, so an unsigned int is more than enough to represent them. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: dpaux: Use unsigned int for register offsets	Thierry Reding
	Register offsets are usually fairly small numbers, so an unsigned int is more than enough to represent them. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: dc: Use unsigned int for register offsets	Thierry Reding
	Register offsets are usually fairly small numbers, so an unsigned int is more than enough to represent them. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: Fix NULL deref in debugfs/iova	Michał Mirosław
	When IOMMU is off, ->mm_lock is not initialized and ->mm is NULL. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: switch to drm__get(), drm__put() helpers	Cihangir Akturk
	Use drm__get() and drm__put() helpers instead of drm__reference() and drm__unreference() helpers. drm__reference() and drm__unreference() functions are just compatibility alias for drm__get() and drm__put() and should not be used by new code. So convert all users of compatibility functions to use the new APIs. Generated by: scripts/coccinelle/api/drm-get-put.cocci Signed-off-by: Cihangir Akturk <cakturk@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: Set MODULE_FIRMWARE for the VIC	Nicolas Chauvet
	The defines are set anyway to prevent an empty string. The test for the SoC is the same as for Nouveau for the Tegra GPU firmware (see drivers/gpu/drm/nouveau/nouveau_platform.c) v2: - Place the defines above each chip's vic_config struct - MODULE_FIRMWARE() at the end of the file Fixes: 0ae797a8ba05 ("drm/tegra: Add VIC support") Signed-off-by: Nicolas Chauvet <kwizart@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	drm/tegra: Add CONFIG_OF dependency	Arnd Bergmann
	Without CONFIG_OF, we can run into a build error: drivers/gpu/drm/tegra/dpaux.c:378:20: error: 'pinconf_generic_dt_node_to_map_group' undeclared here (not in a function); did you mean 'pinconf_generic_params'? .dt_node_to_map = pinconf_generic_dt_node_to_map_group, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pinconf_generic_params drivers/gpu/drm/tegra/dpaux.c:379:17: error: 'pinconf_generic_dt_free_map' undeclared here (not in a function); did you mean 'pinconf_generic_params'? This adds an explicit dependency. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	gpu: host1x: Support sub-devices recursively	Thierry Reding
	The display architecture in Tegra186 changes slightly compared to earlier Tegra generations, which requires that we recursively scan host1x sub-devices from device tree. Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	gpu: host1x: fix error return code in host1x_probe()	Gustavo A. R. Silva
	platform_get_irq() returns an error code, but the host1x driver ignores it and always returns -ENXIO. This is not correct and, prevents -EPROBE_DEFER from being propagated properly. Notice that platform_get_irq() no longer returns 0 on error: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92af Print and propagate the return value of platform_get_irq on failure. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	gpu: host1x: Fix bitshift/mask multipliers	Mikko Perttunen
	Some parts of Host1x uses BIT_WORD/BIT_MASK/BITS_PER_LONG to calculate register or field offsets. This worked fine on ARMv7, but now that BITS_PER_LONG is 64 but our registers are still 32-bit things are broken. Fix by replacing.. - BIT_WORD with (x / 32) - BIT_MASK with BIT(x % 32) - BITS_PER_LONG with 32 Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	gpu: host1x: Don't fail on NULL bo physical address	Mikko Perttunen
	Pinning a Host1x BO currently cannot fail and zero is a valid address for a BO when IOMMU is enabled. To avoid false errors remove checks for NULL BO physical addresses. Fixes: 404bfb78daf3 ("gpu: host1x: Add IOMMU support") Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-08-17	i2c: designware: Fix runtime PM for I2C slave mode	Jarkko Nikula
	I2C slave controller must be powered and active all the time when I2C slave backend is registered in order to let master address and communicate with us. Now if the controller is runtime PM capable it will be suspended after probe and cannot ever respond to the master or generate interrupts. Fix this by resuming the controller when I2C slave backend is registered and let it suspend after unregistering. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-08-17	i2c: designware: Remove needless pm_runtime_put_noidle() call	Jarkko Nikula
	I guess pm_runtime_put_noidle() call in i2c_dw_probe_slave() was copied by accident from similar master mode adapter registration code. It is unbalanced due missing pm_runtime_get_noresume() but harmless since it doesn't decrease dev->power.usage_count below zero. In theory we can hit similar needless runtime suspend/resume cycle during slave mode adapter registration that was happening when registering the master mode adapter. See commit cd998ded5c12 ("i2c: designware: Prevent runtime suspend during adapter registration"). However, since we are slave, we can consider it as a wrong configuration if we have other slaves attached under this adapter and can omit the pm_runtime_get_noresume()/pm_runtime_put_noidle() calls for simplicity. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-08-17	ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices	Takashi Iwai
	C-Media devices (at least some models) mute the playback stream when volumes are set to the minimum value. But this isn't informed via TLV and the user-space, typically PulseAudio, gets confused as if it's still played in a low volume. This patch adds the new flag, min_mute, to struct usb_mixer_elem_info for indicating that the mixer element is with the minimum-mute volume. This flag is set for known C-Media devices in snd_usb_mixer_fu_apply_quirk() in turn. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196669 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-08-17	Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', ↵	Paul E. McKenney
	'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD doc.2017.08.17a: Documentation updates. fixes.2017.08.17a: RCU fixes. hotplug.2017.07.25b: CPU-hotplug updates. misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts). spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait(). srcu.2017.07.27c: SRCU updates. torture.2017.07.24c: Torture-test updates.
2017-08-17	arch: Remove spin_unlock_wait() arch-specific definitions	Paul E. McKenney
	There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore removes the underlying arch-specific arch_spin_unlock_wait() for all architectures providing them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-arch@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Boqun Feng <boqun.feng@gmail.com>
2017-08-17	locking: Remove spin_unlock_wait() generic definitions	Paul E. McKenney
	There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore removes spin_unlock_wait() and related definitions from core code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17	drivers/ata: Replace spin_unlock_wait() with lock/unlock pair	Paul E. McKenney
	There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore eliminates the spin_unlock_wait() call and associated else-clause and hoists the then-clause's lock and unlock out of the "if" statement. This should be safe from a performance perspective because according to Tejun there should be few if any drivers that don't set their own error handler. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <linux-ide@vger.kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17	ipc: Replace spin_unlock_wait() with lock/unlock pair	Paul E. McKenney
	There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore replaces the spin_unlock_wait() call in exit_sem() with spin_lock() followed immediately by spin_unlock(). This should be safe from a performance perspective because exit_sem() is rarely invoked in production. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Manfred Spraul <manfred@colorfullife.com>
2017-08-17	exit: Replace spin_unlock_wait() with lock/unlock pair	Paul E. McKenney
	There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore replaces the spin_unlock_wait() call in do_exit() with spin_lock() followed immediately by spin_unlock(). This should be safe from a performance perspective because the lock is a per-task lock, and this is happening only at task-exit time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17	completion: Replace spin_unlock_wait() with lock/unlock pair	Paul E. McKenney
	There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore replaces the spin_unlock_wait() call in completion_done() with spin_lock() followed immediately by spin_unlock(). This should be safe from a performance perspective because the lock will be held only the wakeup happens really quickly. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-17	doc: Set down RCU's scheduling-clock-interrupt needs	Paul E. McKenney
	This commit documents the situations in which RCU needs the scheduling-clock interrupt to be enabled, along with the consequences of failing to meet RCU's needs in this area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	doc: No longer allowed to use rcu_dereference on non-pointers	Paul E. McKenney
	There are too many ways for the compiler to optimize (that is, break) dependencies carried via integer values, so it is now permissible to carry dependencies only via pointers. This commit catches up some of the documentation on this point. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	doc: Add RCU files to docbook-generation files	Paul E. McKenney
	Suggested-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	doc: Update memory-barriers.txt for read-to-write dependencies	Paul E. McKenney
	The memory-barriers.txt document contains an obsolete passage stating that smp_read_barrier_depends() is required to force ordering for read-to-write dependencies. We now know that this is not required, even for DEC Alpha. This commit therefore updates this passage to state that read-to-write dependencies are respected even without smp_read_barrier_depends(). Reported-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: Luc Maranget <luc.maranget@inria.fr> [ paulmck: Reference control-dependencies sections and use WRITE_ONCE() per Will Deacon. Correctly place split-cache paragraph while there. ] Acked-by: Will Deacon <will.deacon@arm.com>
2017-08-17	doc: Update RCU documentation	Paul E. McKenney
	Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	membarrier: Provide expedited private command	Mathieu Desnoyers
	Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built from all runqueues for which current thread's mm is the same as the thread calling sys_membarrier. It executes faster than the non-expedited variant (no blocking). It also works on NOHZ_FULL configurations. Scheduler-wise, it requires a memory barrier before and after context switching between processes (which have different mm). The memory barrier before context switch is already present. For the barrier after context switch: * Our TSO archs can do RELEASE without being a full barrier. Look at x86 spin_unlock() being a regular STORE for example. But for those archs, all atomics imply smp_mb and all of them have atomic ops in switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full barrier. * From all weakly ordered machines, only ARM64 and PPC can do RELEASE, the rest does indeed do smp_mb(), so there the spin_unlock() is a full barrier and we're good. * ARM64 has a very heavy barrier in switch_to(), which suffices. * PPC just removed its barrier from switch_to(), but appears to be talking about adding something to switch_mm(). So add a smp_mb__after_unlock_lock() for now, until this is settled on the PPC side. Changes since v3: - Properly document the memory barriers provided by each architecture. Changes since v2: - Address comments from Peter Zijlstra, - Add smp_mb__after_unlock_lock() after finish_lock_switch() in finish_task_switch() to add the memory barrier we need after storing to rq->curr. This is much simpler than the previous approach relying on atomic_dec_and_test() in mmdrop(), which actually added a memory barrier in the common case of switching between userspace processes. - Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full kernel, rather than having the whole membarrier system call returning -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full. Adapt the CMD_QUERY mask accordingly. Changes since v1: - move membarrier code under kernel/sched/ because it uses the scheduler runqueue, - only add the barrier when we switch from a kernel thread. The case where we switch from a user-space thread is already handled by the atomic_dec_and_test() in mmdrop(). - add a comment to mmdrop() documenting the requirement on the implicit memory barrier. CC: Peter Zijlstra <peterz@infradead.org> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Andrew Hunter <ahh@google.com> CC: Maged Michael <maged.michael@gmail.com> CC: gromer@google.com CC: Avi Kivity <avi@scylladb.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Dave Watson <davejwatson@fb.com>
2017-08-17	rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()	Paul E. McKenney
	The rcu_idle_exit() and rcu_idle_enter() functions are exported because they were originally used by RCU_NONIDLE(), which was intended to be usable from modules. However, RCU_NONIDLE() now instead uses rcu_irq_enter_irqson() and rcu_irq_exit_irqson(), which are not exported, and there have been no complaints. This commit therefore removes the exports from rcu_idle_exit() and rcu_idle_enter(). Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	rcu: Add warning to rcu_idle_enter() for irqs enabled	Paul E. McKenney
	All current callers of rcu_idle_enter() have irqs disabled, and rcu_idle_enter() relies on this, but doesn't check. This commit therefore adds a RCU_LOCKDEP_WARN() to add some verification to the trust. While we are there, pass "true" rather than "1" to rcu_eqs_enter(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	rcu: Make rcu_idle_enter() rely on callers disabling irqs	Peter Zijlstra (Intel)
	All callers to rcu_idle_enter() have irqs disabled, so there is no point in rcu_idle_enter disabling them again. This commit therefore replaces the irq disabling with a RCU_LOCKDEP_WARN(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	rcu: Add assertions verifying blocked-tasks list	Paul E. McKenney
	This commit adds assertions verifying the consistency of the rcu_node structure's ->blkd_tasks list and its ->gp_tasks, ->exp_tasks, and ->boost_tasks pointers. In particular, the ->blkd_tasks lists must be empty except for leaf rcu_node structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()	Masami Hiramatsu
	Set disable_rcu_irq_enter on not only rcu_eqs_enter_common() but also rcu_eqs_exit(), since rcu_eqs_exit() suffers from the same issue as was fixed for rcu_eqs_enter_common() by commit 03ecd3f48e57 ("rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work"). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-08-17	rcu: Add TPS() protection for _rcu_barrier_trace strings	Paul E. McKenney
	The _rcu_barrier_trace() function is a wrapper for trace_rcu_barrier(), which needs TPS() protection for strings passed through the second argument. However, it has escaped prior TPS()-ification efforts because it _rcu_barrier_trace() does not start with "trace_". This commit therefore adds the needed TPS() protection Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>