linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-06-15	ipv6: introduce neighbour discovery ops	Alexander Aring
	This patch introduces neighbour discovery ops callback structure. The idea is to separate the handling for 6LoWPAN into the 6lowpan module. These callback offers 6lowpan different handling, such as 802.15.4 short address handling or RFC6775 (Neighbor Discovery Optimization for IPv6 over 6LoWPANs). Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	ndisc: add __ndisc_opt_addr_data function	Alexander Aring
	This patch adds __ndisc_opt_addr_data as low-level function for ndisc_opt_addr_data which doesn't depend on net_device parameter. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	ndisc: add __ndisc_opt_addr_space function	Alexander Aring
	This patch adds __ndisc_opt_addr_space as low-level function for ndisc_opt_addr_space which doesn't depend on net_device parameter. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	6lowpan: add 802.15.4 short addr slaac	Alexander Aring
	This patch adds the autoconfiguration if a valid 802.15.4 short address is available for 802.15.4 6LoWPAN interfaces. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	6lowpan: add private neighbour data	Alexander Aring
	This patch will introduce a 6lowpan neighbour private data. Like the interface private data we handle private data for generic 6lowpan and for link-layer specific 6lowpan. The current first use case if to save the short address for a 802.15.4 6lowpan neighbour. Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	block: introduce device_add_disk()	Dan Williams
	In preparation for removing the ->driverfs_dev member of a gendisk, add an api that takes the parent device as a parameter to add_disk(). For now this maintains the status quo of WARN()ing on failure, but not return a error code. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-15	Update my main e-mails at the Kernel tree	Mauro Carvalho Chehab
	For the third time in three years, I'm changing my e-mail at Samsung. That's bad, as it may stop communications with me for a while. So, this time, I'll also add the mchehab@kernel.org e-mail, as it remains stable since ever. Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-15	Merge branches 'doc.2016.06.15a', 'fixes.2016.06.15b' and ↵	Paul E. McKenney
	'torture.2016.06.14a' into HEAD doc.2016.06.15a: Documentation updates fixes.2016.06.15b: Documentation updates torture.2016.06.14a: Documentation updates
2016-06-15	rcu: sysctl: Panic on RCU Stall	Daniel Bristot de Oliveira
	It is not always easy to determine the cause of an RCU stall just by analysing the RCU stall messages, mainly when the problem is caused by the indirect starvation of rcu threads. For example, when preempt_rcu is not awakened due to the starvation of a timer softirq. We have been hard coding panic() in the RCU stall functions for some time while testing the kernel-rt. But this is not possible in some scenarios, like when supporting customers. This patch implements the sysctl kernel.panic_on_rcu_stall. If set to 1, the system will panic() when an RCU stall takes place, enabling the capture of a vmcore. The vmcore provides a way to analyze all kernel/tasks states, helping out to point to the culprit and the solution for the stall. The kernel.panic_on_rcu_stall sysctl is disabled by default. Changes from v1: - Fixed a typo in the git log - The if(sysctl_panic_on_rcu_stall) panic() is in a static function - Fixed the CONFIG_TINY_RCU compilation issue - The var sysctl_panic_on_rcu_stall is now __read_mostly Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15	rcu: Make call_rcu_tasks() tolerate first call with irqs disabled	Paul E. McKenney
	Currently, if the very first call to call_rcu_tasks() has irqs disabled, it will create the rcu_tasks_kthread with irqs disabled, which will result in a splat in the memory allocator, which kthread_run() invokes with the expectation that irqs are enabled. This commit fixes this problem by deferring kthread creation if called with irqs disabled. The first call to call_rcu_tasks() that has irqs enabled will create the kthread. This bug was detected by rcutorture changes that were motivated by Iftekhar Ahmed's mutation-testing efforts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15	rcu: No ordering for rcu_assign_pointer() of NULL	Paul E. McKenney
	This commit does a compile-time check for rcu_assign_pointer() of NULL, and uses WRITE_ONCE() rather than smp_store_release() in that case. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15	net_sched: add the ability to defer skb freeing	Eric Dumazet
	qdisc are changed under RTNL protection and often while blocking BH and root qdisc spinlock. When lots of skbs need to be dropped, we free them under these locks causing TX/RX freezes, and more generally latency spikes. This commit adds rtnl_kfree_skbs(), used to queue skbs for deferred freeing. Actual freeing happens right after RTNL is released, with appropriate scheduling points. rtnl_qdisc_drop() can also be used in place of disc_drop() when RTNL is held. qdisc_reset_queue() and __qdisc_reset_queue() get the new behavior, so standard qdiscs like pfifo, pfifo_fast... have their ->reset() method automatically handled. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	[media] media: fix media devnode ioctl/syscall and unregister race	Shuah Khan
	Media devnode open/ioctl could be in progress when media device unregister is initiated. System calls and ioctls check media device registered status at the beginning, however, there is a window where unregister could be in progress without changing the media devnode status to unregistered. process 1 process 2 fd = open(/dev/media0) media_devnode_is_registered() (returns true here) media_device_unregister() (unregister is in progress and devnode isn't unregistered yet) ... ioctl(fd, ...) __media_ioctl() media_devnode_is_registered() (returns true here) ... media_devnode_unregister() ... (driver releases the media device memory) media_device_ioctl() (By this point devnode->media_dev does not point to allocated memory. use-after free in in mutex_lock_nested) BUG: KASAN: use-after-free in mutex_lock_nested+0x79c/0x800 at addr ffff8801ebe914f0 Fix it by clearing register bit when unregister starts to avoid the race. process 1 process 2 fd = open(/dev/media0) media_devnode_is_registered() (could return true here) media_device_unregister() (clear the register bit, then start unregister.) ... ioctl(fd, ...) __media_ioctl() media_devnode_is_registered() (return false here, ioctl returns I/O error, and will not access media device memory) ... media_devnode_unregister() ... (driver releases the media device memory) Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reported-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Tested-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-06-15	skb_array: resize support	Michael S. Tsirkin
	Update skb_array after ptr_ring API changes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	ptr_ring: resize support	Michael S. Tsirkin
	This adds ring resize support. Seems to be necessary as users such as tun allow userspace control over queue size. If resize is used, this costs us ability to peek at queue without consumer lock - should not be a big deal as peek and consumer are usually run on the same CPU. If ring is made bigger, ring contents is preserved. If ring is made smaller, extra pointers are passed to an optional destructor callback. Cleanup function also gains destructor callback such that all pointers in queue can be cleaned up. This changes some APIs but we don't have any users yet, so it won't break bisect. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	skb_array: array based FIFO for skbs	Michael S. Tsirkin
	A simple array based FIFO of pointers. Intended for net stack so uses skbs for type safety. Implemented as a set of wrappers around ptr_ring. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	[media] media-device: dynamically allocate struct media_devnode	Mauro Carvalho Chehab
	struct media_devnode is currently embedded at struct media_device. While this works fine during normal usage, it leads to a race condition during devnode unregister. the problem is that drivers assume that, after calling media_device_unregister(), the struct that contains media_device can be freed. This is not true, as it can't be freed until userspace closes all opened /dev/media devnodes. In other words, if the media devnode is still open, and media_device gets freed, any call to an ioctl will make the core to try to access struct media_device, with will cause an use-after-free and even GPF. Fix this by dynamically allocating the struct media_devnode and only freeing it when it is safe. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-06-15	ptr_ring: array based FIFO for pointers	Michael S. Tsirkin
	A simple array based FIFO of pointers. Intended for net stack which commonly has a single consumer/producer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	[media] media-devnode: fix namespace mess	Mauro Carvalho Chehab
	Along all media controller code, "mdev" is used to represent a pointer to struct media_device, and "devnode" for a pointer to struct media_devnode. However, inside media-devnode.[ch], "mdev" is used to represent a pointer to struct media_devnode. This is very confusing and may lead to development errors. So, let's change all occurrences at media-devnode.[ch] to also use "devnode" for such pointers. This patch doesn't make any functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-06-15	net_sched: make tcf_hash_check() boolean	WANG Cong
	Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	net: vrf: Handle ipv6 multicast and link-local addresses	David Ahern
	IPv6 multicast and link-local addresses require special handling by the VRF driver: 1. Rather than using the VRF device index and full FIB lookups, packets to/from these addresses should use direct FIB lookups based on the VRF device table. 2. fail sends/receives on a VRF device to/from a multicast address (e.g, make ping6 ff02::1%<vrf> fail) 3. move the setting of the flow oif to the first dst lookup and revert the change in icmpv6_echo_reply made in ca254490c8dfd ("net: Add VRF support to IPv6 stack"). Linklocal/mcast addresses require use of the skb->dev. With this change connections into and out of a VRF enslaved device work for multicast and link-local addresses work (icmp, tcp, and udp) e.g., 1. packets into VM with VRF config: ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1 ping6 -c3 ff02::1%br1 ssh -6 fe80::e0:f9ff:fe1c:b974%br1 2. packets going out a VRF enslaved device: ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1 ping6 -c3 ff02::1%eth1 ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1 Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	net: l3mdev: Remove const from flowi6 arg to get_rt6_dst	David Ahern
	Allow drivers to pass flow arg to functions where the arg is not const and allow the driver to make updates as needed (eg., setting oif). Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15	rpc: share one xps between all backchannels	J. Bruce Fields
	The spec allows backchannels for multiple clients to share the same tcp connection. When that happens, we need to use the same xprt for all of them. Similarly, we need the same xps. This fixes list corruption introduced by the multipath code. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com>
2016-06-15	nfsd4/rpc: move backchannel create logic into rpc code	J. Bruce Fields
	Also simplify the logic a bit. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com>
2016-06-15	netfilter: nf_tables: reject loops from set element jump to chain	Pablo Neira Ayuso
	Liping Zhang says: "Users may add such a wrong nft rules successfully, which will cause an endless jump loop: # nft add rule filter test tcp dport vmap {1: jump test} This is because before we commit, the element in the current anonymous set is inactive, so osp->walk will skip this element and miss the validate check." To resolve this problem, this patch passes the generation mask to the walk function through the iter container structure depending on the code path: 1) If we're dumping the elements, then we have to check if the element is active in the current generation. Thus, we check for the current bit in the genmask. 2) If we're checking for loops, then we have to check if the element is active in the next generation, as we're in the middle of a transaction. Thus, we check for the next bit in the genmask. Based on original patch from Liping Zhang. Reported-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Liping Zhang <liping.zhang@spreadtrum.com>
2016-06-15	crypto: rsa - return raw integers for the ASN.1 parser	Tudor Ambarus
	Return the raw key with no other processing so that the caller can copy it or MPI parse it, etc. The scope is to have only one ANS.1 parser for all RSA implementations. Update the RSA software implementation so that it does the MPI conversion on top. Signed-off-by: Tudor Ambarus <tudor-dan.ambarus@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-15	crypto: drbg - use aligned buffers	Stephan Mueller
	Hardware cipher implementation may require aligned buffers. All buffers that potentially are processed with a cipher are now aligned. Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-15	crypto: drbg - use CTR AES instead of ECB AES	Stephan Mueller
	The CTR DRBG derives its random data from the CTR that is encrypted with AES. This patch now changes the CTR DRBG implementation such that the CTR AES mode is employed. This allows the use of steamlined CTR AES implementation such as ctr-aes-aesni. Unfortunately there are the following subtile changes we need to apply when using the CTR AES mode: - the CTR mode increments the counter after the cipher operation, but the CTR DRBG requires the increment before the cipher op. Hence, the crypto_inc is applied to the counter (drbg->V) once it is recalculated. - the CTR mode wants to encrypt data, but the CTR DRBG is interested in the encrypted counter only. The full CTR mode is the XOR of the encrypted counter with the plaintext data. To access the encrypted counter, the patch uses a NULL data vector as plaintext to be "encrypted". Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-15	gpio: userspace ABI for reading GPIO line events	Linus Walleij
	This adds an ABI for listening to events on GPIO lines. The mechanism returns an anonymous file handle to a request to listen to a specific offset on a specific gpiochip. To fetch the stream of events from the file handle, userspace simply reads an event. - Events can be requested with the same flags as ordinary handles, i.e. open drain or open source. An ioctl() call GPIO_GET_LINEEVENT_IOCTL is issued indicating the desired line. - Events can be requested for falling edge events, rising edge events, or both. - All events are timestamped using the kernel real time nanosecond timestamp (the same as is used by IIO). - The supplied consumer label will appear in "lsgpio" listings of the lines, and in /proc/interrupts as the mechanism will request an interrupt from the gpio chip. - Events are not supported on gpiochips that do not serve interrupts (no legal .to_irq() call). The event interrupt is threaded to avoid any realtime problems. - It is possible to also directly read the current value of the registered GPIO line by issuing the same GPIOHANDLE_GET_LINE_VALUES_IOCTL as used by the line handles. Setting the value is not supported: we do not listen to events on output lines. This ABI is strongly influenced by Industrial I/O and surpasses the old sysfs ABI by providing proper precision timestamps, making it possible to set flags like open drain, and put consumer names on the GPIO lines. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-06-15	gpio: userspace ABI for reading/writing GPIO lines	Linus Walleij
	This adds a userspace ABI for reading and writing GPIO lines. The mechanism returns an anonymous file handle to a request to read/write n offsets from a gpiochip. This file handle in turn accepts two ioctl()s: one that reads and one that writes values to the selected lines. - Handles can be requested as input/output, active low, open drain, open source, however when you issue a request for n lines with GPIO_GET_LINEHANDLE_IOCTL, they must all have the same flags, i.e. all inputs or all outputs, all open drain etc. If a granular control of the flags for each line is desired, they need to be requested individually, not in a batch. - The GPIOHANDLE_GET_LINE_VALUES_IOCTL read ioctl() can be issued also to output lines to verify that the hardware is in the expected state. - It reads and writes up to GPIOHANDLES_MAX lines at once, utilizing the .set_multiple() call in the driver if possible, making the call efficient if several lines can be written with a single register update. The limitation of GPIOHANDLES_MAX to 64 lines is done under the assumption that we may expect hardware that can issue a transaction updating 64 bits at an instant but unlikely anything larger than that. ChangeLog v2->v3: - Use gpiod_get_value_cansleep() so we support also slowpath GPIO drivers. - Fix up the UAPI docs kerneldoc. - Allocate the anonymous fd last, so that the release function don't get called until that point of something fails. After this point, skip the errorpath. ChangeLog v1->v2: - Handle ioctl_compat() properly based on a similar patch to the other ioctl() handling code. - Use _IOWR() as we pass pointers both in and out of the ioctl() - Use kmalloc() and kfree() for the linehandled, do not try to be fancy with devm_* it doesn't work the way I thought. - Fix const-correctness on the linehandle name field. Acked-by: Michael Welling <mwelling@ieee.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-06-15	PM / sleep: Make pm_prepare_console() return void	Borislav Petkov
	Nothing is using its return value so change it to return void. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-14	torture: Break online and offline functions out of torture_onoff()	Paul E. McKenney
	This commit breaks torture_online() and torture_offline() out of torture_onoff() in preparation for allowing waketorture finer-grained control of its CPU-hotplug workload. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14	rcu: Document RCU_NONIDLE() restrictions in comment header	Paul E. McKenney
	Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14	Update my main e-mails at the Kernel tree	Mauro Carvalho Chehab
	For the third time in three years, I'm changing my e-mail at Samsung. That's bad, as it may stop communications with me for a while. So, this time, I'll also the mchehab@kernel.org e-mail, as it remains stable since ever. Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-06-14	locking/spinlock, arch: Update and fix spin_unlock_wait() implementations	Peter Zijlstra
	This patch updates/fixes all spin_unlock_wait() implementations. The update is in semantics; where it previously was only a control dependency, we now upgrade to a full load-acquire to match the store-release from the spin_unlock() we waited on. This ensures that when spin_unlock_wait() returns, we're guaranteed to observe the full critical section we waited on. This fixes a number of spin_unlock_wait() users that (not unreasonably) rely on this. I also fixed a number of ticket lock versions to only wait on the current lock holder, instead of for a full unlock, as this is sufficient. Furthermore; again for ticket locks; I added an smp_rmb() in between the initial ticket load and the spin loop testing the current value because I could not convince myself the address dependency is sufficient, esp. if the loads are of different sizes. I'm more than happy to remove this smp_rmb() again if people are certain the address dependency does indeed work as expected. Note: PPC32 will be fixed independently Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: chris@zankel.net Cc: cmetcalf@mellanox.com Cc: davem@davemloft.net Cc: dhowells@redhat.com Cc: james.hogan@imgtec.com Cc: jejb@parisc-linux.org Cc: linux@armlinux.org.uk Cc: mpe@ellerman.id.au Cc: ralf@linux-mips.org Cc: realmz6@gmail.com Cc: rkuo@codeaurora.org Cc: rth@twiddle.net Cc: schwidefsky@de.ibm.com Cc: tony.luck@intel.com Cc: vgupta@synopsys.com Cc: ysato@users.sourceforge.jp Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14	locking/barriers: Move smp_cond_load_acquire() to asm-generic/barrier.h	Peter Zijlstra
	Since all asm/barrier.h should/must include asm-generic/barrier.h the latter is a good place for generic infrastructure like this. This also allows archs to override the new smp_acquire__after_ctrl_dep(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14	locking/barriers: Introduce smp_acquire__after_ctrl_dep()	Peter Zijlstra
	Introduce smp_acquire__after_ctrl_dep(), this construct is not uncommon, but the lack of this barrier is. Use it to better express smp_rmb() uses in WRITE_ONCE(), the IPC semaphore code and the qspinlock code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14	locking/barriers: Replace smp_cond_acquire() with smp_cond_load_acquire()	Peter Zijlstra
	This new form allows using hardware assisted waiting. Some hardware (ARM64 and x86) allow monitoring an address for changes, so by providing a pointer we can use this to replace the cpu_relax() with hardware optimized methods in the future. Requested-by: Will Deacon <will.deacon@arm.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14	Merge branch 'linus' into locking/core, to pick up fixes before merging new ↵	Ingo Molnar
	changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14	perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86	Andi Kleen
	The NMI watchdog uses either the fixed cycles or a generic cycles counter. This causes a lot of conflicts with users of the PMU who want to run a full group including the cycles fixed counter, for example the --topdown support recently added to perf stat. The code needs to fall back to not use groups, which can cause measurement inaccuracy due to multiplexing errors. This patch switches the NMI watchdog to use reference cycles on Intel systems. This is actually more accurate than cycles, because cycles can tick faster than the measured CPU Frequency due to Turbo mode. The ref cycles always tick at their frequency, or slower when the system is idling. That means the NMI watchdog can never expire too early, unlike with cycles. The reference cycles tick roughly at the frequency of the TSC, so the same period computation can be used. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/1465478079-19993-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14	Merge branch 'linus' into perf/core, to pick up fixes before merging new changes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14	Merge branch 'sched/urgent' into sched/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-13	drbd: correctly handle failed crypto_alloc_hash	Lars Ellenberg
	crypto_alloc_hash returns an ERR_PTR(), not NULL. Also reset peer_integrity_tfm to NULL, to not call crypto_free_hash() on an errno in the cleanup path. Reported-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13	drbd: code cleanups without semantic changes	Fabian Frederick
	This contains various cosmetic fixes ranging from simple typos to const-ifying, and using booleans properly. Original commit messages from Fabian's patch set: drbd: debugfs: constify drbd_version_fops drbd: use seq_put instead of seq_print where possible drbd: include linux/uaccess.h instead of asm/uaccess.h drbd: use const char * const for drbd strings drbd: kerneldoc warning fix in w_e_end_data_req() drbd: use unsigned for one bit fields drbd: use bool for peer is_ states drbd: fix typo drbd: use \| for bitmask combination drbd: use true/false for bool drbd: fix drbd_bm_init() comments drbd: introduce peer state union drbd: fix maybe_pull_ahead() locking comments drbd: use bool for growing drbd: remove redundant declarations drbd: replace if/BUG by BUG_ON Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13	drbd: allow larger max_discard_sectors	Lars Ellenberg
	Make sure we have at least 67 (> AL_UPDATES_PER_TRANSACTION) al-extents available, and allow up to half of that to be discarded in one bio. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13	drbd: when receiving P_TRIM, zero-out partial unaligned chunks	Lars Ellenberg
	We can avoid spurious data divergence caused by partially-ignored discards on certain backends with discard_zeroes_data=0, if we translate partial unaligned discard requests into explicit zero-out. The relevant use case is LVM/DM thin. If on different nodes, DRBD is backed by devices with differing discard characteristics, discards may lead to data divergence (old data or garbage left over on one backend, zeroes due to unmapped areas on the other backend). Online verify would now potentially report tons of spurious differences. While probably harmless for most use cases (fstrim on a file system), DRBD cannot have that, it would violate our promise to upper layers that our data instances on the nodes are identical. To be correct and play safe (make sure data is identical on both copies), we would have to disable discard support, if our local backend (on a Primary) does not support "discard_zeroes_data=true". We'd also have to translate discards to explicit zero-out on the receiving (typically: Secondary) side, unless the receiving side supports "discard_zeroes_data=true". Which both would allocate those blocks, instead of unmapping them, in contrast with expectations. LVM/DM thin does set discard_zeroes_data=0, because it silently ignores discards to partial chunks. We can work around this by checking the alignment first. For unaligned (wrt. alignment and granularity) or too small discards, we zero-out the initial (and/or) trailing unaligned partial chunks, but discard all the aligned full chunks. At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1". Arguably it should behave this way internally, by default, and we'll try to make that happen. But our workaround is still valid for already deployed setups, and for other devices that may behave this way. Setting discard-zeroes-if-aligned=yes will allow DRBD to use discards, and to announce discard_zeroes_data=true, even on backends that announce discard_zeroes_data=false. Setting discard-zeroes-if-aligned=no will cause DRBD to always fall-back to zero-out on the receiving side, and to not even announce discard capabilities on the Primary, if the respective backend announces discard_zeroes_data=false. We used to ignore the discard_zeroes_data setting completely. To not break established and expected behaviour, and suddenly cause fstrim on thin-provisioned LVs to run out-of-space, instead of freeing up space, the default value is "yes". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13	drbd: Introduce new disk config option rs-discard-granularity	Philipp Reisner
	As long as the value is 0 the feature is disabled. With setting it to a positive value, DRBD limits and aligns its resync requests to the rs-discard-granularity setting. If the sync source detects all zeros in such a block, the resync target discards the range on disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13	Merge back earlier cpufreq changes for v4.8.	Rafael J. Wysocki

2016-06-13	vmlinux.lds.h: allow arch specific handling of ro_after_init data section	Heiko Carstens
	commit c74ba8b3480d ("arch: Introduce post-init read-only memory") introduced the __ro_after_init attribute which allows to add variables to the ro_after_init data section. This new section was added to rodata, even though it contains writable data. This in turn causes problems on architectures which mark the page table entries read-only that point to rodata very early. This patch allows architectures to implement an own handling of the .data..ro_after_init section. Usually that would be: - mark the rodata section read-only very early - mark the ro_after_init section read-only within mark_rodata_ro Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13	bitmap: bitmap_equal memcmp optimization	Martin Schwidefsky
	The bitmap_equal function has optimized code for small bitmaps with less than BITS_PER_LONG bits. For larger bitmaps the out-of-line function __bitmap_equal is called. For a constant number of bits divisible by BITS_PER_LONG the memcmp function can be used. For s390 gcc knows how to optimize this function, memcmp calls with up to 256 bytes / 2048 bits are translated into a single instruction. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>