linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-04-10	Merge drm/drm-next into drm-misc-next	Sean Paul
	Finally have a reason for a backmerge other than "it's been a while"! Backmerging drm-next to -misc-next to facilitate Rob Herring's work on Panfrost. Signed-off-by: Sean Paul <seanpaul@chromium.org>
2019-04-10	Revert: "net: sched: put back q.qlen into a single location"	Paolo Abeni
	This revert commit 46b1c18f9deb ("net: sched: put back q.qlen into a single location"). After the previous patch, when a NOLOCK qdisc is enslaved to a locking qdisc it switches to global stats accounting. As a consequence, when a classful qdisc accesses directly a child qdisc's qlen, such qdisc is not doing per CPU accounting and qlen value is consistent. In the control path nobody uses directly qlen since commit e5f0e8f8e45 ("net: sched: introduce and use qdisc tree flush/purge helpers"), so we can remove the contented atomic ops from the datapath. v1 -> v2: - complete the qdisc_qstats_atomic_qlen_dec() -> qdisc_qstats_cpu_qlen_dec() replacement, fix build issue - more descriptive commit message Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10	net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too	Paolo Abeni
	Since stats updating is always consistent with TCQ_F_CPUSTATS flag, we can disable it at qdisc creation time flipping such bit. In my experiments, if the NOLOCK flag is cleared, per CPU stats accounting does not give any measurable performance gain, but it waste some memory. Let's clear TCQ_F_CPUSTATS together with NOLOCK, when enslaving a NOLOCK qdisc to 'lock' one. Use stats update helper inside pfifo_fast, to cope correctly with TCQ_F_CPUSTATS flag change. As a side effect, q.qlen value for any child qdiscs is always consistent for all lock classfull qdiscs. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10	net: sched: always do stats accounting according to TCQ_F_CPUSTATS	Paolo Abeni
	The core sched implementation checks independently for NOLOCK flag to acquire/release the root spin lock and for qdisc_is_percpu_stats() to account per CPU values in many places. This change update the last few places checking the TCQ_F_NOLOCK to do per CPU stats accounting according to qdisc_is_percpu_stats() value. The above allows to clean dev_requeue_skb() implementation a bit and makes stats update always consistent with a single flag. v1 -> v2: - do not move qdisc_is_empty definition, fix build issue Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10	net: sched: prefer qdisc_is_empty() over direct qlen access	Paolo Abeni
	When checking for root qdisc queue length, do not access directly q.qlen. In the following patches we will move back qlen accounting to per CPU values for NOLOCK qdiscs. Instead, prefer the qdisc_is_empty() helper usage. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10	drm: switch drm_fb_xrgb8888_to_rgb888_dstclip to accept __iomem dst	Gerd Hoffmann
	Not all archs have the __io_virt() macro, so cirrus can't simply convert pointers that way. The drm format helpers have to use memcpy_toio() instead. This patch makes drm_fb_xrgb8888_to_rgb888_dstclip() accept a __iomem dst pointer and use memcpy_toio() instead of memcpy(). The helper function (drm_fb_xrgb8888_to_rgb888_line) has been changed to process a single scanline. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Noralf Trønnes <noralf@tronnes.org> Link: http://patchwork.freedesktop.org/patch/msgid/20190410063815.17062-4-kraxel@redhat.com
2019-04-10	drm: switch drm_fb_xrgb8888_to_rgb565_dstclip to accept __iomem dst	Gerd Hoffmann
	Not all archs have the __io_virt() macro, so cirrus can't simply convert pointers that way. The drm format helpers have to use memcpy_toio() instead. This patch makes drm_fb_xrgb8888_to_rgb565_dstclip() accept a __iomem dst pointer and use memcpy_toio() instead of memcpy(). The helper function (drm_fb_xrgb8888_to_rgb565_line) has been changed to process a single scanline. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Noralf Trønnes <noralf@tronnes.org> Link: http://patchwork.freedesktop.org/patch/msgid/20190410063815.17062-3-kraxel@redhat.com
2019-04-10	drm: switch drm_fb_memcpy_dstclip to accept __iomem dst	Gerd Hoffmann
	Not all archs have the __io_virt() macro, so cirrus can't simply convert pointers that way. The drm format helpers have to use memcpy_toio() instead. This patch makes drm_fb_memcpy_dstclip() accept a __iomem dst pointer and use memcpy_toio() instead of memcpy(). With that separating out the memcpy loop into the drm_fb_memcpy_lines() helper isn't useful any more, so move the code back into the calling functins. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Noralf Trønnes <noralf@tronnes.org> Link: http://patchwork.freedesktop.org/patch/msgid/20190410063815.17062-2-kraxel@redhat.com
2019-04-10	Merge branch 'mlx5-next' into rdma.git for-next	Jason Gunthorpe
	From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies on the next series * branch 'mlx5-next': net/mlx5: E-Switch, add a new prio to be used by the RDMA side net/mlx5: E-Switch, don't use hardcoded values for FDB prios net/mlx5: Fix false compilation warning net/mlx5: Expose MPEIN (Management PCIE INfo) register layout net/mlx5: Add rate limit print macros net/mlx5: Add explicit bar address field net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info net/mlx5: Use dev->priv.name instead of dev_name net/mlx5: Make mlx5_core messages independent from mdev->pdev net/mlx5: Break load_one into three stages net/mlx5: Function setup/teardown procedures net/mlx5: Move health and page alloc init to mdev_init net/mlx5: Split mdev init and pci init net/mlx5: Remove redundant init functions parameter net/mlx5: Remove spinlock support from mlx5_write64 net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macros Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10	keys: safe concurrent user->{session,uid}_keyring access	Jann Horn
	The current code can perform concurrent updates and reads on user->session_keyring and user->uid_keyring. Add a comment to struct user_struct to document the nontrivial locking semantics, and use READ_ONCE() for unlocked readers and smp_store_release() for writers to prevent memory ordering issues. Fixes: 69664cf16af4 ("keys: don't generate user and user session keyrings unless they're accessed") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-10	security: don't use RCU accessors for cred->session_keyring	Jann Horn
	sparse complains that a bunch of places in kernel/cred.c access cred->session_keyring without the RCU helpers required by the __rcu annotation. cred->session_keyring is written in the following places: - prepare_kernel_cred() [in a new cred struct] - keyctl_session_to_parent() [in a new cred struct] - prepare_creds [in a new cred struct, via memcpy] - install_session_keyring_to_cred() - from install_session_keyring() on new creds - from join_session_keyring() on new creds [twice] - from umh_keys_init() - from call_usermodehelper_exec_async() on new creds All of these writes are before the creds are committed; therefore, cred->session_keyring doesn't need RCU protection. Remove the __rcu annotation and fix up all existing users that use __rcu. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-10	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio fixes from Michael Tsirkin: "Several fixes, add more reviewers to the list" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio: Honour 'may_reduce_num' in vring_create_virtqueue MAiNTAINERS: add Paolo, Stefan for virtio blk/scsi virtio_pci: fix a NULL pointer reference in vp_del_vqs
2019-04-10	Merge branch 'omap-for-v5.2/am4-ddr3' into omap-for-v5.2/am4-pm-v2	Tony Lindgren

2019-04-10	blk-mq: introduce blk_mq_complete_request_sync()	Ming Lei
	In NVMe's error handler, follows the typical steps of tearing down hardware for recovering controller: 1) stop blk_mq hw queues 2) stop the real hw queues 3) cancel in-flight requests via blk_mq_tagset_busy_iter(tags, cancel_request, ...) cancel_request(): mark the request as abort blk_mq_complete_request(req); 4) destroy real hw queues However, there may be race between #3 and #4, because blk_mq_complete_request() may run q->mq_ops->complete(rq) remotelly and asynchronously, and ->complete(rq) may be run after #4. This patch introduces blk_mq_complete_request_sync() for fixing the above race. Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Bart Van Assche <bvanassche@acm.org> Cc: James Smart <james.smart@broadcom.com> Cc: linux-nvme@lists.infradead.org Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-10	clk: sunxi-ng: sun5i: Export the MBUS clock	Maxime Ripard
	The MBUS clock is used by the MBUS controller, so let's export it so that we can use it in our DT node. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2019-05-02	soundwire: remove multiple blank lines	Vinod Koul
	Multi-blank lines do not help readability so remove them Checkpatch complains: CHECK: Please don't use multiple blank lines Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02	soundwire: wrap macro argument in parenthesis	Vinod Koul
	macro argument should be inside a parenthesis to avoid precedence issues checkpatch complains: CHECK: Macro argument 'n' may be better as '(n)' to avoid precedence issues Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02	soundwire: intel: more alignment fixes	Vinod Koul
	Found few more issues reported checkpatch on code alignment so fix those as well in the intel module. Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02	soundwire: more alignment fixes	Vinod Koul
	Found few more issues reported checkpatch on code alignment so fix those as well in the soundwire core. Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02	soundwire: add argument to function definition	Vinod Koul
	Checkpatch warns that function definition of __sdw_register_driver misses argument, so add it WARNING: function definition argument 'struct module *' should also have an identifier name Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02	soundwire: fix SPDX license for header files	Vinod Koul
	Some more headers had C++ style SDPX line, fix that and change copyright so that it is consistent with rest of the code in subsystem Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02	soundwire: intel: fix SPDX license for header file	Vinod Koul
	Some more headers had C++ style SDPX line, fix that and change copyright so that it is consistent with rest of the code in subsystem Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27	Merge tag 'thunderbolt-for-v5.2' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into char-misc-next Mika writes: thunderbolt: Changes for v5.2 merge window This improves software connection manager on older Apple systems with Thunderbolt 1 and 2 controller to support full PCIe daisy chains, Display Port tunneling and P2P networking. There are also fixes for potential NULL pointer dereferences at various places in the driver. * tag 'thunderbolt-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (44 commits) thunderbolt: Make priority unsigned in struct tb_path thunderbolt: Start firmware on Titan Ridge Apple systems thunderbolt: Reword output of tb_dump_hop() thunderbolt: Make rest of the logging to happen at debug level thunderbolt: Make __TB_[SW\|PORT]_PRINT take const parameters thunderbolt: Add support for XDomain connections thunderbolt: Make tb_switch_alloc() return ERR_PTR() thunderbolt: Add support for DMA tunnels thunderbolt: Add XDomain UUID exchange support thunderbolt: Run tb_xdp_handle_request() in system workqueue thunderbolt: Do not tear down tunnels when driver is unloaded thunderbolt: Add support for Display Port tunnels thunderbolt: Rework NFC credits handling thunderbolt: Generalize port finding routines to support all port types thunderbolt: Scan only valid NULL adapter ports in hotplug thunderbolt: Add support for full PCIe daisy chains thunderbolt: Discover preboot PCIe paths the boot firmware established thunderbolt: Deactivate all paths before restarting them thunderbolt: Extend tunnel creation to more than 2 adjacent switches thunderbolt: Add helper function to iterate from one port to another ...
2019-04-25	coresight: Communicate perf event to sink buffer allocation functions	Mathieu Poirier
	Make struct perf_event available to sink buffer allocation functions in order to use the pid they carry to allocate and free buffer memory along with regimenting access to what source a sink can collect data for. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Tested-by: Robert Walker <robert.walker@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25	coresight: Adding return code to sink::disable() operation	Mathieu Poirier
	In preparation to handle device reference counting inside of the sink drivers, add a return code to the sink::disable() operation so that proper action can be taken if a sink has not been disabled. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Tested-by: Robert Walker <robert.walker@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25	coresight: etm4x: Add kernel configuration for CONTEXTID	Mathieu Poirier
	Set the proper bit in the configuration register when contextID tracing has been requested by user space. That way PE_CONTEXT elements are generated by the tracers when a process is installed on a CPU. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Tested-by: Robert Walker <robert.walker@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25	nvmem: core: add nvmem_cell_read_u16	Fabrice Gasnier
	Add nvmem_cell_read_u16() helper to ease read of an u16 value on consumer side. This is inspired by nvmem_cell_read_u32() function. This helper is useful on stm32 that has 16 bits data cells stored in non volatile memory. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25	drivers/misc: Add Aspeed P2A control driver	Patrick Venture
	The ASPEED AST2400, and AST2500 in some configurations include a PCI-to-AHB MMIO bridge. This bridge allows a server to read and write in the BMC's physical address space. This feature is especially useful when using this bridge to send large files to the BMC. The host may use this to send down a firmware image by staging data at a specific memory address, and in a coordinated effort with the BMC's software stack and kernel, transmit the bytes. This driver enables the BMC to unlock the PCI bridge on demand, and configure it via ioctl to allow the host to write bytes to an agreed upon location. In the primary use-case, the region to use is known apriori on the BMC, and the host requests this information. Once this request is received, the BMC's software stack will enable the bridge and the region and then using some software flow control (possibly via IPMI packets), copy the bytes down. Once the process is complete, the BMC will disable the bridge and unset any region involved. The default behavior of this bridge when present is: enabled and all regions marked read-write. This driver will fix the regions to be read-only and then disable the bridge entirely. The memory regions protected are: * BMC flash MMIO window * System flash MMIO windows * SOC IO (peripheral MMIO) * DRAM The DRAM region itself is all of DRAM and cannot be further specified. Once the PCI bridge is enabled, the host can read all of DRAM, and if the DRAM section is write-enabled, then it can write to all of it. Signed-off-by: Patrick Venture <venture@google.com> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-21	Merge 5.1-rc6 into char-misc-next	Greg Kroah-Hartman
	We want the fixes, and this resolves a merge error in the fastrpc driver. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-18	thunderbolt: Add XDomain UUID exchange support	Mika Westerberg
	Currently ICM has been handling XDomain UUID exchange so there was no need to have it in the driver yet. However, since now we are going to add the same capabilities to the software connection manager it needs to be handled properly. For this reason modify the driver XDomain protocol handling so that if the remote domain UUID is not filled in the core will query it first and only then start the normal property exchange flow. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-04-10	locking/rwsem: Optimize rwsem structure for uncontended lock acquisition	Waiman Long
	For an uncontended rwsem, count and owner are the only fields a task needs to touch when acquiring the rwsem. So they are put next to each other to increase the chance that they will share the same cacheline. On a ThunderX2 99xx (arm64) system with 32K L1 cache and 256K L2 cache, a rwsem locking microbenchmark with one locking thread was run to write-lock and write-unlock an array of rwsems separated 2 cachelines apart in a 1M byte memory block. The locking rates (kops/s) of the microbenchmark when the rwsems are at various "long" (8-byte) offsets from beginning of the cacheline before and after the patch were as follows: Cacheline Offset Pre-patch Post-patch ---------------- --------- ---------- 0 17,449 16,588 1 17,450 16,465 2 17,450 16,460 3 17,453 16,462 4 14,867 16,471 5 14,867 16,470 6 14,853 16,464 7 14,867 13,172 Before the patch, the count and owner are 4 "long"s apart. After the patch, they are only 1 "long" apart. The rwsem data have to be loaded from the L3 cache for each access. It can be seen that the locking rates are more consistent after the patch than before. Note that for this particular system, the performance drop happens whenever the count and owner are at an odd multiples of "long"s apart. No performance drop was observed when only a single rwsem was used (hot cache). So the drop is likely just an idiosyncrasy of the cache architecture of this chip than an inherent problem with the patch. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20190404174320.22416-12-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-10	locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h	Waiman Long
	We don't need to expose rwsem internal functions which are not supposed to be called directly from other kernel code. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20190404174320.22416-4-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-10	Merge branch 'linus' into locking/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-10	OPP: Introduce dev_pm_opp_find_freq_ceil_by_volt()	Andrew-sh.Cheng
	This patch introduces a new helper routine in the OPP core, which returns the OPP with the highest frequency which has voltage less than or equal to the target voltage passed to the helper. Signed-off-by: Andrew-sh.Cheng <andrew-sh.cheng@mediatek.com> [ Viresh: Massaged the commit log and renamed the helper with some cleanups. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-04-10	net/mlx5: E-Switch, add a new prio to be used by the RDMA side	Mark Bloch
	Create a new prio in the FDB, it will be used when inserting steering rules into the FDB from the RDMA side. We create a new PRIO so rules from the net side and rules from the RDMA side won't be inserted to the same PRIO, each side has it's own sandbox to play in. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-10	net/mlx5: E-Switch, don't use hardcoded values for FDB prios	Mark Bloch
	When creating the FDB prios, use the enum values already defined and not the hardcoded values. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-04-09	dt-bindings: clock: sifive: add FU540-C000 PRCI clock constants	Paul Walmsley
	Add preprocessor macros for the important PRCI output clocks that are needed by both the FU540 PRCI driver and DT data. Details are available in the FU540 manual in Chapter 7 of https://static.dev.sifive.com/FU540-C000-v1.0.pdf Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-04-09	bpf: allow for key-less BTF in array map	Daniel Borkmann
	Given we'll be reusing BPF array maps for global data/bss/rodata sections, we need a way to associate BTF DataSec type as its map value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR() macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR") to get initial map to type association going. While more use cases for it are discouraged, this also won't work for global data since the use of array map is a BPF loader detail and therefore unknown at compilation time. For array maps with just a single entry we make an exception in terms of BTF in that key type is declared optional if value type is of DataSec type. The latter LLVM is guaranteed to emit and it also aligns with how we regard global data maps as just a plain buffer area reusing existing map facilities for allowing things like introspection with existing tools. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09	bpf: add specification for BTF Var and DataSec kinds	Daniel Borkmann
	This adds the BTF specification and UAPI bits for supporting BTF Var and DataSec kinds. This is following LLVM upstream commit ac4082b77e07 ("[BPF] Add BTF Var and DataSec Support") which has been merged recently. Var itself is for describing a global variable and DataSec to describe ELF sections e.g. data/bss/rodata sections that hold one or multiple global variables. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09	bpf: add syscall side map freeze support	Daniel Borkmann
	This patch adds a new BPF_MAP_FREEZE command which allows to "freeze" the map globally as read-only / immutable from syscall side. Map permission handling has been refactored into map_get_sys_perms() and drops FMODE_CAN_WRITE in case of locked map. Main use case is to allow for setting up .rodata sections from the BPF ELF which are loaded into the kernel, meaning BPF loader first allocates map, sets up map value by copying .rodata section into it and once complete, it calls BPF_MAP_FREEZE on the map fd to prevent further modifications. Right now BPF_MAP_FREEZE only takes map fd as argument while remaining bpf_attr members are required to be zero. I didn't add write-only locking here as counterpart since I don't have a concrete use-case for it on my side, and I think it makes probably more sense to wait once there is actually one. In that case bpf_attr can be extended as usual with a flag field and/or others where flag 0 means that we lock the map read-only hence this doesn't prevent to add further extensions to BPF_MAP_FREEZE upon need. A map creation flag like BPF_F_WRONCE was not considered for couple of reasons: i) in case of a generic implementation, a map can consist of more than just one element, thus there could be multiple map updates needed to set the map into a state where it can then be made immutable, ii) WRONCE indicates exact one-time write before it is then set immutable. A generic implementation would set a bit atomically on map update entry (if unset), indicating that every subsequent update from then onwards will need to bail out there. However, map updates can fail, so upon failure that flag would need to be unset again and the update attempt would need to be repeated for it to be eventually made immutable. While this can be made race-free, this approach feels less clean and in combination with reason i), it's not generic enough. A dedicated BPF_MAP_FREEZE command directly sets the flag and caller has the guarantee that map is immutable from syscall side upon successful return for any future syscall invocations that would alter the map state, which is also more intuitive from an API point of view. A command name such as BPF_MAP_LOCK has been avoided as it's too close with BPF map spin locks (which already has BPF_F_LOCK flag). BPF_MAP_FREEZE is so far only enabled for privileged users. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09	bpf: add program side {rd, wr}only support for maps	Daniel Borkmann
	This work adds two new map creation flags BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG in order to allow for read-only or write-only BPF maps from a BPF program side. Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only applies to system call side, meaning the BPF program has full read/write access to the map as usual while bpf(2) calls with map fd can either only read or write into the map depending on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows for the exact opposite such that verifier is going to reject program loads if write into a read-only map or a read into a write-only map is detected. For read-only map case also some helpers are forbidden for programs that would alter the map state such as map deletion, update, etc. As opposed to the two BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well as BPF_F_WRONLY_PROG really do correspond to the map lifetime. We've enabled this generic map extension to various non-special maps holding normal user data: array, hash, lru, lpm, local storage, queue and stack. Further generic map types could be followed up in future depending on use-case. Main use case here is to forbid writes into .rodata map values from verifier side. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09	bpf: implement lookup-free direct value access for maps	Daniel Borkmann
	This generic extension to BPF maps allows for directly loading an address residing inside a BPF map value as a single BPF ldimm64 instruction! The idea is similar to what BPF_PSEUDO_MAP_FD does today, which is a special src_reg flag for ldimm64 instruction that indicates that inside the first part of the double insns's imm field is a file descriptor which the verifier then replaces as a full 64bit address of the map into both imm parts. For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following: the first part of the double insns's imm field is again a file descriptor corresponding to the map, and the second part of the imm field is an offset into the value. The verifier will then replace both imm parts with an address that points into the BPF map value at the given value offset for maps that support this operation. Currently supported is array map with single entry. It is possible to support more than just single map element by reusing both 16bit off fields of the insns as a map index, so full array map lookup could be expressed that way. It hasn't been implemented here due to lack of concrete use case, but could easily be done so in future in a compatible way, since both off fields right now have to be 0 and would correctly denote a map index 0. The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of map pointer versus load of map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's encoding into off by one to differ between regular map pointer and map value pointer would add unnecessary complexity and increases barrier for debugability thus less suitable. Using the second part of the imm field as an offset into the value does /not/ come with limitations since maximum possible value size is in u32 universe anyway. This optimization allows for efficiently retrieving an address to a map value memory area without having to issue a helper call which needs to prepare registers according to calling convention, etc, without needing the extra NULL test, and without having to add the offset in an additional instruction to the value base pointer. The verifier then treats the destination register as PTR_TO_MAP_VALUE with constant reg->off from the user passed offset from the second imm field, and guarantees that this is within bounds of the map value. Any subsequent operations are normally treated as typical map value handling without anything extra needed from verification side. The two map operations for direct value access have been added to array map for now. In future other types could be supported as well depending on the use case. The main use case for this commit is to allow for BPF loader support for global variables that reside in .data/.rodata/.bss sections such that we can directly load the address of them with minimal additional infrastructure required. Loader support has been added in subsequent commits for libbpf library. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09	unexport d_alloc_pseudo()	Al Viro
	No modular uses since introducion of alloc_file_pseudo(), and the only non-modular user not in alloc_file_pseudo() had actually been wrong - should've been d_alloc_anon(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-09	dcache: sort the freeing-without-RCU-delay mess for good.	Al Viro
	For lockless accesses to dentries we don't have pinned we rely (among other things) upon having an RCU delay between dropping the last reference and actually freeing the memory. On the other hand, for things like pipes and sockets we neither do that kind of lockless access, nor want to deal with the overhead of an RCU delay every time a socket gets closed. So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags made sure it would happen. We tried to avoid setting it unless we knew we need it. Unfortunately, that had led to recurring class of bugs, in which we missed the need to set it. We only really need it for dentries that are created by d_alloc_pseudo(), so let's not bother with trying to be smart - just make having an RCU delay the default. The ones that do not get it set the replacement flag (DCACHE_NORCU) and we'd better use that sparingly. d_alloc_pseudo() is the only such user right now. FWIW, the race that finally prompted that switch had been between __lock_parent() of immediate subdirectory of what's currently the root of a disconnected tree (e.g. from open-by-handle in progress) racing with d_splice_alias() elsewhere picking another alias for the same inode, either on outright corrupted fs image, or (in case of open-by-handle on NFS) that subdirectory having been just moved on server. It's not easy to hit, so the sky is not falling, but that's not the first race on similar missed cases and the logics for settinf DCACHE_RCUACCESS has gotten ridiculously convoluted. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-10	cpuidle: Export the next timer expiration for CPUs	Ulf Hansson
	To be able to predict the sleep duration for a CPU entering idle, it is essential to know the expiration time of the next timer. Both the teo and the menu cpuidle governors already use this information for CPU idle state selection. Moving forward, a similar prediction needs to be made for a group of idle CPUs rather than for a single one and the following changes implement a new genpd governor for that purpose. In order to support that feature, add a new function called tick_nohz_get_next_hrtimer() that will return the next hrtimer expiration time of a given CPU to be invoked after deciding whether or not to stop the scheduler tick on that CPU. Make the cpuidle core call tick_nohz_get_next_hrtimer() right before invoking the ->enter() callback provided by the cpuidle driver for the given state and store its return value in the per-CPU struct cpuidle_device, so as to make it available to code outside of cpuidle. Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(), the governor's ->select() callback has already returned and indicated whether or not the tick should be stopped, so in fact the value returned by tick_nohz_get_next_hrtimer() always is the next hrtimer expiration time for the given CPU, possibly including the tick (if it hasn't been stopped). Co-developed-by: Lina Iyer <lina.iyer@linaro.org> Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-10	PM / Domains: Add support for CPU devices to genpd	Ulf Hansson
	To enable a CPU device to be attached to a PM domain managed by genpd, make a few changes to it for convenience. To be able to quickly find out what CPUs are attached to a genpd, which typically becomes useful from a genpd governor as subsequent changes are about to show, add a cpumask to struct generic_pm_domain to be updated when a CPU device gets attached to the genpd containing that cpumask. Also, propagate the cpumask changes upwards in the domain hierarchy to the master PM domains. This way, the cpumask for a genpd hierarchically reflects all CPUs attached to the topology below it. Finally, make this an opt-in feature, to avoid having to manage CPUs and the cpumask for a genpd that don't need it. To that end, add a new genpd configuration bit, GENPD_FLAG_CPU_DOMAIN. Co-developed-by: Lina Iyer <lina.iyer@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-10	PM / Domains: Add generic data pointer to struct genpd_power_state	Ulf Hansson
	Add a data pointer to the genpd_power_state struct, to allow a genpd backend driver to store per-state specific data. To introduce the pointer, change the way genpd deals with freeing of the corresponding allocated data. More precisely, clarify the responsibility of whom that shall free the data, by adding a ->free_states() callback to the generic_pm_domain structure. The one allocating the data will be expected to set the callback, to allow genpd to invoke it from genpd_remove(). Co-developed-by: Lina Iyer <lina.iyer@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-09	docs: Add colon clearing sphinx warning	Tobin C. Harding
	Sphinx emits various warnings all caused by a missing colon before code block: WARNING: Block quote ends without a blank line; unexpected unindent. ERROR: Unexpected indentation. WARNING: Block quote ends without a blank line; unexpected unindent. Add the colon, clearing sphinx warnings. Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-04-09	Merge tag 'mac80211-for-davem-2019-04-09' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Various fixes: * iTXQ fixes from Felix * tracing fix - increase message length * fix SW_CRYPTO_CONTROL enforcement * WMM rule handling for regdomain intersection * max_interfaces in hwsim - reported by syzbot * clear private data in some more commands * a clang compiler warning fix I added a patch with two new (unused) macros for rate-limited printing to simplify getting the users into the tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-09	memory: ti-emif-sram: Add ti_emif_run_hw_leveling for DDR3 hardware leveling	Dave Gerlach
	In certain situations, such as when returning from low power modes, the EMIF must re-run hardware leveling to properly restore DDR3 access. This is accomplished by introducing a new ti-emif-sram-pm call, ti_emif_run_hw_leveling, to check if DDR3 is in use and if so, trigger the full write and read leveling processes. Suggested-by: Brad Griffis <bgriffis@ti.com> Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Acked-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Tony Lindgren <tony@atomide.com>