linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-12-13	iov_iter: pass void csum pointer to csum_and_copy_to_iter	Sagi Grimberg
	The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram and we are trying to unite its logic with skb_copy_datagram_iter by passing a callback to the copy function that we want to apply. Thus, we need to make the checksum pointer private to the function. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-12	Merge tag 'mlx5e-updates-2018-12-11' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-12-11 From Eli Britstein, Patches 1-10 adds remote mirroring support. Patches 1-4 refactor encap related code as pre-steps for using per destination encapsulation properties. Patches 5-7 use extended destination feature for single/multi destination scenarios that have a single encap destination. Patches 8-10 enable multiple encap destinations for a TC flow. From, Daniel Jurgens, Patch 11, Use CQE padding for Ethernet CQs, PPC showed up to a 24% improvement in small packet throughput From Eyal Davidovich, patches 12-14, FW monitor counter support FW monitor counters feature came to solve the delayed reporting of FW stats in the atomic get_stats64 ndo, since we can't access the FW at that stage, this feature will enable immediate FW stats updates in the driver via fw events on specific stats updates. Patch 12, cleanup to avoid querying a FW counter when it is not supported Patch 13, Monitor counters FW commands support Patch 14, Use monitor counters in ethernet netdevice to update FW stats reported in the atomic get_stats64 ndo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	David S. Miller
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix warnings suspicious rcu usage when handling base chain statistics, from Taehee Yoo. 2) Refetch pointer to tcp header from nf_ct_sack_adjust() since skb_make_writable() may reallocate data area, reported by Google folks patch from Florian. 3) Incorrect netlink nest end after previous cancellation from error path in ipset, from Pan Bian. 4) Use dst_hold_safe() from nf_xfrm_me_harder(), from Florian. 5) Use rb_link_node_rcu() for rcu-protected rbtree node in nf_conncount, from Taehee Yoo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12	efi: Add an EFI signature blob parser	Dave Howells
	Add a function to parse an EFI signature blob looking for elements of interest. A list is made up of a series of sublists, where all the elements in a sublist are of the same type, but sublists can be of different types. For each sublist encountered, the function pointed to by the get_handler_for_guid argument is called with the type specifier GUID and returns either a pointer to a function to handle elements of that type or NULL if the type is not of interest. If the sublist is of interest, each element is passed to the handler function in turn. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2018-12-12	efi: Add EFI signature data types	Dave Howells
	Add the data types that are used for containing hashes, keys and certificates for cryptographic verification along with their corresponding type GUIDs. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Nayna Jain <nayna@linux.ibm.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.morris@microsoft.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2018-12-12	blkcg: handle dying request_queue when associating a blkg	Dennis Zhou
	Between v3 [1] and v4 [2] of the blkg association series, the association point moved from generic_make_request_checks(), which is called after the request enters the queue, to bio_set_dev(), which is when the bio is formed before submit_bio(). When the request_queue goes away, the blkgs supporting the request_queue are destroyed and then the q->root_blkg is set to %NULL. This patch adds a %NULL check to blkg_tryget_closest() to prevent the NPE caused by the above. It also adds a guard to see if the request_queue is dying when creating a blkg to prevent creating a blkg for a dead request_queue. [1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/ [2] https://lore.kernel.org/lkml/20181126211946.77067-1-dennis@kernel.org/ Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") Reported-and-tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-12	net: ndo_bridge_setlink: Add extack	Petr Machata
	Drivers may not be able to implement a VLAN addition or reconfiguration. In those cases it's desirable to explain to the user that it was rejected (and why). To that end, add extack argument to ndo_bridge_setlink. Adapt all users to that change. Following patches will use the new argument in the bridge driver. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12	bpf: add bpffs pretty print for cgroup local storage maps	Roman Gushchin
	Implement bpffs pretty printing for cgroup local storage maps (both shared and per-cpu). Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c): Shared: $ cat /sys/fs/bpf/map_2 # WARNING!! The output is for debug purpose only # WARNING!! The output format will change {4294968594,1}: {9999,1039896} Per-cpu: $ cat /sys/fs/bpf/map_1 # WARNING!! The output is for debug purpose only # WARNING!! The output format will change {4294968594,1}: { cpu0: {0,0,0,0,0} cpu1: {0,0,0,0,0} cpu2: {1,104,0,0,0} cpu3: {0,0,0,0,0} } Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-12	bpf: pass struct btf pointer to the map_check_btf() callback	Roman Gushchin
	If key_type or value_type are of non-trivial data types (e.g. structure or typedef), it's not possible to check them without the additional information, which can't be obtained without a pointer to the btf structure. So, let's pass btf pointer to the map_check_btf() callbacks. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-13	power: supply: charger-manager: fix race-condition in sysfs registration	Sebastian Reichel
	This registers custom sysfs properties using the native functionality of the power-supply framework, which cleans up the code a bit and fixes a race-condition. Before this patch the sysfs attributes were not properly registered to udev. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2018-12-13	power: supply: core: add support for custom sysfs attributes	Sebastian Reichel
	Add functionality to setup device specific sysfs attributes in a race condition free manner Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2018-12-12	cpuidle: Add 'above' and 'below' idle state metrics	Rafael J. Wysocki
	Add two new metrics for CPU idle states, "above" and "below", to count the number of times the given state had been asked for (or entered from the kernel's perspective), but the observed idle duration turned out to be too short or too long for it (respectively). These metrics help to estimate the quality of the CPU idle governor in use. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-12	Merge tag 'imx-drivers-4.21' of ↵	Olof Johansson
	git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into next/drivers i.MX drivers change for 4.21: - A series from Aisheng that improves SCU power domain bindings by defining '#power-domain-cells' as 1, and adds i.MX8 SCU power domain driver support on top of it. - A series from Lucas that updates gpcv2 driver for scalability and adds i.MX8MQ support into the driver. - Increase gpc driver GPC_CLK_MAX definition to 7, as DISPLAY power domain on imx6sx has 7 clocks. * tag 'imx-drivers-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: soc: imx: gpc: Increase GPC_CLK_MAX to 7 soc: imx: gpcv2: add support for i.MX8MQ SoC soc: imx: gpcv2: move register access table to domain data soc: imx: gpcv2: prefix i.MX7 specific defines firmware: imx: add SCU power domain driver firmware: imx: add pm svc headfile dt-bindings: fsl: scu: update power domain binding firmware: imx: remove resource id enums dt-bindings: imx: add scu resource id headfile Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-12	Merge tag 'v4.20-next-soc' of ↵	Olof Johansson
	https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into next/drivers add helper functions to create and send commands to the global command engine (GCE) device using the command queue driver (cmdq). * tag 'v4.20-next-soc' of https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux: soc: mediatek: Add Mediatek CMDQ helper Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-12	Merge tag 'pxa-for-4.21' of https://github.com/rjarzmik/linux into next/drivers	Olof Johansson
	This pxa update brings only a single patch, finishing the dmaengine conversion. * tag 'pxa-for-4.21' of https://github.com/rjarzmik/linux: dmaengine: pxa: make the filter function internal Signed-off-by: Olof Johansson <olof@lixom.net>
2018-12-12	Merge branch 'for-next/perf' into aarch64/for-next/core	Will Deacon
	Merge in arm64 perf and PMU driver updates, including support for the system/uncore PMU in the ThunderX2 platform.
2018-12-12	PM / AVS: SmartReflex: Switch to SPDX Licence ID	Nishanth Menon
	Fix up licensing to be inline with Linux conventions. Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-12	Merge tag 'usb-for-v4.21' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next Felipe writes: USB changes for v4.21 So it looks like folks are interested in dwc3 again. Almost 64% of the changes are in dwc3 this time around with some other bits in gadget functions and dwc2. There are two important parts here: a. removal of the waitqueue from dwc3's dequeue implementation, which will guarantee that gadget functions can dequeue from any context and; b. better method for starting isochronous transfers to avoid, as much as possible, missed isoc frames. Apart from these, we have the usual set of non-critical fixes and new features all over the place. * tag 'usb-for-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb: (56 commits) usb: dwc2: Fix disable all EP's on disconnect usb: dwc3: gadget: Disable CSP for stream OUT ep usb: dwc2: disable power_down on Amlogic devices Revert "usb: dwc3: pci: Use devm functions to get the phy GPIOs" USB: gadget: udc: s3c2410_udc: convert to DEFINE_SHOW_ATTRIBUTE usb: mtu3: fix dbginfo in qmu_tx_zlp_error_handler usb: dwc3: trace: add missing break statement to make compiler happy usb: dwc3: gadget: Report isoc transfer frame number usb: gadget: Introduce frame_number to usb_request usb: renesas_usbhs: Use SIMPLE_DEV_PM_OPS macro usb: renesas_usbhs: Remove dummy runtime PM callbacks usb: dwc2: host: use hrtimer for NAK retries usb: mtu3: clear SOFTCONN when clear USB3_EN if work as HS mode usb: mtu3: enable SETUPENDISR interrupt usb: mtu3: fix the issue about SetFeature(U1/U2_Enable) usb: mtu3: enable hardware remote wakeup from L1 automatically usb: mtu3: remove QMU checksum usb/mtu3: power down device ip at setup usb: dwc2: Disable power down feature on Samsung SoCs usb: dwc3: Correct the logic for checking TRB full in __dwc3_prepare_one_trb() ...
2018-12-12	pwm: Drop legacy wrapper for changing polarity	Uwe Kleine-König
	The API to configure a PWM using pwm_enable(), pwm_disable(), pwm_config() and pwm_set_polarity() is superseeded by atomically setting the parameters using pwm_apply_state(). To get forward with deprecating the former set of functions use the opportunity that there is no current user of pwm_set_polarity() and remove it. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2018-12-12	Merge tag 'phy-for-4.21_v1' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-next Kishon writes: phy: for 4.21 ) Change phy set_mode ops to take both mode and setmode as arguments ) Add phy_configure() and phy_validate() API's mostly used for MIPI D-PHY ) Add helpers to get default values of parameters define in MIPI D-PHY spec ) Add driver for TI's CPSW Port PHY Interface Mode selection ) Add driver for Cadence Sierra PHY used with USB and PCIe ) Add driver for Freescale i.MX8MQ USB3 PHY ) Fixes QMP PHY bindings to allow the clocks provided by the PHY to be pointed at in device tree ) Fix for using fully specified regions (in device tree) for configuring the second lane in dual lane PHYs in QMP PHY ) Add support for Allwinner H6 USB2 PHY in phy-sun4i-usb driver ) Update phy-rcar-gen3-usb driver to follow the hardware manual ) Add support for fine grained power management in mapphone-mdm6600 driver Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> tag 'phy-for-4.21_v1' of git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy: (30 commits) phy: qcom-qmp: Expose provided clocks to DT dt-bindings: phy-qcom-qmp: Move #clock-cells to child phy: qcom-qmp: Utilize fully-specified DT registers dt-bindings: phy-qcom-qmp: Fix register underspecification phy: ti: fix semicolon.cocci warnings phy: dphy: Add configuration helpers phy: Add MIPI D-PHY configuration options phy: Add configuration interface phy: Add MIPI D-PHY mode phy: add driver for Freescale i.MX8MQ USB3 PHY dt-bindings: phy: add binding for Freescale i.MX8MQ USB3 PHY phy: Use of_node_name_eq for node name comparisons net: ethernet: ti: cpsw: add support for port interface mode selection phy dt-bindings: net: ti: cpsw: switch to use phy-gmii-sel phy phy: ti: introduce phy-gmii-sel driver dt-bindings: phy: add cpsw port interface mode selection phy bindings phy: mvebu-cp110-comphy: fix spelling in structure name phy: mapphone-mdm6600: Improve phy related runtime PM calls phy: renesas: rcar-gen3-usb2: follow the hardware manual procedure phy: cadence: Add driver for Sierra PHY ...
2018-12-12	phy: dphy: Add configuration helpers	Maxime Ripard
	The MIPI D-PHY spec defines default values and boundaries for most of the parameters it defines. Introduce helpers to help drivers get meaningful values based on their current parameters, and validate the boundaries of these parameters if needed. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12	phy: Add MIPI D-PHY configuration options	Maxime Ripard
	Now that we have some infrastructure for it, allow the MIPI D-PHY phy's to be configured through the generic functions through a custom structure added to the generic union. The parameters added here are the ones defined in the MIPI D-PHY spec, plus the number of lanes in use. The current set of parameters should cover all the potential users. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12	phy: Add configuration interface	Maxime Ripard
	The phy framework is only allowing to configure the power state of the PHY using the init and power_on hooks, and their power_off and exit counterparts. While it works for most, simple, PHYs supported so far, some more advanced PHYs need some configuration depending on runtime parameters. These PHYs have been supported by a number of means already, often by using ad-hoc drivers in their consumer drivers. That doesn't work too well however, when a consumer device needs to deal with multiple PHYs, or when multiple consumers need to deal with the same PHY (a DSI driver and a CSI driver for example). So we'll add a new interface, through two funtions, phy_validate and phy_configure. The first one will allow to check that a current configuration, for a given mode, is applicable. It will also allow the PHY driver to tune the settings given as parameters as it sees fit. phy_configure will actually apply that configuration in the phy itself. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12	phy: Add MIPI D-PHY mode	Maxime Ripard
	MIPI D-PHY is a MIPI standard meant mostly for display and cameras in embedded systems. Add a mode for it. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12	phy: core: clean up unused ethernet specific phy modes	Grygorii Strashko
	After recent changes PHY_MODE_SGMII, PHY_MODE_2500SGMII, PHY_MODE_QSGMII, PHY_MODE_10GKR are not used any more and can be removed. Hence - remove them. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12	phy: core: add PHY_MODE_ETHERNET	Grygorii Strashko
	Add new PHY's mode to be used by Ethernet PHY interface drivers or multipurpose PHYs like serdes. It will be reused in further changes. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12	phy: core: rework phy_set_mode to accept phy mode and submode	Grygorii Strashko
	Currently the attempt to add support for Ethernet interface mode PHY (MII/GMII/RGMII) will lead to the necessity of extending enum phy_mode and duplicate there values from phy_interface_t enum (or introduce more PHY callbacks) [1]. Both approaches are ineffective and would lead to fast bloating of enum phy_mode or struct phy_ops in the process of adding more PHYs for different subsystems which will make them unmaintainable. As discussed in [1] the solution could be to introduce dual level PHYs mode configuration - PHY mode and PHY submode. The PHY mode will define generic PHY type (subsystem - PCIE/ETHERNET/USB_) while the PHY submode - subsystem specific interface mode. The last is usually already defined in corresponding subsystem headers (phy_interface_t for Ethernet, enum usb_device_speed for USB). This patch is cumulative change which refactors PHY framework code to support dual level PHYs mode configuration - PHY mode and PHY submode. It extends .set_mode() callback to support additional parameter "int submode" and converts all corresponding PHY drivers to support new .set_mode() callback declaration. The new extended PHY API int phy_set_mode_ext(struct phy *phy, enum phy_mode mode, int submode) is introduced to support dual level PHYs mode configuration and existing phy_set_mode() API is converted to macros, so PHY framework consumers do not need to be changed (~21 matches). [1] http://lkml.kernel.org/r/d63588f6-9ab0-848a-5ad4-8073143bd95d@ti.com Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-11	bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K	Daniel Borkmann
	Michael and Sandipan report: Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF JIT allocations. At compile time it defaults to PAGE_SIZE * 40000, and is adjusted again at init time if MODULES_VADDR is defined. For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with the compile-time default at boot-time, which is 0x9c400000 when using 64K page size. This overflows the signed 32-bit bpf_jit_limit value: root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit -1673527296 and can cause various unexpected failures throughout the network stack. In one case `strace dhclient eth0` reported: setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) and similar failures can be seen with tools like tcpdump. This doesn't always reproduce however, and I'm not sure why. The more consistent failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host would time out on systemd/netplan configuring a virtio-net NIC with no noticeable errors in the logs. Given this and also given that in near future some architectures like arm64 will have a custom area for BPF JIT image allocations we should get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For 4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec() so therefore add another overridable bpf_jit_alloc_exec_limit() helper function which returns the possible size of the memory area for deriving the default heuristic in bpf_jit_charge_init(). Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default JIT memory provider, and therefore in case archs implement their custom module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}. Additionally, for archs supporting large page sizes, we should change the sysctl to be handled as long to not run into sysctl restrictions in future. Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations") Reported-by: Sandipan Das <sandipan@linux.ibm.com> Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-11	seccomp: add a return code to trap to userspace	Tycho Andersen
	This patch introduces a means for syscalls matched in seccomp to notify some other task that a particular filter has been triggered. The motivation for this is primarily for use with containers. For example, if a container does an init_module(), we obviously don't want to load this untrusted code, which may be compiled for the wrong version of the kernel anyway. Instead, we could parse the module image, figure out which module the container is trying to load and load it on the host. As another example, containers cannot mount() in general since various filesystems assume a trusted image. However, if an orchestrator knows that e.g. a particular block device has not been exposed to a container for writing, it want to allow the container to mount that block device (that is, handle the mount for it). This patch adds functionality that is already possible via at least two other means that I know about, both of which involve ptrace(): first, one could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. Unfortunately this is slow, so a faster version would be to install a filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. Since ptrace allows only one tracer, if the container runtime is that tracer, users inside the container (or outside) trying to debug it will not be able to use ptrace, which is annoying. It also means that older distributions based on Upstart cannot boot inside containers using ptrace, since upstart itself uses ptrace to monitor services while starting. The actual implementation of this is fairly small, although getting the synchronization right was/is slightly complex. Finally, it's worth noting that the classic seccomp TOCTOU of reading memory data from the task still applies here, but can be avoided with careful design of the userspace handler: if the userspace handler reads all of the task memory that is necessary before applying its security policy, the tracee's subsequent memory edits will not be read by the tracer. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: "Serge E. Hallyn" <serge@hallyn.com> Acked-by: Serge Hallyn <serge@hallyn.com> CC: Christian Brauner <christian@brauner.io> CC: Tyler Hicks <tyhicks@canonical.com> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-11	seccomp: switch system call argument type to void *	Tycho Andersen
	The const qualifier causes problems for any code that wants to write to the third argument of the seccomp syscall, as we will do in a future patch in this series. The third argument to the seccomp syscall is documented as void , so rather than just dropping the const, let's switch everything to use void as well. I believe this is safe because of 1. the documentation above, 2. there's no real type information exported about syscalls anywhere besides the man pages. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: "Serge E. Hallyn" <serge@hallyn.com> Acked-by: Serge Hallyn <serge@hallyn.com> CC: Christian Brauner <christian@brauner.io> CC: Tyler Hicks <tyhicks@canonical.com> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-11	net/mlx5e: Avoid query PPCNT register if not supported by the device	Eyal Davidovich
	PPCNT is not supported if PCAM access reg is supported and ppcnt bit is clear. Signed-off-by: Eyal Davidovich <eyald@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11	net/mlx5e: Use CQE padding for Ethernet CQs	Daniel Jurgens
	Writing 64B CQEs to 128B cache lines results in a RMW operation. Padding the CQEs to 128B if possible improves performance on 128B cache line systems like PPC. Testing on PPC showed up to a 24% improvement in small packet throughput vs the default behavior, depending on the workload and system topology. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11	Merge tag 'v4.20-rc6' into rdma.git for-next	Jason Gunthorpe
	For dependencies in following patches.
2018-12-11	Merge branch 'for-linus' of ↵	Mark Brown
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into regulator-4.21
2018-12-11	IB/mlx5: Report CapabilityMask2 in ib_query_port	Michael Guralnik
	CapabilityMask2 exists when IB_PORT_CAP_MASK2_SUP is set in the original capability mask. In such cases, query its value and report it in query port. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11	lightnvm: disable interleaved metadata	Igor Konopko
	Currently pblk only check the size of I/O metadata and does not take into account if this metadata is in a separate buffer or interleaved in a single metadata buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. This patch prevents pblk to be instantiated in case device only supports interleaved metadata. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11	lightnvm: dynamic DMA pool entry size	Igor Konopko
	Currently lightnvm and pblk uses single DMA pool, for which the entry size always is equal to PAGE_SIZE. The contents of each entry allocated from the DMA pool consists of a PPA list (8bytes * 64), leaving 56bytes * 64 space for metadata. Since the metadata field can be bigger, such as 128 bytes, the static size does not cover this use-case. This patch adds support for I/O metadata above 56 bytes by changing DMA pool size based on device meta size and allows pblk to use OOB metadata >=16B. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11	clk: Tag clk core files with SPDX	Stephen Boyd
	These are all GPL-2.0 files per the existing license text. Replace the boiler plate with the tag. Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-12-11	sched/topology: Make Energy Aware Scheduling depend on schedutil	Quentin Perret
	Energy Aware Scheduling (EAS) is designed with the assumption that frequencies of CPUs follow their utilization value. When using a CPUFreq governor other than schedutil, the chances of this assumption being true are small, if any. When schedutil is being used, EAS' predictions are at least consistent with the frequency requests. Although those requests have no guarantees to be honored by the hardware, they should at least guide DVFS in the right direction and provide some hope in regards to the EAS model being accurate. To make sure EAS is only used in a sane configuration, create a strong dependency on schedutil being used. Since having sugov compiled-in does not provide that guarantee, make CPUFreq call a scheduler function on governor changes hence letting it rebuild the scheduling domains, check the governors of the online CPUs, and enable/disable EAS accordingly. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11	PM: Introduce an Energy Model management framework	Quentin Perret
	Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ \| Thermal (IPA) \| \| Scheduler (EAS) \| \| Other \| +---------------+ +-----------------+ +-------------+ \| \| em_pd_energy() \| \| \| em_cpu_get() \| +-----------+ \| +--------+ \| \| \| v v v +---------------------+ \| \| \| Energy Model \| \| \| \| Framework \| \| \| +---------------------+ ^ ^ ^ \| \| \| em_register_perf_domain() +----------+ \| +---------+ \| \| \| +---------------+ +---------------+ +--------------+ \| cpufreq-dt \| \| arm_scmi \| \| Other \| +---------------+ +---------------+ +--------------+ ^ ^ ^ \| \| \| +--------------+ +---------------+ +--------------+ \| Device Tree \| \| Firmware \| \| ? \| +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11	sched/cpufreq: Prepare schedutil for Energy Aware Scheduling	Quentin Perret
	Schedutil requests frequency by aggregating utilization signals from the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of them. Since Energy Aware Scheduling (EAS) needs to be able to predict the frequency requests, it needs to forecast the decisions made by the governor. In order to prepare the introduction of EAS, introduce schedutil_freq_util() to centralize the aforementioned signal aggregation and make it available to both schedutil and EAS. Since frequency selection and energy estimation still need to deal with RT and DL signals slightly differently, schedutil_freq_util() is called with a different 'type' parameter in those two contexts, and returns an aggregated utilization signal accordingly. While at it, introduce the map_util_freq() function which is designed to make schedutil's 25% margin usable easily for both sugov and EAS. As EAS will be able to predict schedutil's frequency requests more accurately than any other governor by design, it'd be sensible to make sure EAS cannot be used without schedutil. This will be done later, once EAS has actually been introduced. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: rjw@rjwysocki.net Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-3-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11	sched/topology: Relocate arch_scale_cpu_capacity() to the internal header	Quentin Perret
	By default, arch_scale_cpu_capacity() is only visible from within the kernel/sched folder. Relocate it to include/linux/sched/topology.h to make it visible to other clients needing to know about the capacity of CPUs, such as the Energy Model framework. This also shrinks the <linux/sched/topology.h> public header. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: rjw@rjwysocki.net Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-2-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11	sched/topology: Remove the ::smt_gain field from 'struct sched_domain'	Vincent Guittot
	::smt_gain is used to compute the capacity of CPUs of a SMT core with the constraint 1 < ::smt_gain < 2 in order to be able to compute number of CPUs per core. The field has_free_capacity of struct numa_stat, which was the last user of this computation of number of CPUs per core, has been removed by: 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()") We can now remove this constraint on core capacity and use the defautl value SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE becomes the maximum compute capacity of CPUs on every systems. This should help to simplify some code and remove fields like rd->max_cpu_capacity Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other places in the code when it wants the capacity of a CPUs to scale some metrics like in pelt, deadline or schedutil. In case on SMT, the value returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE. So remove it. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1535548752-4434-4-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11	perf/core: Declare the __percpu attribute on non-deref types	Mukesh Ojha
	Sparse reports the current declaration of two perf percpu variables with this warning: warning: incorrect type in initializer (different address spaces) expected void const [noderef] <asn:3>__vpp_verify got struct perf_cpu_context <noident> While it's normally perfectly fine to place GCC attributes anywhere in the definition, this particular attribute is for a checking compiler's such as Sparse's benefit, which doesn't want __percpu on pointers. So reorder the attribute to come after the structure type, not after the pointer type. [ mingo: Rewrote the changelog. ] Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1543310012-7967-1-git-send-email-mojha@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11	locking/lockdep: Remove ::version from lock_class structure	Waiman Long
	It turns out the version field in the lock_class structure isn't used anywhere. Just remove it. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: iommu@lists.linux-foundation.org Cc: kasan-dev@googlegroups.com Link: https://lkml.kernel.org/r/1542653726-5655-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11	dma/debug: Remove dma_debug_resize_entries()	Robin Murphy
	With the only caller now gone, we can clean up this part of dma-debug's exposed internals and make way to tweak the allocation behaviour. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Qian Cai <cai@lca.pw> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-11	x86/ima: retry detecting secure boot mode	Mimi Zohar
	The secure boot mode may not be detected on boot for some reason (eg. buggy firmware). This patch attempts one more time to detect the secure boot mode. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2018-12-11	x86/ima: define arch_get_ima_policy() for x86	Eric Richter
	On x86, there are two methods of verifying a kexec'ed kernel image signature being loaded via the kexec_file_load syscall - an architecture specific implementaton or a IMA KEXEC_KERNEL_CHECK appraisal rule. Neither of these methods verify the kexec'ed kernel image signature being loaded via the kexec_load syscall. Secure boot enabled systems require kexec images to be signed. Therefore, this patch loads an IMA KEXEC_KERNEL_CHECK policy rule on secure boot enabled systems not configured with CONFIG_KEXEC_VERIFY_SIG enabled. When IMA_APPRAISE_BOOTPARAM is configured, different IMA appraise modes (eg. fix, log) can be specified on the boot command line, allowing unsigned or invalidly signed kernel images to be kexec'ed. This patch permits enabling IMA_APPRAISE_BOOTPARAM or IMA_ARCH_POLICY, but not both. Signed-off-by: Eric Richter <erichte@linux.ibm.com> Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Jones <pjones@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2018-12-11	ima: add support for arch specific policies	Nayna Jain
	Builtin IMA policies can be enabled on the boot command line, and replaced with a custom policy, normally during early boot in the initramfs. Build time IMA policy rules were recently added. These rules are automatically enabled on boot and persist after loading a custom policy. There is a need for yet another type of policy, an architecture specific policy, which is derived at runtime during kernel boot, based on the runtime secure boot flags. Like the build time policy rules, these rules persist after loading a custom policy. This patch adds support for loading an architecture specific IMA policy. Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Co-Developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2018-12-11	Merge back staging AVS changes for v4.21.	Rafael J. Wysocki