summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-03-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a BPF hook for sendmsg and sendfile by reusing the ULP infrastructure and sockmap. Three helpers are added along with this, bpf_msg_apply_bytes(), bpf_msg_cork_bytes(), and bpf_msg_pull_data(). The first is used to tell for how many bytes the verdict should be applied to, the second to tell that x bytes need to be queued first to retrigger the BPF program for a verdict, and the third helper is mainly for the sendfile case to pull in data for making it private for reading and/or writing, from John. 2) Improve address to symbol resolution of user stack traces in BPF stackmap. Currently, the latter stores the address for each entry in the call trace, however to map these addresses to user space files, it is necessary to maintain the mapping from these virtual addresses to symbols in the binary which is not practical for system-wide profiling. Instead, this option for the stackmap rather stores the ELF build id and offset for the call trace entries, from Song. 3) Add support that allows BPF programs attached to perf events to read the address values recorded with the perf events. They are requested through PERF_SAMPLE_ADDR via perf_event_open(). Main motivation behind it is to support building memory or lock access profiling and tracing tools with the help of BPF, from Teng. 4) Several improvements to the tools/bpf/ Makefiles. The 'make bpf' in the tools directory does not provide the standard quiet output except for bpftool and it also does not respect specifying a build output directory. 'make bpf_install' command neither respects specified destination nor prefix, all from Jiri. In addition, Jakub fixes several other minor issues in the Makefiles on top of that, e.g. fixing dependency paths, phony targets and more. 5) Various doc updates e.g. add a comment for BPF fs about reserved names to make the dentry lookup from there a bit more obvious, and a comment to the bpf_devel_QA file in order to explain the diff between native and bpf target clang usage with regards to pointer size, from Quentin and Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21audit: remove path param from link denied functionRichard Guy Briggs
In commit 45b578fe4c3cade6f4ca1fc934ce199afd857edc ("audit: link denied should not directly generate PATH record") the need for the struct path *link parameter was removed. Remove the now useless struct path argument. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-03-21Merge airlied/drm-next into drm-misc-nextSean Paul
Refresh -misc-next Signed-off-by: Sean Paul <seanpaul@chromium.org>
2018-03-21Merge tag 'extcon-next-for-4.17' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next Chanwoo writes: Update extcon for 4.17 Detailed description for this pull request: 1. Add exported extcon function in order to support OF graph binding - The extcon consumer driver used the "extcon = <&extcon's phandle" property in device-tree in order to bind between extcon provider and consumer driver. But, OF graph method is better than 'extcon' property. So, extcon subsystem adds the following function to support OF graph binding. : extcon_find_edev_by_node(struct device_node *node) - Create the immutable branch ("ib-extcon-drm-dt-v4.17") for both drm-misc and device-tree subsystem. This immutable branch contains the use-case of OF graph binding for EXTCON_HDMI connector between MHL device driver and extcon provider driver. 2. Fix minor issues of extcon device drivers - Remove platform_data and covert to fully use GPIO descriptor for extcon-gpio.c - Remove workaround code for id pin direction from extcon-inte-int3496.c because GPIO ACPI library does support it with generic way. - Set direction and drv flags for V5 boost GPIO because of fixing the firmware bug.
2018-03-21ALSA: usb: initial USB Audio Device Class 3.0 supportRuslan Bilovol
Recently released USB Audio Class 3.0 specification introduces many significant changes comparing to previous versions, like - new Power Domains, support for LPM/L1 - new Cluster descriptor - changed layout of all class-specific descriptors - new High Capability descriptors - New class-specific String descriptors - new and removed units - additional sources for interrupts - removed Type II Audio Data Formats - ... and many other things (check spec) It also provides backward compatibility through multiple configurations, as well as requires mandatory support for BADD (Basic Audio Device Definition) on each ADC3.0 compliant device This patch adds initial support of UAC3 specification that is enough for Generic I/O Profile (BAOF, BAIF) device support from BADD document. Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-03-21mtd: nand: fsl_ifc: Read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0Jagdish Gediya
Due to missing information in Hardware manual, current implementation doesn't read ECCSTAT0 and ECCSTAT1 registers for IFC 2.0. Add support to read ECCSTAT0 and ECCSTAT1 registers during ecccheck for IFC 2.0. Fixes: 656441478ed5 ("mtd: nand: ifc: Fix location of eccstat registers for IFC V1.0") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Jagdish Gediya <jagdish.gediya@nxp.com> Reviewed-by: Prabhakar Kushwaha <prabhakar.kushwaha@nxp.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-21mtd: Stop updating erase_info->state and calling mtd_erase_callback()Boris Brezillon
MTD users are no longer checking erase_info->state to determine if the erase operation failed or succeeded. Moreover, mtd_erase_callback() is now a NOP. We can safely get rid of all mtd_erase_callback() calls and all erase_info->state assignments. While at it, get rid of the erase_info->state field, all MTD_ERASE_XXX definitions and the mtd_erase_callback() function. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Richard Weinberger <richard@nod.at> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Bert Kenward <bkenward@solarflare.com> --- Changes in v2: - Address a few coding style issues (reported by Miquel) - Remove comments that are no longer valid (reported by Miquel)
2018-03-21Merge branch 'ib-extcon-drm-dt-v4.17' into extcon-nextChanwoo Choi
2018-03-21extcon: gpio: Localize platform dataLinus Walleij
Nothing in the entire kernel #includes <linux/extcon/extcon-gpio.h> so move the platform data declaration inside of the driver. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2018-03-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Not much exciting here, almost entirely syzkaller fixes. This is going to be on ongoing theme for some time, I think. Both Google and Mellanox are now running syzkaller on different parts of the user API. Summary: - Many bug fixes related to syzkaller from Leon Romanovsky. These are still for the mlx driver and ucma interface. - Fix a situation with port reuse for iWarp, discovered during scale-up testing - Bug fixes for the profile and restrack patches accepted during this merge window - Compile warning cleanups from Arnd, this is apparently the last warning to make 32 bit builds quiet" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/ucma: Ensure that CM_ID exists prior to access it RDMA/verbs: Remove restrack entry from XRCD structure RDMA/ucma: Fix use-after-free access in ucma_close RDMA/ucma: Check AF family prior resolving address infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks infiniband: qplib_fp: fix pointer cast IB/mlx5: Fix cleanup order on unload RDMA/ucma: Don't allow join attempts for unsupported AF family RDMA/ucma: Fix access to non-initialized CM_ID object RDMA/core: Do not use invalid destination in determining port reuse RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory IB/mlx5: Fix integer overflows in mlx5_ib_create_srq IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
2018-03-20svcrdma: Consult max_qp_init_rd_atom when accepting connectionsChuck Lever
The target needs to return the lesser of the client's Inbound RDMA Read Queue Depth (IRD), provided in the connection parameters, and the local device's Outbound RDMA Read Queue Depth (ORD). The latter limit is max_qp_init_rd_atom, not max_qp_rd_atom. The svcrdma_ord value caps the ORD value for iWARP transports, which do not exchange ORD/IRD values at connection time. Since no other Linux kernel RDMA-enabled storage target sees fit to provide this cap, I'm removing it here too. initiator_depth is a u8, so ensure the computed ORD value does not overflow that field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-20clk: davinci: New driver for TI DA8XX CFGCHIP clocksDavid Lechner
This adds a new driver for the gate and multiplexer clocks in the CFGCHIPn syscon registers on TI DA8XX-type SoCs. Signed-off-by: David Lechner <david@lechnology.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-03-20clk: davinci: New driver for davinci PLL clocksDavid Lechner
This adds a new driver for mach-davinci PLL clocks. This is porting the code from arch/arm/mach-davinci/clock.c to the common clock framework. Additionally, it adds device tree support for these clocks. The ifeq ($(CONFIG_COMMON_CLK), y) in the Makefile is needed to prevent compile errors until the clock code in arch/arm/mach-davinci is removed. Note: although there are similar clocks for TI Keystone we are not able to share the code for a few reasons. The keystone clocks are device tree only and use legacy one-node-per-clock bindings. Also the register layouts are a bit different, which would add even more if/else mess to the keystone clocks. And the keystone PLL driver doesn't support setting clock rates. Signed-off-by: David Lechner <david@lechnology.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-03-20ARM: omap2+: control: add support for auxiliary control module instancesTero Kristo
Control module can have multiple instances in a system, each with separate address space and features. Add base support for these auxiliary instances, with support for syscon and clock mappings under them. Signed-off-by: Tero Kristo <t-kristo@ti.com> Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
2018-03-20mtd: rawnand: get rid of the ONFI parameter page in nand_chipMiquel Raynal
The NAND chip parameter page is statically allocated within the nand_chip structure, which reserves a lot of space. Even not ONFI nor JEDEC chips have it embedded. Also, only a few parameters are actually read from the parameter page after the detection. Now that there is a small nand_parameters structure that hold all needed ONFI parameters, remove the ONFI page from the nand_chip structure by just allocating it during the identification phase and removing it right after. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-20mtd: rawnand: get rid of the JEDEC parameter page in nand_chipMiquel Raynal
The NAND chip parameter page is statically allocated within the nand_chip structure, which reserves a lot of space. Even not ONFI nor JEDEC chips have it embedded. Also, only a few parameters are actually read from the parameter page after the detection. Now that there is a small nand_parameters structure that can held generic parameters, remove the JEDEC page from the nand_chip structure by just allocating it during the identification phase and removing it right after. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-20mtd: rawnand: allow vendors to declare (un)supported featuresMiquel Raynal
If SET/GET_FEATURES is available (from the parameter page), use a bitmap to declare what feature is actually supported. Initialize the bitmap in the core to support timing changes (only feature used by the core), also add support for Micron specific features used in Micron initialization code (in the init routine). Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-20mtd: rawnand: prepare the removal of the ONFI parameter pageMiquel Raynal
The NAND chip parameter page is statically allocated within the nand_chip structure, which reserves a lot of space. Even not ONFI nor JEDEC chips have it embedded. Also, only a few parameters are actually read from the parameter page after the detection. ONFI-related parameters that will be used outside from the identification function are stored in a separate onfi_parameters structure embedded in nand_parameters, this small structure that already hold generic parameters. For now, the onfi_parameters structure is allocated statically. However, after some deep rework in the NAND framework, it will be possible to do dynamic allocations from the NAND identification phase, and this strcuture will then be dynamically allocated when needed. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-20Merge 4.16-rc6 into tty-nextGreg Kroah-Hartman
We want the serial/tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-20Merge branch 'siginfo-next' of ↵Will Deacon
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace into aarch64/for-next/core Pull in pending siginfo changes from Eric Biederman as we depend on the definition of FPE_FLTUNK for cleaning up our floating-point exception signal delivery (which is currently broken and using FPE_FIXME).
2018-03-20mtd: rawnand: prepare the removal of ONFI/JEDEC parameter pagesMiquel Raynal
The NAND chip parameter page is statically allocated within the nand_chip structure, which reserves a lot of space. Even not ONFI nor JEDEC chips have it embedded. Also, only a few parameters are actually read from the parameter page after the detection. To prepare to the removal of such huge structure, a small NAND parameter structure is allocated statically and contains only very few members that are generic to all chips and actually used elsewhere in the code. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-20Merge tag 'phy-for-4.17' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-next Kishon writes: phy: for 4.17 *) Add USB PHY driver for MDM6600 on Droid *) Add USB PHY driver for STM32 USB PHY Controller *) Add inno-usb2-phy driver for hi3798cv200 SoC *) Add combo phy driver (SATA/USB/PCIE) for HiSilicon STB SoCs *) Add USB3 PHY driver for Meson GXL and GXM *) Add support for R8A77965 Gen3 USB 2.0 PHY in phy-rcar-gen3-usb2 driver *) Add support for qualcomm QUSB2 V2 and QMP V3 USB3 PHY in phy-qcom-qusb2 and phy-qcom-qmp PHY driver respectively *) Add support for runtime PM in phy-qcom-qusb2 and phy-qcom-qmp PHY drivers *) Add support for Allwinner R40 USB PHY in sun4i-usb PHY driver *) Add support in rockchip-typec PHY driver to make extcon optional and fallback to working in host mode if extcon is missing *) Add support in rockchip-typec PHY driver to mux PHYs connected to DP *) Add support to configure slew rate parameters in phy-mtk-tphy PHY driver *) Add workaround for missing Vbus det interrupts on Allwinner A23/A33 *) Add USB speed related PHY modes in phy core *) Fix PHY 'structure' documentation *) Force rockchip-typec PHY to USB2 if DP-only mode is used *) Fix phy-qcom-qusb2 and phy-qcom-qmp PHY drivers to follow PHY reset and initialization sequence as per hardware programming manual *) Fix Marvell BG2CD SoC USB failure in phy-berlin-usb driver *) Minor fixes in lpc18xx-usb-otg, xusb-tegra210 and phy-rockchip-emmc PHY drivers Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-03-20dma/swiotlb: Remove swiotlb_{alloc,free}_coherent()Christoph Hellwig
Unused now that everyone uses swiotlb_{alloc,free}(). Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-15-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20dma/direct: Handle the memory encryption bit in common codeChristoph Hellwig
Give the basic phys_to_dma() and dma_to_phys() helpers a __-prefix and add the memory encryption mask to the non-prefixed versions. Use the __-prefixed versions directly instead of clearing the mask again in various places. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-13-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20set_memory.h: Provide set_memory_{en,de}crypted() stubsChristoph Hellwig
... to make these APIs more universally available. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-11-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20Merge branch 4.16-rc6 into usb-nextGreg Kroah-Hartman
We want the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-20mtd: rawnand: use wrappers to call onfi GET/SET_FEATURESMiquel Raynal
Prepare the fact that some features managed by GET/SET_FEATURES could be overloaded by vendor code. To handle this logic, use new wrappers instead of directly call the ->get/set_features() hooks. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-20mtd: rawnand: rename SET/GET FEATURES related functionsMiquel Raynal
SET/GET FEATURES are flagged ONFI-compliant because of their name. This is not accurate as non-ONFI NAND chips support it and use it. Rename the hooks and helpers to remove the "onfi" prefix. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-03-20Merge tag 'thunderbolt-for-v4.17' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into char-misc-next Mike writes: thunderbolt: Changes for v4.17 merge window New features: - Intel Titan Ridge Thunderbolt 3 controller support - Preboot ACL supported, allowing more secure way to boot from Thunderbolt devices - New "USB only" security level In addition there are a couple of fixes for increasing timeout when authenticating the ICM firmware and reading root switch config space. Preventing a crash on certain Lenovo systems where ICM firmware for some reason is not always properly starting up.
2018-03-20jump_label: Disable jump labels in __exit codeJosh Poimboeuf
With the following commit: 333522447063 ("jump_label: Explicitly disable jump labels in __init code") ... we explicitly disabled jump labels in __init code, so they could be detected and not warned about in the following commit: dc1dd184c2f0 ("jump_label: Warn on failed jump_label patching attempt") In-kernel __exit code has the same issue. It's never used, so it's freed along with the rest of initmem. But jump label entries in __exit code aren't explicitly disabled, so we get the following warning when enabling pr_debug() in __exit code: can't patch jump_label at dmi_sysfs_exit+0x0/0x2d WARNING: CPU: 0 PID: 22572 at kernel/jump_label.c:376 __jump_label_update+0x9d/0xb0 Fix the warning by disabling all jump labels in initmem (which includes both __init and __exit code). Reported-and-tested-by: Li Wang <liwang@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: dc1dd184c2f0 ("jump_label: Warn on failed jump_label patching attempt") Link: http://lkml.kernel.org/r/7121e6e595374f06616c505b6e690e275c0054d1.1521483452.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait: Remove the wait_on_atomic_t() APIPeter Zijlstra
There are no users left (everyone got converted to wait_var_event()), remove it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, fs/fscache: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait: Introduce wait_var_event()Peter Zijlstra
As a replacement for the wait_on_atomic_t() API provide the wait_var_event() API. The wait_var_event() API is based on the very same hashed-waitqueue idea, but doesn't care about the type (atomic_t) or the specific condition (atomic_read() == 0). IOW. it's much more widely applicable/flexible. It shares all the benefits/disadvantages of a hashed-waitqueue approach with the existing wait_on_atomic_t/wait_on_bit() APIs. The API is modeled after the existing wait_event() API, but instead of taking a wait_queue_head, it takes an address. This addresses is hashed to obtain a wait_queue_head from the bit_wait_table. Similar to the wait_event() API, it takes a condition expression as second argument and will wait until this expression becomes true. The following are (mostly) identical replacements: wait_on_atomic_t(&my_atomic, atomic_t_wait, TASK_UNINTERRUPTIBLE); wake_up_atomic_t(&my_atomic); wait_var_event(&my_atomic, !atomic_read(&my_atomic)); wake_up_var(&my_atomic); The only difference is that wake_up_var() is an unconditional wakeup and doesn't check the previously hard-coded (atomic_read() == 0) condition here. This is of little concequence, since most callers are already conditional on atomic_dec_and_test() and the ones that are not, are trivial to make so. Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/fair: Add util_est on top of PELTPatrick Bellasi
The util_avg signal computed by PELT is too variable for some use-cases. For example, a big task waking up after a long sleep period will have its utilization almost completely decayed. This introduces some latency before schedutil will be able to pick the best frequency to run a task. The same issue can affect task placement. Indeed, since the task utilization is already decayed at wakeup, when the task is enqueued in a CPU, this can result in a CPU running a big task as being temporarily represented as being almost empty. This leads to a race condition where other tasks can be potentially allocated on a CPU which just started to run a big task which slept for a relatively long period. Moreover, the PELT utilization of a task can be updated every [ms], thus making it a continuously changing value for certain longer running tasks. This means that the instantaneous PELT utilization of a RUNNING task is not really meaningful to properly support scheduler decisions. For all these reasons, a more stable signal can do a better job of representing the expected/estimated utilization of a task/cfs_rq. Such a signal can be easily created on top of PELT by still using it as an estimator which produces values to be aggregated on meaningful events. This patch adds a simple implementation of util_est, a new signal built on top of PELT's util_avg where: util_est(task) = max(task::util_avg, f(task::util_avg@dequeue)) This allows to remember how big a task has been reported by PELT in its previous activations via f(task::util_avg@dequeue), which is the new _task_util_est(struct task_struct*) function added by this patch. If a task should change its behavior and it runs longer in a new activation, after a certain time its util_est will just track the original PELT signal (i.e. task::util_avg). The estimated utilization of cfs_rq is defined only for root ones. That's because the only sensible consumer of this signal are the scheduler and schedutil when looking for the overall CPU utilization due to FAIR tasks. For this reason, the estimated utilization of a root cfs_rq is simply defined as: util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued) where: cfs_rq::util_est::enqueued = sum(_task_util_est(task)) for each RUNNABLE task on that root cfs_rq It's worth noting that the estimated utilization is tracked only for objects of interests, specifically: - Tasks: to better support tasks placement decisions - root cfs_rqs: to better support both tasks placement decisions as well as frequencies selection Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndromeMatt Brown
This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> [mpe: Add VPERMXOR macro so we can build with old binutils] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-19Merge branch 'for-4.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two low-impact workqueue commits. One fixes workqueue creation error path and the other removes the unused cancel_work()" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove unused cancel_work() workqueue: use put_device() instead of kfree()
2018-03-19Merge branch 'for-4.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu fixes from Tejun Heo: "Late percpu pull request for v4.16-rc6. - percpu allocator pool replenishing no longer triggers OOM or warning messages. Also, the alloc interface now understands __GFP_NORETRY and __GFP_NOWARN. This is to allow avoiding OOMs from userland triggered actions like bpf map creation. Also added cond_resched() in alloc loop. - perpcu allocation now can be interrupted by kill sigs to avoid deadlocking OOM killer. - Added Dennis Zhou as a co-maintainer. He has rewritten the area map allocator, understands most of the code base and has been responsive for all bug reports" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() percpu: include linux/sched.h for cond_resched() percpu: add a schedule point in pcpu_balance_workfn() percpu: allow select gfp to be passed to underlying allocators percpu: add __GFP_NORETRY semantics to the percpu balancing path percpu: match chunk allocator declarations with definitions percpu: add Dennis Zhou as a percpu co-maintainer
2018-03-19bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX dataJohn Fastabend
This implements a BPF ULP layer to allow policy enforcement and monitoring at the socket layer. In order to support this a new program type BPF_PROG_TYPE_SK_MSG is used to run the policy at the sendmsg/sendpage hook. To attach the policy to sockets a sockmap is used with a new program attach type BPF_SK_MSG_VERDICT. Similar to previous sockmap usages when a sock is added to a sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT program type attached then the BPF ULP layer is created on the socket and the attached BPF_PROG_TYPE_SK_MSG program is run for every msg in sendmsg case and page/offset in sendpage case. BPF_PROG_TYPE_SK_MSG Semantics/API: BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and SK_DROP. Returning SK_DROP free's the copied data in the sendmsg case and in the sendpage case leaves the data untouched. Both cases return -EACESS to the user. Returning SK_PASS will allow the msg to be sent. In the sendmsg case data is copied into kernel space buffers before running the BPF program. The kernel space buffers are stored in a scatterlist object where each element is a kernel memory buffer. Some effort is made to coalesce data from the sendmsg call here. For example a sendmsg call with many one byte iov entries will likely be pushed into a single entry. The BPF program is run with data pointers (start/end) pointing to the first sg element. In the sendpage case data is not copied. We opt not to copy the data by default here, because the BPF infrastructure does not know what bytes will be needed nor when they will be needed. So copying all bytes may be wasteful. Because of this the initial start/end data pointers are (0,0). Meaning no data can be read or written. This avoids reading data that may be modified by the user. A new helper is added later in this series if reading and writing the data is needed. The helper call will do a copy by default so that the page is exclusively owned by the BPF call. The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg in the sendmsg() case and the entire page/offset in the sendpage case. This avoids ambiguity on how to handle mixed return codes in the sendmsg case. Again a helper is added later in the series if a verdict needs to apply to multiple system calls and/or only a subpart of the currently being processed message. The helper msg_redirect_map() can be used to select the socket to send the data on. This is used similar to existing redirect use cases. This allows policy to redirect msgs. Pseudo code simple example: The basic logic to attach a program to a socket is as follows, // load the programs bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); // lookup the sockmap bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); // get fd for sockmap map_fd_msg = bpf_map__fd(bpf_map_msg); // attach program to sockmap bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); Adding sockets to the map is done in the normal way, // Add a socket 'fd' to sockmap at location 'i' bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY); After the above any socket attached to "my_sock_map", in this case 'fd', will run the BPF msg verdict program (msg_prog) on every sendmsg and sendpage system call. For a complete example see BPF selftests or sockmap samples. Implementation notes: It seemed the simplest, to me at least, to use a refcnt to ensure psock is not lost across the sendmsg copy into the sg, the bpf program running on the data in sg_data, and the final pass to the TCP stack. Some performance testing may show a better method to do this and avoid the refcnt cost, but for now use the simpler method. Another item that will come after basic support is in place is supporting MSG_MORE flag. At the moment we call sendpages even if the MSG_MORE flag is set. An enhancement would be to collect the pages into a larger scatterlist and pass down the stack. Notice that bpf_tcp_sendmsg() could support this with some additional state saved across sendmsg calls. I built the code to support this without having to do refactoring work. Other features TBD include ZEROCOPY and the TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series shortly. Future work could improve size limits on the scatterlist rings used here. Currently, we use MAX_SKB_FRAGS simply because this was being used already in the TLS case. Future work could extend the kernel sk APIs to tune this depending on workload. This is a trade-off between memory usage and throughput performance. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAGJohn Fastabend
When calling do_tcp_sendpages() from in kernel and we know the data has no references from user side we can omit SKBTX_SHARED_FRAG flag. This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used to omit setting SKBTX_SHARED_FRAG. The flag is not exposed to userspace because the sendpage call from the splice logic masks out all bits except MSG_MORE. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19Merge tag 'v4.16-rc6' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-19PCI: Add Altera vendor IDJohannes Thumshirn
Add the Altera PCI Vendor id to pci_ids.h and remove the private definitions from xillybus_pcie.c and altera-cvp.c. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Eli Billauer <eli.billauer@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Anatolij Gustschin <agust@denx.de>
2018-03-19net/mlx5: Packet pacing enhancementBodong Wang
Add two new parameters: max_burst_sz and typical_pkt_size (both in bytes) to rate limit configurations. max_burst_sz: The device will schedule bursts of packets for an SQ connected to this rate, smaller than or equal to this value. Value 0x0 indicates packet bursts will be limited to the device defaults. This field should be used if bursts of packets must be strictly kept under a certain value. typical_pkt_size: When the rate limit is intended for a stream of similar packets, stating the typical packet size can improve the accuracy of the rate limiter. The expected packet size will be the same for all SQs associated with the same rate limit index. Ethernet driver is updated according to this change, but these two parameters will be kept as 0 due to lacking of proper way to get the configurations from user space which requires to change ndo_set_tx_maxrate interface. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19Merge tag 'kvm-arm-fixes-for-v4.16-2' into HEADMarc Zyngier
Resolve conflicts with current mainline
2018-03-19cgroup: Use rcu_work instead of explicit rcu and work itemTejun Heo
Workqueue now has rcu_work. Use it instead of open-coding rcu -> work item bouncing. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-19RCU, workqueue: Implement rcu_workTejun Heo
There are cases where RCU callback needs to be bounced to a sleepable context. This is currently done by the RCU callback queueing a work item, which can be cumbersome to write and confusing to read. This patch introduces rcu_work, a workqueue work variant which gets executed after a RCU grace period, and converts the open coded bouncing in fs/aio and kernel/cgroup. v3: Dropped queue_rcu_work_on(). Documented rcu grace period behavior after queue_rcu_work(). v2: Use rcu_barrier() instead of synchronize_rcu() to wait for completion of previously queued rcu callback as per Paul. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-19percpu_ref: Update doc to dissuade users from depending on internal RCU ↵Tejun Heo
grace periods percpu_ref internally uses sched-RCU to implement the percpu -> atomic mode switching and the documentation suggested that this could be depended upon. This doesn't seem like a good idea. * percpu_ref uses sched-RCU which has different grace periods regular RCU. Users may combine percpu_ref with regular RCU usage and incorrectly believe that regular RCU grace periods are performed by percpu_ref. This can lead to, for example, use-after-free due to premature freeing. * percpu_ref has a grace period when switching from percpu to atomic mode. It doesn't have one between the last put and release. This distinction is subtle and can lead to surprising bugs. * percpu_ref allows starting in and switching to atomic mode manually for debugging and other purposes. This means that there may not be any grace periods from kill to release. This patch makes it clear that the grace periods are percpu_ref's internal implementation detail and can't be depended upon by the users. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-19ahci: imx: add the imx6qp ahci sata supportRichard Zhu
- Regarding to imx6q ahci sata, imx6qp ahci sata has the reset mechanism. Add the imx6qp ahci sata support in this commit. - Use the specific reset callback for imx53 sata, and use the default ahci_ops.softreset for the others. Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-19y2038: Introduce struct __kernel_old_timevalArnd Bergmann
Dealing with 'struct timeval' users in the y2038 series is a bit tricky: We have two definitions of timeval that are visible to user space, one comes from glibc (or some other C library), the other comes from linux/time.h. The kernel copy is what we want to be used for a number of structures defined by the kernel itself, e.g. elf_prstatus (used it core dumps), sysinfo and rusage (used in system calls). These generally tend to be used for passing time intervals rather than absolute (epoch-based) times, so they do not suffer from the y2038 overflow. Some of them could be changed to use 64-bit timestamps by creating new system calls, others like the core files cannot easily be changed. An application using these interfaces likely also uses gettimeofday() or other interfaces that use absolute times, and pass 'struct timeval' pointers directly into kernel interfaces, so glibc must redefine their timeval based on a 64-bit time_t when they introduce their y2038-safe interfaces. The only reasonable way forward I see is to remove the 'timeval' definion from the kernel's uapi headers, and change the interfaces that we do not want to (or cannot) duplicate for 64-bit times to use a new __kernel_old_timeval definition instead. This type should be avoided for all new interfaces (those can use 64-bit nanoseconds, or the 64-bit version of timespec instead), and should be used with great care when converting existing interfaces from timeval, to be sure they don't suffer from the y2038 overflow, and only with consensus for the particular user that using __kernel_old_timeval is better than moving to a 64-bit based interface. The structure name is intentionally chosen to not conflict with user space types, and to be ugly enough to discourage its use. Note that ioctl based interfaces that pass a bare 'timeval' pointer cannot change to '__kernel_old_timeval' because the user space source code refers to 'timeval' instead, and we don't want to modify the user space sources if possible. However, any application that relies on a structure to contain an embedded 'timeval' (e.g. by passing a pointer to the member into a function call that expects a timeval pointer) is broken when that structure gets converted to __kernel_old_timeval. I don't see any way around that, and we have to rely on the compiler to produce a warning or compile failure that will alert users when they recompile their sources against a new libc. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20180315161739.576085-1-arnd@arndb.de
2018-03-19buffer.c: call thaw_super during emergency thawMateusz Guzik
There are 2 distinct freezing mechanisms - one operates on block devices and another one directly on super blocks. Both end up with the same result, but thaw of only one of these does not thaw the other. In particular fsfreeze --freeze uses the ioctl variant going to the super block. Since prior to this patch emergency thaw was not doing a relevant thaw, filesystems frozen with this method remained unaffected. The patch is a hack which adds blind unfreezing. In order to keep the super block write-locked the whole time the code is shuffled around and the newly introduced __iterate_supers is employed. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>