summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-07-27copy_regset_to_user(): do all copyout at once.Al Viro
Turn copy_regset_to_user() into regset_get_alloc() + copy_to_user(). Now all ->get() calls have a kernel buffer as destination. Note that we'd already eliminated the callers of copy_regset_to_user() with non-zero offset; now that argument is simply unused. Uninlined, while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27kill elf_fpxregs_tAl Viro
all uses are conditional upon ELF_CORE_COPY_XFPREGS, which has not been defined on any architecture since 2010 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27take fdpic-related parts of elf_prstatus outAl Viro
The only architecture where we might end up using both is arm, and there we definitely don't want fdpic-related fields in elf_prstatus - coredump layout of ELF binaries should not depend upon having the kernel built with the support of ELF_FDPIC ones. Just move the fdpic-modified variant into binfmt_elf_fdpic.c (and call it elf_prstatus_fdpic there) [name stolen from nico] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27unexport linux/elfcore.hAl Viro
It's unusable from userland - it uses elf_gregset_t, which is not provided by exported headers. glibc has it in sys/procfs.h, but the same file defines struct elf_prstatus, so linux/elfcore.h can't be included once sys/procfs.h has been pulled. Same goes for uclibc and dietlibc simply doesn't have elf_gregset_t defined anywhere. IOW, no userland source is including that thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27introduction of regset ->get() wrappers, switching ELF coredumps to thoseAl Viro
Two new helpers: given a process and regset, dump into a buffer. regset_get() takes a buffer and size, regset_get_alloc() takes size and allocates a buffer. Return value in both cases is the amount of data actually dumped in case of success or -E... on error. In both cases the size is capped by regset->n * regset->size, so ->get() is called with offset 0 and size no more than what regset expects. binfmt_elf.c callers of ->get() are switched to using those; the other caller (copy_regset_to_user()) will need some preparations to switch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27i2c: also convert placeholder function to return errnoWolfram Sang
All i2c_new_device-alike functions return ERR_PTR these days, but this fallback function was missed. Fixes: 2dea645ffc21 ("i2c: acpi: Return error pointers from i2c_acpi_new_device()") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [wsa: changed from 'ENOSYS' to 'ENODEV'] Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-07-27fsnotify: pass dir argument to handle_event() callbackAmir Goldstein
The 'inode' argument to handle_event(), sometimes referred to as 'to_tell' is somewhat obsolete. It is a remnant from the times when a group could only have an inode mark associated with an event. We now pass an iter_info array to the callback, with all marks associated with an event. Most backends ignore this argument, with two exceptions: 1. dnotify uses it for sanity check that event is on directory 2. fanotify uses it to report fid of directory on directory entry modification events Remove the 'inode' argument and add a 'dir' argument. The callback function signature is deliberately changed, because the meaning of the argument has changed and the arguments have been documented. The 'dir' argument is set to when 'file_name' is specified and it is referring to the directory that the 'file_name' entry belongs to. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27linux/kernel.h: Add PTR_ALIGN_DOWN macroKishon Vijay Abraham I
Add a macro for aligning down a pointer. This is useful to get an aligned register address when a device allows only word access and doesn't allow half word or byte access. Link: https://lore.kernel.org/r/20200722110317.4744-4-kishon@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Rob Herring <robh@kernel.org>
2020-07-27genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner
John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
2020-07-27RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7Meir Lichtinger
Up to ConnectX-7 UMR is not used when user passes relaxed ordering access flag. ConnectX-7 supports setting relaxed ordering read/write mkey attribute by UMR, indicated by new HCA capabilities. With ConnectX-7 driver uses UMR when user set relaxed ordering access flag, in contrast to previous silicon models. Specifically it includes setting relvant flags of mkey context mask in UMR control segment, and relaxed ordering write and read flags in UMR mkey context segment. Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27RDMA/mlx5: Use MLX5_SET macro instead of local structureMeir Lichtinger
Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX device data structures instead of using structure defined specifically for mlx5_ib module. Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27spi: correct kernel-doc inconsistencyColton Lewis
Silence documentation build warnings by correcting kernel-doc comment for spi_transfer struct. Signed-off-by: Colton Lewis <colton.w.lewis@protonmail.com> Link: https://lore.kernel.org/r/20200725050242.279548-1-colton.w.lewis@protonmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-07-27platform/chrome: cros_ec: Fix host command for regulator control.Pi-Hsun Shih
Since the host command number 0x012B conflicts with other EC host command, add one to all regulator control related host command. Also fix a wrong alignment on struct and sync the comment with the one in ChromeOS EC codebase. Fixes: dff08caf35ec ("platform/chrome: cros_ec: Add command for regulator control.") Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Link: https://lore.kernel.org/r/20200724080358.619245-1-pihsun@chromium.org Signed-off-by: Mark Brown <broonie@kernel.org>
2020-07-27Merge tag 'drivers_soc_for_5.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into arm/drivers SOC: TI Keystone driver update for v5.9 - TI K3 Ring Accelerator updates - Few non critical warining fixes * tag 'drivers_soc_for_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone: soc: TI knav_qmss: make symbol 'knav_acc_range_ops' static firmware: ti_sci: Replace HTTP links with HTTPS ones soc: ti/ti_sci_protocol.h: drop a duplicated word + clarify soc: ti: k3: fix semicolon.cocci warnings soc: ti: k3-ringacc: fix: warn: variable dereferenced before check 'ring' dmaengine: ti: k3-udma: Switch to k3_ringacc_request_rings_pair soc: ti: k3-ringacc: separate soc specific initialization soc: ti: k3-ringacc: add request pair of rings api. soc: ti: k3-ringacc: add ring's flags to dump soc: ti: k3-ringacc: Move state tracking variables under a struct dt-bindings: soc: ti: k3-ringacc: convert bindings to json-schema Link: https://lore.kernel.org/r/1595711814-7015-1-git-send-email-santosh.shilimkar@oracle.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-07-27powercap: Add Power Limit4 supportSumeet Pawnikar
Modern Intel Mobile platforms support power limit4 (PL4), which is the SoC package level maximum power limit (in Watts). It can be used to preemptively limits potential SoC power to prevent power spikes from tripping the power adapter and battery over-current protection. This patch enables this feature by exposing package level peak power capping control to userspace via RAPL sysfs interface. With this, application like DTPF can modify PL4 power limit, the similar way of other package power limit (PL1). As this feature is not tested on previous generations, here it is enabled only for the platform that has been verified to work, for safety concerns. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Co-developed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-27ACPI: Use valid link to the ACPI specificationTiezhu Yang
Currently, acpi.info is an invalid link to access ACPI specification, the new valid link is https://uefi.org/specifications. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-27PM: Make *_DEV_PM_OPS macros use __maybe_unusedPaul Cercueil
This way, when the dev_pm_ops instance is not referenced anywhere, it will simply be dropped by the compiler without a warning. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-27PM: core: introduce pm_ptr() macroPaul Cercueil
This macro is analogous to the infamous of_match_ptr(). If CONFIG_PM is enabled, this macro will resolve to its argument, otherwise to NULL. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-07-27Merge 5.8-rc7 into staging-nextGreg Kroah-Hartman
We need the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27Merge 5.8-rc7 into tty-nextGreg Kroah-Hartman
we need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27Merge 5.8-rc7 into driver-core-nextGreg Kroah-Hartman
We want the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27Merge back cpufreq material for v5.9.Rafael J. Wysocki
2020-07-27Revert "test_firmware: Test platform fw loading on non-EFI systems"Greg Kroah-Hartman
This reverts commit 2d38dbf89a06d0f689daec9842c5d3295c49777f as it broke the build in linux-next Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 2d38dbf89a06 ("test_firmware: Test platform fw loading on non-EFI systems") Cc: stable@vger.kernel.org Cc: Scott Branden <scott.branden@broadcom.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200727165539.0e8797ab@canb.auug.org.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27Merge 5.8-rc7 into char-misc-nextGreg Kroah-Hartman
This should resolve the merge/build issues reported when trying to create linux-next. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27dmaengine: dw: Don't include unneeded header to platform data headerAndy Shevchenko
Including device.h is too much for the dma-dw.h platform data header. Replace it with the headers of which dma-dw.h is direct user. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200721130844.64162-1-andriy.shevchenko@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-07-27dmaengine: dw: Introduce max burst length hw configSerge Semin
IP core of the DW DMA controller may be synthesized with different max burst length of the transfers per each channel. According to Synopsis having the fixed maximum burst transactions length may provide some performance gain. At the same time setting up the source and destination multi size exceeding the max burst length limitation may cause a serious problems. In our case the DMA transaction just hangs up. In order to fix this lets introduce the max burst length platform config of the DW DMA controller device and don't let the DMA channels configuration code exceed the burst length hardware limitation. Note the maximum burst length parameter can be detected either in runtime from the DWC parameter registers or from the dedicated DT property. Depending on the IP core configuration the maximum value can vary from channel to channel so by overriding the channel slave max_burst capability we make sure a DMA consumer will get the channel-specific max burst length. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200723005848.31907-10-Sergey.Semin@baikalelectronics.ru Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-07-27dmaengine: dw: Initialize min and max burst DMA device capabilitySerge Semin
According to the DW APB DMAC data book the minimum burst transaction length is 1 and it's true for any version of the controller since isn't parametrised in the coreAssembler so can't be changed at the IP-core synthesis stage. The maximum burst transaction can vary from channel to channel and from controller to controller depending on a IP-core parameter the system engineer activated during the IP-core synthesis. Let's initialise both min_burst and max_burst members of the DMA controller descriptor with extreme values so the DMA clients could use them to properly optimize the DMA requests. The channels and controller-specific max_burst length initialization will be introduced by the follow-up patches. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200723005848.31907-9-Sergey.Semin@baikalelectronics.ru Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-07-27dmaengine: Introduce DMA-device device_caps callbackSerge Semin
There are DMA devices (like ours version of Synopsys DW DMAC) which have DMA capabilities non-uniformly redistributed between the device channels. In order to provide a way of exposing the channel-specific parameters to the DMA engine consumers, we introduce a new DMA-device callback. In case if provided it gets called from the dma_get_slave_caps() method and is able to override the generic DMA-device capabilities. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200723005848.31907-6-Sergey.Semin@baikalelectronics.ru Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-07-27dmaengine: Introduce max SG burst capabilitySerge Semin
Some devices may lack the support of the hardware accelerated SG list entries automatic walking through and execution. In this case a burden of the SG list traversal and DMA engine re-initialization lies on the DMA engine driver (normally implemented by using a DMA transfer completion IRQ to recharge the DMA device with a next SG list entry). But such solution may not be suitable for some DMA consumers. In particular SPI devices need both Tx and Rx DMA channels work synchronously in order to avoid the Rx FIFO overflow. In case if Rx DMA channel is paused for some time while the Tx DMA channel works implicitly pulling data into the Rx FIFO, the later will be eventually overflown, which will cause the data loss. So if SG list entries aren't automatically fetched by the DMA engine, but are one-by-one manually selected for execution in the ISRs/deferred work/etc., such problem will eventually happen due to the non-deterministic latencies of the service execution. In order to let the DMA consumer know about the DMA device capabilities regarding the hardware accelerated SG list traversal we introduce the max_sg_burst capability. It is supposed to be initialized by the DMA engine driver with 0 if there is no limitation of the number of SG entries atomically executed and with non-zero value if there is such constraints, so the upper limit is determined by the number set to the property. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200723005848.31907-5-Sergey.Semin@baikalelectronics.ru Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-07-27dmaengine: Introduce min burst length capabilitySerge Semin
Some hardware aside from default 0/1 may have greater minimum burst transactions length constraints. Here we introduce the DMA device and slave capability, which if required can be initialized by the DMA engine driver with the device-specific value. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200723005848.31907-4-Sergey.Semin@baikalelectronics.ru Signed-off-by: Vinod Koul <vkoul@kernel.org>
2020-07-27printk: Make linux/printk.h self-containedHerbert Xu
As it stands if you include printk.h by itself it will fail to compile because it requires definitions from ratelimit.h. However, simply including ratelimit.h from printk.h does not work due to inclusion loops involving sched.h and kernel.h. This patch solves this by moving bits from ratelimit.h into a new header file which can then be included by printk.h without any worries about header loops. The build bot then revealed some intriguing failures arising out of this patch. On s390 there is an inclusion loop with asm/bug.h and linux/kernel.h that triggers a compile failure, because kernel.h will cause asm-generic/bug.h to be included before s390's own asm/bug.h has finished processing. This has been fixed by not including kernel.h in arch/s390/include/asm/bug.h. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Link: https://lore.kernel.org/r/20200721062248.GA18383@gondor.apana.org.au
2020-07-27irqchip: Fix IRQCHIP_PLATFORM_DRIVER_* compilation by including module.hMarc Zyngier
The newly introduced IRQCHIP_PLATFORM_DRIVER_* macros expand into module-related macros, but do so without including module.h. Depending on the driver and/or architecture, this happens to work, or not. Unconditionnaly include linux/module.h to sort it out. Fixes: f3b5e608ed6d ("irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macros") Reported-by: kernel test robot <lkp@intel.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-07-27irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macrosSaravana Kannan
Compiling an irqchip driver as a platform driver needs to bunch of things to be done right: - Making sure the parent domain is initialized first - Making sure the device can't be unbound from sysfs - Disallowing module unload if it's built as a module - Finding the parent node - Etc. Instead of trying to make sure all future irqchip platform drivers get this right, provide boilerplate macros that take care of all of this. An example use would look something like this. Where acme_foo_init and acme_bar_init are similar to what would be passed to IRQCHIP_DECLARE. IRQCHIP_PLATFORM_DRIVER_BEGIN(acme_irq) IRQCHIP_MATCH("acme,foo", acme_foo_init) IRQCHIP_MATCH("acme,bar", acme_bar_init) IRQCHIP_PLATFORM_DRIVER_END(acme_irq) Signed-off-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: https://lore.kernel.org/r/20200718000637.3632841-2-saravanak@google.com
2020-07-27irqchip: irq-bcm2836.h: drop a duplicated wordRandy Dunlap
Drop the repeated word "the" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200719002853.20419-1-rdunlap@infradead.org
2020-07-27irqchip/gic-v3: Remove unused register definitionZenghui Yu
[maz: The GICv3 spec has evolved quite a bit since the draft the Linux driver was written against, and some register definitions are simply gone] As per the GICv3 specification, GIC{D,R}_SEIR are not assigned and the locations (0x0068) are actually Reserved. GICR_MOV{LPI,ALL}R are two IMP DEF registers and might be defined by some specific micro-architecture. As they're not used anywhere in the kernel, just drop all of them. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> [maz: added context explaination] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200630134126.880-1-yuzenghui@huawei.com
2020-07-27Merge 5.8-rc7 into usb-nextGreg Kroah-Hartman
We want the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27gpio: regmap: fix type clashMichael Walle
GPIO_REGMAP_ADDR_ZERO() cast to unsigned long but the actual config parameters are unsigned int. We use unsigned int here because that is the type which is used by the underlying regmap. Fixes: ebe363197e52 ("gpio: add a reusable generic gpio_chip using regmap") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Walle <michael@walle.cc> Link: https://lore.kernel.org/r/20200725232337.27581-1-michael@walle.cc Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-07-27power: fix duplicated words in bq2415x_charger.hRandy Dunlap
Drop the doubled word "for". Change "It it" to "If it". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-pm@vger.kernel.org Acked-by: Pali Rohár <pali@kernel.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2020-07-26Merge branch 'x86/urgent' into x86/cleanupsIngo Molnar
Refresh the branch for a dependent commit. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-26entry: Correct __secure_computing() stubThomas Gleixner
The original version of that used secure_computing() which has no arguments. Review requested to switch to __secure_computing() which has one. The function name was correct, but no argument added and of course compiling without SECCOMP was deemed overrated. Add the missing function argument. Fixes: 6823ecabf030 ("seccomp: Provide stub for __secure_computing()") Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-07-25bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commandsAndrii Nakryiko
Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
2020-07-25bpf, xdp: Add bpf_link-based XDP attachment APIAndrii Nakryiko
Add bpf_link-based API (bpf_xdp_link) to attach BPF XDP program through BPF_LINK_CREATE command. bpf_xdp_link is mutually exclusive with direct BPF program attachment, previous BPF program should be detached prior to attempting to create a new bpf_xdp_link attachment (for a given XDP mode). Once BPF link is attached, it can't be replaced by other BPF program attachment or link attachment. It will be detached only when the last BPF link FD is closed. bpf_xdp_link will be auto-detached when net_device is shutdown, similarly to how other BPF links behave (cgroup, flow_dissector). At that point bpf_link will become defunct, but won't be destroyed until last FD is closed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-5-andriin@fb.com
2020-07-25bpf, xdp: Maintain info on attached XDP BPF programs in net_deviceAndrii Nakryiko
Instead of delegating to drivers, maintain information about which BPF programs are attached in which XDP modes (generic/skb, driver, or hardware) locally in net_device. This effectively obsoletes XDP_QUERY_PROG command. Such re-organization simplifies existing code already. But it also allows to further add bpf_link-based XDP attachments without drivers having to know about any of this at all, which seems like a good setup. XDP_SETUP_PROG/XDP_SETUP_PROG_HW are just low-level commands to driver to install/uninstall active BPF program. All the higher-level concerns about prog/link interaction will be contained within generic driver-agnostic logic. All the XDP_QUERY_PROG calls to driver in dev_xdp_uninstall() were removed. It's not clear for me why dev_xdp_uninstall() were passing previous prog_flags when resetting installed programs. That seems unnecessary, plus most drivers don't populate prog_flags anyways. Having XDP_SETUP_PROG vs XDP_SETUP_PROG_HW should be enough of an indicator of what is required of driver to correctly reset active BPF program. dev_xdp_uninstall() is also generalized as an iteration over all three supported mode. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-3-andriin@fb.com
2020-07-25bpf: Make bpf_link API available indepently of CONFIG_BPF_SYSCALLAndrii Nakryiko
Similarly to bpf_prog, make bpf_link and related generic API available unconditionally to make it easier to have bpf_link support in various parts of the kernel. Stub out init/prime/settle/cleanup and inc/put APIs. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-2-andriin@fb.com
2020-07-25bpf: Make cgroup storages shared between programs on the same cgroupYiFei Zhu
This change comes in several parts: One, the restriction that the CGROUP_STORAGE map can only be used by one program is removed. This results in the removal of the field 'aux' in struct bpf_cgroup_storage_map, and removal of relevant code associated with the field, and removal of now-noop functions bpf_free_cgroup_storage and bpf_cgroup_storage_release. Second, we permit a key of type u64 as the key to the map. Providing such a key type indicates that the map should ignore attach type when comparing map keys. However, for simplicity newly linked storage will still have the attach type at link time in its key struct. cgroup_storage_check_btf is adapted to accept u64 as the type of the key. Third, because the storages are now shared, the storages cannot be unconditionally freed on program detach. There could be two ways to solve this issue: * A. Reference count the usage of the storages, and free when the last program is detached. * B. Free only when the storage is impossible to be referred to again, i.e. when either the cgroup_bpf it is attached to, or the map itself, is freed. Option A has the side effect that, when the user detach and reattach a program, whether the program gets a fresh storage depends on whether there is another program attached using that storage. This could trigger races if the user is multi-threaded, and since nondeterminism in data races is evil, go with option B. The both the map and the cgroup_bpf now tracks their associated storages, and the storage unlink and free are removed from cgroup_bpf_detach and added to cgroup_bpf_release and cgroup_storage_map_free. The latter also new holds the cgroup_mutex to prevent any races with the former. Fourth, on attach, we reuse the old storage if the key already exists in the map, via cgroup_storage_lookup. If the storage does not exist yet, we create a new one, and publish it at the last step in the attach process. This does not create a race condition because for the whole attach the cgroup_mutex is held. We keep track of an array of new storages that was allocated and if the process fails only the new storages would get freed. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/d5401c6106728a00890401190db40020a1f84ff1.1595565795.git.zhuyifei@google.com
2020-07-25bpf: Fail PERF_EVENT_IOC_SET_BPF when bpf_get_[stack|stackid] cannot workSong Liu
bpf_get_[stack|stackid] on perf_events with precise_ip uses callchain attached to perf_sample_data. If this callchain is not presented, do not allow attaching BPF program that calls bpf_get_[stack|stackid] to this event. In the error case, -EPROTO is returned so that libbpf can identify this error and print proper hint message. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723180648.1429892-3-songliubraving@fb.com
2020-07-25bpf: Separate bpf_get_[stack|stackid] for perf events BPFSong Liu
Calling get_perf_callchain() on perf_events from PEBS entries may cause unwinder errors. To fix this issue, the callchain is fetched early. Such perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY. Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may also cause unwinder errors. To fix this, add separate version of these two helpers, bpf_get_[stack|stackid]_pe. These two hepers use callchain in bpf_perf_event_data_kern->data->callchain. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723180648.1429892-2-songliubraving@fb.com
2020-07-25bpf: Implement bpf iterator for map elementsYonghong Song
The bpf iterator for map elements are implemented. The bpf program will receive four parameters: bpf_iter_meta *meta: the meta data bpf_map *map: the bpf_map whose elements are traversed void *key: the key of one element void *value: the value of the same element Here, meta and map pointers are always valid, and key has register type PTR_TO_RDONLY_BUF_OR_NULL and value has register type PTR_TO_RDWR_BUF_OR_NULL. The kernel will track the access range of key and value during verification time. Later, these values will be compared against the values in the actual map to ensure all accesses are within range. A new field iter_seq_info is added to bpf_map_ops which is used to add map type specific information, i.e., seq_ops, init/fini seq_file func and seq_file private data size. Subsequent patches will have actual implementation for bpf_map_ops->iter_seq_info. In user space, BPF_ITER_LINK_MAP_FD needs to be specified in prog attr->link_create.flags, which indicates that attr->link_create.target_fd is a map_fd. The reason for such an explicit flag is for possible future cases where one bpf iterator may allow more than one possible customization, e.g., pid and cgroup id for task_file. Current kernel internal implementation only allows the target to register at most one required bpf_iter_link_info. To support the above case, optional bpf_iter_link_info's are needed, the target can be extended to register such link infos, and user provided link_info needs to match one of target supported ones. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
2020-07-25bpf: Support readonly/readwrite buffers in verifierYonghong Song
Readonly and readwrite buffer register states are introduced. Totally four states, PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL] are supported. As suggested by their respective names, PTR_TO_RDONLY_BUF[_OR_NULL] are for readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL] for read/write buffers. These new register states will be used by later bpf map element iterator. New register states share some similarity to PTR_TO_TP_BUFFER as it will calculate accessed buffer size during verification time. The accessed buffer size will be later compared to other metrics during later attach/link_create time. Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at prog->aux->bpf_ctx_arg_aux, and bpf verifier will retrieve the values during btf_ctx_access(). Later bpf map element iterator implementation will show how such information will be assigned during target registeration time. The verifier is also enhanced such that PTR_TO_RDONLY_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or ARG_PTR_TO_UNINIT_MEM. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
2020-07-25bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_tYonghong Song
This patch refactored target bpf_iter_init_seq_priv_t callback function to accept additional information. This will be needed in later patches for map element targets since a particular map should be passed to traverse elements for that particular map. In the future, other information may be passed to target as well, e.g., pid, cgroup id, etc. to customize the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com