summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-03-05signal: add pidfd_send_signal() syscallChristian Brauner
The kill() syscall operates on process identifiers (pid). After a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. This issue has often surfaced and there has been a push to address this problem [1]. This patch uses file descriptors (fd) from proc/<pid> as stable handles on struct pid. Even if a pid is recycled the handle will not change. The fd can be used to send signals to the process it refers to. Thus, the new syscall pidfd_send_signal() is introduced to solve this problem. Instead of pids it operates on process fds (pidfd). /* prototype and argument /* long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags); /* syscall number 424 */ The syscall number was chosen to be 424 to align with Arnd's rework in his y2038 to minimize merge conflicts (cf. [25]). In addition to the pidfd and signal argument it takes an additional siginfo_t and flags argument. If the siginfo_t argument is NULL then pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo(). The flags argument is added to allow for future extensions of this syscall. It currently needs to be passed as 0. Failing to do so will cause EINVAL. /* pidfd_send_signal() replaces multiple pid-based syscalls */ The pidfd_send_signal() syscall currently takes on the job of rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a positive pid is passed to kill(2). It will however be possible to also replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended. /* sending signals to threads (tid) and process groups (pgid) */ Specifically, the pidfd_send_signal() syscall does currently not operate on process groups or threads. This is left for future extensions. In order to extend the syscall to allow sending signal to threads and process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and PIDFD_TYPE_TID) should be added. This implies that the flags argument will determine what is signaled and not the file descriptor itself. Put in other words, grouping in this api is a property of the flags argument not a property of the file descriptor (cf. [13]). Clarification for this has been requested by Eric (cf. [19]). When appropriate extensions through the flags argument are added then pidfd_send_signal() can additionally replace the part of kill(2) which operates on process groups as well as the tgkill(2) and rt_tgsigqueueinfo(2) syscalls. How such an extension could be implemented has been very roughly sketched in [14], [15], and [16]. However, this should not be taken as a commitment to a particular implementation. There might be better ways to do it. Right now this is intentionally left out to keep this patchset as simple as possible (cf. [4]). /* naming */ The syscall had various names throughout iterations of this patchset: - procfd_signal() - procfd_send_signal() - taskfd_send_signal() In the last round of reviews it was pointed out that given that if the flags argument decides the scope of the signal instead of different types of fds it might make sense to either settle for "procfd_" or "pidfd_" as prefix. The community was willing to accept either (cf. [17] and [18]). Given that one developer expressed strong preference for the "pidfd_" prefix (cf. [13]) and with other developers less opinionated about the name we should settle for "pidfd_" to avoid further bikeshedding. The "_send_signal" suffix was chosen to reflect the fact that the syscall takes on the job of multiple syscalls. It is therefore intentional that the name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer because it might imply that pidfd_send_signal() is a replacement for kill(2), and not the latter because it is a hassle to remember the correct spelling - especially for non-native speakers - and because it is not descriptive enough of what the syscall actually does. The name "pidfd_send_signal" makes it very clear that its job is to send signals. /* zombies */ Zombies can be signaled just as any other process. No special error will be reported since a zombie state is an unreliable state (cf. [3]). However, this can be added as an extension through the @flags argument if the need ever arises. /* cross-namespace signals */ The patch currently enforces that the signaler and signalee either are in the same pid namespace or that the signaler's pid namespace is an ancestor of the signalee's pid namespace. This is done for the sake of simplicity and because it is unclear to what values certain members of struct siginfo_t would need to be set to (cf. [5], [6]). /* compat syscalls */ It became clear that we would like to avoid adding compat syscalls (cf. [7]). The compat syscall handling is now done in kernel/signal.c itself by adding __copy_siginfo_from_user_generic() which lets us avoid compat syscalls (cf. [8]). It should be noted that the addition of __copy_siginfo_from_user_any() is caused by a bug in the original implementation of rt_sigqueueinfo(2) (cf. 12). With upcoming rework for syscall handling things might improve significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain any additional callers. /* testing */ This patch was tested on x64 and x86. /* userspace usage */ An asciinema recording for the basic functionality can be found under [9]. With this patch a process can be killed via: #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags) { #ifdef __NR_pidfd_send_signal return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags); #else return -ENOSYS; #endif } int main(int argc, char *argv[]) { int fd, ret, saved_errno, sig; if (argc < 3) exit(EXIT_FAILURE); fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); if (fd < 0) { printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]); exit(EXIT_FAILURE); } sig = atoi(argv[2]); printf("Sending signal %d to process %s\n", sig, argv[1]); ret = do_pidfd_send_signal(fd, sig, NULL, 0); saved_errno = errno; close(fd); errno = saved_errno; if (ret < 0) { printf("%s - Failed to send signal %d to process %s\n", strerror(errno), sig, argv[1]); exit(EXIT_FAILURE); } exit(EXIT_SUCCESS); } /* Q&A * Given that it seems the same questions get asked again by people who are * late to the party it makes sense to add a Q&A section to the commit * message so it's hopefully easier to avoid duplicate threads. * * For the sake of progress please consider these arguments settled unless * there is a new point that desperately needs to be addressed. Please make * sure to check the links to the threads in this commit message whether * this has not already been covered. */ Q-01: (Florian Weimer [20], Andrew Morton [21]) What happens when the target process has exited? A-01: Sending the signal will fail with ESRCH (cf. [22]). Q-02: (Andrew Morton [21]) Is the task_struct pinned by the fd? A-02: No. A reference to struct pid is kept. struct pid - as far as I understand - was created exactly for the reason to not require to pin struct task_struct (cf. [22]). Q-03: (Andrew Morton [21]) Does the entire procfs directory remain visible? Just one entry within it? A-03: The same thing that happens right now when you hold a file descriptor to /proc/<pid> open (cf. [22]). Q-04: (Andrew Morton [21]) Does the pid remain reserved? A-04: No. This patchset guarantees a stable handle not that pids are not recycled (cf. [22]). Q-05: (Andrew Morton [21]) Do attempts to signal that fd return errors? A-05: See {Q,A}-01. Q-06: (Andrew Morton [22]) Is there a cleaner way of obtaining the fd? Another syscall perhaps. A-06: Userspace can already trivially retrieve file descriptors from procfs so this is something that we will need to support anyway. Hence, there's no immediate need to add another syscalls just to make pidfd_send_signal() not dependent on the presence of procfs. However, adding a syscalls to get such file descriptors is planned for a future patchset (cf. [22]). Q-07: (Andrew Morton [21] and others) This fd-for-a-process sounds like a handy thing and people may well think up other uses for it in the future, probably unrelated to signals. Are the code and the interface designed to permit such future applications? A-07: Yes (cf. [22]). Q-08: (Andrew Morton [21] and others) Now I think about it, why a new syscall? This thing is looking rather like an ioctl? A-08: This has been extensively discussed. It was agreed that a syscall is preferred for a variety or reasons. Here are just a few taken from prior threads. Syscalls are safer than ioctl()s especially when signaling to fds. Processes are a core kernel concept so a syscall seems more appropriate. The layout of the syscall with its four arguments would require the addition of a custom struct for the ioctl() thereby causing at least the same amount or even more complexity for userspace than a simple syscall. The new syscall will replace multiple other pid-based syscalls (see description above). The file-descriptors-for-processes concept introduced with this syscall will be extended with other syscalls in the future. See also [22], [23] and various other threads already linked in here. Q-09: (Florian Weimer [24]) What happens if you use the new interface with an O_PATH descriptor? A-09: pidfds opened as O_PATH fds cannot be used to send signals to a process (cf. [2]). Signaling processes through pidfds is the equivalent of writing to a file. Thus, this is not an operation that operates "purely at the file descriptor level" as required by the open(2) manpage. See also [4]. /* References */ [1]: https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/ [2]: https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/ [3]: https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/ [4]: https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/ [5]: https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/ [6]: https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/ [7]: https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/ [8]: https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/ [9]: https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy [11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/ [12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/ [13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/ [14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/ [15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/ [16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/ [17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/ [18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/ [19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/ [20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/ [22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/ [23]: https://lwn.net/Articles/773459/ [24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/ Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Tycho Andersen <tycho@tycho.ws> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Aleksa Sarai <cyphar@cyphar.com>
2019-03-04Merge tag 'leds-for-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: - finalize previously announced support for initialization of pattern triggers from Device Tree - fix for null deref on firmware load failure in leds-lp55xx-common.c * tag 'leds-for-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: leds: lp55xx: fix null deref on firmware load failure leds: trigger: timer: Add initialization from Device Tree leds: trigger: oneshot: Add initialization from Device Tree leds: trigger: pattern: Add pattern initialization from Device Tree leds: Add helper for getting default pattern from Device Tree dt-bindings: leds: Add pattern initialization from Device Tree
2019-03-04Merge tag 'spi-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "A fairly quiet release for SPI, the biggest thing is the conversion to use GPIO descriptors which is now 90% done but still needs some stragglers converting. Summary: - Support for inter-word delays - Conversion of the core and most drivers to use GPIO descriptors for GPIO controlled chip selects - New drivers for NXP FlexSPI and QuadSPI, SiFive and Spreadtrum" * tag 'spi-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (104 commits) spi: sh-msiof: Restrict bits per word to 8/16/24/32 on R-Car Gen2/3 spi: sifive: Remove redundant dev_err call in sifive_spi_probe() spi: sifive: Remove spi_master_put in sifive_spi_remove() spi: spi-gpio: fix SPI_CS_HIGH capability spi: pxa2xx: Setup maximum supported DMA transfer length spi: sifive: Add driver for the SiFive SPI controller spi: sifive: Add DT documentation for SiFive SPI controller spi: sprd: Add a prefix for SPI DMA channel macros spi: sprd: spi: sprd: Add DMA mode support dt-bindings: spi: Add the DMA properties for the SPI dma mode spi: sprd: Add the SPI irq function for the SPI DMA mode dt-bindings: spi: imx: Add an entry for the i.MX8QM compatible spi: use gpio[d]_set_value_cansleep for setting chipselect GPIO spi: gpio: Advertise support for SPI_CS_HIGH spi: sh-msiof: Replace spi_master by spi_controller spi: sh-hspi: Replace spi_master by spi_controller spi: rspi: Replace spi_master by spi_controller spi: atmel-quadspi: add support for sam9x60 qspi controller dt-bindings: spi: atmel-quadspi: QuadSPI driver for Microchip SAM9X60 spi: atmel-quadspi: add support for named peripheral clock ...
2019-03-04Merge tag 'regulator-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The bulk of the standout changes in this release are cleanups, with the core work being a combination of factoring out common code into helpers and the completion of the conversion of the core to use GPIO descriptors. Summary: - Addition of helper functions for current limits and conversion of drivers to use them by Axel Lin. - Lots and lots of cleanups from Axel Lin. - Conversion of the core to use GPIO descriptors rather than numbers by Linus Walleij. - New drivers for Maxim MAX77650 and ROHM BD70528" * tag 'regulator-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (131 commits) regulator: mc13xxx: Constify regulator_ops variables regulator: palmas: Constify palmas_smps_ramp_delay array regulator: wm831x-dcdc: Convert to use regulator_set/get_current_limit_regmap regulator: pv88090: Convert to use regulator_set/get_current_limit_regmap regulator: pv88080: Convert to use regulator_set/get_current_limit_regmap regulator: pv88060: Convert to use regulator_set/get_current_limit_regmap regulator: max77650: Convert to use regulator_set/get_current_limit_regmap regulator: lp873x: Convert to use regulator_set/get_current_limit_regmap regulator: lp872x: Convert to use regulator_set/get_current_limit_regmap regulator: da9210: Convert to use regulator_set/get_current_limit_regmap regulator: da9055: Convert to use regulator_set/get_current_limit_regmap regulator: core: Add set/get_current_limit helpers for regmap users regulator: Fix comment for csel_reg and csel_mask regulator: stm32-vrefbuf: add power management support regulator: 88pm8607: Remove unused fields from struct pm8607_regulator_info regulator: 88pm8607: Simplify pm8607_list_voltage implementation regulator: cpcap: Constify omap4_regulators and xoom_regulators regulator: cpcap: Remove unused vsel_shift from struct cpcap_regulator dt-bindings: regulator: tps65218: rectify units of LS3 dt-bindings: regulator: add LS2 load switch documentation ...
2019-03-04Merge tag 'regmap-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "There are only two changes here: - fix for conflicting attributes on the rbtree node structure - implementation of main status register support in the interrupt code which supports chips that have a register to cut down on the number of per-interrupt status registers that need to be checked when handling interrupts" * tag 'regmap-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: Remove attribute packed from struct 'regcache_rbtree_node' regmap: regmap-irq: Add main status register support
2019-03-04Merge tag 'mmc-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds
Pull MMC updates from Ulf Hansson: "MMC core: - Fixup max_discard/trim calculations - Announce SD specs greater than 4.0 - Add discard support for SD cards - Don't do retries for CMD6 (SWITCH command) - Various cleanups and re-structuring MMC host: - cqhci: * Add maintainers for eMMC CQHCI driver - sdhci: * Consolidate WP GPIO code * Add ADMA3 DMA support for V4 enabled host * Fixup card detect support in pci-o2micro driver * Add support for CMDQ and SDMMC pads auto-calibration in tegra driver * Add DCMD support and CMDQ support, support for i.MX6ULL variant, fixup HS400 timing issue and add HS400_ES support for i.MX8QXP to esdhc-imx driver * Avoid CRC errors by adjusting settings to speed mode and fixup card initialization for high speed mode in renesas_sdhi * Fixup timeout settings for omap * Enable 8 bits bus-width support in atmel-mci * Convert some legacy code in jz4740 driver to use modern APIs * Send a CMD12 to clear DPSM at errors for STM32 sdmmc mmci driver" * tag 'mmc-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (69 commits) mmc:fix a bug when max_discard is 0 mmc: core: Add a debug print when the card may have been replaced mmc: core: Add sd discard timeout mmc: core: Add discard support to sd mmc: sdhci-esdhc-imx: clear the HALT bit when enable CQE mmc: core: do not retry CMD6 in __mmc_switch() mmc: core: Convert mmc_align_data_size() into an SDIO specific function mmc: core: Move mmc_of_parse_voltage() to host.c mmc: core: Convert mmc_regulator_get_ocrmask() to static mmc: core: Move regulator helpers to separate file mmc: of_mmc_spi: Convert to mmc_of_parse_voltage() mmc: core: Drop retries as in-parameter to mmc_wait_for_app_cmd() mmc: core: Convert mmc_wait_for_app_cmd() to static mmc: renesas_sdhi: Change HW adjustment register according to speed mode mmc: mmci: Send a CMD12 to clear the DPSM at errors mmc: sdhci-xenon: Fixup already marked switch fall-through mmc: sdhci-tegra: drop ->get_ro() implementation mmc: sdhci-omap: drop ->get_ro() implementation mmc: sdhci: use WP GPIO in sdhci_check_ro() mmc: wmt-sdmmc: Drop unused include ...
2019-03-04Merge tag 'mtd/for-5.1' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD updates from Boris Brezillon: "Core MTD changes: - Use struct_size() where appropriate - mtd_{read,write}() as wrappers around mtd_{read,write}_oob() - Fix misuse of PTR_ERR() in docg3 - Coding style improvements in mtdcore.c SPI NOR changes: Core changes: - Add support of octal mode I/O transfer - Add a bunch of SPI NOR entries to the flash_info table SPI NOR controller driver changes: - cadence-quadspi: * Add support for Octal SPI controller * write upto 8-bytes data in STIG mode - mtk-quadspi: * rename config to a common one * add SNOR_HWCAPS_READ to spi_nor_hwcaps mask - Add Tudor as SPI-NOR co-maintainer NAND changes: NAND core changes: - Fourth batch of fixes/cleanup to the raw NAND core impacting various controller drivers (Sunxi, Marvell, MTK, TMIO, OMAP2). - Check the return code of nand_reset() and nand_readid_op(). - Remove ->legacy.erase and single_erase(). - Simplify the locking. - Several implicit fall through annotations. Raw NAND controllers drivers changes: - Fix various possible object reference leaks (MTK, JZ4780, Atmel) - ST: * Add support for STM32 FMC2 NAND flash controller - Meson: * Add support for Amlogic NAND flash controller - Denali: * Several cleanup patches - Sunxi: * Several cleanup patches - FSMC: * Disable NAND on remove() * Reset NAND timings on resume() SPI-NAND drivers changes: - Toshiba: * Add support for all Toshiba products. - Macronix: * Fix ECC status read. - Gigadevice: * Add support for GD5F1GQ4UExxG" * tag 'mtd/for-5.1' of git://git.infradead.org/linux-mtd: (64 commits) mtd: spi-nor: Fix wrong abbreviation HWCPAS mtd: spi-nor: cadence-quadspi: fix spelling mistake: "Couldnt't" -> "Couldn't" mtd: spi-nor: Add support for en25qh64 mtd: spi-nor: Add support for MX25V8035F mtd: spi-nor: Add support for EN25Q80A mtd: spi-nor: cadence-quadspi: Add support for Octal SPI controller dt-bindings: cadence-quadspi: Add new compatible for AM654 SoC mtd: spi-nor: split s25fl128s into s25fl128s0 and s25fl128s1 mtd: spi-nor: cadence-quadspi: write upto 8-bytes data in STIG mode mtd: spi-nor: Add support for mx25u3235f mtd: rawnand: denali_dt: remove single anonymous clock support mtd: rawnand: mtk: fix possible object reference leak mtd: rawnand: jz4780: fix possible object reference leak mtd: rawnand: atmel: fix possible object reference leak mtd: rawnand: fsmc: Disable NAND on remove() mtd: rawnand: fsmc: Reset NAND timings on resume() mtd: spinand: Add support for GigaDevice GD5F1GQ4UExxG mtd: rawnand: denali: remove unused dma_addr field from denali_nand_info mtd: rawnand: denali: remove unused function argument 'raw' mtd: rawnand: denali: remove unneeded denali_reset_irq() call ...
2019-03-04Merge tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Switch mdev to generic UUID API (Andy Shevchenko) - Fixup platform reset include paths (Masahiro Yamada) - Fix usage of MINORMASK (Chengguang Xu) - Remove noise from duplicate spapr table unsets (Alexey Kardashevskiy) - Restore device state after PM reset (Alex Williamson) - Ensure memory translation enabled for PCI ROM access (Eric Auger) * tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio: vfio_pci: Enable memory accesses before calling pci_map_rom vfio/pci: Restore device state on PM transition vfio/spapr_tce: Skip unsetting already unset table samples/vfio-mdev/mtty: expand minor range when registering chrdev region samples/vfio-mdev/mdpy: expand minor range when registering chrdev region samples/vfio-mdev/mbochs: expand minor range when registering chrdev region vfio: expand minor range when registering chrdev region vfio: platform: reset: fix up include directives to remove ccflags-y vfio-mdev: Switch to use new generic UUID API
2019-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-03-04aio: simplify - and fix - fget/fput for io_submit()Linus Torvalds
Al Viro root-caused a race where the IOCB_CMD_POLL handling of fget/fput() could cause us to access the file pointer after it had already been freed: "In more details - normally IOCB_CMD_POLL handling looks so: 1) io_submit(2) allocates aio_kiocb instance and passes it to aio_poll() 2) aio_poll() resolves the descriptor to struct file by req->file = fget(iocb->aio_fildes) 3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that aio_kiocb to 2 (bumps by 1, that is). 4) aio_poll() calls vfs_poll(). After sanity checks (basically, "poll_wait() had been called and only once") it locks the queue. That's what the extra reference to iocb had been for - we know we can safely access it. 5) With queue locked, we check if ->woken has already been set to true (by aio_poll_wake()) and, if it had been, we unlock the queue, drop a reference to aio_kiocb and bugger off - at that point it's a responsibility to aio_poll_wake() and the stuff called/scheduled by it. That code will drop the reference to file in req->file, along with the other reference to our aio_kiocb. 6) otherwise, we see whether we need to wait. If we do, we unlock the queue, drop one reference to aio_kiocb and go away - eventual wakeup (or cancel) will deal with the reference to file and with the other reference to aio_kiocb 7) otherwise we remove ourselves from waitqueue (still under the queue lock), so that wakeup won't get us. No async activity will be happening, so we can safely drop req->file and iocb ourselves. If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb won't get freed under us, so we can do all the checks and locking safely. And we don't touch ->file if we detect that case. However, vfs_poll() most certainly *does* touch the file it had been given. So wakeup coming while we are still in ->poll() might end up doing fput() on that file. That case is not too rare, and usually we are saved by the still present reference from descriptor table - that fput() is not the final one. But if another thread closes that descriptor right after our fget() and wakeup does happen before ->poll() returns, we are in trouble - final fput() done while we are in the middle of a method: Al also wrote a patch to take an extra reference to the file descriptor to fix this, but I instead suggested we just streamline the whole file pointer handling by submit_io() so that the generic aio submission code simply keeps the file pointer around until the aio has completed. Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-03-04 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add AF_XDP support to libbpf. Rationale is to facilitate writing AF_XDP applications by offering higher-level APIs that hide many of the details of the AF_XDP uapi. Sample programs are converted over to this new interface as well, from Magnus. 2) Introduce a new cant_sleep() macro for annotation of functions that cannot sleep and use it in BPF_PROG_RUN() to assert that BPF programs run under preemption disabled context, from Peter. 3) Introduce per BPF prog stats in order to monitor the usage of BPF; this is controlled by kernel.bpf_stats_enabled sysctl knob where monitoring tools can make use of this to efficiently determine the average cost of programs, from Alexei. 4) Split up BPF selftest's test_progs similarly as we already did with test_verifier. This allows to further reduce merge conflicts in future and to get more structure into our quickly growing BPF selftest suite, from Stanislav. 5) Fix a bug in BTF's dedup algorithm which can cause an infinite loop in some circumstances; also various BPF doc fixes and improvements, from Andrii. 6) Various BPF sample cleanups and migration to libbpf in order to further isolate the old sample loader code (so we can get rid of it at some point), from Jakub. 7) Add a new BPF helper for BPF cgroup skb progs that allows to set ECN CE code point and a Host Bandwidth Manager (HBM) sample program for limiting the bandwidth used by v2 cgroups, from Lawrence. 8) Enable write access to skb->queue_mapping from tc BPF egress programs in order to let BPF pick TX queue, from Jesper. 9) Fix a bug in BPF spinlock handling for map-in-map which did not propagate spin_lock_off to the meta map, from Yonghong. 10) Fix a bug in the new per-CPU BPF prog counters to properly initialize stats for each CPU, from Eric. 11) Add various BPF helper prototypes to selftest's bpf_helpers.h, from Willem. 12) Fix various BPF samples bugs in XDP and tracing progs, from Toke, Daniel and Yonghong. 13) Silence preemption splat in test_bpf after BPF_PROG_RUN() enforces it now everywhere, from Anders. 14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to get error handling working, from Dan. 15) Fix bpftool documentation and auto-completion with regards to stream_{verdict,parser} attach types, from Alban. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04Merge branch 'spi-5.1' into spi-nextMark Brown
2019-03-04Merge branch 'regulator-5.1' into regulator-nextMark Brown
2019-03-04printk: Remove no longer used LOG_PREFIX.Tetsuo Handa
When commit 5becfb1df5ac8e49 ("kmsg: merge continuation records while printing") introduced LOG_PREFIX, we used KERN_DEFAULT etc. as a flag for setting LOG_PREFIX in order to tell whether to call cont_add() (i.e. whether to append the message to "struct cont"). But since commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for printing continuation lines") inverted the behavior (i.e. don't append the message to "struct cont" unless KERN_CONT is specified) and commit 5aa068ea4082b39e ("printk: remove games with previous record flags") removed the last LOG_PREFIX check, setting LOG_PREFIX via KERN_DEFAULT etc. is no longer meaningful. Therefore, we can remove LOG_PREFIX and make KERN_DEFAULT empty string. Link: http://lkml.kernel.org/r/1550829580-9189-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp To: Steven Rostedt <rostedt@goodmis.org> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-03-04Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: (48 commits) cpufreq: kryo: Release OPP tables on module removal cpufreq: ap806: add missing of_node_put after of_device_is_available cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies cpufreq: Pass updated policy to driver ->setpolicy() callback cpufreq: Fix two debug messages in cpufreq_set_policy() cpufreq: Reorder and simplify cpufreq_update_policy() cpufreq: Add kerneldoc comments for two core functions cpufreq: intel_pstate: Rework iowait boosting to be less aggressive cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate() cpufreq: intel_pstate: Avoid redundant initialization of local vars cpufreq / cppc: Work around for Hisilicon CPPC cpufreq ACPI / CPPC: Add a helper to get desired performance cpufreq: davinci: move configuration to include/linux/platform_data cpufreq: speedstep: convert BUG() to BUG_ON() cpufreq: powernv: fix missing check of return value in init_powernv_pstates() cpufreq: longhaul: remove unneeded semicolon cpufreq: pcc-cpufreq: remove unneeded semicolon cpufreq: Replace double NOT (!!) with single NOT (!) cpufreq: intel_pstate: Add reasons for failure and debug messages cpufreq: dt: Implement online/offline() callbacks ...
2019-03-04Merge branches 'pm-cpuidle' and 'powercap'Rafael J. Wysocki
* pm-cpuidle: ACPI / processor: Set P_LVL{2,3} idle state descriptions intel_idle: add support for Jacobsville cpuidle: dt: bail out if the idle-state DT node is not compatible cpuidle: use BIT() for idle state flags and remove CPUIDLE_DRIVER_FLAGS_MASK Documentation: driver-api: PM: Add cpuidle document cpuidle: New timer events oriented governor for tickless systems * powercap: powercap/intel_rapl: add Ice Lake mobile powercap: intel_rapl: add support for Jacobsville
2019-03-04Merge branches 'pm-core', 'pm-sleep', 'pm-qos', 'pm-domains' and 'pm-em'Rafael J. Wysocki
* pm-core: PM / core: Add support to skip power management in device/driver model PM / suspend: Print debug messages for device using direct-complete PM-runtime: update time accounting only when enabled PM-runtime: Switch accounting over to ktime_get_mono_fast_ns() PM-runtime: Optimize pm_runtime_autosuspend_expiration() PM-runtime: Replace jiffies-based accounting with ktime-based accounting PM-runtime: update accounting_timestamp on enable PM: clock_ops: fix missing clk_prepare() return value check drm/i915: Move on the new pm runtime interface PM-runtime: Add new interface to get accounted time * pm-sleep: PM / wakeup: fix kerneldoc comment for pm_wakeup_dev_event() * pm-qos: PM: QoS: no need to check return value of debugfs_create functions * pm-domains: PM / Domains: Mark "name" const in dev_pm_domain_attach_by_name() PM / Domains: Mark "name" const in genpd_dev_pm_attach_by_name() PM: domains: no need to check return value of debugfs_create functions * pm-em: PM / EM: Expose the Energy Model in debugfs
2019-03-04Merge branch 'acpi-apei'Rafael J. Wysocki
* acpi-apei: (29 commits) efi: cper: Fix possible out-of-bounds access ACPI: APEI: Fix possible out-of-bounds access to BERT region MAINTAINERS: Add James Morse to the list of APEI reviewers ACPI / APEI: Add support for the SDEI GHES Notification type firmware: arm_sdei: Add ACPI GHES registration helper ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry() ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length ACPI / APEI: Make GHES estatus header validation more user friendly ACPI / APEI: Pass ghes and estatus separately to avoid a later copy ACPI / APEI: Let the notification helper specify the fixmap slot ACPI / APEI: Move locking to the notification helper arm64: KVM/mm: Move SEA handling behind a single 'claim' interface KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors ACPI / APEI: Generalise the estatus queue's notify code ACPI / APEI: Don't update struct ghes' flags in read/clear estatus ACPI / APEI: Remove spurious GHES_TO_CLEAR check ...
2019-03-04Merge branches 'acpi-tables', 'acpi-debug', 'acpi-ec' and 'acpi-dptf'Rafael J. Wysocki
* acpi-tables: ACPI/PPTT: Add acpi_pptt_warn_missing() to consolidate logs ACPI / tables: table override from built-in initrd * acpi-debug: ACPI: debug: Clean up acpi_aml_init() ACPI: no need to check return value of debugfs_create functions * acpi-ec: Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk" ACPI: EC: Simplify boot EC checks in acpi_ec_add() ACPI: EC: Eliminate acpi_config_boot_ec() ACPI: EC: Make acpi_ec_dsdt_probe() more straightforward ACPI: EC: Make acpi_ec_ecdt_probe() more straightforward ACPI: EC: Declare boot_ec as static ACPI: EC: Clean up probing for early EC * acpi-dptf: ACPI / DPTF: remove header search path to the parent directory
2019-03-03net: phy: remove gen10g_no_soft_resetHeiner Kallweit
genphy_no_soft_reset and gen10g_no_soft_reset are both the same no-ops, one is enough. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: phy: don't export gen10g_read_statusHeiner Kallweit
gen10g_read_status is deprecated, therefore stop exporting it. We don't want to encourage anybody to use it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: phy: remove gen10g_config_initHeiner Kallweit
ETHTOOL_LINK_MODE_10000baseT_Full_BIT is set anyway in the supported and advertising bitmap because it's part of PHY_10GBIT_FEATURES. And all users of gen10g_config_init use PHY_10GBIT_FEATURES. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: phy: remove gen10g_suspend and gen10g_resumeHeiner Kallweit
phy_suspend() and phy_resume() are no-ops anyway if no callback is defined. Therefore we don't need these stubs. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATEFrancesco Ruggeri
By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will receive all IPv6 RA packets from all namespaces. IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by the socket to be only from the socket's namespace. Signed-off-by: Maxim Martynov <maxim@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04Merge v5.0 into drm-nextDave Airlie
There is a really hairy resolution involving amdgpu fixes, that I'd rather confirm here. Also some misc fixes are landed by me, but the pr has them as well. Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-03-03regulator: core: Add set/get_current_limit helpers for regmap usersAxel Lin
By setting curr_table, n_current_limits, csel_reg and csel_mask, the regmap users can use regulator_set_current_limit_regmap and regulator_get_current_limit_regmap for set/get_current_limit callbacks. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-03-03regulator: Fix comment for csel_reg and csel_maskAxel Lin
The csel_reg and csel_mask fields in struct regulator_desc needs to be generic for drivers. Not just for TPS65218. Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-03-03appletalk: Fix use-after-free in atalk_proc_exitYueHaibing
KASAN report this: BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806 CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667 atalk_proc_exit+0x18/0x820 [appletalk] atalk_exit+0xf/0x5a [appletalk] __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff Allocated by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2739 [inline] slab_alloc mm/slub.c:2747 [inline] kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752 kmem_cache_zalloc include/linux/slab.h:730 [inline] __proc_create+0x30f/0xa20 fs/proc/generic.c:408 proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469 0xffffffffc10c01bb 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook mm/slub.c:1436 [inline] slab_free mm/slub.c:2986 [inline] kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002 pde_put+0x6e/0x80 fs/proc/generic.c:647 remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684 0xffffffffc10c031c 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881f41fe500 which belongs to the cache proc_dir_entry of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff8881f41fe500, ffff8881f41fe600) The buggy address belongs to the page: page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00 raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb It should check the return value of atalk_proc_init fails, otherwise atalk_exit will trgger use-after-free in pde_subdir_find while unload the module.This patch fix error cleanup path of atalk_init Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge tag 'mlx5-updates-2019-03-01' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-03-01 This series adds multipath offload support and contains some small updates to mlx5 driver. Multipath offload support from Roi Dayan: We are going to track SW multipath route and related nexthops and reflect that as port affinity to the HW. 1) Some patches are preparation. 2) add the multipath mode and fib events handling. 3) add support to handle offload failure for net error, i.e. port down. 4) Small updates to match the behavior of multipath Two small updates from Eran Ben Elisha, 5) Make a function static 6) Update PCIe supported devices list. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Add .release_ops to properly unroll .select_ops, use it from nft_compat. After this change, we can remove list of extensions too to simplify this codebase. 2) Update amanda conntrack helper to support v3.4, from Florian Tham. 3) Get rid of the obsolete BUGPRINT macro in ebtables, from Florian Westphal. 4) Merge IPv4 and IPv6 masquerading infrastructure into one single module. From Florian Westphal. 5) Patchset to remove nf_nat_l3proto structure to get rid of indirections, from Florian Westphal. 6) Skip unnecessary conntrack timeout updates in case the value is still the same, also from Florian Westphal. 7) Remove unnecessary 'fall through' comments in empty switch cases, from Li RongQing. 8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys. 9) Incorrect logic to deactivate path of fixed size hashtable sets, element was being tested to self. 10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit keys. 11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi. 12) Enter close state in conntrack if RST matches exact sequence number, from Florian Westphal. 13) Initialize dst_cache in tunnel extension, from wenxu. 14) Pass protocol as u16 to xt_check_match and xt_check_target, from Li RongQing. 15) SCTP header is granted to be in a linear area from IPVS NAT handler, from Xin Long. 16) Don't steal packets coming from slave VRF device from the ip_sabotage_in() path, from David Ahern. 17) Fix unsafe update of basechain stats, from Li RongQing. 18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize modulo operation as bitwise AND, from Li RongQing. 19) Use device_attribute instead of internal definition in the IDLETIMER target, from Sami Tolvanen. 20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix refcount leak in act_ipt during replace, from Davide Caratti. 2) Set task state properly in tun during blocking reads, from Timur Celik. 3) Leaked reference in DSA, from Wen Yang. 4) NULL deref in act_tunnel_key, from Vlad Buslov. 5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts thus referencing garbage, from Nazarov Sergey. 6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those attributes make no sense. 7) Fix hung sendto in tipc, from Tung Nguyen. 8) Out-of-bounds access in netlabel, from Paul Moore. 9) Grant reference leak in xen-netback, from Igor Druzhinin. 10) Fix tx stalls with lan743x, from Bryan Whitehead. 11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit. 12) Memory leak in sit on device registry failure, from Mao Wenan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: sit: fix memory leak in sit_init_net() net: dsa: mv88e6xxx: Fix statistics on mv88e6161 geneve: correctly handle ipv6.disable module parameter net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode bpf: fix sanitation rewrite in case of non-pointers ipv4: Add ICMPv6 support when parse route ipproto MIPS: eBPF: Fix icache flush end address lan743x: Fix TX Stall Issue net: phy: phylink: fix uninitialized variable in phylink_get_mac_state net: aquantia: regression on cpus with high cores: set mode with 8 queues selftests: fixes for UDP GRO bpf: drop refcount if bpf_map_new_fd() fails in map_create() net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X net: dsa: mv88e6xxx: Fix u64 statistics xen-netback: don't populate the hash cache on XenBus disconnect xen-netback: fix occasional leak of grant ref mappings under memory pressure sctp: chunk.c: correct format string for size_t in printk net: netem: fix skb length BUG_ON in __skb_to_sgvec netlabel: fix out-of-bounds memory accesses ipv4: Pass original device to ip_rcv_finish_core ...
2019-03-02platform_data/mlxreg: additions for Mellanox watchdog driver.Michael Shych
There are two new fields added to mlxreg core structure: features - supported features of device and identity - device identity name. Add new defines for watchdog features. Signed-off-by: Michael Shych <michaelsh@mellanox.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2019-03-01NFSv4.2: Add client support for the generic 'layouterror' RPC callTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01NFSv4/flexfiles: Abort I/O early if the layout segment was invalidatedTrond Myklebust
If a layout segment gets invalidated while a pNFS I/O operation is queued for transmission, then we ideally want to abort immediately. This is particularly the case when there is a large number of I/O related RPCs queued in the RPC layer, and the layout segment gets invalidated due to an ENOSPC error, or an EACCES (because the client was fenced). We may end up forced to spam the MDS with a lot of otherwise unnecessary LAYOUTERRORs after that I/O fails. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-01net/mlx5: Emit port affinity event for multipath offloadsRoi Dayan
Under multipath offload scheme, as part of handling fib events, emit mlx5 port affinity event on the enabled ports which will be handled by the tc offloads code. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Add multipath modeRoi Dayan
In order to offload ecmp-on-host scheme where next-hop routes are used, we will make use of HW LAG. Add accessor function to let upper layers in the driver to realize if the lag acts in multi-path mode. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01wusb: Remove unnecessary static function ckhdid_printfJoe Perches
This static inline is unnecessary and can be removed by using the vsprintf %ph extension. This reduces overall object size by more than 2K. Reported-by: Louis Taylor <louis@kragniz.eu> Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Louis Taylor <louis@kragniz.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-01Merge tag 'hyperv-next-signed' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux into char-misc-next Sasha writes: 1. Exopsing counters for state changes of channel ring buffers; this is useful to investigate performance issues. By Kimberly Brown. 2. Switching to the new generic UUID API, by Andy Shevchenko. * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Expose counters for interrupts and full conditions vmbus: Switch to use new generic UUID API
2019-03-01media: include: fix several typosMauro Carvalho Chehab
Use codespell to fix lots of typos over frontends. Manually verified to avoid false-positives. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-03-01Merge tag 'soc-fsl-next-v5.1-4' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into arm/drivers NXP/FSL SoC driver updates for v5.1 take4 DPIO driver - Add support for cache stashing and enable it in dpaa2-eth driver GUTS driver - Make fsl_guts_get_svr() API internal in favor of more generic soc_device_match() * tag 'soc-fsl-next-v5.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux: dpaa2-eth: configure the cache stashing amount on a queue soc: fsl: dpio: configure cache stashing destination soc: fsl: dpio: enable frame data cache stashing per software portal soc: fsl: guts: make fsl_guts_get_svr() static Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-03-01Merge tag 'tee-misc-for-v5.1' of ↵Arnd Bergmann
https://git.linaro.org/people/jens.wiklander/linux-tee into arm/drivers OP-TEE driver - dual license for optee_msg.h and optee_smc.h Generic - add cancellation support to client interface * tag 'tee-misc-for-v5.1' of https://git.linaro.org/people/jens.wiklander/linux-tee: tee: optee: update optee_msg.h and optee_smc.h to dual license tee: add cancellation support to client interface Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-03-01netfilter: convert the proto argument from u8 to u16Li RongQing
The proto in struct xt_match and struct xt_target is u16, when calling xt_check_target/match, their proto argument is u8, and will cause truncation, it is harmless to ip packet, since ip proto is u8 if a etable's match/target has proto that is u16, will cause the check failure. and convert be16 to short in bridge/netfilter/ebtables.c Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01Merge branches 'iommu/fixes', 'arm/msm', 'arm/tegra', 'arm/mediatek', ↵Joerg Roedel
'x86/vt-d', 'x86/amd', 'hyper-v' and 'core' into next
2019-02-28block: optimize bvec iteration in bvec_iter_advanceChristoph Hellwig
There is no need to only iterate in chunks of PAGE_SIZE or less in bvec_iter_advance, given that the callers pass in the chunk length that they are operating on - either that already is less than PAGE_SIZE because they do classic page-based iteration, or it is larger because the caller operates on multi-page bvecs. This should help shaving off a few cycles of the I/O hot path. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28of: mark early_init_dt_alloc_reserved_memory_arch staticChristoph Hellwig
This function is only used in of_reserved_mem.c, and never overridden despite the __weak marker. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rob Herring <robh@kernel.org>
2019-02-28io_uring: add support for pre-mapped user IO buffersJens Axboe
If we have fixed user buffers, we can map them into the kernel when we setup the io_uring. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must call io_uring_register() after having setup an io_uring instance, passing in IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer to an iovec array, and the nr_args should contain how many iovecs the application wishes to map. If successful, these buffers are now mapped into the kernel, eligible for IO. To use these fixed buffers, the application must use the IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len must point to somewhere inside the indexed buffer. The application may register buffers throughout the lifetime of the io_uring instance. It can call io_uring_register() with IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of buffers, and then register a new set. The application need not unregister buffers explicitly before shutting down the io_uring instance. It's perfectly valid to setup a larger buffer, and then sometimes only use parts of it for an IO. As long as the range is within the originally mapped region, it will work just fine. For now, buffers must not be file backed. If file backed buffers are passed in, the registration will fail with -1/EOPNOTSUPP. This restriction may be relaxed in the future. RLIMIT_MEMLOCK is used to check how much memory we can pin. A somewhat arbitrary 1G per buffer size is also imposed. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28fs: add fget_many() and fput_many()Jens Axboe
Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument for how many references to get on the file. Ditto fput_many(), which can drop an arbitrary number of references to a file. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28Add io_uring IO interfaceJens Axboe
The submission queue (SQ) and completion queue (CQ) rings are shared between the application and the kernel. This eliminates the need to copy data back and forth to submit and complete IO. IO submissions use the io_uring_sqe data structure, and completions are generated in the form of io_uring_cqe data structures. The SQ ring is an index into the io_uring_sqe array, which makes it possible to submit a batch of IOs without them being contiguous in the ring. The CQ ring is always contiguous, as completion events are inherently unordered, and hence any io_uring_cqe entry can point back to an arbitrary submission. Two new system calls are added for this: io_uring_setup(entries, params) Sets up an io_uring instance for doing async IO. On success, returns a file descriptor that the application can mmap to gain access to the SQ ring, CQ ring, and io_uring_sqes. io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize) Initiates IO against the rings mapped to this fd, or waits for them to complete, or both. The behavior is controlled by the parameters passed in. If 'to_submit' is non-zero, then we'll try and submit new IO. If IORING_ENTER_GETEVENTS is set, the kernel will wait for 'min_complete' events, if they aren't already available. It's valid to set IORING_ENTER_GETEVENTS and 'min_complete' == 0 at the same time, this allows the kernel to return already completed events without waiting for them. This is useful only for polling, as for IRQ driven IO, the application can just check the CQ ring without entering the kernel. With this setup, it's possible to do async IO with a single system call. Future developments will enable polled IO with this interface, and polled submission as well. The latter will enable an application to do IO without doing ANY system calls at all. For IRQ driven IO, an application only needs to enter the kernel for completions if it wants to wait for them to occur. Each io_uring is backed by a workqueue, to support buffered async IO as well. We will only punt to an async context if the command would need to wait for IO on the device side. Any data that can be accessed directly in the page cache is done inline. This avoids the slowness issue of usual threadpools, since cached data is accessed as quickly as a sync interface. Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28block: introduce mp_bvec_for_each_page() for iterating over pageMing Lei
mp_bvec_for_each_segment() is a bit big for the iteration, so introduce a light-weight helper for iterating over pages, then 32bytes stack space can be saved. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>