summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-08-05Merge branch 'remotes/lorenzo/pci/cadence'Bjorn Helgaas
- Convert cadence to use standard "dma-ranges" DT property instead of its own "cdns,no-bar-match-nbits" (Kishon Vijay Abraham I) - Fix pm_runtime_put_sync() issues in cadence error paths (Kishon Vijay Abraham I) - Add PTR_ALIGN_DOWN macro (Kishon Vijay Abraham I) - Convert cadence r/w accessors to only 32-bit accesses (Kishon Vijay Abraham I) - Add cadence support to start Link and check Link status (Kishon Vijay Abraham I) - Allow custom PCI ops for cadence-based drivers (Kishon Vijay Abraham I) - Remove "mem" from cadence reg binding since it's not memory and it overlaps the PCIe config and memory region (Kishon Vijay Abraham I) - Add cadence ->cpu_addr_fixup() for platforms that require absolute addresses in the ATU, not just offsets (Kishon Vijay Abraham I) - Update cadence Vendor IDs using local management registers, not architected config space (Kishon Vijay Abraham I) - Add cadence endpoint driver MSI-X support (Kishon Vijay Abraham I) - Add bindings and driver for TI J721E SoC, supporting both host and endpoint mode (Kishon Vijay Abraham I) * remotes/lorenzo/pci/cadence: MAINTAINERS: Add Kishon Vijay Abraham I for TI J721E SoC PCIe misc: pci_endpoint_test: Add J721E in pci_device_id table PCI: j721e: Add TI J721E PCIe driver dt-bindings: PCI: Add EP mode dt-bindings for TI's J721E SoC dt-bindings: PCI: Add host mode dt-bindings for TI's J721E SoC PCI: cadence: Add MSI-X support to Endpoint driver PCI: cadence: Fix updating Vendor ID and Subsystem Vendor ID register PCI: cadence: Add new *ops* for CPU addr fixup dt-bindings: PCI: cadence: Remove "mem" from reg binding PCI: cadence: Allow pci_host_bridge to have custom pci_ops PCI: cadence: Add support to start link and verify link status PCI: cadence: Convert all r/w accessors to perform only 32-bit accesses linux/kernel.h: Add PTR_ALIGN_DOWN macro PCI: cadence: Fix cdns_pcie_{host|ep}_setup() error path PCI: cadence: Use "dma-ranges" instead of "cdns,no-bar-match-nbits" property
2020-08-05Merge branch 'pci/virtualization'Bjorn Helgaas
- Remove redundant variable init in xen (Colin Ian King) - Add pci_pri_supported() to check device or associated PF for PRI support (Ashok Raj) - Mark AMD Navi10 GPU rev 0x00 ATS as broken (Kai-Heng Feng) - Release IVRS table in AMD ACS quirk (Hanjun Guo) * pci/virtualization: PCI: Release IVRS table in AMD ACS quirk PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken PCI/ATS: Add pci_pri_supported() to check device or associated PF xen: Remove redundant initialization of irq
2020-08-05Merge branch 'pci/misc'Bjorn Helgaas
- Convert PCIe capability PCIBIOS errors to errno (Bolarinwa Olayemi Saheed) - Align PCIe capability and PCI accessor return values (Bolarinwa Olayemi Saheed) - Replace http:// links with https:// (Alexander A. Klimov) - Replace lkml.org, spinics, gmane with lore.kernel.org (Bjorn Helgaas) - Update panic message to mention kzalloc(), not kmalloc() (Liao Pingfang) - Move PCI_VENDOR_ID_REDHAT definition to pci_ids.h (Huacai Chen) - Remove unused pci_lost_interrupt() (Heiner Kallweit) * pci/misc: PCI: Remove unused pci_lost_interrupt() PCI: Move PCI_VENDOR_ID_REDHAT definition to pci_ids.h PCI: Fix error in panic message PCI: Replace lkml.org, spinics, gmane with lore.kernel.org PCI: Replace http:// links with https:// PCI: Align PCIe capability and PCI accessor return values PCI: Convert PCIe capability PCIBIOS errors to errno
2020-08-05Merge branch 'pci/error'Bjorn Helgaas
- Use pci_channel_state_t instead of enum pci_channel_state (Luc Van Oostenryck) - Simplify __aer_print_error() (Bjorn Helgaas) - Log AER correctable errors as warning, not error (Matt Jolly) - Rename pci_aer_clear_device_status() to pcie_clear_device_status() (Bjorn Helgaas) - Clear PCIe Device Status errors only if OS owns AER (Jonathan Cameron) * pci/error: PCI/ERR: Clear PCIe Device Status errors only if OS owns AER PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status() PCI/AER: Log correctable errors as warning, not error PCI/AER: Simplify __aer_print_error() PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'
2020-08-05vdpa: Modify get_vq_state() to return error codeEli Cohen
Modify get_vq_state() so it returns an error code. In case of hardware acceleration, the available index may be retrieved from the device, an operation that can possibly fail. Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Link: https://lore.kernel.org/r/20200804162048.22587-9-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-08-05net/vdpa: Use struct for set/get vq stateEli Cohen
For now VQ state involves 16 bit available index value encoded in u64 variable. In the future it will be extended to contain more fields. Use struct to contain the state, now containing only a single u16 for the available index. In the future we can add fields to this struct. Reviewed-by: Parav Pandit <parav@mellanox.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Link: https://lore.kernel.org/r/20200804162048.22587-8-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05vdpa: remove hard coded virtq numMax Gurtovoy
This will enable vdpa providers to add support for multi queue feature and publish it to upper layers (vhost and virtio). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200804162048.22587-7-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05Merge branch 'mlx5-next' of ↵Michael S. Tsirkin
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into HEAD Merge shared code that's necessary for mlx5 vdpa bits to build. The branch itself includes a small number of patches and is also independently merged by a couple of other trees. That shouldn't cause any conflicts by itself. Saeed Mahameed <saeedm@mellanox.com> says: --- mlx5-next is a very small branch based on a very early rc that includes mlx5 shared stuff between rdma and net-next, and now virtio as well. --- Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05Merge tag 'for-linus-hmm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "Ralph has been working on nouveau's use of hmm_range_fault() and migrate_vma() which resulted in this small series. It adds reporting of the page table order from hmm_range_fault() and some optimization of migrate_vma(): - Report the size of the page table mapping out of hmm_range_fault(). This makes it easier to establish a large/huge/etc mapping in the device's page table. - Allow devices to ignore the invalidations during migration in cases where the migration is not going to change pages. For instance migrating pages to a device does not require the device to invalidate pages already in the device. - Update nouveau and hmm_tests to use the above" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm/test: use the new migration invalidation nouveau/svm: use the new migration invalidation mm/notifier: add migration invalidation type mm/migrate: add a flags parameter to migrate_vma nouveau: fix storing invalid ptes nouveau/hmm: support mapping large sysmem pages nouveau: fix mapping 2MB sysmem pages nouveau/hmm: fault one page at a time mm/hmm: add tests for hmm_pfn_to_map_order() mm/hmm: provide the page mapping order in hmm_range_fault()
2020-08-05Merge tag 'mmc-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds
Pull MMC updates from Ulf Hansson: "MMC core: - Add a new host cap bit and a corresponding DT property, to support power cycling of the card by FW at system suspend/resume. - Fix clock rate setting for SDIO in SDR12/SDR25 speed-mode - Fix switch to 1/4-bit mode at system suspend/resume for SD-combo cards - Convert the mmc-pwrseq DT bindings to the json-schema - Always allow the card detect uevent to be consumed by userspace MMC host controllers: - Convert a few DT bindings to the json-schema - mtk-sd: - Add support for command queue through cqhci - Add support for the MT6779 variant - renesas_sdhi_internal_dmac: - Fix dma unmapping in the error path - sdhci_am654: - Add support for the AM65x PG2.0 variant - Extend support for phys/clocks - sdhci-cadence: - Drop incorrect HW tuning for SD mode - sdhci-msm: - Add support for interconnect bandwidth scaling - Enable internal voltage control - Enable low power state for pinctrls - sdhci-of-at91: - Ludovic Desroches handovers maintenance to Eugen Hristev - sdhci-pci-gli: - Improve clock handling for GL975x - sdhci-pci-o2micro: - Add HW tuning for SDR104 mode - Fix support for O2 host controller Seabird1" * tag 'mmc-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (66 commits) mmc: mediatek: make function msdc_cqe_disable() static MAINTAINERS: mmc: sdhci-of-at91: handover maintenance to Eugen Hristev dt-bindings: mmc: mediatek: Add document for mt6779 mmc: mediatek: command queue support mmc: mediatek: refine msdc timeout api mmc: mediatek: add MT6779 MMC driver support mmc: sdhci-pci-o2micro: Add HW tuning for SDR104 mode mmc: sdhci-pci-o2micro: Bug fix for O2 host controller Seabird1 mmc: via-sdmmc: use generic power management memstick: jmb38x_ms: use generic power management mmc: sdhci-cadence: do not use hardware tuning for SD mode mmc: sdhci-pci-gli: Set SDR104's clock to 205MHz and enable SSC for GL975x mmc: cqhci: Fix a print format for the task descriptor mmc: sdhci-of-arasan: fix timings allocation code mmc: sdhci: Fix a potential uninitialized variable dt-bindings: mmc: renesas,sdhi: convert to YAML dt-bindings: mmc: convert arasan sdhci bindings to yaml mmc: sdhci: Fix potential null pointer access while accessing vqmmc mmc: core: Add MMC_CAP2_FULL_PWR_CYCLE_IN_SUSPEND dt-bindings: mmc: Add full-pwr-cycle-in-suspend property ...
2020-08-05Merge tag 'hwmon-for-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "Highlights: - New driver for Sparx5 SoC temperature sensot - New driver for Corsair Commander Pro - MAX20710 support added to max20730 driver Enhancements: - max6697: Allow max6581 to create tempX_offset attributes - gsc (Gateworks System Controller): add 16bit pre-scaled voltage mode - adm1275: Enable adm1278 ADM1278_TEMP1_EN - dell-smm: Add Latitude 5480 to fan control whitelist Fixes: - adc128d818: Fix advanced configuration register init - pmbus/core: Use s64 instead of long for calculations to fix overflow issues with 32-bit architectures Plus various cleanups in several drivers" * tag 'hwmon-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (32 commits) hwmon: (adc128d818) Fix advanced configuration register init hwmon: (axi-fan-control) remove duplicate macros hwmon: (i5k_amb, vt8231) Drop uses of pci_read_config_*() return value hwmon: (sparx5) Make symbol 's5_temp_match' static hwmon: (corsair-cpro) add reading pwm values hwmon: sparx5: Add Sparx5 SoC temperature driver dt-bindings: hwmon: Add Sparx5 temperature sensor hwmon: (tmp401) Replace HTTP links with HTTPS ones hwmon: (lm95234) Replace HTTP links with HTTPS ones hwmon: (lm90) Replace HTTP links with HTTPS ones hwmon: (k8temp) Replace HTTP links with HTTPS ones hwmon: (jc42) Replace HTTP links with HTTPS ones hwmon: (ina2xx) Replace HTTP links with HTTPS ones hwmon: (ina209) Replace HTTP links with HTTPS ones hwmon: Replace HTTP links with HTTPS ones docs: hwmon: Replace HTTP links with HTTPS ones hwmon: (adm1025) Replace HTTP links with HTTPS ones hwmon: add Corsair Commander Pro driver hwmon: (max6697) Allow max6581 to create tempX_offset hwmon: (tmmp513) Replace HTTP links with HTTPS links ...
2020-08-05Merge tag 'devicetree-for-5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull Devicetree updates from Rob Herring: - Improve device links cycle detection and breaking. Add more bindings for device link dependencies. - Refactor parsing 'no-map' in __reserved_mem_alloc_size() - Improve DT unittest 'ranges' and 'dma-ranges' test case to check differing cell sizes - Various http to https link conversions - Add a schema check to prevent 'syscon' from being used by itself without a more specific compatible - A bunch more DT binding conversions to schema * tag 'devicetree-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits) of: reserved-memory: remove duplicated call to of_get_flat_dt_prop() for no-map node of: unittest: Use bigger address cells to catch parser regressions dt-bindings: memory-controllers: Convert mmdc to json-schema dt-bindings: mtd: Convert imx nand to json-schema dt-bindings: mtd: Convert gpmi nand to json-schema dt-bindings: iio: io-channel-mux: Fix compatible string in example code of: property: Add device link support for pinctrl-0 through pinctrl-8 of: property: Add device link support for multiple DT bindings dt-bindings: phy: ti: phy-gmii-sel: convert bindings to json-schema dt-bindings: mux: mux.h: drop a duplicated word dt-bindings: misc: Convert olpc,xo1.75-ec to json-schema dt-bindings: aspeed-lpc: Replace HTTP links with HTTPS ones dt-bindings: drm/bridge: Replace HTTP links with HTTPS ones drm/tilcdc: Replace HTTP links with HTTPS ones dt-bindings: iommu: renesas,ipmmu-vmsa: Add r8a774e1 support dt-bindings: fpga: Replace HTTP links with HTTPS ones dt-bindings: virtio: Replace HTTP links with HTTPS ones dt-bindings: media: imx274: Add optional input clock and supplies dt-bindings: i2c-gpio: Use 'deprecated' keyword on deprecated properties dt-bindings: interrupt-controller: Fix typos in loongson,liointc.yaml ...
2020-08-05Merge tag 'gpio-v5.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for the v5.9 kernel cycle. There is nothing too exciting in it, but a new macro that fixes a build failure on a minor ARM32 platform that appeared yesterday is part of it so we better merge it. Core changes: - Introduce the for_each_requested_gpio() macro to help in dependent code all over the place. Also patch a few locations to use it while we are at it. - Split out the sysfs code into its own file. - Split out the character device code into its own file, then make a set of refactorings and improvements to this code. We are setting the stage to revamp the userspace API a bit in the next cycle. - Fix a whole slew of kerneldoc that was wrong or missing. New drivers: - The PCA953x driver now supports the PCAL9535. Driver improvements: - A host of incremental modernizations and improvements to the PCA953x driver. - Incremental improvements to the Xilinx Zynq driver. - Some improvements to the GPIO aggregator driver. - I ran all over the place switching all threaded and other drivers requesting their own IRQ while using the core GPIO IRQ helpers to pass the GPIO irq chip as a template instead of calling the explicit set-up functions. Next merge window we may retire the old code altogether" * tag 'gpio-v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (97 commits) gpio: wcove: Request IRQ after all initialisation done gpio: crystalcove: Free IRQ on error path gpio: pca953x: Request IRQ after all initialisation done gpio: don't use same lockdep class for all devm_gpiochip_add_data users gpio: max732x: Use irqchip template gpio: stmpe: Move chip registration gpio: rcar: Use irqchip template gpio: regmap: fix type clash gpio: Correct kernel-doc inconsistency gpio: pci-idio-16: Use irqchip template gpio: pcie-idio-24: Use irqchip template gpio: 104-idio-16: Use irqchip template gpio: 104-idi-48: Use irqchip template gpio: 104-dio-48e: Use irqchip template gpio: ws16c48: Use irqchip template gpio: omap: improve coding style for pin config flags gpio: dln2: Use irqchip template gpio: sch: Add a blank line between declaration and code gpio: sch: changed every 'unsigned' to 'unsigned int' gpio: ich: changed every 'unsigned' to 'unsigned int' ...
2020-08-05Merge tag 'usb-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt updates from Greg KH: "Here is the large set of USB and Thunderbolt patches for 5.9-rc1. Nothing really magic/major in here, just lots of little changes and updates: - clean up language usages in USB core and some drivers - Thunderbolt driver updates and additions - USB Gadget driver updates - dwc3 driver updates (like always...) - build with "W=1" warning fixups - mtu3 driver updates - usb-serial driver updates and device ids - typec additions and updates for new hardware - xhci debug code updates for future platforms - cdns3 driver updates - lots of other minor driver updates and fixes and cleanups All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (330 commits) usb: common: usb-conn-gpio: Register charger usb: mtu3: simplify mtu3_req_complete() usb: mtu3: clear dual mode of u3port when disable device usb: mtu3: use MTU3_EP_WEDGE flag usb: mtu3: remove useless member @busy in mtu3_ep struct usb: mtu3: remove repeated error log usb: mtu3: add ->udc_set_speed() usb: mtu3: introduce a funtion to check maximum speed usb: mtu3: clear interrupts status when disable interrupts usb: mtu3: reinitialize CSR registers usb: mtu3: fix macro for maximum number of packets usb: mtu3: remove unnecessary pointer checks usb: xhci: Fix ASMedia ASM1142 DMA addressing usb: xhci: define IDs for various ASMedia host controllers usb: musb: convert to devm_platform_ioremap_resource_byname usb: gadget: tegra-xudc: convert to devm_platform_ioremap_resource_byname usb: gadget: r8a66597: convert to devm_platform_ioremap_resource_byname usb: dwc3: convert to devm_platform_ioremap_resource_byname usb: cdns3: convert to devm_platform_ioremap_resource_byname usb: phy: am335x: convert to devm_platform_ioremap_resource_byname ...
2020-08-05Merge tag 'driver-core-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of changes to the driver core, and some drivers using the changes, for 5.9-rc1. "Biggest" thing in here is the device link exposure in sysfs, to help to tame the madness that is SoC device tree representations and driver interactions with it. Other stuff in here that is interesting is: - device probe log helper so that drivers can report problems in a unified way easier. - devres functions added - DEVICE_ATTR_ADMIN_* macro added to make it harder to write incorrect sysfs file permissions - documentation cleanups - ability for debugfs to be present in the kernel, yet not exposed to userspace. Needed for systems that want it enabled, but do not trust users, so they can still use some kernel functions that were otherwise disabled. - other minor fixes and cleanups The patches outside of drivers/base/ all have acks from the respective subsystem maintainers to go through this tree instead of theirs. All of these have been in linux-next with no reported issues" * tag 'driver-core-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits) drm/bridge: lvds-codec: simplify error handling drm/bridge/sii8620: fix resource acquisition error handling driver core: add deferring probe reason to devices_deferred property driver core: add device probe log helper driver core: Avoid binding drivers to dead devices Revert "test_firmware: Test platform fw loading on non-EFI systems" firmware_loader: EFI firmware loader must handle pre-allocated buffer selftest/firmware: Add selftest timeout in settings test_firmware: Test platform fw loading on non-EFI systems driver core: Change delimiter in devlink device's name to "--" debugfs: Add access restriction option tracefs: Remove unnecessary debug_fs checks. driver core: Fix probe_count imbalance in really_probe() kobject: remove unused KOBJ_MAX action driver core: Fix sleeping in invalid context during device link deletion driver core: Add waiting_for_supplier sysfs file for devices driver core: Add state_synced sysfs file for devices that support it driver core: Expose device link details in sysfs driver core: Drop mention of obsolete bus rwsem from kernel-doc debugfs: file: Remove unnecessary cast in kfree() ...
2020-08-05Merge tag 'char-misc-5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the large set of char and misc and other driver subsystem patches for 5.9-rc1. Lots of new driver submissions in here, and cleanups and features for existing drivers. Highlights are: - habanalabs driver updates - coresight driver updates - nvmem driver updates - huge number of "W=1" build warning cleanups from Lee Jones - dyndbg updates - virtbox driver fixes and updates - soundwire driver updates - mei driver updates - phy driver updates - fpga driver updates - lots of smaller individual misc/char driver cleanups and fixes Full details are in the shortlog. All of these have been in linux-next with no reported issues" * tag 'char-misc-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (322 commits) habanalabs: remove unused but set variable 'ctx_asid' nvmem: qcom-spmi-sdam: Enable multiple devices dt-bindings: nvmem: SID: add binding for A100's SID controller nvmem: update Kconfig description nvmem: qfprom: Add fuse blowing support dt-bindings: nvmem: Add properties needed for blowing fuses dt-bindings: nvmem: qfprom: Convert to yaml nvmem: qfprom: use NVMEM_DEVID_AUTO for multiple instances nvmem: core: add support to auto devid nvmem: core: Add nvmem_cell_read_u8() nvmem: core: Grammar fixes for help text nvmem: sc27xx: add sc2730 efuse support nvmem: Enforce nvmem stride in the sysfs interface MAINTAINERS: Add git tree for NVMEM FRAMEWORK nvmem: sprd: Fix return value of sprd_efuse_probe() drivers: android: Fix the SPDX comment style drivers: android: Fix a variable declaration coding style issue drivers: android: Remove braces for a single statement if-else block drivers: android: Remove the use of else after return drivers: android: Fix a variable declaration coding style issue ...
2020-08-05Merge tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block stacking updates from Jens Axboe: "The stacking related fixes depended on both the core block and drivers branches, so here's a topic branch with that change. Outside of that, a late fix from Johannes for zone revalidation" * tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-block: block: don't do revalidate zones on invalid devices block: remove blk_queue_stack_limits block: remove bdev_stack_limits block: inherit the zoned characteristics in blk_stack_limits
2020-08-05Merge tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - NVMe: - ZNS support (Aravind, Keith, Matias, Niklas) - Misc cleanups, optimizations, fixes (Baolin, Chaitanya, David, Dongli, Max, Sagi) - null_blk zone capacity support (Aravind) - MD: - raid5/6 fixes (ChangSyun) - Warning fixes (Damien) - raid5 stripe fixes (Guoqing, Song, Yufen) - sysfs deadlock fix (Junxiao) - raid10 deadlock fix (Vitaly) - struct_size conversions (Gustavo) - Set of bcache updates/fixes (Coly) * tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block: (117 commits) md/raid5: Allow degraded raid6 to do rmw md/raid5: Fix Force reconstruct-write io stuck in degraded raid5 raid5: don't duplicate code for different paths in handle_stripe raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show md: print errno in super_written md/raid5: remove the redundant setting of STRIPE_HANDLE md: register new md sysfs file 'uuid' read-only md: fix max sectors calculation for super 1.0 nvme-loop: remove extra variable in create ctrl nvme-loop: set ctrl state connecting after init nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths nvme-multipath: fix logic for non-optimized paths nvme-rdma: fix controller reset hang during traffic nvme-tcp: fix controller reset hang during traffic nvmet: introduce the passthru Kconfig option nvmet: introduce the passthru configfs interface nvmet: Add passthru enable/disable helpers nvmet: add passthru code to process commands nvme: export nvme_find_get_ns() and nvme_put_ns() nvme: introduce nvme_ctrl_get_by_path() ...
2020-08-05watchdog: add support for adjusting last known HW keepalive timeTero Kristo
Certain watchdogs require the watchdog only to be pinged within a specific time window, pinging too early or too late cause the watchdog to fire. In cases where this sort of watchdog has been started before kernel comes up, we must adjust the watchdog keepalive window to match the actually running timer, so add a new driver API for this purpose. Signed-off-by: Tero Kristo <t-kristo@ti.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20200717132958.14304-3-t-kristo@ti.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2020-08-05platform_data/mlxreg: support new watchdog type with longer timeout periodMichael Shych
Add new watchdog type 3 with longer timeout period. Extend size of health_cntr field that that can be used to init watchdog timeout period. Signed-off-by: Michael Shych <michaelsh@mellanox.com> Reviewed-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20200504141427.17685-2-michaelsh@mellanox.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2020-08-05vDPA: dont change vq irq after DRIVER_OKZhu Lingshan
IRQ of a vq is not expected to be changed in a DRIVER_OK ~ !DRIVER_OK period for irq offloading purposes. Place this comment at the side of bus ops get_vq_irq than in set_status in vhost_vdpa. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/20200804102123.69978-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05vDPA: add get_vq_irq() in vdpa_config_opsZhu Lingshan
This commit adds a new function get_vq_irq() in struct vdpa_config_ops, which will return the irq number of a virtqueue. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200731065533.4144-4-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_config: drop LE option from config spaceMichael S. Tsirkin
All drivers now use virtio_cread/write_le for LE config space fields. Drop LE option from virtio_cread/write, only leaving the option to access transitional fields. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_config: add virtio_cread_le_featureMichael S. Tsirkin
Mirrors virtio_cread_feature but for LE fields. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_caif: correct tags for config space fieldsMichael S. Tsirkin
Tag config space fields as having virtio endian-ness. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_config: LE config space accessorsMichael S. Tsirkin
To be used by modern code, as well as to handle LE only fields such as balloon. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_config: disallow native type fields (again)Michael S. Tsirkin
_Generic version allowed __uXX types but that is no longer necessary: Transitional devices should all use __virtioXX types (and __leXX for fields not present in the legacy devices). Modern ones should use __leXX. _uXX type would be a bug. Let's prevent that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_config: rewrite using _GenericMichael S. Tsirkin
Min compiler version has been raised, so that's ok now. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_config: cread/write cleanupMichael S. Tsirkin
Use vars of the correct type instead of casting. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05vdpa: make sure set_features is invoked for legacyMichael S. Tsirkin
Some legacy guests just assume features are 0 after reset. We detect that config space is accessed before features are set and set features to 0 automatically. Note: some legacy guests might not even access config space, if this is reported in the field we might need to catch a kick to handle these. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio_config: disallow native type fieldsMichael S. Tsirkin
Transitional devices should all use __virtioXX types (and __leXX for fields not present in legacy devices). Modern ones should use __leXX. _uXX type would be a bug. Let's prevent that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05virtio: allow __virtioXX, __leXX in config spaceMichael S. Tsirkin
Currently all config space fields are of the type __uXX. This confuses people and some drivers (notably vdpa) access them using CPU endian-ness - which only works well for legacy or LE platforms. Update virtio_cread/virtio_cwrite macros to allow __virtioXX and __leXX field types. Follow-up patches will convert config space to use these types. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com>
2020-08-05virtio_ring: sparse warning fixupMichael S. Tsirkin
virtio_store_mb was built with split ring in mind so it accepts __virtio16 arguments. Packed ring uses __le16 values, so sparse complains. It's just a store with some barriers so let's convert it to a macro, we don't loose too much type safety by doing that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com>
2020-08-05i2c: core: add generic I2C GPIO recoveryCodrin Ciubotariu
Multiple I2C bus drivers use similar bindings to obtain information needed for I2C recovery. For example, for platforms using device-tree, the properties look something like this: &i2c { ... pinctrl-names = "default", "gpio"; pinctrl-0 = <&pinctrl_i2c_default>; pinctrl-1 = <&pinctrl_i2c_gpio>; sda-gpios = <&pio 0 GPIO_ACTIVE_HIGH>; scl-gpios = <&pio 1 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>; ... } For this reason, we can add this common initialization in the core. This way, other I2C bus drivers will be able to support GPIO recovery just by providing a pointer to platform's pinctrl and calling i2c_recover_bus() when SDA is stuck low. Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> [wsa: inverted one logic for better readability, minor update to kdoc] Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-08-05modules: inherit TAINT_PROPRIETARY_MODULEChristoph Hellwig
If a TAINT_PROPRIETARY_MODULE exports symbol, inherit the taint flag for all modules importing these symbols, and don't allow loading symbols from TAINT_PROPRIETARY_MODULE modules if the module previously imported gplonly symbols. Add a anti-circumvention devices so people don't accidentally get themselves into trouble this way. Comment from Greg: "Ah, the proven-to-be-illegal "GPL Condom" defense :)" [jeyu: pr_info -> pr_err and pr_warn as per discussion] Link: http://lore.kernel.org/r/20200730162957.GA22469@lst.de Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2020-08-04Merge tag 'docs-5.9' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <alobakin@marvell.com> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...
2020-08-04Merge tag 'printk-for-5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Herbert Xu made printk header file self-contained. - Andy Shevchenko and Sergey Senozhatsky cleaned up console->setup() error handling. - Andy Shevchenko did some cleanups (e.g. sparse warning) in vsprintf code. - Minor documentation updates. * tag 'printk-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: lib/vsprintf: Force type of flags value for gfp_t lib/vsprintf: Replace custom spec to print decimals with generic one lib/vsprintf: Replace hidden BUILD_BUG_ON() with static_assert() printk: Make linux/printk.h self-contained doc:kmsg: explicitly state the return value in case of SEEK_CUR Replace HTTP links with HTTPS ones: vsprintf hvc: unify console setup naming console: Fix trivia typo 'change' -> 'chance' console: Propagate error code from console ->setup() tty: hvc: Return proper error code from console ->setup() hook serial: sunzilog: Return proper error code from console ->setup() hook serial: sunsab: Return proper error code from console ->setup() hook mips: Return proper error code from console ->setup() hook
2020-08-04Merge tag 'core-entry-2020-08-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull generic kernel entry/exit code from Thomas Gleixner: "Generic implementation of common syscall, interrupt and exception entry/exit functionality based on the recent X86 effort to ensure correctness of entry/exit vs RCU and instrumentation. As this functionality and the required entry/exit sequences are not architecture specific, sharing them allows other architectures to benefit instead of copying the same code over and over again. This branch was kept standalone to allow others to work on it. The conversion of x86 comes in a seperate pull request which obviously is based on this branch" * tag 'core-entry-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Correct __secure_computing() stub entry: Correct 'noinstr' attributes entry: Provide infrastructure for work before transitioning to guest mode entry: Provide generic interrupt entry/exit code entry: Provide generic syscall exit function entry: Provide generic syscall entry functionality seccomp: Provide stub for __secure_computing()
2020-08-04SUNRPC dont update timeout value on connection resetOlga Kornievskaia
Current behaviour: every time a v3 operation is re-sent to the server we update (double) the timeout. There is no distinction between whether or not the previous timer had expired before the re-sent happened. Here's the scenario: 1. Client sends a v3 operation 2. Server RST-s the connection (prior to the timeout) (eg., connection is immediately reset) 3. Client re-sends a v3 operation but the timeout is now 120sec. As a result, an application sees 2mins pause before a retry in case server again does not reply. Instead, this patch proposes to keep track off when the minor timeout should happen and if it didn't, then don't update the new timeout. Value is updated based on the previous value to make timeouts predictable. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-04remoteproc: Add remoteproc character device interfaceSiddharth Gupta
Add the character device interface into remoteproc framework. This interface can be used in order to boot/shutdown remote subsystems and provides a basic ioctl based interface to implement supplementary functionality. An ioctl call is implemented to enable the shutdown on release feature which will allow remote processors to be shutdown when the controlling userspace application crashes or hangs. Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Rishabh Bhatnagar <rishabhb@codeaurora.org> Signed-off-by: Siddharth Gupta <sidgup@codeaurora.org> Link: https://lore.kernel.org/r/1596044401-22083-2-git-send-email-sidgup@codeaurora.org [bjorn: s/int32_t/s32/ per checkpatch] Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-08-04Merge tag 'irq-core-2020-08-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The usual boring updates from the interrupt subsystem: - Infrastructure to allow building irqchip drivers as modules - Consolidation of irqchip ACPI probing - Removal of the EOI-preflow interrupt handler which was required for SPARC support and became obsolete after SPARC was converted to use sparse interrupts. - Cleanups, fixes and improvements all over the place" * tag 'irq-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) irqchip/loongson-pch-pic: Fix the misused irq flow handler irqchip/loongson-htvec: Support 8 groups of HT vectors irqchip/loongson-liointc: Fix misuse of gc->mask_cache dt-bindings: interrupt-controller: Update Loongson HTVEC description irqchip/imx-intmux: Fix irqdata regs save in imx_intmux_runtime_suspend() irqchip/imx-intmux: Implement intmux runtime power management irqchip/gic-v4.1: Use GFP_ATOMIC flag in allocate_vpe_l1_table() irqchip: Fix IRQCHIP_PLATFORM_DRIVER_* compilation by including module.h irqchip/stm32-exti: Map direct event to irq parent irqchip/mtk-cirq: Convert to a platform driver irqchip/mtk-sysirq: Convert to a platform driver irqchip/qcom-pdc: Switch to using IRQCHIP_PLATFORM_DRIVER helper macros irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macros irqchip: irq-bcm2836.h: drop a duplicated word irqchip/gic-v4.1: Ensure accessing the correct RD when writing INVALLR irqchip/irq-bcm7038-l1: Guard uses of cpu_logical_map irqchip/gic-v3: Remove unused register definition irqchip/qcom-pdc: Allow QCOM_PDC to be loadable as a permanent module genirq: Export irq_chip_retrigger_hierarchy and irq_chip_set_vcpu_affinity_parent irqdomain: Export irq_domain_update_bus_token ...
2020-08-04init: add an init_dup helperChristoph Hellwig
Add a simple helper to grab a reference to a file and install it at the next available fd, and switch the early init code over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-04Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - make support for dma_ops optional - move more code out of line - add generic support for a dma_ops bypass mode - misc cleanups * tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping: dma-contiguous: cleanup dma_alloc_contiguous dma-debug: use named initializers for dir2name powerpc: use the generic dma_ops_bypass mode dma-mapping: add a dma_ops_bypass flag to struct device dma-mapping: make support for dma ops optional dma-mapping: inline the fast path dma-direct calls dma-mapping: move the remaining DMA API calls out of line
2020-08-04Merge tag 'uuid-for-5.9' of git://git.infradead.org/users/hch/uuidLinus Torvalds
Pull uuid update from Christoph Hellwig: "Remove a now unused helper (Andy Shevchenko)" * tag 'uuid-for-5.9' of git://git.infradead.org/users/hch/uuid: uuid: remove unused uuid_le_to_bin() definition
2020-08-04Merge tag 'close-range-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range() implementation from Christian Brauner: "This adds the close_range() syscall. It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. This is coordinated with the FreeBSD folks which have copied our version of this syscall and in the meantime have already merged it in April 2019: https://reviews.freebsd.org/D21627 https://svnweb.freebsd.org/base?view=revision&revision=359836 The syscall originally came up in a discussion around the new mount API and making new file descriptor types cloexec by default. During this discussion, Al suggested the close_range() syscall. First, it helps to close all file descriptors of an exec()ing task. This can be done safely via (quoting Al's example from [1] verbatim): /* that exec is sensitive */ unshare(CLONE_FILES); /* we don't want anything past stderr here */ close_range(3, ~0U); execve(....); The code snippet above is one way of working around the problem that file descriptors are not cloexec by default. This is aggravated by the fact that we can't just switch them over without massively regressing userspace. For a whole class of programs having an in-kernel method of closing all file descriptors is very helpful (e.g. demons, service managers, programming language standard libraries, container managers etc.). Second, it allows userspace to avoid implementing closing all file descriptors by parsing through /proc/<pid>/fd/* and calling close() on each file descriptor and other hacks. From looking at various large(ish) userspace code bases this or similar patterns are very common in service managers, container runtimes, and programming language runtimes/standard libraries such as Python or Rust. In addition, the syscall will also work for tasks that do not have procfs mounted and on kernels that do not have procfs support compiled in. In such situations the only way to make sure that all file descriptors are closed is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery. Based on Linus' suggestion close_range() also comes with a new flag CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping right before exec. This would usually be expressed in the sequence: unshare(CLONE_FILES); close_range(3, ~0U); as pointed out by Linus it might be desirable to have this be a part of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which gets especially handy when we're closing all file descriptors above a certain threshold. Test-suite as always included" * tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add CLOSE_RANGE_UNSHARE tests close_range: add CLOSE_RANGE_UNSHARE tests: add close_range() tests arch: wire-up close_range() open: add close_range()
2020-08-04Merge tag 'cap-checkpoint-restore-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull checkpoint-restore updates from Christian Brauner: "This enables unprivileged checkpoint/restore of processes. Given that this work has been going on for quite some time the first sentence in this summary is hopefully more exciting than the actual final code changes required. Unprivileged checkpoint/restore has seen a frequent increase in interest over the last two years and has thus been one of the main topics for the combined containers & checkpoint/restore microconference since at least 2018 (cf. [1]). Here are just the three most frequent use-cases that were brought forward: - The JVM developers are integrating checkpoint/restore into a Java VM to significantly decrease the startup time. - In high-performance computing environment a resource manager will typically be distributing jobs where users are always running as non-root. Long-running and "large" processes with significant startup times are supposed to be checkpointed and restored with CRIU. - Container migration as a non-root user. In all of these scenarios it is either desirable or required to run without CAP_SYS_ADMIN. The userspace implementation of checkpoint/restore CRIU already has the pull request for supporting unprivileged checkpoint/restore up (cf. [2]). To enable unprivileged checkpoint/restore a new dedicated capability CAP_CHECKPOINT_RESTORE is introduced. This solution has last been discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1] "Update on Task Migration at Google Using CRIU") with Adrian and Nicolas providing the implementation now over the last months. In essence, this allows the CRIU binary to be installed with the CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling unprivileged users to restore processes. To make this possible the following permissions are altered: - Selecting a specific PID via clone3() set_tid relaxed from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE. - Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE. - Accessing /proc/pid/map_files relaxed from init userns CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE. - Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE. Of these four changes the /proc/self/exe change deserves a few words because the reasoning behind even restricting /proc/self/exe changes in the first place is just full of historical quirks and tracking this down was a questionable version of fun that I'd like to spare others. In short, it is trivial to change /proc/self/exe as an unprivileged user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace() or by simply intercepting the elf loader in userspace during exec. Nicolas was nice enough to even provide a POC for the latter (cf. [3]) to illustrate this fact. The original patchset which introduced PR_SET_MM_MAP had no permissions around changing the exe link. They too argued that it is trivial to spoof the exe link already which is true. The argument brought up against this was that the Tomoyo LSM uses the exe link in tomoyo_manager() to detect whether the calling process is a policy manager. This caused changing the exe links to be guarded by userns CAP_SYS_ADMIN. All in all this rather seems like a "better guard it with something rather than nothing" argument which imho doesn't qualify as a great security policy. Again, because spoofing the exe link is possible for the calling process so even if this were security relevant it was broken back then and would be broken today. So technically, dropping all permissions around changing the exe link would probably be possible and would send a clearer message to any userspace that relies on /proc/self/exe for security reasons that they should stop doing this but for now we're only relaxing the exe link permissions from userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE. There's a final uapi change in here. Changing the exe link used to accidently return EINVAL when the caller lacked the necessary permissions instead of the more correct EPERM. This pr contains a commit fixing this. I assume that userspace won't notice or care and if they do I will revert this commit. But since we are changing the permissions anyway it seems like a good opportunity to try this fix. With these changes merged unprivileged checkpoint/restore will be possible and has already been tested by various users" [1] LPC 2018 1. "Task Migration at Google Using CRIU" https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095 2. "Securely Migrating Untrusted Workloads with CRIU" https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400 LPC 2019 1. "CRIU and the PID dance" https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s 2. "Update on Task Migration at Google Using CRIU" https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s [2] https://github.com/checkpoint-restore/criu/pull/1155 [3] https://github.com/nviennot/run_as_exe * tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests: add clone3() CAP_CHECKPOINT_RESTORE test prctl: exe link permission error changed from -EINVAL to -EPERM prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid pid: use checkpoint_restore_ns_capable() for set_tid capabilities: Introduce CAP_CHECKPOINT_RESTORE
2020-08-04Merge tag 'fork-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull fork cleanups from Christian Brauner: "This is cleanup series from when we reworked a chunk of the process creation paths in the kernel and switched to struct {kernel_}clone_args. High-level this does two main things: - Remove the double export of both do_fork() and _do_fork() where do_fork() used the incosistent legacy clone calling convention. Now we only export _do_fork() which is based on struct kernel_clone_args. - Remove the copy_thread_tls()/copy_thread() split making the architecture specific HAVE_COYP_THREAD_TLS config option obsolete. This switches all remaining architectures to select HAVE_COPY_THREAD_TLS and thus to the copy_thread_tls() calling convention. The current split makes the process creation codepaths more convoluted than they need to be. Each architecture has their own copy_thread() function unless it selects HAVE_COPY_THREAD_TLS then it has a copy_thread_tls() function. The split is not needed anymore nowadays, all architectures support CLONE_SETTLS but quite a few of them never bothered to select HAVE_COPY_THREAD_TLS and instead simply continued to use copy_thread() and use the old calling convention. Removing this split cleans up the process creation codepaths and paves the way for implementing clone3() on such architectures since it requires the copy_thread_tls() calling convention. After having made each architectures support copy_thread_tls() this series simply renames that function back to copy_thread(). It also switches all architectures that call do_fork() directly over to _do_fork() and the struct kernel_clone_args calling convention. This is a corollary of switching the architectures that did not yet support it over to copy_thread_tls() since do_fork() is conditional on not supporting copy_thread_tls() (Mostly because it lacks a separate argument for tls which is trivial to fix but there's no need for this function to exist.). The do_fork() removal is in itself already useful as it allows to to remove the export of both do_fork() and _do_fork() we currently have in favor of only _do_fork(). This has already been discussed back when we added clone3(). The legacy clone() calling convention is - as is probably well-known - somewhat odd: # # ABI hall of shame # config CLONE_BACKWARDS config CLONE_BACKWARDS2 config CLONE_BACKWARDS3 that is aggravated by the fact that some architectures such as sparc follow the CLONE_BACKWARDSx calling convention but don't really select the corresponding config option since they call do_fork() directly. So do_fork() enforces a somewhat arbitrary calling convention in the first place that doesn't really help the individual architectures that deviate from it. They can thus simply be switched to _do_fork() enforcing a single calling convention. (I really hope that any new architectures will __not__ try to implement their own calling conventions...) Most architectures already have made a similar switch (m68k comes to mind). Overall this removes more code than it adds even with a good portion of added comments. It simplifies a chunk of arch specific assembly either by moving the code into C or by simply rewriting the assembly. Architectures that have been touched in non-trivial ways have all been actually boot and stress tested: sparc and ia64 have been tested with Debian 9 images. They are the two architectures which have been touched the most. All non-trivial changes to architectures have seen acks from the relevant maintainers. nios2 with a custom built buildroot image. h8300 I couldn't get something bootable to test on but the changes have been fairly automatic and I'm sure we'll hear people yell if I broke something there. All other architectures that have been touched in trivial ways have been compile tested for each single patch of the series via git rebase -x "make ..." v5.8-rc2. arm{64} and x86{_64} have been boot tested even though they have just been trivially touched (removal of the HAVE_COPY_THREAD_TLS macro from their Kconfig) because well they are basically "core architectures" and since it is trivial to get your hands on a useable image" * tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: arch: rename copy_thread_tls() back to copy_thread() arch: remove HAVE_COPY_THREAD_TLS unicore: switch to copy_thread_tls() sh: switch to copy_thread_tls() nds32: switch to copy_thread_tls() microblaze: switch to copy_thread_tls() hexagon: switch to copy_thread_tls() c6x: switch to copy_thread_tls() alpha: switch to copy_thread_tls() fork: remove do_fork() h8300: select HAVE_COPY_THREAD_TLS, switch to kernel_clone_args nios2: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args ia64: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args sparc: unconditionally enable HAVE_COPY_THREAD_TLS sparc: share process creation helpers between sparc and sparc64 sparc64: enable HAVE_COPY_THREAD_TLS fork: fold legacy_clone_args_valid() into _do_fork()
2020-08-04Merge tag 'threads-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread updates from Christian Brauner: "This contains the changes to add the missing support for attaching to time namespaces via pidfds. Last cycle setns() was changed to support attaching to multiple namespaces atomically. This requires all namespaces to have a point of no return where they can't fail anymore. Specifically, <namespace-type>_install() is allowed to perform permission checks and install the namespace into the new struct nsset that it has been given but it is not allowed to make visible changes to the affected task. Once <namespace-type>_install() returns, anything that the given namespace type additionally requires to be setup needs to ideally be done in a function that can't fail or if it fails the failure must be non-fatal. For time namespaces the relevant functions that fell into this category were timens_set_vvar_page() and vdso_join_timens(). The latter could still fail although it didn't need to. This function is only implemented for vdso_join_timens() in current mainline. As discussed on-list (cf. [1]), in order to make setns() support time namespaces when attaching to multiple namespaces at once properly we changed vdso_join_timens() to always succeed. So vdso_join_timens() replaces the mmap_write_lock_killable() with mmap_read_lock(). Please note that arm is about to grow vdso support for time namespaces (possibly this merge window). We've synced on this change and arm64 also uses mmap_read_lock(), i.e. makes vdso_join_timens() a function that can't fail. Once the changes here and the arm64 changes have landed, vdso_join_timens() should be turned into a void function so it's obvious to callers and implementers on other architectures that the expectation is that it can't fail. We didn't do this right away because it would've introduced unnecessary merge conflicts between the two trees for no major gain. As always, tests included" [1]: https://lore.kernel.org/lkml/20200611110221.pgd3r5qkjrjmfqa2@wittgenstein * tag 'threads-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: tests: add CLONE_NEWTIME setns tests nsproxy: support CLONE_NEWTIME with setns() timens: add timens_commit() helper timens: make vdso_join_timens() always succeed
2020-08-04Merge branch 'exec-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "During the development of v5.7 I ran into bugs and quality of implementation issues related to exec that could not be easily fixed because of the way exec is implemented. So I have been diggin into exec and cleaning up what I can. This cycle I have been looking at different ideas and different implementations to see what is possible to improve exec, and cleaning the way exec interfaces with in kernel users. Only cleaning up the interfaces of exec with rest of the kernel has managed to stabalize and make it through review in time for v5.9-rc1 resulting in 2 sets of changes this cycle. - Implement kernel_execve - Make the user mode driver code a better citizen With kernel_execve the code size got a little larger as the copying of parameters from userspace and copying of parameters from userspace is now separate. The good news is kernel threads no longer need to play games with set_fs to use exec. Which when combined with the rest of Christophs set_fs changes should security bugs with set_fs much more difficult" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) exec: Implement kernel_execve exec: Factor bprm_stack_limits out of prepare_arg_pages exec: Factor bprm_execve out of do_execve_common exec: Move bprm_mm_init into alloc_bprm exec: Move initialization of bprm->filename into alloc_bprm exec: Factor out alloc_bprm exec: Remove unnecessary spaces from binfmts.h umd: Stop using split_argv umd: Remove exit_umh bpfilter: Take advantage of the facilities of struct pid exit: Factor thread_group_exited out of pidfd_poll umd: Track user space drivers with struct pid bpfilter: Move bpfilter_umh back into init data exec: Remove do_execve_file umh: Stop calling do_execve_file umd: Transform fork_usermode_blob into fork_usermode_driver umd: Rename umd_info.cmdline umd_info.driver_name umd: For clarity rename umh_info umd_info umh: Separate the user mode driver and the user mode helper support umh: Remove call_usermodehelper_setup_file. ...
2020-08-04Merge tag 'audit-pr-20200803' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Aside from some smaller bug fixes, here are the highlights: - add a new backlog wait metric to the audit status message, this is intended to help admins determine how long processes have been waiting for the audit backlog queue to clear - generate audit records for nftables configuration changes - generate CWD audit records for for the relevant LSM audit records" * tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: report audit wait metric in audit status reply audit: purge audit_log_string from the intra-kernel audit API audit: issue CWD record to accompany LSM_AUDIT_DATA_* records audit: use the proper gfp flags in the audit_log_nfcfg() calls audit: remove unused !CONFIG_AUDITSYSCALL __audit_inode* stubs audit: add gfp parameter to audit_log_nfcfg audit: log nftables configuration change events audit: Use struct_size() helper in alloc_chunk