summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-19Merge branch 'pci/reset'Bjorn Helgaas
- Warn about doing a Secondary Bus Reset without holding the device lock (Dan Williams) - Lock bridge in addition to downstream hierarchy before doing a Secondary Bus Reset (Dan Williams) * pci/reset: PCI: Add missing bridge lock to pci_bus_lock() PCI: Warn on missing cfg_access_lock during secondary bus reset
2024-07-19Merge branch 'pci/hotplug'Bjorn Helgaas
- Detect if a device was removed or replaced during system sleep so we don't assume a new device is the one that used to be there. This uses Vendor/Device/Subsystem/Class/Revision and Device Serial Number (if implemented), so it's not fool-proof and drivers may know how to detect more cases (Lukas Wunner) - Add missing MODULE_DESCRIPTION() macro (Jeff Johnson) * pci/hotplug: PCI: acpiphp: Add missing MODULE_DESCRIPTION() macro PCI: pciehp: Detect device replacement during system sleep
2024-07-19Merge branch 'pci/err'Bjorn Helgaas
- Disable AER and DPC during suspend so that if they share an interrupt with PME and errors occur during suspend, the AER or DPC interrupt doesn't cause spurious wakeups (Kai-Heng Feng) * pci/err: PCI/DPC: Disable DPC service on suspend PCI/AER: Disable AER service on suspend
2024-07-19Merge branch 'pci/enumeration'Bjorn Helgaas
- Move the PRESERVE_BOOT_CONFIG ACPI _DSM evaluation from drivers/acpi to drivers/pci so we can unify with similar DT functionality (Vidya Sagar) - Add of_pci_preserve_config() to check for a DT "linux,pci-probe-only" property on a per-host bridge basis in addition to a global basis (Vidya Sagar) - Unify ACPI PRESERVE_BOOT_CONFIG _DSM and DT "linux,pci-probe-only" in a generic pci_preserve_config() path (Vidya Sagar) * pci/enumeration: PCI: Use preserve_config in place of pci_flags PCI: Unify ACPI and DT 'preserve config' support PCI: of: Add of_pci_preserve_config() for per-host bridge support PCI: Move PRESERVE_BOOT_CONFIG _DSM evaluation to pci_register_host_bridge()
2024-07-19Merge branch 'pci/dpc'Bjorn Helgaas
- If there's a device below a bridge, prevent a use-after-free by holding a reference to the device while waiting for the secondary bus to be ready in case the device is concurrently removed, e.g., by DPC (Lukas Wunner) * pci/dpc: PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
2024-07-19Merge branch 'pci/devres'Bjorn Helgaas
- Add pcim_add_mapping_to_legacy_table() and pcim_remove_mapping_from_legacy_table() helper functions to simplify devres iomap table (Philipp Stanner) - Reimplement devres that take a bit mask of BARs in a way that can be used to map partial BARs as well as entire BARs (Philipp Stanner) - Deprecate pcim_iomap_table() and pcim_iomap_regions_request_all() in favor of pcim_* request plus pcim_* mapping (Philipp Stanner) - Add pcim_request_region(), a managed interface to request a single BAR (Philipp Stanner) - Use the existing pci_is_enabled() interface to replace the struct devres.enabled bit (Philipp Stanner) - Move the struct pci_devres.pinned bit to struct pci_dev (Philipp Stanner) - Reimplement pcim_set_mwi() so it uses its own devres cleanup callback instead of a special-purpose bit in struct pci_devres (Philipp Stanner) - Add pcim_intx(), which is unambiguously managed, unlike pci_intx(), which is managed if pcim_enable_device() has been called but unmanaged otherwise (Philipp Stanner) - Remove pcim_release(), which is no longer needed after previous cleanups of pcim_set_mwi() and pci_intx() (Philipp Stanner) - Add pcim_iomap_range(), a managed interface to map part of a BAR (Philipp Stanner) - Fix vboxvideo leak by using the new pcim_iomap_range() instead of the unmanaged pci_iomap_range() (Philipp Stanner) * pci/devres: drm/vboxvideo: fix mapping leaks PCI: Add managed pcim_iomap_range() PCI: Remove legacy pcim_release() PCI: Add managed pcim_intx() PCI: Give pcim_set_mwi() its own devres cleanup callback PCI: Move struct pci_devres.pinned bit to struct pci_dev PCI: Remove struct pci_devres.enabled status bit PCI: Document hybrid devres hazards PCI: Add managed pcim_request_region() PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all() PCI: Add managed partial-BAR request and map infrastructure PCI: Add devres helpers for iomap table PCI: Add and use devres helper for bit masks
2024-07-19Merge branch 'pci/acs'Bjorn Helgaas
- Add ACS quirk for Broadcom BCM5760X NIC, which doesn't allow peer-to-peer transactions between functions, but doesn't advertise ACS support (Ajit Khaparde) - Add "pci=config_acs=" kernel command-line parameter to relax default ACS settings to enable peer-to-peer configurations. Requires expert knowledge of topology and ACS operation (Vidya Sagar) * pci/acs: PCI: Extend ACS configurability PCI: Add ACS quirk for Broadcom BCM5760X NIC
2024-07-12PCI: Extend ACS configurabilityVidya Sagar
PCIe ACS settings control the level of isolation and the possible P2P paths between devices. With greater isolation the kernel will create smaller iommu_groups and with less isolation there is more HW that can achieve P2P transfers. From a virtualization perspective all devices in the same iommu_group must be assigned to the same VM as they lack security isolation. There is no way for the kernel to automatically know the correct ACS settings for any given system and workload. Existing command line options (e.g., disable_acs_redir) allow only for large scale change, disabling all isolation, but this is not sufficient for more complex cases. Add a kernel command-line option 'config_acs' to directly control all the ACS bits for specific devices, which allows the operator to setup the right level of isolation to achieve the desired P2P configuration. The definition is future proof; when new ACS bits are added to the spec the open syntax can be extended. ACS needs to be setup early in the kernel boot as the ACS settings affect how iommu_groups are formed. iommu_group formation is a one time event during initial device discovery, so changing ACS bits after kernel boot can result in an inaccurate view of the iommu_groups compared to the current isolation configuration. ACS applies to PCIe Downstream Ports and multi-function devices. The default ACS settings are strict and deny any direct traffic between two functions. This results in the smallest iommu_group the HW can support. Frequently these values result in slow or non-working P2PDMA. ACS offers a range of security choices controlling how traffic is allowed to go directly between two devices. Some popular choices: - Full prevention - Translated requests can be direct, with various options - Asymmetric direct traffic, A can reach B but not the reverse - All traffic can be direct Along with some other less common ones for special topologies. The intention is that this option would be used with expert knowledge of the HW capability and workload to achieve the desired configuration. Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> [bhelgaas: add example, tidy printk formats] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-12PCI: Add missing bridge lock to pci_bus_lock()Dan Williams
One of the true positives that the cfg_access_lock lockdep effort identified is this sequence: WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70 RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70 Call Trace: <TASK> ? __warn+0x8c/0x190 ? pci_bridge_secondary_bus_reset+0x5d/0x70 ? report_bug+0x1f8/0x200 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? pci_bridge_secondary_bus_reset+0x5d/0x70 pci_reset_bus+0x1d8/0x270 vmd_probe+0x778/0xa10 pci_device_probe+0x95/0x120 Where pci_reset_bus() users are triggering unlocked secondary bus resets. Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses pci_bus_lock() before issuing the reset which locks everything *but* the bridge itself. For the same motivation as adding: bridge = pci_upstream_bridge(dev); if (bridge) pci_dev_lock(bridge); to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add pci_dev_lock() for @bus->self to pci_bus_lock(). Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org> [bhelgaas: squash in recursive locking deadlock fix from Keith Busch: https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11drm/vboxvideo: fix mapping leaksPhilipp Stanner
When the PCI devres API was introduced to this driver, it was wrongly assumed that initializing the device with pcim_enable_device() instead of pci_enable_device() will make all PCI functions managed. This is wrong and was caused by the quite confusing PCI devres API in which some, but not all, functions become managed that way. The function pci_iomap_range() is never managed. Replace pci_iomap_range() with the managed function pcim_iomap_range(). Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions") Link: https://lore.kernel.org/r/20240613115032.29098-14-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2024-07-11PCI: Add managed pcim_iomap_range()Philipp Stanner
The only managed mapping function currently is pcim_iomap() which doesn't allow for mapping an area starting at a certain offset, which many drivers want. Add pcim_iomap_range() as an exported function. Link: https://lore.kernel.org/r/20240613115032.29098-13-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-11PCI: Remove legacy pcim_release()Philipp Stanner
Thanks to preceding cleanup steps, pcim_release() is now not needed anymore and can be replaced by pcim_disable_device(), which is the exact counterpart to pcim_enable_device(). This permits removing further parts of the old PCI devres implementation. Replace pcim_release() with pcim_disable_device(). Remove the now unused function get_pci_dr(). Remove the struct pci_devres from pci.h. Link: https://lore.kernel.org/r/20240613115032.29098-12-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-11PCI: Add managed pcim_intx()Philipp Stanner
pci_intx() is a "hybrid" function, i.e., it is managed if pcim_enable_device() has been called, but unmanaged otherwise. Add pcim_intx(), which is always managed, and implement pci_intx() using it. Remove the now-unused struct pci_devres.orig_intx and .restore_intx and find_pci_dr(). Link: https://lore.kernel.org/r/20240613115032.29098-11-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> [kwilczynski: squashed in https://lore.kernel.org/r/426645d40776198e0fcc942f4a6cac4433c7a9aa.camel@redhat.com to fix problem reported and tested by Ashish Kalra <Ashish.Kalra@amd.com>: https://lore.kernel.org/r/20240708214656.4721-1-Ashish.Kalra@amd.com https://lore.kernel.org/r/8c4634e9-4f02-4c54-9c89-d75e2f4bf026@amd.com/] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Give pcim_set_mwi() its own devres cleanup callbackPhilipp Stanner
Managing pci_set_mwi() with devres can easily be done with its own callback, without the necessity to store any state about it in a device-related struct. Remove the MWI state from struct pci_devres. Give pcim_set_mwi() a separate devres cleanup callback. Link: https://lore.kernel.org/r/20240613115032.29098-10-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Move struct pci_devres.pinned bit to struct pci_devPhilipp Stanner
The bit describing whether the PCI device is currently pinned is stored in struct pci_devres. To clean up and simplify the PCI devres API, it's better if this information is stored in struct pci_dev. This will later permit simplifying pcim_enable_device(). Move the 'pinned' boolean bit to struct pci_dev. Restructure bits in struct pci_dev so the pm / pme fields are next to each other. Link: https://lore.kernel.org/r/20240613115032.29098-9-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Remove struct pci_devres.enabled status bitPhilipp Stanner
The struct pci_devres has a separate boolean to track whether a device is enabled. That, however, can easily be tracked in an agnostic manner through the function pci_is_enabled(). Using it allows for simplifying the PCI devres implementation. Replace the separate 'enabled' status bit from struct pci_devres with calls to pci_is_enabled() at the appropriate places. Link: https://lore.kernel.org/r/20240613115032.29098-8-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Document hybrid devres hazardsPhilipp Stanner
These functions: pci_request_region() pci_request_regions() pci_request_regions_exclusive() pci_request_selected_regions() pci_request_selected_regions_exclusive() pci_intx() are "hybrid" functions that are managed if pcim_enable_device() has been called, but unmanaged otherwise. This is confusing and has already caused a bug (in 8558de401b5f ("drm/vboxvideo: use managed pci functions")) because users believe all PCI functions, such as pci_iomap_range(), can become managed that way, which is not the case. Add comments to the relevant functions' docstrings that warn users about this behavior. Link: https://lore.kernel.org/r/20240613115032.29098-7-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add managed pcim_request_region()Philipp Stanner
These existing functions: pci_request_region() pci_request_selected_regions() pci_request_selected_regions_exclusive() are "hybrid" functions built on __pci_request_region() and are managed if pcim_enable_device() has been called, but unmanaged otherwise. Add these new functions: pcim_request_region() pcim_request_region_exclusive() These are *always* managed and use the new pcim_addr_devres tracking infrastructure instead of find_pci_dr() and struct pci_devres.region_mask. Implement the hybrid functions using the new "pure" functions and remove struct pci_devres.region_mask, which is no longer needed. Link: https://lore.kernel.org/r/20240613115032.29098-6-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()Philipp Stanner
Deprecate pcim_iomap_table(). It returns a pointer to a table of ioremapped BARs, or NULL if it fails. This makes uses like this: addr = pcim_iomap_table(pdev)[0]; problematic because it causes a NULL pointer dereference on failure. Callers should use pcim_iomap() instead. Deprecate pcim_iomap_regions_request_all() because it is built on __pci_request_region() and is managed if pcim_enable_device() has been called, but unmanaged otherwise, which is prone to errors. Callers should either use pcim_iomap_regions() to request and map BARs, or use pcim_request_region() followed by pcim_iomap(). Link: https://lore.kernel.org/r/20240613115032.29098-5-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log, sphinx markup] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add managed partial-BAR request and map infrastructurePhilipp Stanner
The pcim_iomap_devres table tracks entire-BAR mappings, so we can't use it to build a managed version of pci_iomap_range(), which maps partial BARs. Add struct pcim_addr_devres, which can track request and mapping of both entire BARs and partial BARs. Add the following internal devres functions based on struct pcim_addr_devres: pcim_iomap_region() # request & map entire BAR pcim_iounmap_region() # unmap & release entire BAR pcim_request_region() # request entire BAR pcim_release_region() # release entire BAR pcim_request_all_regions() # request all entire BARs pcim_release_all_regions() # release all entire BARs Rework the following public interfaces using the new infrastructure listed above: pcim_iomap() # map partial BAR pcim_iounmap() # unmap partial BAR pcim_iomap_regions() # request & map specified BARs pcim_iomap_regions_request_all() # request all BARs, map specified BARs pcim_iounmap_regions() # unmap & release specified BARs Link: https://lore.kernel.org/r/20240613115032.29098-4-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add devres helpers for iomap tablePhilipp Stanner
The pcim_iomap_devres.table administrated by pcim_iomap_table() has its entries set and unset at several places throughout devres.c using manual iterations which are effectively code duplications. Add pcim_add_mapping_to_legacy_table() and pcim_remove_mapping_from_legacy_table() helper functions and use them where possible. Link: https://lore.kernel.org/r/20240613115032.29098-3-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add and use devres helper for bit masksPhilipp Stanner
The current devres implementation uses manual shift operations to check whether a bit in a mask is set. The code can be made more readable by writing a small helper function for that. Implement mask_contains_bar() and use it where applicable. Link: https://lore.kernel.org/r/20240613115032.29098-2-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-01PCI/DPC: Fix use-after-free on concurrent DPC and hot-removalLukas Wunner
Keith reports a use-after-free when a DPC event occurs concurrently to hot-removal of the same portion of the hierarchy: The dpc_handler() awaits readiness of the secondary bus below the Downstream Port where the DPC event occurred. To do so, it polls the config space of the first child device on the secondary bus. If that child device is concurrently removed, accesses to its struct pci_dev cause the kernel to oops. That's because pci_bridge_wait_for_secondary_bus() neglects to hold a reference on the child device. Before v6.3, the function was only called on resume from system sleep or on runtime resume. Holding a reference wasn't necessary back then because the pciehp IRQ thread could never run concurrently. (On resume from system sleep, IRQs are not enabled until after the resume_noirq phase. And runtime resume is always awaited before a PCI device is removed.) However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset"), which introduced that, failed to appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a reference on the child device because dpc_handler() and pciehp may indeed run concurrently. The commit was backported to v5.10+ stable kernels, so that's the oldest one affected. Add the missing reference acquisition. Abridged stack trace: BUG: unable to handle page fault for address: 00000000091400c0 CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0 RIP: pci_bus_read_config_dword+0x17/0x50 pci_dev_wait() pci_bridge_wait_for_secondary_bus() dpc_reset_link() pcie_do_recovery() dpc_handler() Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset") Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/ Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.1718707743.git.lukas@wunner.de Reported-by: Keith Busch <kbusch@kernel.org> Tested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org # v5.10+
2024-06-18PCI/DPC: Disable DPC service on suspendKai-Heng Feng
If the link is powered off during suspend, electrical noise may cause errors that trigger DPC. If the DPC interrupt is enabled and shares an IRQ with PME, that causes a spurious wakeup during suspend. Disable DPC triggering and the DPC interrupt during suspend to prevent this. Clear DPC interrupt status before re-enabling DPC interrupts during resume so we don't get an interrupt for errors that occurred during the suspend/resume process. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Link: https://lore.kernel.org/r/20240416043225.1462548-3-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [bhelgaas: clear status on resume, add comments, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-18PCI/AER: Disable AER service on suspendKai-Heng Feng
If the link is powered off during suspend, electrical noise may cause errors that are logged via AER. If the AER interrupt is enabled and shares an IRQ with PME, that causes a spurious wakeup during suspend. Disable the AER interrupt during suspend to prevent this. Clear error status before re-enabling IRQ interrupts during resume so we don't get an interrupt for errors that occurred during the suspend/resume process. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Link: https://lore.kernel.org/r/20240416043225.1462548-2-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [bhelgaas: drop pci_ancestor_pr3_present() etc, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-18PCI: acpiphp: Add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=arm64, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pci/hotplug/acpiphp_ampere_altra.o Add the missing MODULE_DESCRIPTION(). Link: https://lore.kernel.org/r/20240612-md-drivers-pci-hotplug-v1-1-2b30d14d783d@quicinc.com Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-04PCI: Warn on missing cfg_access_lock during secondary bus resetDan Williams
The recent adventure with adding lockdep tracking for cfg_access_lock, while it yielded many false positives [1], did catch a true positive in the pci_reset_bus() path [2]. So, while lockdep is difficult to deploy, open coding a check that cfg_access_lock is held during the reset is feasible. While this does not offer a full backtrace, it should be sufficient to implicate the caller of pci_bridge_secondary_bus_reset() as a path that needs investigation. Link: https://lore.kernel.org/r/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com Link: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html [1] Link: http://lore.kernel.org/r/cfb50601-5d2a-4676-a958-1bd3f1b06654@intel.com [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
2024-06-03PCI: Use preserve_config in place of pci_flagsVidya Sagar
Use preserve_config in place of checking for PCI_PROBE_ONLY flag to enable support for "linux,pci-probe-only" on a per host bridge basis. This also obviates the use of adding PCI_REASSIGN_ALL_BUS flag if !PCI_PROBE_ONLY, as pci_assign_unassigned_root_bus_resources() takes care of reassigning the resources that are not already claimed. Link: https://lore.kernel.org/r/20240508174138.3630283-5-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-03PCI: Unify ACPI and DT 'preserve config' supportVidya Sagar
Unify the 'preserve config' support across ACPI and device-tree boot flows. Link: https://lore.kernel.org/r/20240508174138.3630283-4-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-03PCI: of: Add of_pci_preserve_config() for per-host bridge supportVidya Sagar
Add of_pci_preserve_config() to look for the "linux,pci-probe-only" property under a specified node. If it's not found there, look under "of_chosen" in addition. If the caller didn't specify a node, look under "of_chosen". With a future patch, this will support "linux,pci-probe-only" on a per host bridge basis based on the presence of the property in the respective PCI host bridge DT node. Implement of_pci_check_probe_only() using of_pci_preserve_config(). Link: https://lore.kernel.org/r/20240508174138.3630283-3-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-03PCI: Move PRESERVE_BOOT_CONFIG _DSM evaluation to pci_register_host_bridge()Vidya Sagar
Move the PRESERVE_BOOT_CONFIG _DSM evaluation from acpi_pci_root_create() to pci_register_host_bridge(). This will help unify the ACPI _DSM path and the DT-based "linux,pci-probe-only" paths. This should be safe because it happens earlier than it used to: acpi_pci_root_create pci_create_root_bus pci_register_host_bridge + bridge->preserve_config = pci_preserve_config(bridge) pci_acpi_preserve_config + acpi_evaluate_dsm_typed(DSM_PCI_PRESERVE_BOOT_CONFIG) - acpi_evaluate_dsm_typed(DSM_PCI_PRESERVE_BOOT_CONFIG) No functional change intended. Link: https://lore.kernel.org/r/20240508174138.3630283-2-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-30PCI: pciehp: Detect device replacement during system sleepLukas Wunner
Ricky reports that replacing a device in a hotplug slot during ACPI sleep state S3 does not cause re-enumeration on resume, as one would expect. Instead, the new device is treated as if it was the old one. There is no bulletproof way to detect device replacement, but as a heuristic, check whether the device identity in config space matches cached data in struct pci_dev (Vendor ID, Device ID, Class Code, Revision ID, Subsystem Vendor ID, Subsystem ID). Additionally, cache and compare the Device Serial Number (PCIe r6.2 sec 7.9.3). If a mismatch is detected, mark the old device disconnected (to prevent its driver from accessing the new device) and synthesize a Presence Detect Changed event. The device identity in config space which is compared here is the same as the one included in the signed Subject Alternative Name per PCIe r6.1 sec 6.31.3. Thus, the present commit prevents attacks where a valid device is replaced with a malicious device during system sleep and the valid device's driver obliviously accesses the malicious device. This is about as much as can be done at the PCI layer. Drivers may have additional ways to identify devices (such as reading a WWID from some register) and may trigger re-enumeration when detecting an identity change on resume. Link: https://lore.kernel.org/r/a1afaa12f341d146ecbea27c1743661c71683833.1716992815.git.lukas@wunner.de Reported-by: Ricky Wu <ricky_wu@realtek.com> Closes: https://lore.kernel.org/r/a608b5930d0a48f092f717c0e137454b@realtek.com Tested-by: Ricky Wu <ricky_wu@realtek.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-28PCI: Add ACS quirk for Broadcom BCM5760X NICAjit Khaparde
The Broadcom BCM5760X NIC may be a multi-function device. While it does not advertise an ACS capability, peer-to-peer transactions are not possible between the individual functions. So it is ok to treat them as fully isolated. Add an ACS quirk for this device so the functions can be in independent IOMMU groups and attached individually to userspace applications using VFIO. [kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/20240510204228.73435-1-ajit.khaparde@broadcom.com Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
2024-05-26Linux 6.10-rc1v6.10-rc1Linus Torvalds
2024-05-26mm: percpu: Include smp.h in alloc_tag.hKent Overstreet
percpu.h depends on smp.h, but doesn't include it directly because of circular header dependency issues; percpu.h is needed in a bunch of low level headers. This fixes a randconfig build error on mips: include/linux/alloc_tag.h: In function '__alloc_tag_ref_set': include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration] Reported-by: kernel test robot <lkp@intel.com> Fixes: 24e44cc22aa3 ("mm: percpu: enable per-cpu allocation tagging") Closes: https://lore.kernel.org/oe-kbuild-all/202405210052.DIrMXJNz-lkp@intel.com/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-05-26Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tool fix from Arnaldo Carvalho de Melo: "Revert a patch causing a regression. This made a simple 'perf record -e cycles:pp make -j199' stop working on the Ampere ARM64 system Linus uses to test ARM64 kernels". * tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
2024-05-26Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"Arnaldo Carvalho de Melo
This reverts commit 617824a7f0f73e4de325cf8add58e55b28c12493. This made a simple 'perf record -e cycles:pp make -j199' stop working on the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed at length in the threads in the Link tags below. The fix provided by Ian wasn't acceptable and work to fix this will take time we don't have at this point, so lets revert this and work on it on the next devel cycle. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Ethan Adams <j.ethan.adams@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-25Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - two important netfs integration fixes - including for a data corruption and also fixes for multiple xfstests - reenable swap support over SMB3 * tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix missing set of remote_i_size cifs: Fix smb3_insert_range() to move the zero_point cifs: update internal version number smb3: reenable swapfiles over SMB3 mounts
2024-05-25Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes, 11 of which are cc:stable. A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes, various singletons fixing various issues in various parts" * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/ksm: fix possible UAF of stable_node mm/memory-failure: fix handling of dissolved but not taken off from buddy pages mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again nilfs2: fix potential hang in nilfs_detach_log_writer() nilfs2: fix unexpected freezing of nilfs_segctor_sync() nilfs2: fix use-after-free of timer for log writer thread selftests/mm: fix build warnings on ppc64 arm64: patching: fix handling of execmem addresses selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages selftests/mm: compaction_test: fix bogus test success on Aarch64 mailmap: update email address for Satya Priya mm/huge_memory: don't unpoison huge_zero_folio kasan, fortify: properly rename memintrinsics lib: add version into /proc/allocinfo output mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
2024-05-25Merge tag 'irq-urgent-2024-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: - Fix x86 IRQ vector leak caused by a CPU offlining race - Fix build failure in the riscv-imsic irqchip driver caused by an API-change semantic conflict - Fix use-after-free in irq_find_at_or_after() * tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after() genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
2024-05-25Merge tag 'x86-urgent-2024-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix regressions of the new x86 CPU VFM (vendor/family/model) enumeration/matching code - Fix crash kernel detection on buggy firmware with non-compliant ACPI MADT tables - Address Kconfig warning * tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL crypto: x86/aes-xts - switch to new Intel CPU model defines x86/topology: Handle bogus ACPI tables correctly x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
2024-05-25Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmiLinus Torvalds
Pull ipmi updates from Corey Minyard: "Mostly updates for deprecated interfaces, platform.remove and converting from a tasklet to a BH workqueue. Also use HAS_IOPORT for disabling inb()/outb()" * tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi: ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void ipmi: ipmi_ssif: Convert to platform remove callback returning void ipmi: ipmi_si_platform: Convert to platform remove callback returning void ipmi: ipmi_powernv: Convert to platform remove callback returning void ipmi: bt-bmc: Convert to platform remove callback returning void char: ipmi: handle HAS_IOPORT dependencies ipmi: Convert from tasklet to BH workqueue
2024-05-25Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "A series from Xiubo that adds support for additional access checks based on MDS auth caps which were recently made available to clients. This is needed to prevent scenarios where the MDS quietly discards updates that a UID-restricted client previously (wrongfully) acked to the user. Other than that, just a documentation fixup" * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client: doc: ceph: update userspace command to get CephFS metadata ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit ceph: check the cephx mds auth access for async dirop ceph: check the cephx mds auth access for open ceph: check the cephx mds auth access for setattr ceph: add ceph_mds_check_access() helper ceph: save cap_auths in MDS client when session is opened
2024-05-25Merge tag 'ntfs3_for_6.10' of ↵Linus Torvalds
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: "Fixes: - reusing of the file index (could cause the file to be trimmed) - infinite dir enumeration - taking DOS names into account during link counting - le32_to_cpu conversion, 32 bit overflow, NULL check - some code was refactored Changes: - removed max link count info display during driver init Remove: - atomic_open has been removed for lack of use" * tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: Break dir enumeration if directory contents error fs/ntfs3: Fix case when index is reused during tree transformation fs/ntfs3: Mark volume as dirty if xattr is broken fs/ntfs3: Always make file nonresident on fallocate call fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode fs/ntfs3: Use variable length array instead of fixed size fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow fs/ntfs3: Check 'folio' pointer for NULL fs/ntfs3: Missed le32_to_cpu conversion fs/ntfs3: Remove max link count info display during driver init fs/ntfs3: Taking DOS names into account during link counting fs/ntfs3: remove atomic_open fs/ntfs3: use kcalloc() instead of kzalloc()
2024-05-25Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Two ksmbd server fixes, both for stable" * tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: ignore trailing slashes in share paths ksmbd: avoid to send duplicate oplock break notifications
2024-05-25Merge tag 'rtc-6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "There is one new driver and then most of the changes are the device tree bindings conversions to yaml. New driver: - Epson RX8111 Drivers: - Many Device Tree bindings conversions to dtschema - pcf8563: wakeup-source support" * tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: pcf8563: add wakeup-source support rtc: rx8111: handle VLOW flag rtc: rx8111: demote warnings to debug level rtc: rx6110: Constify struct regmap_config dt-bindings: rtc: convert trivial devices into dtschema dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema dt-bindings: rtc: pxa-rtc: convert to dtschema rtc: Add driver for Epson RX8111 dt-bindings: rtc: Add Epson RX8111 rtc: mcp795: drop unneeded MODULE_ALIAS rtc: nuvoton: Modify part number value rtc: test: Split rtc unit test into slow and normal speed test dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema dt-bindings: rtc: digicolor-rtc: move to trivial-rtc dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema dt-bindings: rtc: armada-380-rtc: convert to dtschema rtc: cros-ec: provide ID table for avoiding fallback match
2024-05-25Merge tag 'i3c/for-6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Alexandre Belloni: "Runtime PM (power management) is improved and hot-join support has been added to the dw controller driver. Core: - Allow device driver to trigger controller runtime PM Drivers: - dw: hot-join support - svc: better IBI handling" * tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: dw: Add hot-join support. i3c: master: Enable runtime PM for master controller i3c: master: svc: fix invalidate IBI type and miss call client IBI handler i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
2024-05-25Merge tag 'jffs2-for-linus-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull jffs2 updates from Richard Weinberger: - Fix illegal memory access in jffs2_free_inode() - Kernel-doc fixes - print symbolic error names * tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: jffs2: Fix potential illegal address access in jffs2_free_inode jffs2: Simplify the allocation of slab caches jffs2: nodemgmt: fix kernel-doc comments jffs2: print symbolic error name instead of error code
2024-05-25Merge tag 'uml-for-linus-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Fixes for -Wmissing-prototypes warnings and further cleanup - Remove callback returning void from rtc and virtio drivers - Fix bash location * tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits) um: virtio_uml: Convert to platform remove callback returning void um: rtc: Convert to platform remove callback returning void um: Remove unused do_get_thread_area function um: Fix -Wmissing-prototypes warnings for __vdso_* um: Add an internal header shared among the user code um: Fix the declaration of kasan_map_memory um: Fix the -Wmissing-prototypes warning for get_thread_reg um: Fix the -Wmissing-prototypes warning for __switch_mm um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn um: Stop tracking host PID in cpu_tasks um: process: remove unused 'n' variable um: vector: remove unused len variable/calculation um: vector: fix bpfflash parameter evaluation um: slirp: remove set but unused variable 'pid' um: signal: move pid variable where needed um: Makefile: use bash from the environment um: Add winch to winch_handlers before registering winch IRQ um: Fix -Wmissing-prototypes warnings for __warp_* and foo um: Fix -Wmissing-prototypes warnings for text_poke* um: Move declarations to proper headers ...
2024-05-24Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Some fixes for the end of the merge window, mostly amdgpu and panthor, with one nouveau uAPI change that fixes a bad decision we made a few months back. nouveau: - fix bo metadata uAPI for vm bind panthor: - Fixes for panthor's heap logical block. - Reset on unrecoverable fault - Fix VM references. - Reset fix. xlnx: - xlnx compile and doc fixes. amdgpu: - Handle vbios table integrated info v2.3 amdkfd: - Handle duplicate BOs in reserve_bo_and_cond_vms - Handle memory limitations on small APUs dp/mst: - MST null deref fix. bridge: - Don't let next bridge create connector in adv7511 to make probe work" * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel: drm/amdgpu/atomfirmware: add intergrated info v2.3 table drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2 drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms drm/bridge: adv7511: Attach next bridge without creating connector drm/buddy: Fix the warn on's during force merge drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations drm/panthor: Call panthor_sched_post_reset() even if the reset failed drm/panthor: Reset the FW VM to NULL on unplug drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level drm/panthor: Force an immediate reset on unrecoverable faults drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints drm/panthor: Fix an off-by-one in the heap context retrieval logic drm/panthor: Relax the constraints on the tiler chunk size drm/panthor: Make sure the tiler initial/max chunks are consistent drm/panthor: Fix tiler OOM handling to allow incremental rendering drm: xlnx: zynqmp_dpsub: Fix compilation error drm: xlnx: zynqmp_dpsub: Fix few function comments