summaryrefslogtreecommitdiff
path: root/drivers/pci/pci.h
AgeCommit message (Collapse)Author
2025-06-04Merge tag 'pci-v6.16-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Print the actual delay time in pci_bridge_wait_for_secondary_bus() instead of assuming it was 1000ms (Wilfred Mallawa) - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices', which broke resume from system sleep on AMD platforms and has been fixed by other commits (Lukas Wunner) Resource management: - Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and unnecessary (Philipp Stanner) - Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and related flags since all uses have been removed (Philipp Stanner) - Rework devres 'request' functions so they are no longer 'hybrid', i.e., their behavior no longer depends on whether pcim_enable_device or pci_enable_device() was used, and remove related code (Philipp Stanner) - Warn (not BUG()) about failure to assign optional resources (Ilpo Järvinen) Error handling: - Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL or ERR_NONFATAL was received from a downstream device) and decode into bus/device/function (Bjorn Helgaas) - Determine AER log level once and save it so all related messages use the same level (Karolina Stolarek) - Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors (Karolina Stolarek) - Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs controls on interval and burst count, to avoid flooding logs and RCU stall warnings (Jon Pan-Doh) Power management: - Increment PM usage counter when probing reset methods so we don't try to read config space of a powered-off device (Alex Williamson) - Set all devices to D0 during enumeration to ensure ACPI opregion is connected via _REG (Mario Limonciello) Power control: - Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the filename paths. Retain old deprecated symbols for compatibility, except for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold) - When unregistering pwrctrl, cancel outstanding rescan work before cleaning up data structures to avoid use-after-free issues (Brian Norris) Bandwidth control: - Simplify link bandwidth controller by replacing the count of Link Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag (Ilpo Järvinen) - Update the Link Speed after retraining, since the Link Speed may have changed (Ilpo Järvinen) PCIe native device hotplug: - Ignore Presence Detect Changed caused by DPC. pciehp already ignores Link Down/Up events caused by DPC, but on slots using in-band presence detect, DPC causes a spurious Presence Detect Changed event (Lukas Wunner) - Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports using in-band presence detect, the reset causes a Presence Detect Changed event, which mistakenly caused teardown and re-enumeration of the device. Drivers may need to annotate code that resets their device (Lukas Wunner) Virtualization: - Add an ACS quirk for Loongson Root Ports that don't advertise ACS but don't allow peer-to-peer transactions between Root Ports; the quirk allows each Root Port to be in a separate IOMMU group (Huacai Chen) Endpoint framework: - For fixed-size BARs, retain both the actual size and the possibly larger size allocated to accommodate iATU alignment requirements (Jerome Brunet) - Simplify ctrl/SPAD space allocation and avoid allocating more space than needed (Jerome Brunet) - Correct MSI-X PBA offset calculations for DesignWare and Cadence endpoint controllers (Niklas Cassel) - Align the return value (number of interrupts) encoding for pci_epc_get_msi()/pci_epc_ops::get_msi() and pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel) - Align the nr_irqs parameter encoding for pci_epc_set_msi()/pci_epc_ops::set_msi() and pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel) Common host controller library: - Convert pci-host-common to a library so platforms that don't need native host controller drivers don't need to include these helper functions (Manivannan Sadhasivam) Apple PCIe controller driver: - Extract ECAM bridge creation helper from pci_host_common_probe() to separate driver-specific things like MSI from PCI things (Marc Zyngier) - Dynamically allocate RID-to_SID bitmap to prepare for SoCs with varying capabilities (Marc Zyngier) - Skip ports disabled in DT when setting up ports (Janne Grunau) - Add t6020 compatible string (Alyssa Rosenzweig) - Add T602x PCIe support (Hector Martin) - Directly set/clear INTx mask bits because T602x dropped the accessors that could do this without locking (Marc Zyngier) - Move port PHY registers to their own reg items to accommodate T602x, which moves them around; retain default offsets for existing DTs that lack phy%d entries with the reg offsets (Hector Martin) - Stop polling for core refclk, which doesn't work on T602x and the bootloader has already done anyway (Hector Martin) - Use gpiod_set_value_cansleep() when asserting PERST# in probe because we're allowed to sleep there (Hector Martin) Cadence PCIe controller driver: - Drop a runtime PM 'put' to resolve a runtime atomic count underflow (Hans Zhang) - Make the cadence core buildable as a module (Kishon Vijay Abraham I) - Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by loadable drivers when they are removed (Siddharth Vadapalli) Freescale i.MX6 PCIe controller driver: - Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP (Richard Zhu) - Remove redundant dw_pcie_wait_for_link() from imx_pcie_start_link(); since the DWC core does this, imx6 only needs it when retraining for a faster link speed (Richard Zhu) - Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu) - Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in some cases, the controller can't exit 'L23 Ready' through Beacon or PERST# deassertion (Richard Zhu) - Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum: controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8 GT/s, causing timeouts in L1 (Richard Zhu) - Wait for i.MX95 PLL lock before enabling controller (Richard Zhu) - Save/restore i.MX95 LUT for suspend/resume (Richard Zhu) Mobiveil PCIe controller driver: - Return bool (not int) for link-up check in mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans Zhang) NVIDIA Tegra194 PCIe controller driver: - Create debugfs directory for 'aspm_state_cnt' only when CONFIG_PCIEASPM is enabled, since there are no other entries (Hans Zhang) Qualcomm PCIe controller driver: - Add OF support for parsing DT 'eq-presets-<N>gts' property for lane equalization presets (Krishna Chaitanya Chundru) - Read Maximum Link Width from the Link Capabilities register if DT lacks 'num-lanes' property (Krishna Chaitanya Chundru) - Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru) - Add generic dwc support for configuring lane equalization presets (Krishna Chaitanya Chundru) - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar) Renesas R-Car PCIe controller driver: - Describe endpoint BAR 4 as being fixed size (Jerome Brunet) - Document how to obtain R-Car V4H (r8a779g0) controller firmware (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Reorder rockchip_pci_core_rsts because reset_control_bulk_deassert() deasserts in reverse order, to fix a link training regression (Jensen Huang) - Mark RK3399 as being capable of raising INTx interrupts (Niklas Cassel) Rockchip DesignWare PCIe controller driver: - Check only PCIE_LINKUP, not LTSSM status, to determine whether the link is up (Shawn Lin) - Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s for Root Complex and Endpoint modes (Shawn Lin) - Hide the broken ATS Capability in rockchip_pcie_ep_init() instead of rockchip_pcie_ep_pre_init() so it stays hidden after PERST# resets non-sticky registers (Shawn Lin) - Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit() (Diederik de Haas) Synopsys DesignWare PCIe controller driver: - Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training more robust; this will not affect the intended link width if all lanes are functional (Wenbin Yao) - Return bool (not int) for link-up check in dw_pcie_ops.link_up() and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay, keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx, tegra194, uniphier, visconti (Hans Zhang) - Add debugfs support for exposing DWC device-specific PTM context (Manivannan Sadhasivam) TI J721E PCIe driver: - Make j721e buildable as a loadable and removable module (Siddharth Vadapalli) - Fix j721e host/endpoint dependencies that result in link failures in some configs (Arnd Bergmann) Device tree bindings: - Add qcom DT binding for 'global' interrupt (PCIe controller and link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p, sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan Sadhasivam) - Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074, ipq8074-gen3, ipq6018 (Manivannan Sadhasivam) - Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang) - Correct indentation and style of examples in brcm,stb-pcie, cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie, microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm (Krzysztof Kozlowski) - Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and armada8k from text to schema DT bindings (Rob Herring) - Remove obsolete .txt DT bindings for content that has been moved to schemas (Rob Herring) - Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074 and IPQ9574 (Varadarajan Narayanan) - Convert v3,v360epc-pci from text to DT schema binding (Rob Herring) - Change microchip,pcie-host DT binding to be 'dma-noncoherent' since PolarFire may be configured that way (Conor Dooley) Miscellaneous: - Drop 'pci' suffix from intel_mid_pci.c filename to match similar files (Andy Shevchenko) - All platforms with PCI have an MMU, so add PCI Kconfig dependency on MMU to simplify build testing and avoid inadvertent build regressions (Arnd Bergmann) - Update Krzysztof Wilczyński's email address in MAINTAINERS (Krzysztof Wilczyński) - Update Manivannan Sadhasivam's email address in MAINTAINERS (Manivannan Sadhasivam)" * tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits) MAINTAINERS: Update Manivannan Sadhasivam email address PCI: j721e: Fix host/endpoint dependencies PCI: j721e: Add support to build as a loadable module PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup PCI: cadence: Add support to build pcie-cadence library as a kernel module MAINTAINERS: Update Krzysztof Wilczyński email address PCI: Remove unnecessary linesplit in __pci_setup_bridge() PCI: WARN (not BUG()) when we fail to assign optional resources PCI: Remove unused pci_printk() PCI: qcom: Replace PERST# sleep time with proper macro PCI: dw-rockchip: Replace PERST# sleep time with proper macro PCI: host-common: Convert to library for host controller drivers PCI/ERR: Remove misleading TODO regarding kernel panic PCI: cadence: Remove duplicate message code definitions PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding PCI: cadence-ep: Correct PBA offset in .set_msix() callback ...
2025-06-04Merge branch 'pci/controller/qcom'Bjorn Helgaas
- Add OF support for parsing DT 'eq-presets-<N>gts' property for lane equalization presets (Krishna Chaitanya Chundru) - Read Maximum Link Width from the Link Capabilities register if DT lacks 'num-lanes' property (Krishna Chaitanya Chundru) - Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru) - Add generic dwc support for configuring lane equalization presets (Krishna Chaitanya Chundru) - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar) * pci/controller/qcom: PCI: qcom: Add support for IPQ5018 dt-bindings: PCI: qcom: Add IPQ5018 SoC PCI: dwc: Add support for configuring lane equalization presets PCI: Add lane equalization register offsets PCI: dwc: Update pci->num_lanes to maximum supported link width PCI: of: Add of_pci_get_equalization_presets() API
2025-06-04Merge branch 'pci/pm'Bjorn Helgaas
- Add pm_runtime_put() cleanup helper for use with __free() to automatically drop the device usage count when a pointer goes out of scope (Alex Williamson) - Increment PM usage counter when probing reset methods so we don't try to read config space of a powered-off device (Alex Williamson) - Set all devices to D0 during enumeration to ensure ACPI opregion is connected via _REG (Mario Limonciello) * pci/pm: PCI: Explicitly put devices into D0 when initializing PCI: Increment PM usage counter when probing reset methods PM: runtime: Define pm_runtime_put cleanup helper
2025-06-04Merge branch 'pci/hotplug'Bjorn Helgaas
- Ignore Presence Detect Changed caused by DPC. pciehp already ignores Link Down/Up events caused by DPC, but on slots using in-band presence detect, DPC causes a spurious Presence Detect Changed event (Lukas Wunner) - Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports using in-band presence detect, the reset causes a Presence Detect Changed event, which mistakenly caused teardown and re-enumeration of the device. Drivers may need to annotate code that resets their device (Lukas Wunner) * pci/hotplug: PCI: hotplug: Drop superfluous #include directives PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset PCI: pciehp: Ignore Presence Detect Changed caused by DPC # Conflicts: # drivers/pci/pci.h
2025-06-04Merge branch 'pci/enumeration'Bjorn Helgaas
- Remove pci_fixup_cardbus(), which has no users left (Heiner Kallweit) - Print the actual delay time in pci_bridge_wait_for_secondary_bus() instead of assuming it was 1000ms (Wilfred Mallawa) - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices', which broke resume from system sleep on AMD platforms and has been fixed by other commits (Lukas Wunner) - Restrict visibility of pci_dev.match_driver since it's no longer used outside the PCI core (Lukas Wunner) * pci/enumeration: PCI: Limit visibility of match_driver flag to PCI core Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices" PCI: Print the actual delay time in pci_bridge_wait_for_secondary_bus() PCI: Use PCI_STD_NUM_BARS instead of 6 PCI: Remove pci_fixup_cardbus() # Conflicts: # drivers/pci/pci.h
2025-06-04Merge branch 'pci/devres'Bjorn Helgaas
- Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and unnecessary (Philipp Stanner) - Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and related flags since all uses have been removed (Philipp Stanner) - Rework devres 'request' functions so they are no longer 'hybrid', i.e., their behavior no longer depends on whether pcim_enable_device or pci_enable_device() was used, and remove related code (Philipp Stanner) * pci/devres: PCI: Remove function pcim_intx() prototype from pci.h PCI: Remove hybrid-devres usage warnings from kernel-doc PCI: Remove redundant set of request functions PCI: Remove exclusive requests flags from _pcim_request_region() PCI: Remove pcim_request_region_exclusive() Documentation/driver-api: Update pcim_enable_device() PCI: Remove hybrid devres nature from request functions PCI: Remove pcim_iounmap_regions() mtip32xx: Remove unnecessary pcim_iounmap_regions() calls
2025-06-04Merge branch 'pci/bwctrl'Bjorn Helgaas
- Simplify link bandwidth controller by replacing the count of Link Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag (Ilpo Järvinen) - Update the Link Speed after retraining, since the Link Speed may have changed (Ilpo Järvinen) * pci/bwctrl: PCI: Update Link Speed after retraining PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
2025-05-23PCI/AER: Add sysfs attributes for log ratelimitsJon Pan-Doh
Allow userspace to read/write log ratelimits per device (including enable/disable). Create aer/ sysfs directory to store them and any future AER configs. The new sysfs files are: /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so if we try to emit more than 10 messages in a 5 second period, some are suppressed. Update AER sysfs ABI filename to reflect the broader scope of AER sysfs attributes (e.g. stats and ratelimits). Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats -> sysfs-bus-pci-devices-aer Tested using aer-inject[1]. Configured correctable log ratelimit to 5. Sent 6 AER errors. Observed 5 errors logged while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6. Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors logged and accounted in AER stats (12 total errors). [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git [bhelgaas: note fatal errors are not ratelimited, "aer_report" -> "aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
2025-05-23PCI/AER: Ratelimit correctable and non-fatal error loggingJon Pan-Doh
Spammy devices can flood kernel logs with AER errors and slow/stall execution. Add per-device ratelimits for AER correctable and non-fatal uncorrectable errors that use the kernel defaults (10 per 5s). Logging of fatal errors is not ratelimited. There are two AER logging entry points: - aer_print_error() is used by DPC and native AER - pci_print_aer() is used by GHES and CXL The native AER aer_print_error() case includes a loop that may log details from multiple devices, which are ratelimited individually. If we log details for any device, we also log the Error Source ID from the Root Port or RCEC. If no such device details are found, we still log the Error Source from the ERR_* Message, ratelimited by the Root Port or RCEC that received it. The DPC aer_print_error() case is not ratelimited, since this only happens for fatal errors. The CXL pci_print_aer() case is ratelimited by the Error Source device. The GHES pci_print_aer() case is via aer_recover_work_func(), which searches for the Error Source device. If the device is not found, there's no per-device ratelimit, so we use a system-wide ratelimit that covers all error types (correctable, non-fatal, and fatal). Sargun at Meta reported internally that a flood of AER errors causes RCU CPU stall warnings and CSD-lock warnings. Tested using aer-inject[1]. Sent 11 AER errors. Observed 10 errors logged while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) show true count of 11. [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git [bhelgaas: commit log, factor out trace_aer_event() and aer_print_rp_info() changes to previous patches, enable Error Source logging if any downstream detail will be printed, don't ratelimit fatal errors, "aer_report" -> "aer_info", "cor_log_ratelimit" -> "correctable_ratelimit", "uncor_log_ratelimit" -> "nonfatal_ratelimit"] Reported-by: Sargun Dhillon <sargun@meta.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-19-helgaas@kernel.org
2025-05-23PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to indexBjorn Helgaas
Previously aer_get_device_error_info() and aer_print_error() took a pointer to struct aer_err_info and a pointer to a pci_dev. Typically the pci_dev was one of the elements of the aer_err_info.dev[] array (DPC was an exception, where the dev[] array was unused). Convert aer_get_device_error_info() and aer_print_error() to take an index into the aer_err_info.dev[] array instead. A future patch will add per-device ratelimit information, so the index makes it convenient to find the ratelimit associated with the device. To accommodate DPC, set info->dev[0] to the DPC port before using these interfaces. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-17-helgaas@kernel.org
2025-05-23PCI/ERR: Add printk level to pcie_print_tlp_log()Bjorn Helgaas
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/ etc) that depends on the kind of error, and it calls pcie_print_tlp_log(), which previously always produced output at KERN_ERR. Add a "level" parameter so aer_print_error() can control the level of the pcie_print_tlp_log() output to match. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-14-helgaas@kernel.org
2025-05-23PCI/AER: Check log level once and remember itKarolina Stolarek
When reporting an AER error, we check its type multiple times to determine the log level for each message. Do this check only in the top-level functions (aer_isr_one_error(), pci_print_aer()) and save the level in struct aer_err_info. [bhelgaas: save log level in struct aer_err_info instead of passing it as a parameter] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-13-helgaas@kernel.org
2025-05-22PCI: Remove function pcim_intx() prototype from pci.hPhilipp Stanner
The subsystem-internal header pci.h still contains the function prototype of pcim_intx(), which has since been made public in the global header. Remove the redundant function prototype. Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250522084626.150148-2-phasta@kernel.org
2025-05-19PCI: Remove pcim_request_region_exclusive()Philipp Stanner
pcim_request_region_exclusive() was only needed for redirecting the relatively exotic exclusive request functions in pci.c in case of them operating in managed mode. The managed nature has been removed from those functions and no one else uses pcim_request_region_exclusive(). Remove pcim_request_region_exclusive(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-5-phasta@kernel.org
2025-05-19PCI: Remove hybrid devres nature from request functionsPhilipp Stanner
All functions based on __pci_request_region() and its release counter part support "hybrid mode", where the functions become managed if the PCI device was enabled with pcim_enable_device(). Removing this undesirable feature requires to remove all users who activated their device with that function and use one of the affected request functions. These users were: ASoC alsa cardreader cirrus i2c mmc mtd mtd mxser net spi vdpa vmwgfx all of which have been ported to always-managed pcim_ functions by now. The hybrid nature can, thus, be removed from the aforementioned PCI functions. Remove all function guards and documentation in pci.c related to the hybrid redirection. Adjust the visibility of pcim_release_region(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-3-phasta@kernel.org
2025-05-15PCI: Limit visibility of match_driver flag to PCI coreLukas Wunner
Since commit 58d9a38f6fac ("PCI: Skip attaching driver in device_add()"), PCI enumeration is split into two steps: In the first step, all devices are published in sysfs with device_add(). In the second step, drivers are bound to the devices with device_attach(). To delay driver binding until the second step, a "bool match_driver" in struct pci_dev is used. Instead of a bool, use a bit in the "unsigned long priv_flags" to shrink struct pci_dev a little and prevent use of the bool outside the PCI core (as has happened with commit cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices")). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/d22a9e5b81d6bd8dd1837607d6156679b3b1199c.1745572340.git.lukas@wunner.de
2025-05-15PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flagIlpo Järvinen
PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-05-05PCI: Explicitly put devices into D0 when initializingMario Limonciello
AMD BIOS team has root caused an issue that NVMe storage failed to come back from suspend to a lack of a call to _REG when NVMe device was probed. 112a7f9c8edbf ("PCI/ACPI: Call _REG when transitioning D-states") added support for calling _REG when transitioning D-states, but this only works if the device actually "transitions" D-states. 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices") added support for runtime PM on PCI devices, but never actually 'explicitly' sets the device to D0. To make sure that devices are in D0 and that platform methods such as _REG are called, explicitly set all devices into D0 during initialization. Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Denis Benato <benato.denis96@gmail.com> Tested-By: Yijun Shen <Yijun_Shen@Dell.com> Tested-By: David Perry <david.perry@amd.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://patch.msgid.link/20250424043232.1848107-1-superm1@kernel.org
2025-04-19PCI: of: Add of_pci_get_equalization_presets() APIKrishna Chaitanya Chundru
PCIe equalization presets are predefined settings used to optimize signal integrity by compensating for signal loss and distortion in high-speed data transmission. As per PCIe spec 6.0.1 revision section 8.3.3.3 & 4.2.4 for data rates of 8.0 GT/s, 16.0 GT/s, 32.0 GT/s, and 64.0 GT/s, there is a way to configure lane equalization presets for each lane to enhance the PCIe link reliability. Each preset value represents a different combination of pre-shoot and de-emphasis values. For each data rate, different registers are defined: for 8.0 GT/s, registers are defined in section 7.7.3.4; for 16.0 GT/s, in section 7.7.5.9, etc. The 8.0 GT/s rate has an extra receiver preset hint, requiring 16 bits per lane, while the remaining data rates use 8 bits per lane. Based on the number of lanes and the supported data rate, of_pci_get_equalization_presets() reads the device tree property and stores in the presets structure. Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://patch.msgid.link/20250328-preset_v6-v9-2-22cfa0490518@oss.qualcomm.com
2025-04-15PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus ResetLukas Wunner
When a Secondary Bus Reset is issued at a hotplug port, it causes a Data Link Layer State Changed event as a side effect. On hotplug ports using in-band presence detect, it additionally causes a Presence Detect Changed event. These spurious events should not result in teardown and re-enumeration of the device in the slot. Hence commit 2e35afaefe64 ("PCI: pciehp: Add reset_slot() method") masked the Presence Detect Changed Enable bit in the Slot Control register during a Secondary Bus Reset. Commit 06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset") additionally masked the Data Link Layer State Changed Enable bit. However masking those bits only disables interrupt generation (PCIe r6.2 sec 6.7.3.1). The events are still visible in the Slot Status register and picked up by the IRQ handler if it runs during a Secondary Bus Reset. This can happen if the interrupt is shared or if an unmasked hotplug event occurs, e.g. Attention Button Pressed or Power Fault Detected. The likelihood of this happening used to be small, so it wasn't much of a problem in practice. That has changed with the recent introduction of bandwidth control in v6.13-rc1 with commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller"): Bandwidth control shares the interrupt with PCIe hotplug. A Secondary Bus Reset causes a Link Bandwidth Notification, so the hotplug IRQ handler runs, picks up the masked events and tears down the device in the slot. As a result, Joel reports VFIO passthrough failure of a GPU, which Ilpo root-caused to the incorrect handling of masked hotplug events. Clearly, a more reliable way is needed to ignore spurious hotplug events. For Downstream Port Containment, a new ignore mechanism was introduced by commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC"). It has been working reliably for the past four years. Adapt it for Secondary Bus Resets. Introduce two helpers to annotate code sections which cause spurious link changes: pci_hp_ignore_link_change() and pci_hp_unignore_link_change() Use those helpers in lieu of masking interrupts in the Slot Control register. Introduce a helper to check whether such a code section is executing concurrently and if so, await it: pci_hp_spurious_link_change() Invoke the helper in the hotplug IRQ thread pciehp_ist(). Re-use the IRQ thread's existing code which ignores DPC-induced link changes unless the link is unexpectedly down after reset recovery or the device was replaced during the bus reset. That code block in pciehp_ist() was previously only executed if a Data Link Layer State Changed event has occurred. Additionally execute it for Presence Detect Changed events. That's necessary for compatibility with PCIe r1.0 hotplug ports because Data Link Layer State Changed didn't exist before PCIe r1.1. DPC was added with PCIe r3.1 and thus DPC-capable hotplug ports always support Data Link Layer State Changed events. But the same cannot be assumed for Secondary Bus Reset, which already existed in PCIe r1.0. Secondary Bus Reset is only one of many causes of spurious link changes. Others include runtime suspend to D3cold, firmware updates or FPGA reconfiguration. The new pci_hp_{,un}ignore_link_change() helpers may be used by all kinds of drivers to annotate such code sections, hence their declarations are publicly visible in <linux/pci.h>. A case in point is the Mellanox Ethernet driver which disables a firmware reset feature if the Ethernet card is attached to a hotplug port, see commit 3d7a3f2612d7 ("net/mlx5: Nack sync reset request when HotPlug is enabled"). Going forward, PCIe hotplug will be able to cope gracefully with all such use cases once the code sections are properly annotated. The new helpers internally use two bits in struct pci_dev's priv_flags as well as a wait_queue. This mirrors what was done for DPC by commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC"). That may be insufficient if spurious link changes are caused by multiple sources simultaneously. An example might be a Secondary Bus Reset issued by AER during FPGA reconfiguration. If this turns out to happen in real life, support for it can easily be added by replacing the PCI_LINK_CHANGING flag with an atomic_t counter incremented by pci_hp_ignore_link_change() and decremented by pci_hp_unignore_link_change(). Instead of awaiting a zero PCI_LINK_CHANGING flag, the pci_hp_spurious_link_change() helper would then simply await a zero counter. Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Joel Mathew Thomas <proxy0@tutamail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/d04deaf49d634a2edf42bf3c06ed81b4ca54d17b.1744298239.git.lukas@wunner.de
2025-04-09PCI/MSI: Provide a sane mechanism for TPHThomas Gleixner
The PCI/TPH driver fiddles with the MSI-X control word of an active interrupt completely unserialized against concurrent operations issued from the interrupt core. It also brings the PCI/MSI-X internal cached control word out of sync. Provide a function, which has the required serialization and keeps the control word cache in sync. Unfortunately this requires to look up and lock the interrupt descriptor, which should be only done in the interrupt core code. But confining this particular oddity in the PCI/MSI core is the lesser of all evil. A interrupt core implementation would require a larger pile of infrastructure and indirections for dubious value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/all/20250319105506.683663807@linutronix.de
2025-03-27Merge branch 'pci/devtree-create'Bjorn Helgaas
- Add device_add_of_node() to set dev->of_node and dev->fwnode only if they haven't been set already (Herve Codina) - Allow of_pci_set_address() to set the DT address property for root bus nodes, where there is no PCI bridge to supply the PCI bus/device/function part of the property (Herve Codina) - Create DT nodes for PCI host bridges to enable loading device tree overlays to create platform devices for PCI devices that have several features that require multiple drivers (Herve Codina) * pci/devtree-create: PCI: of: Create device tree PCI host bridge node PCI: of_property: Constify parameter in of_pci_get_addr_flags() PCI: of_property: Add support for NULL pdev in of_pci_set_address() PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device driver core: Introduce device_{add,remove}_of_node()
2025-03-27Merge branch 'pci/resource'Bjorn Helgaas
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo Järvinen) - Fix typo that repeatedly distributed resources to a bridge instead of iterating over subordinate bridges, which resulted in too little space to assign some BARs (Kai-Heng Feng) - Relax bridge window tail sizing for optional resources, e.g., IOV BARs, to avoid failures when removing and re-adding devices (Ilpo Järvinen) - Fix a double counting error for I/O resources, as we previously did for memory resources (Ilpo Järvinen) - Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen) - Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen) - Add pci_resource_num() to look up the BAR number from the resource pointer (Ilpo Järvinen) - Add restore_dev_resource() to simplify code that resources saved device resources (Ilpo Järvinen) - Allow drivers to enable devices even if we haven't assigned optional IOV resources to them (Ilpo Järvinen) - Improve debug output during resource reallocation (Ilpo Järvinen) - Rework handling of optional resources (IOV BARs, ROMs) to reduce failures if we can't allocate them (Ilpo Järvinen) - Move declarations of pci_rescan_bus_bridge_resize(), pci_reassign_bridge_resources(), and CardBus-related sizes from include/linux/pci.h to drivers/pci/pci.h since they're not used outside the PCI core (Ilpo Järvinen) - Make pci_setup_bridge() static (Ilpo Järvinen) - Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory) - Fix s390 mmio_read/write syscalls, which didn't cause page faults in some cases, which broke vfio-pci lazy mapping on first access (Niklas Schnelle) - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was disabled only for s390 (Niklas Schnelle) - Support mmap of PCI resources on s390 except for ISM devices (Niklas Schnelle) * pci/resource: s390/pci: Support mmap() of PCI resources except for ISM devices s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP s390/pci: Fix s390_mmio_read/write syscall page fault handling PCI: Fix NULL dereference in SR-IOV VF creation error path PCI: Move cardbus IO size declarations into pci/pci.h PCI: Make pci_setup_bridge() static PCI: Move resource reassignment func declarations into pci/pci.h PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h PCI: Fix BAR resizing when VF BARs are assigned PCI: Do not claim to release resource falsely PCI: Increase Resizable BAR support from 512 GB to 128 TB PCI: Rework optional resource handling PCI: Perform reset_resource() and build fail list in sync PCI: Use res->parent to check if resource is assigned PCI: Add debug print when releasing resources before retry PCI: Indicate optional resource assignment failures PCI: Always have realloc_head in __assign_resources_sorted() PCI: Extend enable to check for any optional resource PCI: Add restore_dev_resource() PCI: Remove incorrect comment from pci_reassign_resource() PCI: Consolidate assignment loop next round preparation PCI: Rename retval to ret PCI: Use while loop and break instead of gotos PCI: Refactor pdev_sort_resources() & __dev_sort_resources() PCI: Converge return paths in __assign_resources_sorted() PCI: Add dev & res local variables to resource assignment funcs PCI: Add pci_resource_num() helper PCI: Check resource_size() separately PCI: Add pci_resource_is_iov() to identify IOV resources PCI: Use resource_set_{range,size}() helpers PCI: Use SZ_* instead of literals in setup-bus.c PCI: Fix old_size lower bound in calculate_iosize() too PCI: Allow relaxed bridge window tail sizing for optional resources PCI: Simplify size1 assignment logic PCI: Use min_align, not unrelated add_align, for size0 PCI: Remove add_align overwrite unrelated to size0 PCI: Use downstream bridges for distributing resources PCI: Cleanup dev->resource + resno to use pci_resource_n()
2025-03-27Merge branch 'pci/enumeration'Bjorn Helgaas
- Enable Configuration RRS SV early instead of during child bus scanning (Bjorn Helgaas) - Cache offset of Resizable BAR capability to avoid redundant searches for it (Bjorn Helgaas) - Fix reference leaks in pci_register_host_bridge() and pci_alloc_child_bus() (Ma Ke) - Drop put_device() in pci_register_host_bridge() left over from converting device_register() to device_add() (Dan Carpenter) * pci/enumeration: PCI: Remove stray put_device() in pci_register_host_bridge() PCI: Fix reference leak in pci_alloc_child_bus() PCI: Fix reference leak in pci_register_host_bridge() PCI: Cache offset of Resizable BAR capability PCI: Enable Configuration RRS SV early
2025-03-27Merge branch 'pci/doe'Bjorn Helgaas
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair Francis) - Expose supported DOE features via sysfs (Alistair Francis) - Allow DOE support to be enabled even if CXL isn't enabled (Alistair Francis) * pci/doe: PCI/DOE: Allow enabling DOE without CXL PCI/DOE: Expose DOE features via sysfs PCI/DOE: Rename Discovery Response Data Object Contents to type PCI/DOE: Rename DOE protocol to feature
2025-03-27Merge branch 'pci/devres'Bjorn Helgaas
- Enlarge the devres table[] to accommodate bridge windows, ROM, IOV BARs, etc (Philipp Stanner) - Validate BAR index in devres interfaces (Philipp Stanner) * pci/devres: PCI: Check BAR index for validity PCI: Fix wrong length of devres array
2025-03-21PCI/DOE: Expose DOE features via sysfsAlistair Francis
PCIe r6.0 added support for Data Object Exchange (DOE). When DOE is supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec 6.30.1.1. DOE allows a requester to obtain information about the other DOE features supported by the device. The kernel already queries the DOE features supported and caches the values. Expose the values in sysfs to allow user space to determine which DOE features are supported by the PCIe device. By exposing the information to userspace, tools like lspci can relay the information to users. By listing all of the supported features we can allow userspace to parse the list, which might include vendor specific features as well as yet to be supported features. As the DOE Discovery feature must always be supported we treat it as a special named attribute case. This allows the usual PCI attribute_group handling to correctly create the doe_features directory when registering pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group() will seg fault). After this patch is supported you can see something like this when attaching a DOE device: $ ls /sys/devices/pci0000:00/0000:00:02.0//doe* 0001:01 0001:02 doe_discovery Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> [bhelgaas: drop pci_doe_sysfs_init() stub return, make DEVICE_ATTR_RO(doe_discovery) static] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-20PCI: Move cardbus IO size declarations into pci/pci.hIlpo Järvinen
For some reason, cardbus related io/mem size declarations are in linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h. Move all them into one place in pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move resource reassignment func declarations into pci/pci.hIlpo Järvinen
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used outside of the PCI subsystem. They seem to be naturally static functions but since resource fitting/assignment is split between setup-bus.c and setup-res.c, they fall into different sides of the divide and need to be declared. Move the declarations of pci_reassign_bridge_resources() and pci_reassign_resource() into pci/pci.h to keep them internal to PCI subsystem. Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.hIlpo Järvinen
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem. The comment also falsely advertises it to be for hotplug drivers, yet the only caller is from sysfs store function. Move the function declaration into pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-14PCI: Check BAR index for validityPhilipp Stanner
Many functions in PCI use accessor macros such as pci_resource_len(), which take a BAR index. That index, however, is never checked for validity, potentially resulting in undefined behavior by overflowing the array pci_dev.resource in the macro pci_resource_n(). Since many users of those macros directly assign the accessed value to an unsigned integer, the macros cannot be changed easily anymore to return -EINVAL for invalid indexes. Consequently, the problem has to be mitigated in higher layers. Add pci_bar_index_valid(). Use it where appropriate. Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org Closes: https://lore.kernel.org/all/adb53b1f-29e1-3d14-0e61-351fd2d3ff0d@linux.intel.com/ Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: correct if-statement condition the pci_bar_index_is_valid() helper function uses, tidy up code comments] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: fix typo] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-10PCI: Cache offset of Resizable BAR capabilityBjorn Helgaas
Previously most resizable BAR interfaces (pci_rebar_get_possible_sizes(), pci_rebar_set_size(), etc) as well as pci_restore_state() searched config space for a Resizable BAR capability. Most devices don't have such a capability, so this is wasted effort, especially for pci_restore_state(). Search for a Resizable BAR capability once at enumeration-time and cache the offset so we don't have to search every time we need it. No functional change intended. Link: https://lore.kernel.org/r/20250215000301.175097-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-02-28PCI: of: Create device tree PCI host bridge nodeHerve Codina
PCI devices device tree nodes can be already created. This was introduced by commit 407d1a51921e ("PCI: Create device tree node for bridge"). In order to have device tree nodes related to PCI devices attached on their PCI root bus (the PCI bus handled by the PCI host bridge), a PCI root bus device tree node is needed. This root bus node will be used as the parent node of the first level devices scanned on the bus. On device tree based systems, this PCI root bus device tree node is set to the node of the related PCI host bridge. The PCI host bridge node is available in the device tree used to describe the hardware passed at boot. On non device tree based system (such as ACPI), a device tree node for the PCI host bridge or for the root bus does not exist. Indeed, the PCI host bridge is not described in a device tree used at boot simply because no device tree is passed at boot. The device tree PCI host bridge node creation needs to be done at runtime. This is done in the same way as for the creation of the PCI device nodes. I.e. node and properties are created based on computed information done by the PCI core. Also, as is done on device tree based systems, this PCI host bridge node is used for the PCI root bus. With this done, hardware available in a PCI device that doesn't follow the PCI model consisting in one PCI function handled by one driver can be described by a device tree overlay loaded by the PCI device driver on non device tree based systems. Those PCI devices provide a single PCI function that includes several functionalities that require different drivers. The device tree overlay describes the internal devices and their relationships. It allows to load drivers needed by those different devices in order to have functionalities handled. Link: https://lore.kernel.org/r/20250224141356.36325-6-herve.codina@bootlin.com Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-21PCI/ERR: Handle TLP Log in Flit modeIlpo Järvinen
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is presented through AER and DPC Capability registers. The TLP Prefix Log Register is not present with Flit mode, and the register becomes an extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13). Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the extended TLP Header Log when the Link is in Flit mode. As the Prefix Log and Extended TLP Header are not present at the same time, a C union can be used. Determining whether the error occurred while the Link was in Flit mode is a bit complicated. In case of AER, the Advanced Error Capabilities and Control Register directly tells whether the error was logged in Flit mode or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14), unfortunately, does not contain the same information. Unlike AER, the DPC Capability does not provide a way to discern whether the error was logged in Flit mode (this is confirmed by PCI WG to be an oversight in the spec). DPC will bring the Link down immediately following an error, which makes it impossible to acquire the Flit Mode Status directly from the Link Status 2 register because Flit Mode Status is only set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use the flit_mode value stored into the struct pci_bus. Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21PCI: Track Flit Mode Status & print it with link statusIlpo Järvinen
PCIe r6.0 added Flit mode, which mainly alters HW behavior, but there are some OS visible changes. The OS visible changes include differences in the layout of some capabilities and interpretation of the TLP headers (in diagnostics situations). To be able to determine which mode the PCIe Link is using, store the Flit Mode Status (PCIe r6.1 sec 7.5.3.20) information in addition to the Link speed into struct pci_bus in pcie_update_link_speed(). Link: https://lore.kernel.org/r/20250207161836.2755-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: use unsigned int:1 instead of bool, update flit_mode setting] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-18PCI: Extend enable to check for any optional resourceIlpo Järvinen
pci_enable_resources() checks if device's io and mem resources are all assigned and disallows enable if any resource failed to assign (*) but makes an exception for the case of disabled extension ROM. There are other optional resources, however. Add pci_resource_is_optional() and use it instead of pci_resource_is_disabled_rom() to cover also IOV resources that are also optional as per pbus_size_mem(). As there will be more users of pci_resource_is_optional() inside setup-bus.c in changes coming up after this one, the function is placed there. (*) In practice, resource fitting code calls reset_resource() for any resource it fails to assign which clears resource's ->flags causing pci_enable_resources() to never detect failed resource assignments. This seems undesirable internal logic inconsistency, effectively reset_resource() prevents pci_enable_resources() from functioning as intended. This is one step of many that will be needed towards removing reset_resource(). Link: https://lore.kernel.org/r/20241216175632.4175-20-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18PCI: Add pci_resource_num() helperIlpo Järvinen
A few places in PCI code, mainly in setup-bus.c, need to reverse lookup the index of a resource in pci_dev's resource array. Create pci_resource_num() helper to avoid repeating the pointer arithmetic trick used to calculate the index. Link: https://lore.kernel.org/r/20241216175632.4175-11-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18PCI: Add pci_resource_is_iov() to identify IOV resourcesMichał Winiarski
There are multiple places where special handling is required for IOV resources. Extract the identification of IOV resources to pci_resource_is_iov() and drop a few ifdefs. Link: https://lore.kernel.org/r/20241216175632.4175-9-ilpo.jarvinen@linux.intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-01-23Merge branch 'pci/pci-sysfs'Bjorn Helgaas
- Move reset related sysfs code from pci.c to pci-sysfs.c where other similar code lives (Ilpo Järvinen) - Simplify reset_method_store() memory management by using __free() instead of explicit kfree() cleanup (Ilpo Järvinen) - Drop unnecessary zero initializer (Ilpo Järvinen) * pci/pci-sysfs: PCI/sysfs: Remove unnecessary zero in initializer PCI/sysfs: Use __free() in reset_method_store() PCI/sysfs: Move reset related sysfs code to correct file
2025-01-23Merge branch 'pci/of'Bjorn Helgaas
- Unexport of_pci_parse_bus_range() since it's only used in of.c (Bjorn Helgaas) - Drop 'No bus range found' message so we don't complain when DTs don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas) - Simplify devm_of_pci_get_host_bridge_resources() interface by dropping parameters that are always the same default values (Bjorn Helgaas) - Update comment reference to of_pci_get_host_bridge_resources(), which no longer exists (Bjorn Helgaas) - Rename the drivers/pci/of_property.c struct of_pci_range to of_pci_range_entry to avoid confusion with the global of_pci_range in include/linux/of_address.h (Bjorn Helgaas) * pci/of: PCI: of_property: Rename struct of_pci_range to of_pci_range_entry sparc/PCI: Update reference to devm_of_pci_get_host_bridge_resources() PCI: of: Simplify devm_of_pci_get_host_bridge_resources() interface PCI: of: Drop 'No bus range found' message PCI: Unexport of_pci_parse_bus_range()
2025-01-23Merge branch 'pci/err'Bjorn Helgaas
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging rather than building their own (Ilpo Järvinen) - Move TLP Log handling to its own file (Ilpo Järvinen) - Add #defines for TLP Header/Prefix log sizes (Ilpo Järvinen) - Store number of supported End-End TLP Prefixes always so we can read the correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen) - Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log() (Ilpo Järvinen) - Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix Log (Ilpo Järvinen) * pci/err: PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log PCI: Add TLP Prefix reading to pcie_read_tlp_log() PCI: Store number of supported End-End TLP Prefixes PCI: Use unsigned int i in pcie_read_tlp_log() PCI: Use same names in pcie_read_tlp_log() prototype and definition PCI: Add defines for TLP Header/Prefix log sizes PCI: Move TLP Log handling to its own file PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
2025-01-23PCI: Batch BAR sizing operationsAlex Williamson
Toggling memory enable is free on bare metal, but potentially expensive in virtualized environments as the device MMIO spaces are added and removed from the VM address space, including DMA mapping of those spaces through the IOMMU where peer-to-peer is supported. Currently memory decode is disabled around sizing each individual BAR, even for SR-IOV BARs while VF Enable is cleared. This can be better optimized for virtual environments by sizing a set of BARs at once, stashing the resulting mask into an array, while only toggling memory enable once. This also naturally improves the SR-IOV path as the caller becomes responsible for any necessary decode disables while sizing BARs, therefore SR-IOV BARs are sized relying only on the VF Enable rather than toggling the PF memory enable in the command register. Link: https://lore.kernel.org/r/20250120182202.1878581-1-alex.williamson@redhat.com Reported-by: Mitchell Augustin <mitchell.augustin@canonical.com> Link: https://lore.kernel.org/r/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com> Reviewed-by: Mitchell Augustin <mitchell.augustin@canonical.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-16PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix LogIlpo Järvinen
Add pcie_print_tlp_log() to print TLP Header and Prefix Log. Print End-End Prefixes only if they are non-zero. Consolidate the few places which currently print TLP using custom formatting. Link: https://lore.kernel.org/r/20250114170840.1633-9-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-16PCI: Add TLP Prefix reading to pcie_read_tlp_log()Ilpo Järvinen
pcie_read_tlp_log() handles only 4 Header Log DWORDs but TLP Prefix Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present. Generalize pcie_read_tlp_log() and struct pcie_tlp_log to also handle TLP Prefix Log. The relevant registers are formatted identically in AER and DPC Capability, but has these variations: a) The offsets of TLP Prefix Log registers vary. b) DPC RP PIO TLP Prefix Log register can be < 4 DWORDs. c) AER TLP Prefix Log Present (PCIe r6.1 sec 7.8.4.7) can indicate Prefix Log is not present. Therefore callers must pass the offset of the TLP Prefix Log register and the entire length to pcie_read_tlp_log() to be able to read the correct number of TLP Prefix DWORDs from the correct offset. Link: https://lore.kernel.org/r/20250114170840.1633-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash ternary fix from https://lore.kernel.org/r/20250116172019.88116-1-colin.i.king@gmail.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-15PCI: Unexport of_pci_parse_bus_range()Bjorn Helgaas
of_pci_parse_bus_range() is only used in drivers/pci/of.c, so make it static and unexport it. Link: https://lore.kernel.org/r/20250113231557.441289-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2025-01-15PCI/sysfs: Move reset related sysfs code to correct fileIlpo Järvinen
Most PCI sysfs code and structs are in a dedicated file but a few reset related things remain in pci.c. Move also them to pci-sysfs.c and drop pci_dev_reset_method_attr_is_visible() as it is 100% duplicate of pci_dev_reset_attr_is_visible(). Link: https://lore.kernel.org/r/20241028174046.1736-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-01-14PCI: Move TLP Log handling to its own fileIlpo Järvinen
TLP Log is a PCIe feature and is processed only by AER and DPC. Configwise, DPC depends AER being enabled. In lack of better place, the TLP Log handling code was initially placed into pci.c but it can be easily placed in a separate file. Move TLP Log handling code to its own file under pcie/ subdirectory and include it only when AER is enabled. Link: https://lore.kernel.org/r/20250114170840.1633-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14PCI: Don't expose pcie_read_tlp_log() outside PCI subsystemIlpo Järvinen
pcie_read_tlp_log() was exposed by the commit 0a5a46a6a61b ("PCI/AER: Generalize TLP Header Log reading") with the intent that drivers could use it, but the PCI maintainer later decided that drivers should be encouraged to use PCI core diagnostic logging of generic AER registers rather than building their own. Drivers that currently implement their own diagnostic logging include ixgbe (ixgbe_io_error_detected()) and iwlwifi (iwl_trans_pcie_dump_regs()). Remove the unwanted EXPORT of pcie_read_tlp_log() and remove it from include/linux/aer.h. Link: https://lore.kernel.org/r/20250114170840.1633-2-ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/all/20240322193011.GA701027@bhelgaas/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2024-11-25Merge branch 'pci/tph'Bjorn Helgaas
- Add and document TLP Processing Hints (TPH) support so drivers can enable and disable TPH and the kernel can save/restore TPH configuration (Wei Huang) - Add TPH Steering Tag support so drivers can retrieve Steering Tag values associated with specific CPUs via an ACPI _DSM to direct DMA writes closer to their consumers (Wei Huang) * pci/tph: PCI/TPH: Add TPH documentation PCI/TPH: Add Steering Tag support PCI: Add TLP Processing Hints (TPH) support
2024-11-25Merge branch 'pci/reset'Bjorn Helgaas
- Add sysfs 'reset_subordinate' to reset hierarchy below bridge (Keith Busch) - Warn if we reset a running device where driver didn't register pci_error_handlers notification callbacks (Keith Busch) * pci/reset: PCI: Warn if a running device is unaware of reset PCI: Add 'reset_subordinate' to reset hierarchy below bridge