summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-23arm64: Kconfig: switch to HAVE_PWRCTRLJohan Hovold
The HAVE_PWRCTRL symbol has been renamed to reflect the pwrctrl framework name. Switch to the non-deprecated symbol. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20250402132634.18065-5-johan+linaro@kernel.org
2025-05-23wifi: ath12k: switch to PCI_PWRCTRL_PWRSEQJohan Hovold
The PCI_PWRCTRL_PWRSEQ and HAVE_PWRCTRL symbols have been renamed to reflect the pwrctrl framework name. Switch to the non-deprecated symbols. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jeff Johnson <jjohnson@kernel.org> # drivers/net/wireless/ath/... Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20250402132634.18065-4-johan+linaro@kernel.org
2025-05-23wifi: ath11k: switch to PCI_PWRCTRL_PWRSEQJohan Hovold
The PCI_PWRCTRL_PWRSEQ and HAVE_PWRCTRL symbols have been renamed to reflect the pwrctrl framework name. Switch to the non-deprecated symbols. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jeff Johnson <jjohnson@kernel.org> # drivers/net/wireless/ath/... Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20250402132634.18065-3-johan+linaro@kernel.org
2025-05-23PCI/pwrctrl: Rename pwrctrl Kconfig symbols and slot moduleJohan Hovold
Commits b88cbaaa6fa1 ("PCI/pwrctrl: Rename pwrctl files to pwrctrl") and 3f925cd62874 ("PCI/pwrctrl: Rename pwrctrl functions and structures") renamed the "pwrctl" framework to "pwrctrl" for consistency reasons. Rename also the Kconfig symbols so that they reflect the new name while adding entries for the deprecated ones. The old symbols can be removed once everything that depends on them has been updated. Note that no deprecated symbol is added for the new slot driver to avoid having to add a user visible option. Rename the new slot module to reflect the framework name and match the other pwrctrl modules. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20250402132634.18065-2-johan+linaro@kernel.org
2025-05-23PCI/pwrctrl: Cancel outstanding rescan work when unregisteringBrian Norris
It's possible to trigger use-after-free here by: (a) forcing rescan_work_func() to take a long time and (b) utilizing a pwrctrl driver that may be unloaded for some reason Cancel outstanding work to ensure it is finished before we allow our data structures to be cleaned up. [bhelgaas: tidy commit log] Fixes: 8f62819aaace ("PCI/pwrctl: Rescan bus on a separate thread") Signed-off-by: Brian Norris <briannorris@google.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Konrad Dybcio <konradybcio@kernel.org> Link: https://patch.msgid.link/20250409115313.1.Ia319526ed4ef06bec3180378c9a008340cec9658@changeid
2025-05-23PCI/AER: Add sysfs attributes for log ratelimitsJon Pan-Doh
Allow userspace to read/write log ratelimits per device (including enable/disable). Create aer/ sysfs directory to store them and any future AER configs. The new sysfs files are: /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so if we try to emit more than 10 messages in a 5 second period, some are suppressed. Update AER sysfs ABI filename to reflect the broader scope of AER sysfs attributes (e.g. stats and ratelimits). Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats -> sysfs-bus-pci-devices-aer Tested using aer-inject[1]. Configured correctable log ratelimit to 5. Sent 6 AER errors. Observed 5 errors logged while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6. Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors logged and accounted in AER stats (12 total errors). [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git [bhelgaas: note fatal errors are not ratelimited, "aer_report" -> "aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
2025-05-23PCI/AER: Add ratelimits to PCI AER DocumentationJon Pan-Doh
Add ratelimits section for rationale and defaults. [bhelgaas: note fatal errors are not ratelimited] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://patch.msgid.link/20250522232339.1525671-20-helgaas@kernel.org
2025-05-23PCI/AER: Ratelimit correctable and non-fatal error loggingJon Pan-Doh
Spammy devices can flood kernel logs with AER errors and slow/stall execution. Add per-device ratelimits for AER correctable and non-fatal uncorrectable errors that use the kernel defaults (10 per 5s). Logging of fatal errors is not ratelimited. There are two AER logging entry points: - aer_print_error() is used by DPC and native AER - pci_print_aer() is used by GHES and CXL The native AER aer_print_error() case includes a loop that may log details from multiple devices, which are ratelimited individually. If we log details for any device, we also log the Error Source ID from the Root Port or RCEC. If no such device details are found, we still log the Error Source from the ERR_* Message, ratelimited by the Root Port or RCEC that received it. The DPC aer_print_error() case is not ratelimited, since this only happens for fatal errors. The CXL pci_print_aer() case is ratelimited by the Error Source device. The GHES pci_print_aer() case is via aer_recover_work_func(), which searches for the Error Source device. If the device is not found, there's no per-device ratelimit, so we use a system-wide ratelimit that covers all error types (correctable, non-fatal, and fatal). Sargun at Meta reported internally that a flood of AER errors causes RCU CPU stall warnings and CSD-lock warnings. Tested using aer-inject[1]. Sent 11 AER errors. Observed 10 errors logged while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) show true count of 11. [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git [bhelgaas: commit log, factor out trace_aer_event() and aer_print_rp_info() changes to previous patches, enable Error Source logging if any downstream detail will be printed, don't ratelimit fatal errors, "aer_report" -> "aer_info", "cor_log_ratelimit" -> "correctable_ratelimit", "uncor_log_ratelimit" -> "nonfatal_ratelimit"] Reported-by: Sargun Dhillon <sargun@meta.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-19-helgaas@kernel.org
2025-05-23PCI/AER: Simplify add_error_device()Bjorn Helgaas
Return -ENOSPC error early so the usual path through add_error_device() is the straightline code. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-18-helgaas@kernel.org
2025-05-23PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to indexBjorn Helgaas
Previously aer_get_device_error_info() and aer_print_error() took a pointer to struct aer_err_info and a pointer to a pci_dev. Typically the pci_dev was one of the elements of the aer_err_info.dev[] array (DPC was an exception, where the dev[] array was unused). Convert aer_get_device_error_info() and aer_print_error() to take an index into the aer_err_info.dev[] array instead. A future patch will add per-device ratelimit information, so the index makes it convenient to find the ratelimit associated with the device. To accommodate DPC, set info->dev[0] to the DPC port before using these interfaces. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-17-helgaas@kernel.org
2025-05-23PCI/AER: Rename struct aer_stats to aer_infoKarolina Stolarek
Update name to reflect the broader definition of structs/variables that are stored (e.g. ratelimits). This is a preparatory patch for adding rate limit support. [bhelgaas: "aer_report" -> "aer_info"] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-16-helgaas@kernel.org
2025-05-23PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNINGKarolina Stolarek
Some existing logs in pci_print_aer() log with error severity by default. Convert them to use KERN_WARNING for correctable errors and KERN_ERR for uncorrectable errors. [bhelgaas: commit log] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-15-helgaas@kernel.org
2025-05-23PCI/ERR: Add printk level to pcie_print_tlp_log()Bjorn Helgaas
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/ etc) that depends on the kind of error, and it calls pcie_print_tlp_log(), which previously always produced output at KERN_ERR. Add a "level" parameter so aer_print_error() can control the level of the pcie_print_tlp_log() output to match. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-14-helgaas@kernel.org
2025-05-23PCI/AER: Check log level once and remember itKarolina Stolarek
When reporting an AER error, we check its type multiple times to determine the log level for each message. Do this check only in the top-level functions (aer_isr_one_error(), pci_print_aer()) and save the level in struct aer_err_info. [bhelgaas: save log level in struct aer_err_info instead of passing it as a parameter] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-13-helgaas@kernel.org
2025-05-23PCI/AER: Trace error event before ratelimitingBjorn Helgaas
As with the AER statistics, we always want to emit trace events, even if the actual dmesg logging is rate limited. Call trace_aer_event() immediately after pci_dev_aer_stats_incr() so both happen before ratelimiting. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-12-helgaas@kernel.org
2025-05-23PCI/AER: Update statistics before ratelimitingBjorn Helgaas
There are two AER logging entry points: - aer_print_error() is used by DPC (dpc_process_error()) and native AER handling (aer_process_err_devices()). - pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL (cxl_handle_rdport_errors()) Both use __aer_print_error() to print the AER error bits. Previously __aer_print_error() also incremented the AER statistics via pci_dev_aer_stats_incr(). Call pci_dev_aer_stats_incr() early in the entry points instead of in __aer_print_error() so we update the statistics even if the actual printing of error bits is rate limited by a future change. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-11-helgaas@kernel.org
2025-05-23PCI/AER: Simplify pci_print_aer()Bjorn Helgaas
Simplify pci_print_aer() by initializing the struct aer_err_info "info" with a designated initializer list (it was previously initialized with memset()) and using pci_name(). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-10-helgaas@kernel.org
2025-05-23PCI/AER: Initialize aer_err_info before using itBjorn Helgaas
Previously the struct aer_err_info "e_info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "e_info" at declaration with a designated initializer list, which initializes the other members to zero. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-9-helgaas@kernel.org
2025-05-23PCI/AER: Move aer_print_source() earlier in fileBjorn Helgaas
Move aer_print_source() earlier in the file so a future change can use it from aer_print_error(), where it's easier to rate limit it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-8-helgaas@kernel.org
2025-05-23PCI/AER: Rename aer_print_port_info() to aer_print_source()Jon Pan-Doh
Rename aer_print_port_info() to aer_print_source() to be more descriptive. This logs the Error Source ID logged by a Root Port or Root Complex Event Collector when it receives an ERR_COR, ERR_NONFATAL, or ERR_FATAL Message. [bhelgaas: aer_print_rp_info() -> aer_print_source()] Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-7-helgaas@kernel.org
2025-05-23PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etcBjorn Helgaas
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number, device, and function number directly from the Error Source ID. There's no need to shift and mask it explicitly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-6-helgaas@kernel.org
2025-05-23PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()Bjorn Helgaas
Previously we decoded the AER Error Source ID in aer_isr_one_error_type(), then again in find_source_device() if we didn't find any devices with errors logged in their AER Capabilities. Consolidate this so we only decode and log the Error Source ID once in aer_isr_one_error_type(). Add a "found" parameter so we can add a note when we didn't find any downstream devices with errors logged in their AER Capability. This changes the dmesg logging when we found no devices with errors logged: - pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0 - pci 0000:00:01.0: AER: found no error details for 0000:02:00.0 + pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0 (no details found) Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-5-helgaas@kernel.org
2025-05-23PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()Bjorn Helgaas
aer_isr_one_error() duplicates the Error Source ID logging and AER error processing for Correctable Errors and Uncorrectable Errors. Factor out the duplicated code to aer_isr_one_error_type(). aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the Root Port or RCEC pci_dev pointer instead. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-4-helgaas@kernel.org
2025-05-23PCI/DPC: Log Error Source ID only when validBjorn Helgaas
DPC Error Source ID is only valid when the DPC Trigger Reason indicates that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL Message (PCIe r6.0, sec 7.9.14.5). When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE) or ERR_FATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) from a downstream device, log the Error Source ID (decoded into domain/bus/device/function). Don't print the source otherwise, since it's not valid. For DPC trigger due to reception of ERR_NONFATAL or ERR_FATAL, the dmesg logging changes: - pci 0000:00:01.0: DPC: containment event, status:0x000d source:0x0200 - pci 0000:00:01.0: DPC: ERR_FATAL detected + pci 0000:00:01.0: DPC: containment event, status:0x000d, ERR_FATAL received from 0000:02:00.0 and when DPC triggered for other reasons, where DPC Error Source ID is undefined, e.g., unmasked uncorrectable error: - pci 0000:00:01.0: DPC: containment event, status:0x0009 source:0x0200 - pci 0000:00:01.0: DPC: unmasked uncorrectable error detected + pci 0000:00:01.0: DPC: containment event, status:0x0009: unmasked uncorrectable error detected Previously the "containment event" message was at KERN_INFO and the "%s detected" message was at KERN_WARNING. Now the single message is at KERN_WARNING. Fixes: 26e515713342 ("PCI: Add Downstream Port Containment driver") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-3-helgaas@kernel.org
2025-05-23PCI/DPC: Initialize aer_err_info before using itBjorn Helgaas
Previously the struct aer_err_info "info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "info" at declaration so it starts as all zeros. Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-2-helgaas@kernel.org
2025-05-22PCI/ACPI: Fix allocated memory release on error in pci_acpi_scan_root()Zhe Qiao
In the pci_acpi_scan_root() function, when creating a PCI bus fails, we need to free up the previously allocated memory, which can avoid invalid memory usage and save resources. Fixes: 789befdfa389 ("arm64: PCI: Migrate ACPI related functions to pci-acpi.c") Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250430060603.381504-1-qiaozhe@iscas.ac.cn
2025-05-22PCI: Remove function pcim_intx() prototype from pci.hPhilipp Stanner
The subsystem-internal header pci.h still contains the function prototype of pcim_intx(), which has since been made public in the global header. Remove the redundant function prototype. Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250522084626.150148-2-phasta@kernel.org
2025-05-21PCI: rcar-gen4: Document how to obtain platform firmwareYoshihiro Shimoda
Renesas R-Car V4H (r8a779g0) has PCIe controller, and it requires specific firmware downloading. So, add a document about the firmware how to get. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> [kwilczynski: commit log, refactor the document content and then add this new file to a correct index under the top-level PCI documentation] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250507100947.608875-1-yoshihiro.shimoda.uh@renesas.com
2025-05-20PCI: dwc: ep: Fix errno typoNiklas Cassel
Fix errno typo in kernel-doc comments. Fixes: 7cbebc86c72a ("PCI: dwc: ep: Add Kernel-doc comments for APIs") Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250506095138.482485-2-cassel@kernel.org
2025-05-20PCI: Remove hybrid-devres usage warnings from kernel-docPhilipp Stanner
pci/iomap.c still contains warnings about those functions not behaving in a managed manner if pcim_enable_device() was called. Since all hybrid behavior that users could know about has been removed by now, those explicit warnings are no longer necessary. Remove the hybrid-devres usage warnings from the kernel-doc. Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-8-phasta@kernel.org
2025-05-20PCI: Remove redundant set of request functionsPhilipp Stanner
When the demangling of the hybrid devres functions within PCI was implemented, it was necessary to implement several PCI functions a second time to avoid cyclic calls, since the hybrid functions in pci.c call the managed functions in devres.c, which in turn can be directly used outside of PCI and needed request infrastructure, too. Therefore, __pcim_request_region_range(), __pci_release_region_range() and wrappers around them were implemented. The hybrid nature has recently been removed from all functions in pci.c. Therefore, the functions in devres.c can now directly use their counterparts in pci.c without causing a call-cycle. Remove __pcim_request_region_range(), __pcim_request_region_range() and the wrappers. Use the corresponding request functions from pci.c in devres.c Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-7-phasta@kernel.org
2025-05-20PCI: Remove exclusive requests flags from _pcim_request_region()Philipp Stanner
pcim_request_region_exclusive(), the only user in PCI devres that needed exclusive region requests, has been removed. All features related to exclusive requests can, therefore, be removed, too. Remove them. Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-6-phasta@kernel.org
2025-05-19PCI: Remove pcim_request_region_exclusive()Philipp Stanner
pcim_request_region_exclusive() was only needed for redirecting the relatively exotic exclusive request functions in pci.c in case of them operating in managed mode. The managed nature has been removed from those functions and no one else uses pcim_request_region_exclusive(). Remove pcim_request_region_exclusive(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-5-phasta@kernel.org
2025-05-19Documentation/driver-api: Update pcim_enable_device()Philipp Stanner
pcim_enable_device() is not related anymore to switching the mode of operation of any functions. It merely sets up a devres callback for automatically disabling the PCI device on driver detach. Adjust the function's documentation. Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-4-phasta@kernel.org
2025-05-19PCI: Remove hybrid devres nature from request functionsPhilipp Stanner
All functions based on __pci_request_region() and its release counter part support "hybrid mode", where the functions become managed if the PCI device was enabled with pcim_enable_device(). Removing this undesirable feature requires to remove all users who activated their device with that function and use one of the affected request functions. These users were: ASoC alsa cardreader cirrus i2c mmc mtd mtd mxser net spi vdpa vmwgfx all of which have been ported to always-managed pcim_ functions by now. The hybrid nature can, thus, be removed from the aforementioned PCI functions. Remove all function guards and documentation in pci.c related to the hybrid redirection. Adjust the visibility of pcim_release_region(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/r/20250519112959.25487-3-phasta@kernel.org
2025-05-16dt-bindings: PCI: microchip,pcie-host: Fix DMA coherency propertyConor Dooley
PolarFire SoC may be configured in a way that requires non-coherent DMA handling. On RISC-V, buses are coherent by default & the dma-noncoherent property is required to denote buses or devices that are non-coherent. For some reason, instead of adding dma-noncoherent to the binding the pointless, NOP, property dma-coherent was. Swap dma-coherent for dma-noncoherent. Fixes: 04aa999eb96fd ("dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250516-datebook-senator-ff7a1c30cbd5@spud
2025-05-15PCI: Update Link Speed after retrainingIlpo Järvinen
PCIe Link Retraining can alter Link Speed. pcie_retrain_link() that performs the Link Training is called from bwctrl and ASPM driver. While bwctrl listens for Link Bandwidth Management Status (LBMS) to pick up changes in Link Speed, there is a race between pcie_reset_lbms() clearing LBMS after the Link Training and pcie_bwnotif_irq() reading the Link Status register. If LBMS is already cleared when the irq handler reads the register, the interrupt handler will return early with IRQ_NONE and won't update the Link Speed. When Link Speed update originates from bwctrl, pcie_bwctrl_change_speed() ensures Link Speed is updated after the retraining. ASPM driver, however, calls pcie_retrain_link() but does not update the Link Speed after retraining which can result in stale Link Speed. Also, it is possible to have ASPM support with CONFIG_PCIEPORTBUS=n in which case bwctrl will not be built in (and thus won't update the Link Speed at all). To ensure Link Speed is not left stale after Link Training, move the call to pcie_update_link_speed() from pcie_bwctrl_change_speed() into pcie_retrain_link(). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/linux-pci/aBCjpfyYmlkJ12AZ@wunner.de Link: https://lore.kernel.org/r/20250514132821.15705-1-ilpo.jarvinen@linux.intel.com
2025-05-15PCI: Limit visibility of match_driver flag to PCI coreLukas Wunner
Since commit 58d9a38f6fac ("PCI: Skip attaching driver in device_add()"), PCI enumeration is split into two steps: In the first step, all devices are published in sysfs with device_add(). In the second step, drivers are bound to the devices with device_attach(). To delay driver binding until the second step, a "bool match_driver" in struct pci_dev is used. Instead of a bool, use a bit in the "unsigned long priv_flags" to shrink struct pci_dev a little and prevent use of the bool outside the PCI core (as has happened with commit cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices")). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/d22a9e5b81d6bd8dd1837607d6156679b3b1199c.1745572340.git.lukas@wunner.de
2025-05-15Revert "iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices"Lukas Wunner
Commit 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") changed IRQ handling on PCI driver probing. It inadvertently broke resume from system sleep on AMD platforms: https://lore.kernel.org/r/20150926164651.GA3640@pd.tnic/ This was fixed by two independent commits: * 8affb487d4a4 ("x86/PCI: Don't alloc pcibios-irq when MSI is enabled") * cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices") The breaking change and one of these two fixes were subsequently reverted: * fe25d078874f ("Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"") * 6c777e8799a9 ("Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"") This rendered the second fix unnecessary, so revert it as well. It used the match_driver flag in struct pci_dev, which is internal to the PCI core and not supposed to be touched by arbitrary drivers. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://patch.msgid.link/9a3ddff5cc49512044f963ba0904347bd404094d.1745572340.git.lukas@wunner.de
2025-05-15PCI: qcom-ep: Mask PTM_UPDATING interruptManivannan Sadhasivam
When PTM is enabled, PTM_UPDATING interrupt will be fired for each PTM context update, which will be once every 10ms in the case of auto context update. Since the interrupt is not strictly needed for making use of PTM, mask it to avoid the overhead of processing it. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/20250505-pcie-ptm-v4-4-02d26d51400b@linaro.org
2025-05-15PCI: dwc: Add debugfs support for PTM contextManivannan Sadhasivam
Synopsys Designware PCIe IPs support PTM capability as defined in the PCIe spec r6.0, sec 6.21. The PTM context information is exposed through Vendor Specific Extended Capability (VSEC) registers on supported controller implementation. Hence, add support for exposing these context information to userspace through the debugfs interface for the DWC controllers (both RC and EP). Currently, only Qcom controllers are supported. For adding support for other DWC vendor controllers, dwc_pcie_ptm_vsec_ids[] needs to be extended. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/20250505-pcie-ptm-v4-3-02d26d51400b@linaro.org
2025-05-15PCI: dwc: Pass DWC PCIe mode to dwc_pcie_debugfs_init()Manivannan Sadhasivam
Upcoming PTM debugfs interface relies on the DWC PCIe mode to expose the relevant debugfs attributes to userspace. So pass the mode to dwc_pcie_debugfs_init() API from host and ep drivers and save it in 'struct dw_pcie::mode'. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/20250505-pcie-ptm-v4-2-02d26d51400b@linaro.org
2025-05-15PCI: Add debugfs support for exposing PTM contextManivannan Sadhasivam
Precision Time Management (PTM) mechanism defined in PCIe spec r6.0, sec 6.21 allows precise coordination of timing information across multiple components in a PCIe hierarchy with independent local time clocks. PCI core already supports enabling PTM in the root port and endpoint devices through PTM Extended Capability registers. But the PTM context supported by the PTM capable components such as Root Complex (RC) and Endpoint (EP) controllers were not exposed as of now. Part of the reason is that the spec doesn't define how the context information is exposed to the software and left it to the vendor implementation. So there is no standardized way to get access to the context information and each vendor have defined their own way. This commit adds debugfs support to expose the PTM context to userspace from both PCIe RC and EP controllers. Since the context information is exposed in a vendor specific way, the debugfs interface allows the controller drivers to implement callbacks for each attribute, to be called by the generic PTM driver. The Controller drivers are expected to call pcie_ptm_create_debugfs() to create the debugfs attributes for the PTM context and call pcie_ptm_destroy_debugfs() to destroy them. The drivers should also populate the relevant callbacks in the 'struct pcie_ptm_ops' structure based on the controller implementation. Below PTM context are exposed through debugfs: PCIe RC ======= 1. PTM Local clock 2. PTM T2 timestamp 3. PTM T3 timestamp 4. PTM Context valid PCIe EP ======= 1. PTM Local clock 2. PTM T1 timestamp 3. PTM T4 timestamp 4. PTM Master clock 5. PTM Context update Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: fix overflow issue reported by Dan Carpenter from https://lore.kernel.org/linux-pci/b41c1754-c6b7-4805-9f14-7c643d6c5304@suswa.mountain] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/20250505-pcie-ptm-v4-1-02d26d51400b@linaro.org
2025-05-15PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flagIlpo Järvinen
PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-05-13PCI: cadence: Simplify J721e link status checkHans Zhang
Replace explicit if-else condition with direct return statement in j721e_pcie_link_up(). This reduces code verbosity while maintaining the same logic for detecting PCIe link completion. Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20250510160710.392122-4-18255117159@163.com
2025-05-13PCI: mobiveil: Return bool from link up checkHans Zhang
PCIe link status check is supposed to return a boolean to indicate whether the link is up or not. So update ls_g4_pcie_link_up() to return bool and also simplify the LTSSM state check. Signed-off-by: Hans Zhang <18255117159@163.com> [mani: commit message rewording] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20250510160710.392122-3-18255117159@163.com
2025-05-13PCI: dwc: Return bool from link up checkHans Zhang
PCIe link status check is supposed to return a boolean to indicate whether the link is up or not. So, modify the link_up callbacks and dw_pcie_link_up() function to return bool instead of int. Signed-off-by: Hans Zhang <18255117159@163.com> [mani: commit message reword] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20250510160710.392122-2-18255117159@163.com
2025-05-10dt-bindings: PCI: Convert v3,v360epc-pci to DT schemaRob Herring (Arm)
Convert the v3,v360epc-pci binding to DT schema format. Add "clocks" which was not documented and is required. Drop "syscon" which was documented, but is not used. Drop the "v3,v360epc-pci" compatible by itself as this device is only used on the Arm Integrator/AP and not likely going to be used anywhere else at this point. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20250505220139.2202164-1-robh@kernel.org
2025-05-10PCI: dwc: ep: Use FIELD_GET() where applicableHans Zhang
Use FIELD_GET() to simplify the code extracting the register values. No functional change intended. Signed-off-by: Hans Zhang <18255117159@163.com> [mani: commit message fixup] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20250428124230.112648-1-18255117159@163.com
2025-05-05PCI: Explicitly put devices into D0 when initializingMario Limonciello
AMD BIOS team has root caused an issue that NVMe storage failed to come back from suspend to a lack of a call to _REG when NVMe device was probed. 112a7f9c8edbf ("PCI/ACPI: Call _REG when transitioning D-states") added support for calling _REG when transitioning D-states, but this only works if the device actually "transitions" D-states. 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices") added support for runtime PM on PCI devices, but never actually 'explicitly' sets the device to D0. To make sure that devices are in D0 and that platform methods such as _REG are called, explicitly set all devices into D0 during initialization. Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Denis Benato <benato.denis96@gmail.com> Tested-By: Yijun Shen <Yijun_Shen@Dell.com> Tested-By: David Perry <david.perry@amd.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://patch.msgid.link/20250424043232.1848107-1-superm1@kernel.org