Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Batch sizing of multiple BARs while memory decoding is disabled
instead of disabling/enabling decoding for each BAR individually;
this optimizes virtualized environments where toggling decoding
enable is expensive (Alex Williamson)
- Add host bridge .enable_device() and .disable_device() hooks for
bridges that need to configure things like Requester ID to StreamID
mapping when enabling devices (Frank Li)
- Extend struct pci_ecam_ops with .enable_device() and
.disable_device() hooks so drivers that use pci_host_common_probe()
instead of their own .probe() have a way to set the
.enable_device() callbacks (Marc Zyngier)
- Drop 'No bus range found' message so we don't complain when DTs
don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)
- Rename the drivers/pci/of_property.c struct of_pci_range to
of_pci_range_entry to avoid confusion with the global of_pci_range
in include/linux/of_address.h (Bjorn Helgaas)
Driver binding:
- Update resource request API documentation to encourage callers to
supply a driver name when requesting resources (Philipp Stanner)
- Export pci_intx_unmanaged() and pcim_intx() (always managed) so
callers of pci_intx() (which is sometimes managed) can explicitly
choose the one they need (Philipp Stanner)
- Convert drivers from pci_intx() to always-managed pcim_intx() or
never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)
- Remove pci_intx_unmanaged() since pci_intx() is now always
unmanaged and pcim_intx() is always managed (Philipp Stanner)
Error handling:
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
logging rather than building their own (Ilpo Järvinen)
- Move TLP Log handling to its own file (Ilpo Järvinen)
- Store number of supported End-End TLP Prefixes always so we can
read the correct number of DWORDs from the TLP Prefix Log (Ilpo
Järvinen)
- Read TLP Prefixes in addition to the Header Log in
pcie_read_tlp_log() (Ilpo Järvinen)
- Add pcie_print_tlp_log() to consolidate printing of TLP Header and
Prefix Log (Ilpo Järvinen)
- Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
BIOSes that don't configure it correctly (Takashi Iwai)
ASPM:
- Save parent L1 PM Substates config so when we restore it along with
an endpoint's config, the parent info isn't junk (Jian-Hong Pan)
Power management:
- Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
the system can't wake up from suspend (Werner Sembach)
Endpoint framework:
- Destroy the EPC device in devm_pci_epc_destroy(), which previously
didn't call devres_release() (Zijun Hu)
- Finish virtual EP removal in pci_epf_remove_vepf(), which
previously caused a subsequent pci_epf_add_vepf() to fail with
-EBUSY (Zijun Hu)
- Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
don't depend on the BAR_MASK reset value being larger than the
requested BAR size (Niklas Cassel)
- Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
reads from bypassing the iATU if we reduced the BAR size (Niklas
Cassel)
- Verify address alignment when programming iATU so we don't attempt
to write bits that are read-only because of the BAR size, which
could lead to directing accesses to the wrong address (Niklas
Cassel)
- Implement artpec6 pci_epc_features so we can rely on all drivers
supporting it so we can use it in EPC core code (Niklas Cassel)
- Check for BARs of fixed size to prevent endpoint drivers from
trying to change their size (Niklas Cassel)
- Verify that requested BAR size is a power of two when endpoint
driver sets the BAR (Niklas Cassel)
Endpoint framework tests:
- Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
dma_chan_rx (Mohamed Khalfella)
- Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)
- Add pci-epf-test and pci_endpoint_test support for capabilities
(Niklas Cassel)
- Add Endpoint test for consecutive BARs (Niklas Cassel)
- Remove redundant comparison from Endpoint BAR test because a > 1MB
BAR can always be exactly covered by iterating with a 1MB buffer
(Hans Zhang)
- Move and convert PCI Endpoint tests from tools/pci to Kselftests
(Manivannan Sadhasivam)
Apple PCIe controller driver:
- Convert StreamID mapping configuration from a bus notifier to the
.enable_device() and .disable_device() callbacks (Marc Zyngier)
Freescale i.MX6 PCIe controller driver:
- Add Requester ID to StreamID mapping configuration when enabling
devices (Frank Li)
- Use DWC core suspend/resume functions for imx6 (Frank Li)
- Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
Zhu)
- Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
Li)
- Add DT binding for optional i.MX95 Refclk and driver support to
enable it if the platform hasn't enabled it (Richard Zhu)
- Configure PHY based on controller being in Root Complex or Endpoint
mode (Frank Li)
- Rely on dbi2 and iATU base addresses from DT via
dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)
- Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
asserted in imx_pcie_assert_core_reset() (Richard Zhu)
- Add missing reference clock enable or disable logic for IMX6SX,
IMX7D, IMX8MM (Richard Zhu)
- Remove redundant imx7d_pcie_init_phy() since
imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)
Freescale Layerscape PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead
of syscon_regmap_lookup_by_phandle() followed by
of_property_read_u32_array() (Krzysztof Kozlowski)
Marvell MVEBU PCIe controller driver:
- Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)
MediaTek PCIe Gen3 controller driver:
- Use clk_bulk_prepare_enable() instead of separate
clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)
- Rearrange reset assert/deassert so they're both done in the
*_power_up() callbacks (Lorenzo Bianconi)
- Document that Airoha EN7581 requires PHY init and power-on before
PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
Bianconi)
- Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)
- Sleep instead of delay during Airoha EN7581 power-up, since this is
a non-atomic context (Lorenzo Bianconi)
- Skip PERST# assertion on Airoha EN7581 during probe and
suspend/resume to avoid a hardware defect (Lorenzo Bianconi)
- Enable async probe to reduce system startup time (Douglas Anderson)
Microchip PolarFlare PCIe controller driver:
- Set up the inbound address translation based on whether the
platform allows coherent or non-coherent DMA (Daire McNamara)
- Update DT binding such that platforms are DMA-coherent by default
and must specify 'dma-noncoherent' if needed (Conor Dooley)
Mobiveil PCIe controller driver:
- Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
and 'reg-names' (Frank Li)
Qualcomm PCIe controller driver:
- Add DT SM8550 and SM8650 optional 'global' interrupt for link
events (Neil Armstrong)
- Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
Mylavarapu)
- If 'global' IRQ is supported for detection of Link Up events, tell
DWC core not to wait for link up (Krishna chaitanya chundru)
Renesas R-Car PCIe controller driver:
- Avoid passing stack buffer as resource name (King Dix)
Rockchip PCIe controller driver:
- Simplify clock and reset handling by using bulk interfaces (Anand
Moon)
- Pass typed rockchip_pcie (not void) pointer to
rockchip_pcie_disable_clocks() (Anand Moon)
- Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
(Dan Carpenter)
Rockchip DesignWare PCIe controller driver:
- Use dll_link_up IRQ to detect Link Up and enumerate devices so
users don't have to manually rescan (Niklas Cassel)
- Tell DWC core not to wait for link up since the 'sys' interrupt is
required and detects Link Up events (Niklas Cassel)
Synopsys DesignWare PCIe controller driver:
- Don't wait for link up in DWC core if driver can detect Link Up
event (Krishna chaitanya chundru)
- Update ICC and OPP votes after Link Up events (Krishna chaitanya
chundru)
- Always stop link in dw_pcie_suspend_noirq(), which is required at
least for i.MX8QM to re-establish link on resume (Richard Zhu)
- Drop racy and unnecessary LTSSM state check before sending
PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)
- Add struct of_pci_range.parent_bus_addr for devices that need their
immediate parent bus address, not the CPU address, e.g., to program
an internal Address Translation Unit (iATU) (Frank Li)
TI DRA7xx PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
syscon_regmap_lookup_by_phandle() followed by
of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
(Krzysztof Kozlowski)
Xilinx Versal CPM PCIe controller driver:
- Add DT binding and driver support for Xilinx Versal CPM5
(Thippeswamy Havalige)
MicroSemi Switchtec management driver:
- Add Microchip PCI100X device IDs (Rakesh Babu Saladi)
Miscellaneous:
- Move reset related sysfs code from pci.c to pci-sysfs.c where other
similar code lives (Ilpo Järvinen)
- Simplify reset_method_store() memory management by using __free()
instead of explicit kfree() cleanup (Ilpo Järvinen)
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
ACPI hotplug driver (Thomas Weißschuh)
- Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
Zhang)
- Correct documentation of the 'config_acs=' kernel parameter
(Akihiko Odaki)"
* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
PCI: Batch BAR sizing operations
dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
PCI: microchip: Set inbound address translation for coherent or non-coherent mode
Documentation: Fix pci=config_acs= example
PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
PCI: Don't include 'pm_wakeup.h' directly
selftests: pci_endpoint: Migrate to Kselftest framework
selftests: Move PCI Endpoint tests from tools/pci to Kselftests
misc: pci_endpoint_test: Fix IOCTL return value
dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
PCI: switchtec: Add Microchip PCI100X device IDs
misc: pci_endpoint_test: Remove redundant 'remainder' test
misc: pci_endpoint_test: Add consecutive BAR test
misc: pci_endpoint_test: Add support for capabilities
PCI: endpoint: pci-epf-test: Add support for capabilities
PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
PCI: dwc: Simplify config resource lookup
...
|
|
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI
hotplug driver (Thomas Weißschuh)
- Update PCI_EXP_LNKCAP_SLS comment (Lukas Wunner)
- Drop superfluous pm_wakeup.h include (Wolfram Sang)
- Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang)
- Correct documentation of the 'config_acs=' kernel parameter (Akihiko
Odaki)
* pci/misc:
Documentation: Fix pci=config_acs= example
PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
PCI: Don't include 'pm_wakeup.h' directly
PCI: Update code comment on PCI_EXP_LNKCAP_SLS for PCIe r3.0
PCI/ACPI: Constify 'struct bin_attribute'
PCI/P2PDMA: Constify 'struct bin_attribute'
PCI/VPD: Constify 'struct bin_attribute'
PCI/sysfs: Constify 'struct bin_attribute'
|
|
- Add host bridge .enable_device() and .disable_device() hooks for bridges
that need to configure things like Requester ID to StreamID mapping when
enabling devices (Frank Li)
- Add imx6 Requester ID to StreamID mapping configuration when enabling
devices (Frank Li)
- Extend struct pci_ecam_ops with .enable_device() and .disable_device()
hooks so drivers that use pci_host_common_probe() instead of their own
.probe() have a way to set the .enable_device() callbacks (Marc Zyngier)
- Convert pcie-apple StreamID mapping configuration from a bus notifier to
the .enable_device() and .disable_device() callbacks (Marc Zyngier)
* pci/controller/iommu-map:
PCI: apple: Convert to {en,dis}able_device() callbacks
PCI: host-generic: Allow {en,dis}able_device() to be provided via pci_ecam_ops
PCI: imx6: Add IOMMU and ITS MSI support for i.MX95
PCI: Add enable_device() and disable_device() callbacks for bridges
|
|
- Move reset related sysfs code from pci.c to pci-sysfs.c where other
similar code lives (Ilpo Järvinen)
- Simplify reset_method_store() memory management by using __free() instead
of explicit kfree() cleanup (Ilpo Järvinen)
- Drop unnecessary zero initializer (Ilpo Järvinen)
* pci/pci-sysfs:
PCI/sysfs: Remove unnecessary zero in initializer
PCI/sysfs: Use __free() in reset_method_store()
PCI/sysfs: Move reset related sysfs code to correct file
|
|
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging
rather than building their own (Ilpo Järvinen)
- Move TLP Log handling to its own file (Ilpo Järvinen)
- Add #defines for TLP Header/Prefix log sizes (Ilpo Järvinen)
- Store number of supported End-End TLP Prefixes always so we can read the
correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen)
- Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log()
(Ilpo Järvinen)
- Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix
Log (Ilpo Järvinen)
* pci/err:
PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log
PCI: Add TLP Prefix reading to pcie_read_tlp_log()
PCI: Store number of supported End-End TLP Prefixes
PCI: Use unsigned int i in pcie_read_tlp_log()
PCI: Use same names in pcie_read_tlp_log() prototype and definition
PCI: Add defines for TLP Header/Prefix log sizes
PCI: Move TLP Log handling to its own file
PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
|
|
The header clearly states that it does not want to be included directly,
only via 'device.h'. The 'platform_device.h' works equally well.
Thus, remove the direct inclusion.
Link: https://lore.kernel.org/r/20241118072917.3853-12-wsa+renesas@sang-engineering.com
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_intx() is a hybrid function which can sometimes be managed through
devres. This hybrid nature is undesirable.
Since all users of pci_intx() have by now been ported either to
always-managed pcim_intx() or never-managed pci_intx_unmanaged(), the
devres functionality can be removed from pci_intx().
Consequently, pci_intx_unmanaged() is now redundant, because pci_intx()
itself is now unmanaged.
Remove the devres functionality from pci_intx(). Have all users of
pci_intx_unmanaged() call pci_intx(). Remove pci_intx_unmanaged().
Link: https://lore.kernel.org/r/20241209130632.132074-13-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
|
|
pci_intx() is a hybrid function which sometimes performs devres operations,
depending on whether pcim_enable_device() has been used to enable the
pci_dev. This sometimes-managed nature of the function is problematic.
Notably, it causes the function to allocate under some circumstances which
makes it unusable from interrupt context.
Export pcim_intx() (which is always managed) and rename __pcim_intx()
(which is never managed) to pci_intx_unmanaged() and export it as well.
Then all callers of pci_intx() can be ported to the version they need,
depending whether they use pci_enable_device() or pcim_enable_device().
Link: https://lore.kernel.org/r/20241209130632.132074-3-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Some PCI host bridges require special handling when enabling or
disabling PCI devices. For example, the i.MX95 platform has a lookup
table to map Requester IDs to StreamIDs, which the SMMU and MSI
controller use to identify the source of DMA accesses.
Without this mapping, DMA accesses may target unintended memory, which
would corrupt memory or read the wrong data.
Add a host bridge enable_device() hook the imx6 driver can use to
configure the Requester ID to StreamID mapping. The hardware table isn't
big enough to map all possible Requester IDs, so this hook may fail if
no table space is available. In that case, return failure from
pci_enable_device().
It might make more sense to make pci_set_master() decline to enable bus
mastering and return failure, but it currently doesn't have a way to return
failure.
Link: https://lore.kernel.org/r/20250114-imx95_lut-v9-1-39f58dbed03a@nxp.com
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
|
Most PCI sysfs code and structs are in a dedicated file but a few reset
related things remain in pci.c. Move also them to pci-sysfs.c and drop
pci_dev_reset_method_attr_is_visible() as it is 100% duplicate of
pci_dev_reset_attr_is_visible().
Link: https://lore.kernel.org/r/20241028174046.1736-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
TLP Log is a PCIe feature and is processed only by AER and DPC.
Configwise, DPC depends AER being enabled. In lack of better place, the TLP
Log handling code was initially placed into pci.c but it can be easily
placed in a separate file.
Move TLP Log handling code to its own file under pcie/ subdirectory and
include it only when AER is enabled.
Link: https://lore.kernel.org/r/20250114170840.1633-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
|
|
pcie_read_tlp_log() was exposed by the commit 0a5a46a6a61b ("PCI/AER:
Generalize TLP Header Log reading") with the intent that drivers could use
it, but the PCI maintainer later decided that drivers should be encouraged
to use PCI core diagnostic logging of generic AER registers rather than
building their own.
Drivers that currently implement their own diagnostic logging include ixgbe
(ixgbe_io_error_detected()) and iwlwifi (iwl_trans_pcie_dump_regs()).
Remove the unwanted EXPORT of pcie_read_tlp_log() and remove it from
include/linux/aer.h.
Link: https://lore.kernel.org/r/20250114170840.1633-2-ilpo.jarvinen@linux.intel.com
Link: https://lore.kernel.org/all/20240322193011.GA701027@bhelgaas/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
|
|
The Supported Link Speeds Vector in the Link Capabilities 2 Register
indicates the *supported* link speeds. The Max Link Speed field in the
Link Capabilities Register indicates the *maximum* of those speeds.
pcie_get_supported_speeds() neglects to honor the Max Link Speed field and
will thus incorrectly deem higher speeds as supported. Fix it.
One user-visible issue addressed here is an incorrect value in the sysfs
attribute "max_link_speed".
But the main motivation is a boot hang reported by Niklas: Intel JHL7540
"Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds,
but indicate 2.5 GT/s as maximum. Ilpo recalls seeing this on more
devices. It can be explained by the controller's Downstream Ports
supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s
if the port interfaces to a PCIe Adapter, in accordance with USB4 v2
sec 11.2.1:
"This section defines the functionality of an Internal PCIe Port that
interfaces to a PCIe Adapter. [...]
The Logical sub-block shall update the PCIe configuration registers
with the following characteristics: [...]
Max Link Speed field in the Link Capabilities Register set to 0001b
(data rate of 2.5 GT/s only).
Note: These settings do not represent actual throughput. Throughput
is implementation specific and based on the USB4 Fabric performance."
The present commit is not sufficient on its own to fix Niklas' boot hang,
but it is a prerequisite: A subsequent commit will fix the boot hang by
enabling bandwidth control only if more than one speed is supported.
The GENMASK() macro used herein specifies 0 as lowest bit, even though
the Supported Link Speeds Vector ends at bit 1. This is done on purpose
to avoid a GENMASK(0, 1) macro if Max Link Speed is zero. That macro
would be invalid as the lowest bit is greater than the highest bit.
Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
Endpoints in particular, so it does occur in practice. (The Link
Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.)
Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org
Link: https://lore.kernel.org/r/fe03941e3e1cc42fb9bf4395e302bff53ee2198b.1734428762.git.lukas@wunner.de
Reported-by: Niklas Schnelle <niks@kernel.org>
Tested-by: Niklas Schnelle <niks@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
PCI region request functions have a @name parameter (sometimes called
"res_name"). It is used in a log message to inform drivers about request
collisions, e.g., when another driver has requested that region already.
This message is only useful when it contains the actual owner of the
region, i.e., which driver requested it. So far, a great many drivers
misuse the @name parameter and just pass pci_name(), which doesn't result
in useful debug information.
Rename "res_name" to "name".
Detail @name's purpose in the docstrings.
Link: https://lore.kernel.org/r/20241203100023.31152-2-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[bhelgaas: tweak comment wording to include "driver"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Make pci_stop_dev() and pci_destroy_dev() safe so concurrent
callers can't stop a device multiple times, even as we migrate from
the global pci_rescan_remove_lock to finer-grained locking (Keith
Busch)
- Improve pci_walk_bus() implementation by making it recursive and
moving locking up to avoid need for a 'locked' parameter (Keith
Busch)
- Unexport pci_walk_bus_locked(), which is only used internally by
the PCI core (Keith Busch)
- Detect some Thunderbolt chips that are built-in and hence
'trustworthy' by a heuristic since the 'ExternalFacingPort' and
'usb4-host-interface' ACPI properties are not quite enough (Esther
Shimanovich)
Resource management:
- Use PCI bus addresses (not CPU addresses) in 'ranges' properties
when building dynamic DT nodes so systems where PCI and CPU
addresses differ work correctly (Andrea della Porta)
- Tidy resource sizing and assignment with helpers to reduce
redundancy (Ilpo Järvinen)
- Improve pdev_sort_resources() 'bogus alignment' warning to be more
specific (Ilpo Järvinen)
Driver binding:
- Convert driver .remove_new() callbacks to .remove() again to finish
the conversion from returning 'int' to being 'void' (Sergio
Paracuellos)
- Export pcim_request_all_regions(), a managed interface to request
all BARs (Philipp Stanner)
- Replace pcim_iomap_regions_request_all() with
pcim_request_all_regions(), and pcim_iomap_table()[n] with
pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto
octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212
(Philipp Stanner)
- Remove the now unused pcim_iomap_regions_request_all() (Philipp
Stanner)
- Export pcim_iounmap_region(), a managed interface to unmap and
release a PCI BAR (Philipp Stanner)
- Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and
pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the
following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield,
cavium (Philipp Stanner)
Error handling:
- Add sysfs 'reset_subordinate' to reset the entire hierarchy below a
bridge; previously Secondary Bus Reset could only be used when
there was a single device below a bridge (Keith Busch)
- Warn if we reset a running device where the driver didn't register
pci_error_handlers notification callbacks (Keith Busch)
ASPM:
- Disable ASPM L1 before touching L1 PM Substates to follow the spec
closer and avoid a CPU load timeout on some platforms (Ajay
Agarwal)
- Set devices below Intel VMD to D0 before enabling ASPM L1 Substates
as required per spec for all L1 Substates changes (Jian-Hong Pan)
Power management:
- Enable starfive controller runtime PM before probing host bridge
(Mayank Rana)
- Enable runtime power management for host bridges (Krishna chaitanya
chundru)
Power control:
- Use of_platform_device_create() instead of of_platform_populate()
to create pwrctl platform devices so we can control it based on the
child nodes (Manivannan Sadhasivam)
- Create pwrctrl platform devices only if there's a relevant power
supply property (Manivannan Sadhasivam)
- Add device link from the pwrctl supplier to the PCI dev to ensure
pwrctl drivers are probed before the PCI dev driver; this avoids a
race where pwrctl could change device power state while the PCI
driver was active (Manivannan Sadhasivam)
- Find pwrctl device for removal with of_find_device_by_node()
instead of searching all children of the parent (Manivannan
Sadhasivam)
- Rename 'pwrctl' to 'pwrctrl' to match new bandwidth controller
('bwctrl') and hotplug files (Bjorn Helgaas)
Bandwidth control:
- Add read/modify/write locking for Link Control 2, which is used to
manage Link speed (Ilpo Järvinen)
- Extract Link Bandwidth Management Status check into
pcie_lbms_seen(), where it can be shared between the bandwidth
controller and quirks that use it to help retrain failed links
(Ilpo Järvinen)
- Re-add Link Bandwidth notification support with updates to address
the reasons it was previously reverted (Alexandru Gagniuc, Ilpo
Järvinen)
- Add pcie_set_target_speed() and related functionality so drivers
can manage PCIe Link speed based on thermal or other constraints
(Ilpo Järvinen)
- Add a thermal cooling driver to throttle PCIe Links via the
existing thermal management framework (Ilpo Järvinen)
- Add a userspace selftest for the PCIe bandwidth controller (Ilpo
Järvinen)
PCI device hotplug:
- Add hotplug controller driver for Marvell OCTEON multi-function
device where function 0 has a management console interface to
enable/disable and provision various personalities for the other
functions (Shijith Thotton)
- Retain a reference to the pci_bus for the lifetime of a pci_slot to
avoid a use-after-free when the thunderbolt driver resets USB4 host
routers on boot, causing hotplug remove/add of downstream docks or
other devices (Lukas Wunner)
- Remove unused cpcihp struct cpci_hp_controller_ops.hardware_test
(Guilherme Giacomo Simoes)
- Remove unused cpqphp struct ctrl_dbg.ctrl (Christophe JAILLET)
- Use pci_bus_read_dev_vendor_id() instead of hand-coded presence
detection in cpqphp (Ilpo Järvinen)
- Simplify cpqphp enumeration, which is already simple-minded and
doesn't handle devices below hot-added bridges (Ilpo Järvinen)
Virtualization:
- Add ACS quirk for Wangxun FF5xxx NICs, which don't advertise an ACS
capability but do isolate functions as though PCI_ACS_RR and
PCI_ACS_CR were set, so the functions can be in independent IOMMU
groups (Mengyuan Lou)
TLP Processing Hints (TPH):
- Add and document TLP Processing Hints (TPH) support so drivers can
enable and disable TPH and the kernel can save/restore TPH
configuration (Wei Huang)
- Add TPH Steering Tag support so drivers can retrieve Steering Tag
values associated with specific CPUs via an ACPI _DSM to improve
performance by directing DMA writes closer to their consumers (Wei
Huang)
Data Object Exchange (DOE):
- Wait up to 1 second for DOE Busy bit to clear before writing a
request to the mailbox to avoid failures if the mailbox is still
busy from a previous transfer (Gregory Price)
Endpoint framework:
- Skip attempts to allocate from endpoint controller memory window if
the requested size is larger than the window (Damien Le Moal)
- Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to
handle controller-specific size and alignment constraints, and add
test cases to the endpoint test driver (Damien Le Moal)
- Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can
observe DWC-specific alignment requirements (Damien Le Moal)
- Synchronously cancel command handler work in endpoint test before
cleaning up DMA and BARs (Damien Le Moal)
- Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas
Cassel)
- Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and
dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent
(Niklas Cassel)
- Avoid NULL dereference if Modem Host Interface Endpoint lacks
'mmio' DT property (Zhongqiu Han)
- Release PCI domain ID of Endpoint controller parent (not controller
itself) and before unregistering the controller, to avoid
use-after-free (Zijun Hu)
- Clear secondary (not primary) EPC in pci_epc_remove_epf() when
removing the secondary controller associated with an NTB (Zijun Hu)
Cadence PCIe controller driver:
- Lower severity of 'phy-names' message (Bartosz Wawrzyniak)
Freescale i.MX6 PCIe controller driver:
- Fix suspend/resume support on i.MX6QDL, which has a hardware
erratum that prevents use of L2 (Stefan Eichenberger)
Intel VMD host bridge driver:
- Add 0xb60b and 0xb06f Device IDs for client SKUs (Nirmal Patel)
MediaTek PCIe Gen3 controller driver:
- Update mediatek-gen3 DT binding to require the exact number of
clocks for each SoC (Fei Shao)
- Add support for DT 'max-link-speed' and 'num-lanes' properties to
restrict the link speed and width (AngeloGioacchino Del Regno)
Microchip PolarFlare PCIe controller driver:
- Add DT and driver support for using either of the two PolarFire
Root Ports (Conor Dooley)
NVIDIA Tegra194 PCIe controller driver:
- Move endpoint controller cleanups that depend on refclk from the
host to the notifier that tells us the host has deasserted PERST#,
when refclk should be valid (Manivannan Sadhasivam)
Qualcomm PCIe controller driver:
- Add qcom SAR2130P DT binding with an additional clock (Dmitry
Baryshkov)
- Enable MSI interrupts if 'global' IRQ is supported, since a
previous commit unintentionally masked them (Manivannan Sadhasivam)
- Move endpoint controller cleanups that depend on refclk from the
host to the notifier that tells us the host has deasserted PERST#,
when refclk should be valid (Manivannan Sadhasivam)
- Add DT binding and driver support for IPQ9574, with Synopsys IP
v5.80a and Qcom IP 1.27.0 (devi priya)
- Move the OPP "operating-points-v2" table from the
qcom,pcie-sm8450.yaml DT binding to qcom,pcie-common.yaml, where it
can be used by other Qcom platforms (Qiang Yu)
- Add 'global' SPI interrupt for events like link-up, link-down to
qcom,pcie-x1e80100 DT binding so we can start enumeration when the
link comes up (Qiang Yu)
- Disable ASPM L0s for qcom,pcie-x1e80100 since the PHY is not tuned
to support this (Qiang Yu)
- Add ops_1_21_0 for SC8280X family SoC, which doesn't use the
'iommu-map' DT property and doesn't need BDF-to-SID translation
(Qiang Yu)
Rockchip PCIe controller driver:
- Define ROCKCHIP_PCIE_AT_SIZE_ALIGN to replace magic 256 endpoint
.align value (Damien Le Moal)
- When unmapping an endpoint window, compute the region index instead
of searching for it, and verify that the address was mapped (Damien
Le Moal)
- When mapping an endpoint window, verify that the address hasn't
been mapped already (Damien Le Moal)
- Implement pci_epc_ops.align_addr() for rockchip-ep (Damien Le Moal)
- Fix MSI IRQ data mapping to observe the alignment constraint, which
fixes intermittent page faults in memcpy_toio() and memcpy_fromio()
(Damien Le Moal)
- Rename rockchip_pcie_parse_ep_dt() to
rockchip_pcie_ep_get_resources() for consistency with similar DT
interfaces (Damien Le Moal)
- Skip the unnecessary link train in rockchip_pcie_ep_probe() and do
it only in the endpoint start operation (Damien Le Moal)
- Implement pci_epc_ops.stop_link() to disable link training and
controller configuration (Damien Le Moal)
- Attempt link training at 5 GT/s when both partners support it
(Damien Le Moal)
- Add a handler for PERST# signal so we can detect host-initiated
resets and start link training after PERST# is deasserted (Damien
Le Moal)
Synopsys DesignWare PCIe controller driver:
- Clear outbound address on unmap so dw_pcie_find_index() won't match
an ATU index that was already unmapped (Damien Le Moal)
- Use of_property_present() instead of of_property_read_bool() when
testing for presence of non-boolean DT properties (Rob Herring)
- Advertise 1MB size if endpoint supports Resizable BARs, which was
inadvertently lost in v6.11 (Niklas Cassel)
TI J721E PCIe driver:
- Add PCIe support for J722S SoC (Siddharth Vadapalli)
- Delay PCIE_T_PVPERL_MS (100 ms), not just PCIE_T_PERST_CLK_US (100
us), before deasserting PERST# to ensure power and refclk are
stable (Siddharth Vadapalli)
TI Keystone PCIe controller driver:
- Set the 'ti,keystone-pcie' mode so v3.65a devices work in Root
Complex mode (Kishon Vijay Abraham I)
- Try to avoid unrecoverable SError for attempts to issue config
transactions when the link is down; this is racy but the best we
can do (Kishon Vijay Abraham I)
Miscellaneous:
- Reorganize kerneldoc parameter names to match order in function
signature (Julia Lawall)
- Fix sysfs reset_method_store() memory leak (Todd Kjos)
- Simplify pci_create_slot() (Ilpo Järvinen)
- Fix incorrect printf format specifiers in pcitest (Luo Yifan)"
* tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (127 commits)
PCI: rockchip-ep: Handle PERST# signal in EP mode
PCI: rockchip-ep: Improve link training
PCI: rockship-ep: Implement the pci_epc_ops::stop_link() operation
PCI: rockchip-ep: Refactor endpoint link training enable
PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
PCI: rockchip-ep: Rename rockchip_pcie_parse_ep_dt()
PCI: rockchip-ep: Fix MSI IRQ data mapping
PCI: rockchip-ep: Implement the pci_epc_ops::align_addr() operation
PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
PCI: rockchip-ep: Use a macro to define EP controller .align feature
PCI: rockchip-ep: Fix address translation unit programming
PCI/pwrctrl: Rename pwrctrl functions and structures
PCI/pwrctrl: Rename pwrctl files to pwrctrl
PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent
PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers
PCI/pwrctl: Create pwrctl device only if at least one power supply is present
PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices
tools: PCI: Fix incorrect printf format specifiers
...
|
|
- Reorganize kerneldoc parameter names to match order in function signature
(Julia Lawall)
- Remove kerneldoc return value descriptions from hotplug registration
interfaces that don't return anything (Ilpo Järvinen)
- Fix sysfs reset_method_store() memory leak (Todd Kjos)
- Simplify pci_create_slot() (Ilpo Järvinen)
- Fix incorrect printf format specifiers in pcitest (Luo Yifan)
* pci/misc:
tools: PCI: Fix incorrect printf format specifiers
PCI: Simplify pci_create_slot() logic
PCI: Fix reset_method_store() memory leak
PCI: hotplug: Remove "Returns" kerneldoc from void functions
PCI: hotplug: Reorganize kerneldoc parameter names
|
|
- Add and document TLP Processing Hints (TPH) support so drivers can enable
and disable TPH and the kernel can save/restore TPH configuration (Wei
Huang)
- Add TPH Steering Tag support so drivers can retrieve Steering Tag values
associated with specific CPUs via an ACPI _DSM to direct DMA writes
closer to their consumers (Wei Huang)
* pci/tph:
PCI/TPH: Add TPH documentation
PCI/TPH: Add Steering Tag support
PCI: Add TLP Processing Hints (TPH) support
|
|
- Add resource_set_size() to set resource size when start has already been
set (Ilpo Järvinen)
- Add resource_set_range() helper to set both resource start and size (Ilpo
Järvinen)
- Use IS_ALIGNED() and resource_size() in quirk_s3_64M() instead of
open-coding them (Ilpo Järvinen)
- Add ALIGN_DOWN_IF_NONZERO() to avoid code duplication when distributing
resources across devices (Ilpo Järvinen)
- Improve pdev_sort_resources() warning message to be more specific (Ilpo
Järvinen)
* pci/resource:
PCI: Improve pdev_sort_resources() warning message
PCI: Add ALIGN_DOWN_IF_NONZERO() helper
PCI: Use align and resource helpers, and SZ_* in quirk_s3_64M()
PCI: Use resource_set_{range,size}() helpers
resource: Add resource set range and size helpers
|
|
- Add sysfs 'reset_subordinate' to reset hierarchy below bridge (Keith
Busch)
- Warn if we reset a running device where driver didn't register
pci_error_handlers notification callbacks (Keith Busch)
* pci/reset:
PCI: Warn if a running device is unaware of reset
PCI: Add 'reset_subordinate' to reset hierarchy below bridge
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"Bindings:
- Enable dtc "interrupt_provider" warnings for binding examples. Fix
the warnings in fsl,mu-msi and ti,sci-inta due to this.
- Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
altr,fpga-passive-serial to DT schema format
- Add some documentation on the different forms of YAML text blocks
which are a constant source of review comments
- Fix some schema errors in constraints for arrays
- Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
DT core:
- Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
- Add some warnings on deprecated address handling
- Rework early_init_dt_scan() so the arch can pass in the phys
address of the DTB as __pa() is not always valid to use. This fixes
a warning for arm64 with kexec.
- Add and use some new DT graph iterators for iterating over ports
and endpoints
- Rework reserved-memory handling to be sized dynamically for fixed
regions
- Optimize of_modalias() to avoid a strlen() call
- Constify struct device_node and property pointers where ever
possible"
* tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (36 commits)
of: Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
dt-bindings: interrupt-controller: qcom,pdc: Add SAR2130P compatible
of/address: Rework bus matching to avoid warnings
of: WARN on deprecated #address-cells/#size-cells handling
of/fdt: Don't use default address cell sizes for address translation
dt-bindings: Enable dtc "interrupt_provider" warnings
of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
dt-bindings: watchdog: convert zii,rave-sp-wdt.txt to yaml format
dt-bindings: input: convert zii,rave-sp-pwrbutton.txt to yaml
media: xilinx-tpg: use new of_graph functions
fbdev: omapfb: use new of_graph functions
gpu: drm: omapdrm: use new of_graph functions
ASoC: audio-graph-card2: use new of_graph functions
ASoC: audio-graph-card: use new of_graph functions
ASoC: test-component: use new of_graph functions
of: property: use new of_graph functions
of: property: add of_graph_get_next_port_endpoint()
of: property: add of_graph_get_next_port()
of: module: remove strlen() call in of_modalias()
...
|
|
This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove bandwidth
notification"). An upcoming commit extends this driver building PCIe
bandwidth controller on top of it.
PCIe bandwidth notifications were first added in the commit e8303bb7a75c
("PCI/LINK: Report degraded links via link bandwidth notification") but
later had to be removed. The significant changes compared with the old
bandwidth notification driver include:
1) Don't print the notifications into kernel log, just keep the Link
Speed cached in struct pci_bus updated. While somewhat unfortunate,
the log spam was the source of complaints that eventually lead to
the removal of the bandwidth notifications driver (see the links
below for further information).
2) Besides the Link Bandwidth Management Interrupt, also enable Link
Autonomous Bandwidth Interrupt to cover the other source of bandwidth
changes.
3) Handle Link Speed updates robustly. Refresh the cached Link Speed
when enabling Bandwidth Notification Interrupts, and solve the race
between Link Speed read and LBMS/LABS update in
pcie_bwnotif_irq_thread().
4) Use concurrency safe LNKCTL RMW operations.
5) The driver is now called PCIe bwctrl (bandwidth controller) instead
of just bandwidth notifications because of increased scope and
functionality within the driver.
6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain().
Provide LBMS counting API for it.
7) Tweaks to variable/functions names for consistency and length reasons.
Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to
keep track PCIe Link Speed changes.
[bhelgaas: This is based on previous work by Alexandru Gagniuc
<mr.nuke.me@gmail.com>; see e8303bb7a75c ("PCI/LINK: Report degraded links
via link bandwidth notification")]
Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com
Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler:
https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
If a reset is issued to a running device with a driver that didn't register
the notification callbacks, the driver may be unaware of this event and
have an inconsistent view of the device's state. Log a warning of this
event because there's nothing else indicating the event occured, which
could be confusing when debugging such situations.
Link: https://lore.kernel.org/r/20241025222755.3756162-2-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary
Bus Reset on the bridge leading to the device. These only work if the
device is the only device below the bridge.
Add a sysfs 'reset_subordinate' attribute on bridges that can assert
Secondary Bus Reset regardless of how many devices are below the bridge.
This resets all the devices below a bridge in a single command, including
the locking and config space save/restore that reset methods normally do.
This may be the only way to reset devices that don't support other reset
methods (ACPI, FLR, PM reset, etc).
Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com>
|
|
The PCIe bandwidth controller added by a subsequent commit will require
selecting PCIe Link Speeds that are lower than the Maximum Link Speed.
The struct pci_bus only stores max_bus_speed. Even if PCIe r6.1 sec 8.2.1
currently disallows gaps in supported Link Speeds, the Implementation Note
in PCIe r6.1 sec 7.5.3.18, recommends determining supported Link Speeds
using the Supported Link Speeds Vector in the Link Capabilities 2 Register
(when available) to "avoid software being confused if a future
specification defines Links that do not require support for all slower
speeds."
Reuse code in pcie_get_speed_cap() to add pcie_get_supported_speeds() to
query the Supported Link Speeds Vector of a PCIe device. The value is taken
directly from the Supported Link Speeds Vector or synthesized from the Max
Link Speed in the Link Capabilities Register when the Link Capabilities 2
Register is not available.
The Supported Link Speeds Vector in the Link Capabilities Register 2
corresponds to the bus below on Root Ports and Downstream Ports, whereas it
corresponds to the bus above on Upstream Ports and Endpoints (PCIe r6.1 sec
7.5.3.18):
Supported Link Speeds Vector - This field indicates the supported Link
speed(s) of the associated Port.
Add supported_speeds into the struct pci_dev that caches the
Supported Link Speeds Vector.
supported_speeds contains a set of Link Speeds only in the case where PCIe
Link Speed can be determined. Root Complex Integrated Endpoints do not have
a well-defined Link Speed because they do not implement either of the Link
Capabilities Registers, which is allowed by PCIe r6.1 sec 7.5.3 (the same
limitation applies to determining cur_bus_speed and max_bus_speed that are
PCI_SPEED_UNKNOWN in such case). This is of no concern from PCIe bandwidth
controller point of view because such devices are not attached into a PCIe
Root Port that could be controlled.
The supported_speeds field keeps the extra reserved zero at the least
significant bit to match the Link Capabilities 2 Register layout.
An attempt was made to store supported_speeds field into the struct pci_bus
as an intersection of both ends of the Link, however, the subordinate
struct pci_bus is not available early enough. The Target Speed quirk (in
pcie_failed_link_retrain()) can run either during initial scan or later,
requiring it to use the API provided by the PCIe bandwidth controller to
set the Target Link Speed in order to co-exist with the bandwidth
controller. When the Target Speed quirk is calling the bandwidth controller
during initial scan, the struct pci_bus is not yet initialized. As such,
storing supported_speeds into the struct pci_bus is not viable.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/20241018144755.7875-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: move pcie_get_supported_speeds() decl to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
There are ACS quirks that hijack the normal ACS processing and deliver to
to special quirk code. The enable path needs to call
pci_dev_specific_enable_acs() and then pci_dev_specific_acs_enabled() will
report the hidden ACS state controlled by the quirk.
The recent rework got this out of order and we should try to call
pci_dev_specific_enable_acs() regardless of any actual ACS support in the
device.
As before command line parameters that effect standard PCI ACS don't
interact with the quirk versions, including the new config_acs= option.
Link: https://lore.kernel.org/r/0-v1-f96b686c625b+124-pci_acs_quirk_fix_jgg@nvidia.com
Fixes: 47c8846a49ba ("PCI: Extend ACS configurability")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/e89107da-ac99-4d3a-9527-a4df9986e120@kernel.org
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229019
Tested-by: Steffen Dirkwinkel <me@steffen.cc>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_register_io_range() does not modify the passed in fwnode_handle, so
make it const.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20241010-dt-const-v1-1-87a51f558425@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
Convert open-coded resource size calculations to use
resource_set_{range,size}() helpers.
While at it, use SZ_* for size parameter where appropriate which makes the
intent of code more obvious.
Also, cast sizes to resource_size_t, not u64.
Link: https://lore.kernel.org/r/20240614100606.15830-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
In reset_method_store(), a string is allocated via kstrndup() and assigned
to the local "options". options is then used in with strsep() to find
spaces:
while ((name = strsep(&options, " ")) != NULL) {
If there are no remaining spaces, then options is set to NULL by strsep(),
so the subsequent kfree(options) doesn't free the memory allocated via
kstrndup().
Fix by using a separate tmp_options to iterate with strsep() so options is
preserved.
Link: https://lore.kernel.org/r/20241001231147.3583649-1-tkjos@google.com
Fixes: d88f521da3ef ("PCI: Allow userspace to query and set device reset mechanism")
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Add support for PCIe TLP Processing Hints (TPH) support (see PCIe r6.2,
sec 6.17).
Add TPH register definitions in pci_regs.h, including the TPH Requester
capability register, TPH Requester control register, TPH Completer
capability, and the ST fields of MSI-X entry.
Introduce pcie_enable_tph() and pcie_disable_tph(), enabling drivers to
toggle TPH support and configure specific ST mode as needed. Also add a new
kernel parameter, "pci=notph", allowing users to disable TPH support across
the entire system.
Link: https://lore.kernel.org/r/20241002165954.128085-2-wei.huang2@amd.com
Co-developed-by: Jing Liu <jing2.liu@intel.com>
Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com>
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Wait for device readiness after reset by polling Vendor ID and
looking for Configuration RRS instead of polling the Command
register and looking for non-error completions, to avoid hardware
retries done for RRS on non-Vendor ID reads (Bjorn Helgaas)
- Rename CRS Completion Status to RRS ('Request Retry Status') to
match PCIe r6.0 spec usage (Bjorn Helgaas)
- Clear LBMS bit after a manual link retrain so we don't try to
retrain a link when there's no downstream device anymore (Maciej W.
Rozycki)
- Revert to the original link speed after retraining fails instead of
leaving it restricted to 2.5GT/s, so a future device has a chance
to use higher speeds (Maciej W. Rozycki)
- Wait for each level of downstream bus, not just the first, to
become accessible before restoring devices on that bus (Ilpo
Järvinen)
- Add ARCH_PCI_DEV_GROUPS so s390 can add its own attribute_groups
without having to stomp on the core's pdev->dev.groups (Lukas
Wunner)
Driver binding:
- Export pcim_request_region(), a managed counterpart of
pci_request_region(), for use by drivers (Philipp Stanner)
- Export pcim_iomap_region() and deprecate pcim_iomap_regions()
(Philipp Stanner)
- Request the PCI BAR used by xboxvideo (Philipp Stanner)
- Request and map drm/ast BARs with pcim_iomap_region() (Philipp
Stanner)
MSI:
- Add MSI_FLAG_NO_AFFINITY flag for devices that mux MSIs onto a
single IRQ line and cannot set the affinity of each MSI to a
specific CPU core (Marek Vasut)
- Use MSI_FLAG_NO_AFFINITY and remove unnecessary .irq_set_affinity()
implementations in aardvark, altera, brcmstb, dwc, mediatek-gen3,
mediatek, mobiveil, plda, rcar, tegra, vmd, xilinx-nwl,
xilinx-xdma, and xilinx drivers to avoid 'IRQ: set affinity failed'
warnings (Marek Vasut)
Power management:
- Add pwrctl support for ATH11K inside the WCN6855 package (Konrad
Dybcio)
PCI device hotplug:
- Remove unnecessary hpc_ops struct from shpchp (ngn)
- Check for PCI_POSSIBLE_ERROR(), not 0xffffffff, in cpqphp
(weiyufeng)
Virtualization:
- Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson)
- Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS
but does provide ACS-like features (Subramanian Ananthanarayanan)
IOMMU:
- Add function 0 DMA alias quirk for Glenfly Arise audio function,
which uses the function 0 Requester ID (WangYuli)
NPEM:
- Add Native PCIe Enclosure Management (NPEM) support for sysfs
control of NVMe RAID storage indicators (ok/fail/locate/
rebuild/etc) (Mariusz Tkaczyk)
- Add support for the ACPI _DSM PCIe SSD status LED management, which
is functionally similar to NPEM but mediated by platform firmware
(Mariusz Tkaczyk)
Device trees:
- Drop minItems and maxItems from ranges in PCI generic host binding
since host bridges may have several MMIO and I/O port apertures
(Frank Li)
- Add kirin, rcar-gen2, uniphier DT binding top-level constraints for
clocks (Krzysztof Kozlowski)
Altera PCIe controller driver:
- Convert altera DT bindings from text to YAML (Matthew Gerlach)
- Replace TLP_REQ_ID() with macro PCI_DEVID(), which does the same
thing and is what other drivers use (Jinjie Ruan)
Broadcom STB PCIe controller driver:
- Add DT binding maxItems for reset controllers (Jim Quinlan)
- Use the 'bridge' reset method if described in the DT (Jim Quinlan)
- Use the 'swinit' reset method if described in the DT (Jim Quinlan)
- Add 'has_phy' so the existence of a 'rescal' reset controller
doesn't imply software control of it (Jim Quinlan)
- Add support for many inbound DMA windows (Jim Quinlan)
- Rename SoC 'type' to 'soc_base' express the fact that SoCs come in
families of multiple similar devices (Jim Quinlan)
- Add Broadcom 7712 DT description and driver support (Jim Quinlan)
- Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings for
maintainability (Bjorn Helgaas)
Freescale i.MX6 PCIe controller driver:
- Add imx6q-pcie 'dbi2' and 'atu' reg-names for i.MX8M Endpoints
(Richard Zhu)
- Fix a code restructuring error that caused i.MX8MM and i.MX8MP
Endpoints to fail to establish link (Richard Zhu)
- Fix i.MX8MP Endpoint occasional failure to trigger MSI by enforcing
outbound alignment requirement (Richard Zhu)
- Call phy_power_off() in the .probe() error path (Frank Li)
- Rename internal names from imx6_* to imx_* since i.MX7/8/9 are also
supported (Frank Li)
- Manage Refclk by using SoC-specific callbacks instead of switch
statements (Frank Li)
- Manage core reset by using SoC-specific callbacks instead of switch
statements (Frank Li)
- Expand comments for erratum ERR010728 workaround (Frank Li)
- Use generic PHY APIs to configure mode, speed, and submode, which
is harmless for devices that implement their own internal PHY
management and don't set the generic imx_pcie->phy (Frank Li)
- Add i.MX8Q (i.MX8QM, i.MX8QXP, and i.MX8DXL) DT binding and driver
Root Complex support (Richard Zhu)
Freescale Layerscape PCIe controller driver:
- Replace layerscape-pcie DT binding compatible fsl,lx2160a-pcie with
fsl,lx2160ar2-pcie (Frank Li)
- Add layerscape-pcie DT binding deprecated 'num-viewport' property
to address a DT checker warning (Frank Li)
- Change layerscape-pcie DT binding 'fsl,pcie-scfg' to phandle-array
(Frank Li)
Loongson PCIe controller driver:
- Increase max PCI hosts to 8 for Loongson-3C6000 and newer chipsets
(Huacai Chen)
Marvell Aardvark PCIe controller driver:
- Fix issue with emulating Configuration RRS for two-byte reads of
Vendor ID; previously it only worked for four-byte reads (Bjorn
Helgaas)
MediaTek PCIe Gen3 controller driver:
- Add per-SoC struct mtk_gen3_pcie_pdata to support multiple SoC
types (Lorenzo Bianconi)
- Use reset_bulk APIs to manage PHY reset lines (Lorenzo Bianconi)
- Add DT and driver support for Airoha EN7581 PCIe controller
(Lorenzo Bianconi)
Qualcomm PCIe controller driver:
- Update qcom,pcie-sc7280 DT binding with eight interrupts (Rayyan
Ansari)
- Add back DT 'vddpe-3v3-supply', which was incorrectly removed
earlier (Johan Hovold)
- Drop endpoint redundant masking of global IRQ events (Manivannan
Sadhasivam)
- Clarify unknown global IRQ message and only log it once to avoid a
flood (Manivannan Sadhasivam)
- Add 'linux,pci-domain' property to endpoint DT binding (Manivannan
Sadhasivam)
- Assign PCI domain number for endpoint controllers (Manivannan
Sadhasivam)
- Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for
endpoint controller (Manivannan Sadhasivam)
- Add global SPI interrupt for PCIe link events to DT binding
(Manivannan Sadhasivam)
- Add global RC interrupt handler to handle 'Link up' events and
automatically enumerate hot-added devices (Manivannan Sadhasivam)
- Avoid mirroring of DBI and iATU register space so it doesn't
overlap BAR MMIO space (Prudhvi Yarlagadda)
- Enable controller resources like PHY only after PERST# is
deasserted to partially avoid the problem that the endpoint SoC
crashes when accessing things when Refclk is absent (Manivannan
Sadhasivam)
- Add 16.0 GT/s equalization and RX lane margining settings (Shashank
Babu Chinta Venkata)
- Pass domain number to pci_bus_release_domain_nr() explicitly to
avoid a NULL pointer dereference (Manivannan Sadhasivam)
Renesas R-Car PCIe controller driver:
- Make the read-only const array 'check_addr' static (Colin Ian King)
- Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding
(Yoshihiro Shimoda)
TI DRA7xx PCIe controller driver:
- Request IRQF_ONESHOT for 'dra7xx-pcie-main' IRQ since the primary
handler is NULL (Siddharth Vadapalli)
- Handle IRQ request errors during root port and endpoint probe
(Siddharth Vadapalli)
TI J721E PCIe driver:
- Add DT 'ti,syscon-acspcie-proxy-ctrl' and driver support to enable
the ACSPCIE module to drive Refclk for the Endpoint (Siddharth
Vadapalli)
- Extract the cadence link setup from cdns_pcie_host_setup() so link
setup can be done separately during resume (Thomas Richard)
- Add T_PERST_CLK_US definition for the mandatory delay between
Refclk becoming stable and PERST# being deasserted (Thomas Richard)
- Add j721e suspend and resume support (Théo Lebrun)
TI Keystone PCIe controller driver:
- Fix NULL pointer checking when applying MRRS limitation quirk for
AM65x SR 1.0 Errata #i2037 (Dan Carpenter)
Xilinx NWL PCIe controller driver:
- Fix off-by-one error in INTx IRQ handler that caused INTx
interrupts to be lost or delivered as the wrong interrupt (Sean
Anderson)
- Rate-limit misc interrupt messages (Sean Anderson)
- Turn off the clock on probe failure and device removal (Sean
Anderson)
- Add DT binding and driver support for enabling/disabling PHYs (Sean
Anderson)
- Add PCIe phy bindings for the ZCU102 (Sean Anderson)
Xilinx XDMA PCIe controller driver:
- Add support for Xilinx QDMA Soft IP PCIe Root Port Bridge to DT
binding and xilinx-dma-pl driver (Thippeswamy Havalige)
Miscellaneous:
- Fix buffer overflow in kirin_pcie_parse_port() (Alexandra Diupina)
- Fix minor kerneldoc issues and typos (Bjorn Helgaas)
- Use PCI_DEVID() macro in aer_inject() instead of open-coding it
(Jinjie Ruan)
- Check pcie_find_root_port() return in x86 fixups to avoid NULL
pointer dereferences (Samasth Norway Ananda)
- Make pci_bus_type constant (Kunwu Chan)
- Remove unused declarations of __pci_pme_wakeup() and
pci_vpd_release() (Yue Haibing)
- Remove any leftover .*.cmd files with make clean (zhang jiao)
- Remove unused BILLION macro (zhang jiao)"
* tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (132 commits)
PCI: Fix typos
dt-bindings: PCI: qcom: Allow 'vddpe-3v3-supply' again
tools: PCI: Remove unused BILLION macro
tools: PCI: Remove .*.cmd files with make clean
PCI: Pass domain number to pci_bus_release_domain_nr() explicitly
PCI: dra7xx: Fix error handling when IRQ request fails in probe
PCI: dra7xx: Fix threaded IRQ request for "dra7xx-pcie-main" IRQ
PCI: qcom: Add RX lane margining settings for 16.0 GT/s
PCI: qcom: Add equalization settings for 16.0 GT/s
PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed
PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed'
PCI: qcom-ep: Enable controller resources like PHY only after refclk is available
PCI: Mark Creative Labs EMU20k2 INTx masking as broken
dt-bindings: PCI: imx6q-pcie: Add reg-name "dbi2" and "atu" for i.MX8M PCIe Endpoint
dt-bindings: PCI: altera: msi: Convert to YAML
PCI: imx6: Add i.MX8Q PCIe Root Complex (RC) support
PCI: Rename CRS Completion Status to RRS
PCI: aardvark: Correct Configuration RRS checking
PCI: Wait for device readiness with Configuration RRS
PCI: brcmstb: Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings
...
|
|
- Drop endpoint redundant masking of global IRQ events (Manivannan
Sadhasivam)
- Clarify unknown global IRQ message and only log it once to avoid a flood
(Manivannan Sadhasivam)
- Add Manivannan Sadhasivam as maintainer of qcom endpoint driver
(Manivannan Sadhasivam)
- Add 'linux,pci-domain' property to endpoint DT binding (Manivannan
Sadhasivam)
- Assign PCI domain number for endpoint controllers (Manivannan Sadhasivam)
- Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for endpoint
controller (Manivannan Sadhasivam)
- Add global SPI interrupt for PCIe link events to DT binding (Manivannan
Sadhasivam)
- Add global RC interrupt handler to handle 'Link up' events and
automatically enumerate hot-added devices (Manivannan Sadhasivam)
- Avoid mirroring of DBI and iATU register space so it doesn't overlap BAR
MMIO space (Prudhvi Yarlagadda)
- Enable controller resources like PHY only after PERST# is deasserted to
partially avoid the problem that the endpoint SoC crashes when accessing
things when Refclk is absent (Manivannan Sadhasivam)
- Rename dw_pcie.link_gen to max_link_speed to avoid ambiguity (Manivannan
Sadhasivam)
- Cache maximum link speed value in dw_pcie.max_link_speed for use by
vendor drivers (Manivannan Sadhasivam)
- Add 16.0 GT/s equalization and RX lane margining settings (Shashank Babu
Chinta Venkata)
- Pass domain number to pci_bus_release_domain_nr() explicitly to avoid a
NULL pointer dereference (Manivannan Sadhasivam)
* pci/controller/qcom:
PCI: Pass domain number to pci_bus_release_domain_nr() explicitly
PCI: qcom: Add RX lane margining settings for 16.0 GT/s
PCI: qcom: Add equalization settings for 16.0 GT/s
PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed
PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed'
PCI: qcom-ep: Enable controller resources like PHY only after refclk is available
PCI: qcom: Disable mirroring of DBI and iATU register space in BAR region
PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt
dt-bindings: PCI: qcom,pcie-sm8450: Add 'global' interrupt
PCI: qcom-ep: Modify 'global_irq' and 'perst_irq' IRQ device names
PCI: endpoint: Assign PCI domain number for endpoint controllers
dt-bindings: PCI: pci-ep: Document 'linux,pci-domain' property
dt-bindings: PCI: pci-ep: Update Maintainers
PCI: qcom-ep: Reword the error message for receiving unknown global IRQ event
PCI: qcom-ep: Drop the redundant masking of global IRQ events
|
|
- Wait for each level of downstream bus, not just the first, to become
accessible before restoring devices on that bus (Ilpo Järvinen)
* pci/reset:
PCI: Wait for Link before restoring Downstream Buses
|
|
- Clear LBMS bit after a manual link retrain so we don't try to retrain a
link when there's no downstream device anymore (Maciej W. Rozycki)
- Revert to the original link speed after retraining fails instead of
leaving it restricted to 2.5GT/s, so a future device has a chance to use
higher speeds (Maciej W. Rozycki)
- Correct interpretation of pcie_retrain_link() return status and update it
to return 0/errno instead of true/false (Maciej W. Rozycki)
* pci/enumeration:
PCI: Use an error code with PCIe failed link retraining
PCI: Correct error reporting with PCIe failed link retraining
PCI: Revert to the original speed after PCIe failed link retraining
PCI: Clear the LBMS bit after a link retrain
|
|
The pci_bus_release_domain_nr() API is supposed to free the domain
number allocated by pci_bus_find_domain_nr(). Most of the callers of
pci_bus_find_domain_nr(), store the domain number in pci_bus::domain_nr.
As such, the pci_bus_release_domain_nr() implicitly frees the domain
number by dereferencing 'struct pci_bus'. However, one of the callers
of this API, the PCI endpoint subsystem, doesn't have 'struct pci_bus',
so it only passes NULL. Due to this, the API will end up dereferencing
the NULL pointer.
To fix this issue, pass the domain number to this API explicitly. Since
'struct pci_bus' is not used for anything else other than extracting the
domain number, it makes sense to pass the domain number directly.
Fixes: 0328947c5032 ("PCI: endpoint: Assign PCI domain number for endpoint controllers")
Closes: https://lore.kernel.org/linux-pci/c0c40ddb-bf64-4b22-9dd1-8dbb18aa2813@stanley.mountain
Link: https://lore.kernel.org/linux-pci/20240912053025.25314-1-manivannan.sadhasivam@linaro.org
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
PCIe r6.0 changed the abbreviation for "Configuration Request Retry Status"
Completion Status from "CRS" to "RRS" and uses the terminology of
"Configuration RRS Software Visibility" instead of "CRS Software
Visibility".
Align the Linux usage with the r6.0 spec language. No functional change
intended.
It's confusing to make this change, but I think "RRS" *is* a better
abbreviation because it was easy to interpret "CRS" as "Completion Retry
Status", which really didn't make any sense.
Link: https://lore.kernel.org/r/20240827234848.4429-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
After a device reset, delays are required before the device can
successfully complete config accesses. PCIe r6.0, sec 6.6, specifies some
delays required before software can perform config accesses. Devices that
require more time after those delays may respond to config accesses with
Configuration Request Retry Status (RRS) completions.
Callers of pci_dev_wait() are responsible for delays until the device can
respond to config accesses. pci_dev_wait() waits any additional time until
the device can successfully complete config accesses.
Reading config space of devices that are not present or not ready typically
returns ~0 (PCI_ERROR_RESPONSE). Previously we polled the Command register
until we got a value other than ~0. This is sometimes a problem because
Root Complex handling of RRS completions may include several retries and
implementation-specific behavior that is invisible to software (see sec
2.3.2), so the exponential backoff in pci_dev_wait() may not work as
intended.
Linux enables Configuration RRS Software Visibility on all Root Ports that
support it. If it is enabled, read the Vendor ID instead of the Command
register. RRS completions cause immediate return of the 0x0001 reserved
Vendor ID value, so the pci_dev_wait() backoff works correctly.
When a read of Vendor ID eventually completes successfully by returning a
non-0x0001 value (the Vendor ID or 0xffff for VFs), the device should be
initialized and ready to respond to config requests.
For conventional PCI devices or devices below Root Ports that don't support
Configuration RRS Software Visibility, poll the Command register as before.
This was developed independently, but is very similar to Stanislav
Spassov's previous work at
https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com
Link: https://lore.kernel.org/r/20240827234848.4429-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Duc Dang <ducdang@google.com>
|
|
Given how the call place in pcie_wait_for_link_delay() got structured now,
and that pcie_retrain_link() returns a potentially useful error code,
convert pcie_failed_link_retrain() to return an error code rather than a
boolean status, fixing handling at the call site mentioned. Update the
other call site accordingly.
Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091156530.61955@angie.orcam.me.uk
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.5+
|
|
The LBMS bit, where implemented, is set by hardware either in response
to the completion of retraining caused by writing 1 to the Retrain Link
bit or whenever hardware has changed the link speed or width in attempt
to correct unreliable link operation. It is never cleared by hardware
other than by software writing 1 to the bit position in the Link Status
register and we never do such a write.
We currently have two places, namely apply_bad_link_workaround() and
pcie_failed_link_retrain() in drivers/pci/controller/dwc/pcie-tegra194.c
and drivers/pci/quirks.c respectively where we check the state of the LBMS
bit and neither is interested in the state of the bit resulting from the
completion of retraining, both check for a link fault.
And in particular pcie_failed_link_retrain() causes issues consequently, by
trying to retrain a link where there's no downstream device anymore and the
state of 1 in the LBMS bit has been retained from when there was a device
downstream that has since been removed.
Clear the LBMS bit then at the conclusion of pcie_retrain_link(), so that
we have a single place that controls it and that our code can track link
speed or width changes resulting from unreliable link operation.
Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091133140.61955@angie.orcam.me.uk
Reported-by: Matthew W Carlis <mattc@purestorage.com>
Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Cc: <stable@vger.kernel.org> # v6.5+
|
|
__pci_reset_bus() calls pci_bridge_secondary_bus_reset() to perform the
reset and also waits for the Secondary Bus to become again accessible.
__pci_reset_bus() then calls pci_bus_restore_locked() that restores the PCI
devices connected to the bus, and if necessary, recursively restores also
the subordinate buses and their devices.
The logic in pci_bus_restore_locked() does not take into account that after
restoring a device on one level, there might be another Link Downstream
that can only start to come up after restore has been performed for its
Downstream Port device. That is, the Link may require additional wait until
it becomes accessible.
Similarly, pci_slot_restore_locked() lacks wait.
Amend pci_bus_restore_locked() and pci_slot_restore_locked() to wait for
the Secondary Bus before recursively performing the restore of that bus.
Fixes: 090a3c5322e9 ("PCI: Add pci_reset_slot() and pci_reset_bus()")
Link: https://lore.kernel.org/r/20240808121708.2523-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_intx() becomes managed if pcim_enable_device() has been called in
advance. Commit 25216afc9db5 ("PCI: Add managed pcim_intx()") changed this
behavior so that pci_intx() always leads to creation of a separate device
resource for itself, whereas earlier, a shared resource was used for all
PCI devres operations.
Unfortunately, pci_intx() seems to be used in some drivers' remove() paths;
in the managed case this causes a device resource to be created on driver
detach, which causes .probe() to fail if the driver is reloaded:
pci 0000:00:1f.2: Resources present before probing
Fix the regression by only redirecting pci_intx() to its managed twin
pcim_intx() if the pci_command changes.
Link: https://lore.kernel.org/r/20240725120729.59788-2-pstanner@redhat.com
Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()")
Reported-by: Damien Le Moal <dlemoal@kernel.org>
Closes: https://lore.kernel.org/all/b8f4ba97-84fc-4b7e-ba1a-99de2d9f0118@kernel.org/
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[bhelgaas: add error message to commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Define PCIE_RESET_CONFIG_DEVICE_WAIT_MS for the generic 100ms
required after reset before config access (Kevin Xie)
- Define PCIE_T_RRS_READY_MS for the generic 100ms required after
reset before config access (probably should be unified with
PCIE_RESET_CONFIG_DEVICE_WAIT_MS) (Damien Le Moal)
Resource management:
- Rename find_resource() to find_resource_space() to be more
descriptive (Ilpo Järvinen)
- Export find_resource_space() for use by PCI core, which needs to
learn whether there is available space for a bridge window (Ilpo
Järvinen)
- Prevent double counting of resources so window size doesn't grow on
each remove/rescan cycle (Ilpo Järvinen)
- Relax bridge window sizing algorithm so a device doesn't break
simply because it was removed and rescanned (Ilpo Järvinen)
- Evaluate the ACPI PRESERVE_BOOT_CONFIG _DSM in
pci_register_host_bridge() (not acpi_pci_root_create()) so we can
unify it with similar DT functionality (Vidya Sagar)
- Extend use of DT "linux,pci-probe-only" property so it works
per-host bridge as well as globally (Vidya Sagar)
- Unify support for ACPI PRESERVE_BOOT_CONFIG _DSM and the DT
"linux,pci-probe-only" property in pci_preserve_config() (Vidya
Sagar)
Driver binding:
- Add devres infrastructure for managed request and map of partial
BAR resources (Philipp Stanner)
- Deprecate pcim_iomap_table() because uses like
"pcim_iomap_table()[0]" have no good way to return errors (Philipp
Stanner)
- Add an always-managed pcim_request_region() for use instead of
pci_request_region() and similar, which are sometimes managed
depending on whether pcim_enable_device() has been called
previously (Philipp Stanner)
- Reimplement pcim_set_mwi() so it doesn't need to keep store MWI
state (Philipp Stanner)
- Add pcim_intx() for use instead of pci_intx(), which is sometimes
managed depending on whether pcim_enable_device() has been called
previously (Philipp Stanner)
- Add managed pcim_iomap_range() to allow mapping of a partial BAR
(Philipp Stanner)
- Fix a devres mapping leak in drm/vboxvideo (Philipp Stanner)
Error handling:
- Add missing bridge locking in device reset path and add a warning
for other possible lock issues (Dan Williams)
- Fix use-after-free on concurrent DPC and hot-removal (Lukas Wunner)
Power management:
- Disable AER and DPC during suspend to avoid spurious wakeups if
they share an interrupt with PME (Kai-Heng Feng)
PCIe native device hotplug:
- Detect if a device was removed or replaced during system sleep so
we don't assume a new device is the one that used to be there
(Lukas Wunner)
Virtualization:
- Add an ACS quirk for Broadcom BCM5760X multi-function NIC; it
prevents transactions between functions even though it doesn't
advertise ACS, so the functions can be attached individually via
VFIO (Ajit Khaparde)
Peer-to-peer DMA:
- Add a "pci=config_acs=" kernel command-line parameter to relax
default ACS settings to enable additional peer-to-peer
configurations. Requires expert knowledge of topology and ACS
operation (Vidya Sagar)
Endpoint framework:
- Remove unused struct pci_epf_group.type_group (Christophe JAILLET)
- Fix error handling in vpci_scan_bus() and epf_ntb_epc_cleanup()
(Dan Carpenter)
- Make struct pci_epc_class constant (Greg Kroah-Hartman)
- Remove unused pci_endpoint_test_bar_{readl,writel} functions
(Jiapeng Chong)
- Rename "BME" to "Bus Master Enable" (Manivannan Sadhasivam)
- Rename struct pci_epc_event_ops.core_init() callback to epc_init()
(Manivannan Sadhasivam)
- Move DMA init to MHI .epc_init() callback for uniformity
(Manivannan Sadhasivam)
- Cancel EPF test delayed work when link goes down (Manivannan
Sadhasivam)
- Add struct pci_epc_event_ops.epc_deinit() callback for cleanup
needed on fundamental reset (Manivannan Sadhasivam)
- Add 64KB alignment to endpoint test to support Rockchip rk3588
(Niklas Cassel)
- Optimize endpoint test by using memcpy() instead of readl() (Niklas
Cassel)
Device tree bindings:
- Add generic "ats-supported" property to advertise that a PCIe Root
Complex supports ATS (Jean-Philippe Brucker)
Amazon Annapurna Labs PCIe controller driver:
- Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
(Aleksandr Mishin)
Axis ARTPEC-6 PCIe controller driver:
- Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
address, not a CPU address (Niklas Cassel)
Freescale i.MX6 PCIe controller driver:
- Convert to agnostic GPIO API (Andy Shevchenko)
Freescale Layerscape PCIe controller driver:
- Make struct mobiveil_rp_ops constant (Christophe JAILLET)
- Use new generic dw_pcie_ep_linkdown() to handle link-down events
(Manivannan Sadhasivam)
HiSilicon Kirin PCIe controller driver:
- Convert to agnostic GPIO API (Andy Shevchenko)
- Use _scoped() iterator for OF children to ensure refcounts are
decremented at loop exit (Javier Carrasco)
Intel VMD host bridge driver:
- Create sysfs "domain" symlink before downstream devices are exposed
to userspace by pci_bus_add_devices() (Jiwei Sun)
Loongson PCIe controller driver:
- Enable MSI when LS7A is used with new CPUs that have integrated
PCIe Root Complex, e.g., Loongson-3C6000, so downstream devices can
use MSI (Huacai Chen)
Microchip AXI PolarFlare PCIe controller driver:
- Move pcie-microchip-host.c to a new PLDA directory (Minda Chen)
- Factor PLDA generic items out to a common
plda,xpressrich3-axi-common.yaml binding (Minda Chen)
- Factor PLDA generic data structures and code out to shared
pcie-plda.h, pcie-plda-host.c (Minda Chen)
- Add PLDA generic interrupt handling with a .request_event_irq()
callback for vendor-specific events (Minda Chen)
- Add PLDA generic host init/deinit and map bus functions for use by
vendor-specific drivers (Minda Chen)
- Rework to use PLDA core (Minda Chen)
Microsoft Hyper-V host bridge driver:
- Return zero, not garbage, when reading PCI_INTERRUPT_PIN (Wei Liu)
NVIDIA Tegra194 PCIe controller driver:
- Remove unused struct tegra_pcie_soc (Dr. David Alan Gilbert)
- Set 64KB inbound ATU alignment restriction (Jon Hunter)
Qualcomm PCIe controller driver:
- Make the MHI reg region mandatory for X1E80100, since all PCIe
controllers have it (Abel Vesa)
- Prevent use of uninitialized data and possible error pointer
dereference (Dan Carpenter)
- Return error, not success, if dev_pm_opp_find_freq_floor() fails
(Dan Carpenter)
- Add Operating Performance Points (OPP) support to scale performance
state based on aggregate link bandwidth to improve SoC power
efficiency (Krishna chaitanya chundru)
- Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays
active even if other drivers don't vote for it (Krishna chaitanya
chundru)
- Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
writing out all the clock names (Manivannan Sadhasivam)
- Add DT binding and driver support for the SA8775P SoC (Mrinmay
Sarkar)
- Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)
- Override the SA8775P NO_SNOOP default to avoid possible memory
corruption (Mrinmay Sarkar)
- Make sure resources are disabled during PERST# assertion, even if
the link is already disabled (Manivannan Sadhasivam)
- Use new generic dw_pcie_ep_linkdown() to handle link-down events
(Manivannan Sadhasivam)
- Add DT and endpoint driver support for the SA8775P SoC (Mrinmay
Sarkar)
- Add Hyper DMA (HDMA) support for the SA8775P SoC and enable it in
the EPF MHI driver (Mrinmay Sarkar)
- Set PCIE_PARF_NO_SNOOP_OVERIDE to override the default NO_SNOOP
attribute on the SA8775P SoC (both Root Complex and Endpoint mode)
to avoid possible memory corruption (Mrinmay Sarkar)
Renesas R-Car PCIe controller driver:
- Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup() to
avoid unnecessary backtrace (Marek Vasut)
- Add DT and driver support for R-Car V4H (R8A779G0) host and
endpoint. This requires separate proprietary firmware (Yoshihiro
Shimoda)
Rockchip PCIe controller driver:
- Assert PERST# for 100ms after power is stable (Damien Le Moal)
- Wait PCIE_T_RRS_READY_MS (100ms) after reset before starting
configuration (Damien Le Moal)
- Use GPIOD_OUT_LOW flag while requesting ep_gpio to fix a firmware
crash on Qcom-based modems with Rockpro64 board (Manivannan
Sadhasivam)
Rockchip DesignWare PCIe controller driver:
- Factor common parts of rockchip-dw-pcie DT binding to be shared by
Root Complex and Endpoint mode (Niklas Cassel)
- Add missing INTx signals to common DT binding (Niklas Cassel)
- Add eDMA items to DT binding for Endpoint controller (Niklas
Cassel)
- Fix initial dw-rockchip PERST# GPIO value to prevent unnecessary
short assert/deassert that causes issues with some WLAN controllers
(Niklas Cassel)
- Refactor dw-rockchip and add support for Endpoint mode (Niklas
Cassel)
- Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
wrapper (Niklas Cassel)
- Add error messages in .probe() error paths to improve user
experience (Uwe Kleine-König)
Samsung Exynos PCIe controller driver:
- Use bulk clock APIs to simplify clock setup (Shradha Todi)
StarFive PCIe controller driver:
- Add DT binding and driver support for the StarFive JH7110
PLDA-based PCIe controller (Minda Chen)
Synopsys DesignWare PCIe controller driver:
- Add generic support for sending PME_Turn_Off when system suspends
(Frank Li)
- Fix incorrect interpretation of iATU slot 0 after PERST#
assert/deassert (Frank Li)
- Use msleep() instead of usleep_range() while waiting for link
(Konrad Dybcio)
- Refactor dw_pcie_edma_find_chip() to enable adding support for
Hyper DMA (HDMA) (Manivannan Sadhasivam)
- Enable drivers to supply the eDMA channel count since some can't
auto detect this (Manivannan Sadhasivam)
- Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
wrapper (Manivannan Sadhasivam)
- Pass the eDMA mapping format directly from drivers instead of
maintaining a capability for it (Manivannan Sadhasivam)
- Add generic dw_pcie_ep_linkdown() to notify EPF drivers about
link-down events and restore non-sticky DWC registers lost on link
down (Manivannan Sadhasivam)
- Add vendor-specific "apb" reg name, interrupt names, INTx names to
generic binding (Niklas Cassel)
- Enforce DWC restriction that 64-bit BARs must start with an
even-numbered BAR (Niklas Cassel)
- Consolidate args of dw_pcie_prog_outbound_atu() into a structure
(Yoshihiro Shimoda)
- Add support for endpoints to send Message TLPs, e.g., for INTx
emulation (Yoshihiro Shimoda)
TI DRA7xx PCIe controller driver:
- Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
address, not a CPU address (Niklas Cassel)
TI Keystone PCIe controller driver:
- Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
(Aleksandr Mishin)
- Work around AM65x/DRA80xM Errata #i2037 that corrupts TLPs and
causes processor hangs by limiting Max_Read_Request_Size (MRRS) and
Max_Payload_Size (MPS) (Kishon Vijay Abraham I)
- Leave BAR 0 disabled for AM654x to fix a regression caused by
6ab15b5e7057 ("PCI: dwc: keystone: Convert .scan_bus() callback to
use add_bus"), which caused a 45-second boot delay (Siddharth
Vadapalli)
Xilinx Versal CPM PCIe controller driver:
- Fix overlapping bridge registers and 32-bit BAR addresses in DT
binding (Thippeswamy Havalige)
MicroSemi Switchtec management driver:
- Make struct switchtec_class constant (Greg Kroah-Hartman)
Miscellaneous:
- Remove unused struct acpi_handle_node (Dr. David Alan Gilbert)
- Add missing MODULE_DESCRIPTION() macros (Jeff Johnson)"
* tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (154 commits)
PCI: loongson: Enable MSI in LS7A Root Complex
PCI: Extend ACS configurability
PCI: Add missing bridge lock to pci_bus_lock()
drm/vboxvideo: fix mapping leaks
PCI: Add managed pcim_iomap_range()
PCI: Remove legacy pcim_release()
PCI: Add managed pcim_intx()
PCI: vmd: Create domain symlink before pci_bus_add_devices()
PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
PCI: qcom: Prevent potential error pointer dereference
PCI: qcom: Fix missing error code in qcom_pcie_probe()
PCI: Give pcim_set_mwi() its own devres cleanup callback
PCI: Move struct pci_devres.pinned bit to struct pci_dev
PCI: Remove struct pci_devres.enabled status bit
PCI: Document hybrid devres hazards
PCI: Add managed pcim_request_region()
PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
PCI: Add managed partial-BAR request and map infrastructure
PCI: Add devres helpers for iomap table
PCI: Add and use devres helper for bit masks
...
|
|
- Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
writing out all the clock names (Manivannan Sadhasivam)
- Add DT binding and driver support for the SA8775P SoC (Mrinmay Sarkar)
- Refactor dw_pcie_edma_find_chip() to enable adding support for Hyper DMA
(HDMA) (Manivannan Sadhasivam)
- Enable drivers to supply the eDMA channel count since some can't auto
detect this (Manivannan Sadhasivam)
- Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)
- Override the SA8775P NO_SNOOP default to avoid possible memory corruption
(Mrinmay Sarkar)
- Make sure resources are disabled during PERST# assertion, even if the
link is already disabled (Manivannan Sadhasivam)
- Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays active
even if other drivers don't vote for it (Krishna chaitanya chundru)
- Add Operating Performance Points (OPP) to scale performance state based
on aggregate link bandwidth to improve SoC power efficiency (Krishna
chaitanya chundru)
- Return failure instead of success if dev_pm_opp_find_freq_floor() fails
(Dan Carpenter)
- Avoid an error pointer dereference if dev_pm_opp_find_freq_exact() fails
(Dan Carpenter)
- Prevent use of uninitialized data in qcom_pcie_suspend_noirq() (Dan
Carpenter)
* pci/controller/qcom:
PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
PCI: qcom: Prevent potential error pointer dereference
PCI: qcom: Fix missing error code in qcom_pcie_probe()
PCI: qcom: Add OPP support to scale performance
PCI: Bring the PCIe speed to MBps logic to new pcie_dev_speed_mbps()
PCI: qcom: Add ICC bandwidth vote for CPU to PCIe path
PCI: qcom-ep: Disable resources unconditionally during PERST# assert
PCI: qcom-ep: Override NO_SNOOP attribute for SA8775P EP
PCI: qcom: Override NO_SNOOP attribute for SA8775P RC
PCI: epf-mhi: Enable HDMA for SA8775P SoC
PCI: qcom-ep: Add HDMA support for SA8775P SoC
PCI: dwc: Pass the eDMA mapping format flag directly from glue drivers
PCI: dwc: Skip finding eDMA channels count for HDMA platforms
PCI: dwc: Refactor dw_pcie_edma_find_chip() API
PCI: qcom-ep: Add support for SA8775P SOC
dt-bindings: PCI: qcom-ep: Add support for SA8775P SoC
PCI: qcom: Use devm_clk_bulk_get_all() API
|
|
- Warn about doing a Secondary Bus Reset without holding the device lock
(Dan Williams)
- Lock bridge in addition to downstream hierarchy before doing a Secondary
Bus Reset (Dan Williams)
* pci/reset:
PCI: Add missing bridge lock to pci_bus_lock()
PCI: Warn on missing cfg_access_lock during secondary bus reset
|
|
- If there's a device below a bridge, prevent a use-after-free by holding a
reference to the device while waiting for the secondary bus to be ready
in case the device is concurrently removed, e.g., by DPC (Lukas Wunner)
* pci/dpc:
PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
|
|
- Add pcim_add_mapping_to_legacy_table() and
pcim_remove_mapping_from_legacy_table() helper functions to simplify
devres iomap table (Philipp Stanner)
- Reimplement devres that take a bit mask of BARs in a way that can be used
to map partial BARs as well as entire BARs (Philipp Stanner)
- Deprecate pcim_iomap_table() and pcim_iomap_regions_request_all() in
favor of pcim_* request plus pcim_* mapping (Philipp Stanner)
- Add pcim_request_region(), a managed interface to request a single BAR
(Philipp Stanner)
- Use the existing pci_is_enabled() interface to replace the struct
devres.enabled bit (Philipp Stanner)
- Move the struct pci_devres.pinned bit to struct pci_dev (Philipp Stanner)
- Reimplement pcim_set_mwi() so it uses its own devres cleanup callback
instead of a special-purpose bit in struct pci_devres (Philipp Stanner)
- Add pcim_intx(), which is unambiguously managed, unlike pci_intx(), which
is managed if pcim_enable_device() has been called but unmanaged
otherwise (Philipp Stanner)
- Remove pcim_release(), which is no longer needed after previous cleanups
of pcim_set_mwi() and pci_intx() (Philipp Stanner)
- Add pcim_iomap_range(), a managed interface to map part of a BAR (Philipp
Stanner)
- Fix vboxvideo leak by using the new pcim_iomap_range() instead of the
unmanaged pci_iomap_range() (Philipp Stanner)
* pci/devres:
drm/vboxvideo: fix mapping leaks
PCI: Add managed pcim_iomap_range()
PCI: Remove legacy pcim_release()
PCI: Add managed pcim_intx()
PCI: Give pcim_set_mwi() its own devres cleanup callback
PCI: Move struct pci_devres.pinned bit to struct pci_dev
PCI: Remove struct pci_devres.enabled status bit
PCI: Document hybrid devres hazards
PCI: Add managed pcim_request_region()
PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
PCI: Add managed partial-BAR request and map infrastructure
PCI: Add devres helpers for iomap table
PCI: Add and use devres helper for bit masks
|
|
PCIe ACS settings control the level of isolation and the possible P2P paths
between devices. With greater isolation the kernel will create smaller
iommu_groups and with less isolation there is more HW that can achieve P2P
transfers. From a virtualization perspective all devices in the same
iommu_group must be assigned to the same VM as they lack security
isolation.
There is no way for the kernel to automatically know the correct ACS
settings for any given system and workload. Existing command line options
(e.g., disable_acs_redir) allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.
Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration. The
definition is future proof; when new ACS bits are added to the spec the
open syntax can be extended.
ACS needs to be setup early in the kernel boot as the ACS settings affect
how iommu_groups are formed. iommu_group formation is a one time event
during initial device discovery, so changing ACS bits after kernel boot can
result in an inaccurate view of the iommu_groups compared to the current
isolation configuration.
ACS applies to PCIe Downstream Ports and multi-function devices. The
default ACS settings are strict and deny any direct traffic between two
functions. This results in the smallest iommu_group the HW can support.
Frequently these values result in slow or non-working P2PDMA.
ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
- Full prevention
- Translated requests can be direct, with various options
- Asymmetric direct traffic, A can reach B but not the reverse
- All traffic can be direct
Along with some other less common ones for special topologies.
The intention is that this option would be used with expert knowledge of
the HW capability and workload to achieve the desired configuration.
Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
[bhelgaas: add example, tidy printk formats]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
One of the true positives that the cfg_access_lock lockdep effort
identified is this sequence:
WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70
RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70
Call Trace:
<TASK>
? __warn+0x8c/0x190
? pci_bridge_secondary_bus_reset+0x5d/0x70
? report_bug+0x1f8/0x200
? handle_bug+0x3c/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? pci_bridge_secondary_bus_reset+0x5d/0x70
pci_reset_bus+0x1d8/0x270
vmd_probe+0x778/0xa10
pci_device_probe+0x95/0x120
Where pci_reset_bus() users are triggering unlocked secondary bus resets.
Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses
pci_bus_lock() before issuing the reset which locks everything *but* the
bridge itself.
For the same motivation as adding:
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_dev_lock(bridge);
to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add
pci_dev_lock() for @bus->self to pci_bus_lock().
Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com
Reported-by: Imre Deak <imre.deak@intel.com>
Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: squash in recursive locking deadlock fix from Keith Busch:
https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Kalle Valo <kvalo@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
|
|
pci_intx() is a "hybrid" function, i.e., it is managed if
pcim_enable_device() has been called, but unmanaged otherwise.
Add pcim_intx(), which is always managed, and implement pci_intx() using
it.
Remove the now-unused struct pci_devres.orig_intx and .restore_intx and
find_pci_dr().
Link: https://lore.kernel.org/r/20240613115032.29098-11-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[kwilczynski: squashed in
https://lore.kernel.org/r/426645d40776198e0fcc942f4a6cac4433c7a9aa.camel@redhat.com
to fix problem reported and tested by Ashish Kalra <Ashish.Kalra@amd.com>:
https://lore.kernel.org/r/20240708214656.4721-1-Ashish.Kalra@amd.com
https://lore.kernel.org/r/8c4634e9-4f02-4c54-9c89-d75e2f4bf026@amd.com/]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The struct pci_devres has a separate boolean to track whether a device is
enabled. That, however, can easily be tracked in an agnostic manner through
the function pci_is_enabled().
Using it allows for simplifying the PCI devres implementation.
Replace the separate 'enabled' status bit from struct pci_devres with
calls to pci_is_enabled() at the appropriate places.
Link: https://lore.kernel.org/r/20240613115032.29098-8-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
These functions:
pci_request_region()
pci_request_regions()
pci_request_regions_exclusive()
pci_request_selected_regions()
pci_request_selected_regions_exclusive()
pci_intx()
are "hybrid" functions that are managed if pcim_enable_device() has been
called, but unmanaged otherwise.
This is confusing and has already caused a bug (in 8558de401b5f
("drm/vboxvideo: use managed pci functions")) because users believe all PCI
functions, such as pci_iomap_range(), can become managed that way, which is
not the case.
Add comments to the relevant functions' docstrings that warn users about
this behavior.
Link: https://lore.kernel.org/r/20240613115032.29098-7-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|