diff options
Diffstat (limited to 'Documentation/PCI')
-rw-r--r-- | Documentation/PCI/endpoint/pci-vntb-howto.rst | 9 | ||||
-rw-r--r-- | Documentation/PCI/pci-error-recovery.rst | 43 | ||||
-rw-r--r-- | Documentation/PCI/pcieaer-howto.rst | 85 |
3 files changed, 81 insertions, 56 deletions
diff --git a/Documentation/PCI/endpoint/pci-vntb-howto.rst b/Documentation/PCI/endpoint/pci-vntb-howto.rst index 70d3bc90893f..9a7a2f0a6849 100644 --- a/Documentation/PCI/endpoint/pci-vntb-howto.rst +++ b/Documentation/PCI/endpoint/pci-vntb-howto.rst @@ -90,8 +90,9 @@ of the function device and is populated with the following NTB specific attributes that can be configured by the user:: # ls functions/pci_epf_vntb/func1/pci_epf_vntb.0/ - db_count mw1 mw2 mw3 mw4 num_mws - spad_count + ctrl_bar db_count mw1_bar mw2_bar mw3_bar mw4_bar spad_count + db_bar mw1 mw2 mw3 mw4 num_mws vbus_number + vntb_vid vntb_pid A sample configuration for NTB function is given below:: @@ -100,6 +101,10 @@ A sample configuration for NTB function is given below:: # echo 1 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws # echo 0x100000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1 +By default, each construct is assigned a BAR, as needed and in order. +Should a specific BAR setup be required by the platform, BAR may be assigned +to each construct using the related ``XYZ_bar`` entry. + A sample configuration for virtual NTB driver for virtual PCI bus:: # echo 0x1957 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst index 42e1e78353f3..5df481ac6193 100644 --- a/Documentation/PCI/pci-error-recovery.rst +++ b/Documentation/PCI/pci-error-recovery.rst @@ -13,7 +13,7 @@ PCI Error Recovery Many PCI bus controllers are able to detect a variety of hardware PCI errors on the bus, such as parity errors on the data and address buses, as well as SERR and PERR errors. Some of the more advanced -chipsets are able to deal with these errors; these include PCI-E chipsets, +chipsets are able to deal with these errors; these include PCIe chipsets, and the PCI-host bridges found on IBM Power4, Power5 and Power6-based pSeries boxes. A typical action taken is to disconnect the affected device, halting all I/O to it. The goal of a disconnection is to avoid system @@ -108,8 +108,8 @@ A driver does not have to implement all of these callbacks; however, if it implements any, it must implement error_detected(). If a callback is not implemented, the corresponding feature is considered unsupported. For example, if mmio_enabled() and resume() aren't there, then it -is assumed that the driver is not doing any direct recovery and requires -a slot reset. Typically a driver will want to know about +is assumed that the driver does not need these callbacks +for recovery. Typically a driver will want to know about a slot_reset(). The actual steps taken by a platform to recover from a PCI error @@ -122,6 +122,10 @@ A PCI bus error is detected by the PCI hardware. On powerpc, the slot is isolated, in that all I/O is blocked: all reads return 0xffffffff, all writes are ignored. +Similarly, on platforms supporting Downstream Port Containment +(PCIe r7.0 sec 6.2.11), the link to the sub-hierarchy with the +faulting device is disabled. Any device in the sub-hierarchy +becomes inaccessible. STEP 1: Notification -------------------- @@ -141,6 +145,9 @@ shouldn't do any new IOs. Called in task context. This is sort of a All drivers participating in this system must implement this call. The driver must return one of the following result codes: + - PCI_ERS_RESULT_RECOVERED + Driver returns this if it thinks the device is usable despite + the error and does not need further intervention. - PCI_ERS_RESULT_CAN_RECOVER Driver returns this if it thinks it might be able to recover the HW by just banging IOs or if it wants to be given @@ -199,7 +206,25 @@ reset or some such, but not restart operations. This callback is made if all drivers on a segment agree that they can try to recover and if no automatic link reset was performed by the HW. If the platform can't just re-enable IOs without a slot reset or a link reset, it will not call this callback, and -instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) +instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset). + +.. note:: + + On platforms supporting Advanced Error Reporting (PCIe r7.0 sec 6.2), + the faulting device may already be accessible in STEP 1 (Notification). + Drivers should nevertheless defer accesses to STEP 2 (MMIO Enabled) + to be compatible with EEH on powerpc and with s390 (where devices are + inaccessible until STEP 2). + + On platforms supporting Downstream Port Containment, the link to the + sub-hierarchy with the faulting device is re-enabled in STEP 3 (Link + Reset). Hence devices in the sub-hierarchy are inaccessible until + STEP 4 (Slot Reset). + + For errors such as Surprise Down (PCIe r7.0 sec 6.2.7), the device + may not even be accessible in STEP 4 (Slot Reset). Drivers can detect + accessibility by checking whether reads from the device return all 1's + (PCI_POSSIBLE_ERROR()). .. note:: @@ -234,14 +259,14 @@ The driver should return one of the following result codes: The next step taken depends on the results returned by the drivers. If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform -proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). +proceeds to either STEP 3 (Link Reset) or to STEP 5 (Resume Operations). If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform proceeds to STEP 4 (Slot Reset) STEP 3: Link Reset ------------------ -The platform resets the link. This is a PCI-Express specific step +The platform resets the link. This is a PCIe specific step and is done whenever a fatal error has been detected that can be "solved" by resetting the link. @@ -263,13 +288,13 @@ that is equivalent to what it would be after a fresh system power-on followed by power-on BIOS/system firmware initialization. Soft reset is also known as hot-reset. -Powerpc fundamental reset is supported by PCI Express cards only +Powerpc fundamental reset is supported by PCIe cards only and results in device's state machines, hardware logic, port states and configuration registers to initialize to their default conditions. For most PCI devices, a soft reset will be sufficient for recovery. Optional fundamental reset is provided to support a limited number -of PCI Express devices for which a soft reset is not sufficient +of PCIe devices for which a soft reset is not sufficient for recovery. If the platform supports PCI hotplug, then the reset might be @@ -313,7 +338,7 @@ Result codes: - PCI_ERS_RESULT_DISCONNECT Same as above. -Drivers for PCI Express cards that require a fundamental reset must +Drivers for PCIe cards that require a fundamental reset must set the needs_freset bit in the pci_dev structure in their probe function. For example, the QLogic qla2xxx driver sets the needs_freset bit for certain PCI card types:: diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst index 4b71e2f43ca7..3210c4792978 100644 --- a/Documentation/PCI/pcieaer-howto.rst +++ b/Documentation/PCI/pcieaer-howto.rst @@ -70,16 +70,16 @@ AER error output ---------------- When a PCIe AER error is captured, an error message will be output to -console. If it's a correctable error, it is output as an info message. +console. If it's a correctable error, it is output as a warning message. Otherwise, it is printed as an error. So users could choose different log level to filter out correctable error messages. Below shows an example:: - 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) + 0000:50:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Requester ID) 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 - 0000:50:00.0: [20] Unsupported Request (First) - 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 + 0000:50:00.0: [20] UnsupReq (First) + 0000:50:00.0: TLP Header: 0x04000001 0x00200a03 0x05010000 0x00050100 In the example, 'Requester ID' means the ID of the device that sent the error message to the Root Port. Please refer to PCIe specs for other @@ -138,7 +138,7 @@ error message to the Root Port above it when it captures an error. The Root Port, upon receiving an error reporting message, internally processes and logs the error message in its AER Capability structure. Error information being logged includes storing -the error reporting agent's requestor ID into the Error Source +the error reporting agent's Requester ID into the Error Source Identification Registers and setting the error bits of the Root Error Status Register accordingly. If AER error reporting is enabled in the Root Error Command Register, the Root Port generates an interrupt when an @@ -152,18 +152,6 @@ the device driver. Provide callbacks ----------------- -callback reset_link to reset PCIe link -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This callback is used to reset the PCIe physical link when a -fatal error happens. The Root Port AER service driver provides a -default reset_link function, but different Upstream Ports might -have different specifications to reset the PCIe link, so -Upstream Port drivers may provide their own reset_link functions. - -Section 3.2.2.2 provides more detailed info on when to call -reset_link. - PCI error-recovery callbacks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -174,8 +162,8 @@ when performing error recovery actions. Data struct pci_driver has a pointer, err_handler, to point to pci_error_handlers who consists of a couple of callback function pointers. The AER driver follows the rules defined in -pci-error-recovery.rst except PCIe-specific parts (e.g. -reset_link). Please refer to pci-error-recovery.rst for detailed +pci-error-recovery.rst except PCIe-specific parts (see +below). Please refer to pci-error-recovery.rst for detailed definitions of the callbacks. The sections below specify when to call the error callback functions. @@ -189,10 +177,21 @@ software intervention or any loss of data. These errors do not require any recovery actions. The AER driver clears the device's correctable error status register accordingly and logs these errors. -Non-correctable (non-fatal and fatal) errors -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Uncorrectable (non-fatal and fatal) errors +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If an error message indicates a non-fatal error, performing link reset +The AER driver performs a Secondary Bus Reset to recover from +uncorrectable errors. The reset is applied at the port above +the originating device: If the originating device is an Endpoint, +only the Endpoint is reset. If on the other hand the originating +device has subordinate devices, those are all affected by the +reset as well. + +If the originating device is a Root Complex Integrated Endpoint, +there's no port above where a Secondary Bus Reset could be applied. +In this case, the AER driver instead applies a Function Level Reset. + +If an error message indicates a non-fatal error, performing a reset at upstream is not required. The AER driver calls error_detected(dev, pci_channel_io_normal) to all drivers associated within a hierarchy in question. For example:: @@ -204,38 +203,34 @@ Downstream Port B and Endpoint. A driver may return PCI_ERS_RESULT_CAN_RECOVER, PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on -whether it can recover or the AER driver calls mmio_enabled as next. +whether it can recover without a reset, considers the device unrecoverable +or needs a reset for recovery. If all affected drivers agree that they can +recover without a reset, it is skipped. Should one driver request a reset, +it overrides all other drivers. If an error message indicates a fatal error, kernel will broadcast error_detected(dev, pci_channel_io_frozen) to all drivers within -a hierarchy in question. Then, performing link reset at upstream is -necessary. As different kinds of devices might use different approaches -to reset link, AER port service driver is required to provide the -function to reset link via callback parameter of pcie_do_recovery() -function. If reset_link is not NULL, recovery function will use it -to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER -and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes -to mmio_enabled. - -Frequent Asked Questions ------------------------- +a hierarchy in question. Then, performing a reset at upstream is +necessary. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER +to indicate that recovery without a reset is possible, the error +handling goes to mmio_enabled, but afterwards a reset is still +performed. -Q: - What happens if a PCIe device driver does not provide an - error recovery handler (pci_driver->err_handler is equal to NULL)? +In other words, for non-fatal errors, drivers may opt in to a reset. +But for fatal errors, they cannot opt out of a reset, based on the +assumption that the link is unreliable. -A: - The devices attached with the driver won't be recovered. If the - error is fatal, kernel will print out warning messages. Please refer - to section 3 for more information. +Frequently Asked Questions +-------------------------- Q: - What happens if an upstream port service driver does not provide - callback reset_link? + What happens if a PCIe device driver does not provide an + error recovery handler (pci_driver->err_handler is equal to NULL)? A: - Fatal error recovery will fail if the errors are reported by the - upstream ports who are attached by the service driver. + The devices attached with the driver won't be recovered. + The kernel will print out informational messages to identify + unrecoverable devices. Software error injection |