summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/eeh.c
AgeCommit message (Collapse)Author
2017-09-26powerpc/powernv: Rework EEH initialization on powernvBenjamin Herrenschmidt
Remove the post_init callback which is only used by powernv, we can just call it explicitly from the powernv code. This partially kills the ability to "disable" eeh at runtime via debugfs as this was calling that same callback again, but this is both unused and broken in several ways. If we want to revive it, we need to create a dedicated enable/disable callback on the backend that does the right thing. Let the bulk of eeh initialize normally at core_initcall() like it does on pseries by removing the hack in eeh_init() that delays it. Instead we make sure our eeh->probe cleanly bails out of the PEs haven't been created yet and we force a re-probe where we used to call eeh_init() again. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-21powerpc/eeh: Create PHB PEs after EEH is initializedBenjamin Herrenschmidt
Otherwise we end up not yet having computed the right diag data size on powernv where EEH initialization is delayed, thus causing memory corruption later on when calling OPAL. Fixes: 5cb1f8fdddb7 ("powerpc/powernv/pci: Dynamically allocate PHB diag data") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/eeh: Reduce use of pci_dn::nodeAlexey Kardashevskiy
The pci_dn struct caches a OF device node pointer in order to access the "ibm,loc-code" property when EEH is recovering. However, when this happens in eeh_dev_check_failure(), we also have a pci_dev pointer which should have a valid pointer to the device node when pci_dn has one (both pointers are not NULL for physical functions and are NULL for virtual functions). This changes pci_remove_device_node_info() to look for a parent of the node being removed, just like pci_add_device_node_info() does when it references the parent node. This is the first step to get rid of pci_dn::node. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/eeh: Remove unnecessary pointer to phb from eeh_devAlexey Kardashevskiy
The eeh_dev struct already holds a pointer to pci_dn which it does not exist without and pci_dn itself holds the very same pointer so just use it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17powerpc/mm: Rename find_linux_pte_or_hugepte()Aneesh Kumar K.V
Add newer helpers to make the function usage simpler. It is always recommended to use find_current_mm_pte() for walking the page table. If we cannot use find_current_mm_pte(), it should be documented why the said usage of __find_linux_pte() is safe against a parallel THP split. For now we have KVM code using __find_linux_pte(). This is because kvm code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but with PACA soft_enabled = 1. We may want to fix that later and make sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that we can switch kvm to use find_linux_pte(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc: Create asm/debugfs.h and move powerpc_debugfs_root thereMichael Ellerman
powerpc_debugfs_root is the dentry representing the root of the "powerpc" directory tree in debugfs. Currently it sits in asm/debug.h, a long with some other things that have "debug" in the name, but are otherwise unrelated. Pull it out into a separate header, which also includes linux/debugfs.h, and convert all the users to include debugfs.h instead of debug.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-18powerpc/eeh: Enable IO path on permanent errorGavin Shan
We give up recovery on permanent error, simply shutdown the affected devices and remove them. If the devices can't be put into quiet state, they spew more traffic that is likely to cause another unexpected EEH error. This was observed on "p8dtu2u" machine: 0002:00:00.0 PCI bridge: IBM Device 03dc 0002:01:00.0 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.1 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.2 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.3 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) On P8 PowerNV platform, the IO path is frozen when shutdowning the devices, meaning the memory registers are inaccessible. It is why the devices can't be put into quiet state before removing them. This fixes the issue by enabling IO path prior to putting the devices into quiet state. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-22powerpc/eeh: Refactor EEH PE reset functionsRussell Currey
eeh_pe_reset and eeh_reset_pe are two different functions in the same file which do mostly the same thing. Not only is this confusing, but potentially causes disrepancies in functionality, notably eeh_reset_pe as it does not check return values for failure. Refactor this into the following: - eeh_pe_reset(): stays as is, performs a single operation, exported - eeh_pe_reset_full(): new, full reset process that calls eeh_pe_reset() - eeh_reset_pe(): removed and replaced by eeh_pe_reset_full() - eeh_reset_pe_once(): removed Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-22powerpc/pci: Always print PHB and PE numbers as hexadecimalRussell Currey
PHB, PE (and by association MVE) numbers are printed as a mix of decimal and hexadecimal throughout the kernel. This can be misleading, so make them all hexadecimal. Standardising on hex instead of dec because: - PHB numbers are presented in hex in sysfs/debugfs (and lspci, etc) - PE numbers are presented as hex in sysfs and parsed in hex in debugfs The only place I think this could cause confusing are the messages during boot, i.e. pci 000a:01 : [PE# 000] Secondary bus 1 associated with PE#0 which can be a quick way to check PE numbers. pe_level_printk() will only print two characters instead of three, so the above would be pci 000a:01 : [PE# 00] Secondary bus 1 associated with PE#0 which gives a hint it's in hex. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-10-04powerpc/eeh: Quieten EEH message when no adapters are foundAnton Blanchard
No real need for this to be pr_warn(), reduce it to pr_info(). Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-29powerpc/eeh: Export confirm_error_lockGavin Shan
This exports @confirm_error_lock so that eeh_serialize_{lock, unlock}() can be used to freeze the affected PE in PCI surprise hot remove path. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-29powerpc/eeh: Allow to freeze PE in eeh_pe_set_option()Gavin Shan
Function eeh_pe_set_option() is used to apply the requested options (enable, disable, unfreeze) in EEH virtualization path. The semantics of this function isn't complete until freezing is supported. This allows to freeze the indicated PE. The new semantics is going to be used in PCI surprise hot remove path, to freeze removed PCI devices (PE) to avoid unexpected EEH error reporting. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/eeh: Switch to conventional PCI address output in EEH logGuilherme G. Piccoli
This is a very minor/trivial fix for the output of PCI address on EEH logs. The PCI address on "OF node" field currently is using ":" as a separator for the function, but the usual separator is ".". This patch changes the separator to dot, so the PCI address is printed as usual. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-12Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"Guilherme G. Piccoli
This reverts commit 89a51df5ab1d38b257300b8ac940bbac3bb0eb9b. The function eeh_add_device_early() is used to perform EEH initialization in devices added later on the system, like in hotplug/DLPAR scenarios. Since the commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on Cell") a new check was introduced in this function - Cell has no EEH capabilities which led to kernel oops if hotplug was performed, so checking for eeh_enabled() was introduced to avoid the issue. However, in architectures that EEH is present like pSeries or PowerNV, we might reach a case in which no PCI devices are present on boot time and so EEH is not initialized. Then, if a device is added via DLPAR for example, eeh_add_device_early() fails because eeh_enabled() is false, and EEH end up not being enabled at all. This reverts the aforementioned patch since a new verification was introduced by the commit d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug") and so the original Cell issue does not happen anymore. Cc: stable@vger.kernel.org # v4.1+ Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-12powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner()Gavin Shan
The label "reset" in eeh_pe_change_owner() is used only for once. No need to keep it and just drop it. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11powerpc/eeh: rename EEH from "extended" to "enhanced" error handlingRussell Currey
IBM online documentation for EEH uses "extended error handling" and "enhanced error handling" to refer to the same thing, in different places. The only place mentioning it as "enhanced error handling" in the kernel is the MAINTAINERS file, and it's "extended" in some documentation. IBM originally defined EEH as "enhanced error handling", so standardise all mentions of EEH to use that term. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: eeh_pci_enable(): fix checking of post-request stateAndrew Donnellan
In eeh_pci_enable(), after making the request to set the new options, we call eeh_ops->wait_state() to check that the request finished successfully. At the moment, if eeh_ops->wait_state() returns 0, we return 0 without checking that it reflects the expected outcome. This can lead to callers further up the chain incorrectly assuming the slot has been successfully unfrozen and continuing to attempt recovery. On powernv, this will occur if pnv_eeh_get_pe_state() or pnv_eeh_get_phb_state() return 0, which in turn occurs if the relevant OPAL call returns OPAL_EEH_STOPPED_MMIO_DMA_FREEZE or OPAL_EEH_PHB_ERROR respectively. On pseries, this will occur if pseries_eeh_get_state() returns 0, which in turn occurs if RTAS reports that the PE is in the MMIO Stopped and DMA Stopped states. Obviously, none of these cases represent a successful completion of a request to thaw MMIO or DMA. Fix the check so that a wait_state() return value of 0 won't be considered successful for the EEH_OPT_THAW_MMIO or EEH_OPT_THAW_DMA cases. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Remove duplicated check in eeh_dump_pe_log()Gavin Shan
When eeh_dump_pe_log() is only called by eeh_slot_error_detail(), we already have the check that the PE isn't in PCI config blocked state in eeh_slot_error_detail(). So we needn't the duplicated check in eeh_dump_pe_log(). This removes the duplicated check in eeh_dump_pe_log(). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Synchronize recovery in host/guestGavin Shan
When passing through SRIOV VFs to guest, we possibly encounter EEH error on PF. In this case, the VF PEs are put into frozen state. The error could be reported to guest before it's captured by the host. That means the guest could attempt to recover errors on VFs before host gets chance to recover errors on PFs. The VFs won't be recovered successfully. This enforces the recovery order for above case: the recovery on child PE in guest is hold until the recovery on parent PE in host is completed. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: powerpc/eeh: Support error recovery for VF PEWei Yang
PFs are enumerated on PCI bus, while VFs are created by PF's driver. In EEH recovery, it has two cases: 1. Device and driver is EEH aware, error handlers are called. 2. Device and driver is not EEH aware, un-plug the device and plug it again by enumerating it. The special thing happens on the second case. For a PF, we could use the original pci core to enumerate the bus, while for VF we need to record the VFs which aer un-plugged then plug it again. Also The patch caches the VF index in pci_dn, which can be used to calculate VF's bus, device and function number. Those information helps to locate the VF's PCI device instance when doing hotplug during EEH recovery if necessary. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/powernv: Support EEH reset for VF PEWei Yang
PEs for VFs don't have primary bus. So they have to have their own reset backend, which is used during EEH recovery. The patch implements the reset backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained in the PE. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-08powerpc/eeh: fix incorrect function name in commentAndrew Donnellan
The comment block above pcibios_set_pcie_reset_state() incorrectly refers to pcibios_set_pcie_slot_reset(). Fix the comment accordingly. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21powerpc/eeh: More relaxed condition for enabled IO pathGavin Shan
When one or both of the below two flags are marked in the PE state, the PE's IO path is regarded as enabled: EEH_STATE_MMIO_ACTIVE or EEH_STATE_MMIO_ENABLED. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15powerpc/eeh: atomic_dec_if_positive() to update passthru countGavin Shan
No need to have two atomic opertions (update and fetch/check) when decreasing PE's number of passed devices as one atomic operation is enough. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-12powerpc/mm: Differentiate between hugetlb and THP during page walkAneesh Kumar K.V
We need to properly identify whether a hugepage is an explicit or a transparent hugepage in follow_huge_addr(). We used to depend on hugepage shift argument to do that. But in some case that can result in wrong results. For ex: On finding a transparent hugepage we set hugepage shift to PMD_SHIFT. But we can end up clearing the thp pte, via pmdp_huge_get_and_clear. We do prevent reusing the pfn page via the usage of kick_all_cpus_sync(). But that happens after we updated the pte to 0. Hence in follow_huge_addr() we can find hugepage shift set, but transparent huge page check fail for a thp pte. NOTE: We fixed a variant of this race against thp split in commit 691e95fd7396905a38d98919e9c150dbc3ea21a3 ("powerpc/mm/thp: Make page table walk safe against thp split/collapse") Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in follow_page_mask occasionally. In the long term, we may want to switch ppc64 64k page size config to enable CONFIG_ARCH_WANT_GENERAL_HUGETLB Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-28powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail()Gavin Shan
The config space of some PCI devices can't be accessed when their PEs are in frozen state. Otherwise, fenced PHB might be seen. Those PEs are identified with flag EEH_PE_CFG_RESTRICTED, meaing EEH_PE_CFG_BLOCKED is set automatically when the PE is put to frozen state (EEH_PE_ISOLATED). eeh_slot_error_detail() restores PCI device BARs with eeh_pe_restore_bars(), which then calls eeh_ops->restore_config() to reinitialize the PCI device in (OPAL) firmware. eeh_ops->restore_config() produces PCI config access that causes fenced PHB. The problem was reported on below adapter: 0001:01:00.0 0200: 14e4:168e (rev 10) 0001:01:00.0 Ethernet controller: Broadcom Corporation \ NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) This fixes the issue by skipping eeh_pe_restore_bars() in eeh_slot_error_detail() when EEH_PE_CFG_BLOCKED is set for the PE. Fixes: b6541db1 ("powerpc/eeh: Block PCI config access upon frozen PE") Cc: stable@vger.kernel.org # v4.0+ Reported-by: Manvanthara B. Puttashankar <mputtash@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-18powerpc/eeh: Disable automatically blocked PCI configGavin Shan
pcibios_set_pcie_reset_state() could be called to complete reset request when passing through PCI device, flag EEH_PE_ISOLATED is set before saving the PCI config sapce. On some Broadcom adapters, EEH_PE_CFG_BLOCKED is automatically set when the flag EEH_PE_ISOLATED is marked. It caused bogus data saved from the PCI config space, which will be restored to the PCI adapter after the reset. Eventually, the hardware can't work with corrupted data in PCI config space. The patch fixes the issue with eeh_pe_state_mark_no_cfg(), which doesn't set EEH_PE_CFG_BLOCKED when seeing EEH_PE_ISOLATED on the PE, in order to avoid the bogus data saved and restored to the PCI config space. Reported-by: Rajanikanth H. Adaveeshaiah <rajanikanth.ha@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-14powerpc/eeh: Probe after unbalanced kref checkDaniel Axtens
In the complete hotplug case, EEH PEs are supposed to be released and set to NULL. Normally, this is done by eeh_remove_device(), which is called from pcibios_release_device(). However, if something is holding a kref to the device, it will not be released, and the PE will remain. eeh_add_device_late() has a check for this which will explictly destroy the PE in this case. This check in eeh_add_device_late() occurs after a call to eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(), which will exit without probing if there is an existing PE. This means that on PowerNV, devices with outstanding krefs will not be rediscovered by EEH correctly after a complete hotplug. This is affecting CXL (CAPI) devices in the field. Put the probe after the kref check so that the PE is destroyed and affected devices are correctly rediscovered by EEH. Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug") Cc: stable@vger.kernel.org Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU groupAlexey Kardashevskiy
This relies on the fact that a PCI device always has an IOMMU table which may not be the case when we get dynamic DMA windows so let's use more reliable check for IOMMU group here. As we do not rely on the table presence here, remove the workaround from pnv_pci_ioda2_set_bypass(); also remove the @add_to_iommu_group parameter from pnv_ioda_setup_bus_dma(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-07powerpc/eeh: Fix trivial error in eeh_restore_dev_state()David Gibson
Commit 28158cd "powerpc/eeh: Enhance pcibios_set_pcie_reset_state()" introduced a fix for a problem where certain configurations could lead to pci_reset_function() destroying the state of PCI devices other than the one specified. Unfortunately, the fix has a trivial bug - it calls pci_save_state() again, when it should be calling pci_restore_state(). This corrects the problem. Cc: Gavin Shan <gwshan@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13powerpc/eeh: remove unused macro IS_BRIDGEWei Yang
Currently, the macro IS_BRIDGE is not used any where. This patch just removes it. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-12powerpc/eeh: Introduce eeh_pe_inject_err()Gavin Shan
The patch defines PCI error types and functions in uapi/asm/eeh.h and exports function eeh_pe_inject_err(), which will be called by VFIO driver to inject the specified PCI error to the indicated PE for testing purpose. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01powerpc/eeh: Delay probing EEH device during hotplugGavin Shan
Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH devices in early stage, which is reasonable to pSeries platform. However, it's wrong for PowerNV platform because the PE# isn't determined until the resources (IO and MMIO) are assigned to PE in hotplug case. So we have to delay probing EEH devices for PowerNV platform until the PE# is assigned. Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()Gavin Shan
When asserting reset in pcibios_set_pcie_reset_state(), the PE is enforced to (hardware) frozen state in order to drop unexpected PCI transactions (except PCI config read/write) automatically by hardware during reset, which would cause recursive EEH error. However, the (software) frozen state EEH_PE_ISOLATED is missed. When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED is set in PE state retrival backend. Unfortunately, nobody (the reset handler or the EEH recovery functinality in host) will clear EEH_PE_ISOLATED when the PE has been passed through to guest. The patch sets and clears EEH_PE_ISOLATED properly during reset in function pcibios_set_pcie_reset_state() to fix the issue. Fixes: 28158cd ("Enhance pcibios_set_pcie_reset_state()") Reported-by: Carol L. Soto <clsoto@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Carol L. Soto <clsoto@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17powerpc/mm/thp: Make page table walk safe against thp split/collapseAneesh Kumar K.V
We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-14powerpc/eeh: Fix crash in eeh_add_device_early() on CellMichael Ellerman
The recent change to the EEH probing causes a crash on Cell because eeh_ops is NULL. Check if EEH is enabled and if not bail out. Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-24powerpc/eeh: Remove device_node dependencyGavin Shan
The patch removes struct eeh_dev::dn and the corresponding helper functions: eeh_dev_to_of_node() and of_node_to_eeh_dev(). Instead, eeh_dev_to_pdn() and pdn_to_eeh_dev() should be used to get the pdn, which might contain device_node on PowerNV platform. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24powerpc/eeh: Replace device_node with pci_dn in eeh_opsGavin Shan
There are 3 EEH operations whose arguments contain device_node: read_config(), write_config() and restore_config(). The patch replaces device_node with pci_dn. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24powerpc/eeh: Do probe on pci_dnGavin Shan
Originally, EEH core probes on device_node or pci_dev to populate EEH devices and PEs, which conflicts with the fact: SRIOV VFs are usually enabled and created by PF's driver and they don't have the corresponding device_nodes. Instead, SRIOV VFs have dynamically created pci_dn, which can be used for EEH probe. The patch reworks EEH probe for PowerNV and pSeries platforms to do probing based on pci_dn, instead of pci_dev or device_node any more. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17powerpc/eeh: Enhance pcibios_set_pcie_reset_state()Gavin Shan
Function pcibios_set_pcie_reset_state() is possibly called by pci_reset_function(), on which VFIO infrastructure depends to issue reset. pcibios_set_pcie_reset_state() is issuing reset on the parent PE of the indicated PCI device. The reset causes state lost on all PCI devices except the indicated one as the argument to pcibios_set_pcie_reset_state(). Also, sideband MMIO access from guest when issuing reset would cause unexpected EEH error. For above two issues, the patch applies following enhancements to pcibios_set_pcie_reset_state(): * For all PCI devices except the indicated one, save their state prior to reset and restore state after that. * Explicitly freeze PE prior to reset and unfreeze it after that, in order to avoid unexpected EEH error. Tested-by: Priya M. A <priyama2@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-01-23powerpc/eeh: Allow to set maximal frozen timesGavin Shan
When PE's frozen count hits maximal allowed frozen times, which is 5 currently, it will be forced to be offline permanently. Once the PE is removed permanently, rebooting machine is required to bring the PE back. It's not convienent when testing EEH functionality. The patch exports the maximal allowed frozen times through debugfs entry (/sys/kernel/debug/powerpc/eeh_max_freezes). Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02powerpc: Drop useless warning in eeh_init()Greg Kurz
This is what we get in dmesg when booting a pseries guest and the hypervisor doesn't provide EEH support. [ 0.166655] EEH functionality not supported [ 0.166778] eeh_init: Failed to call platform init function (-22) Since both powernv_eeh_init() and pseries_eeh_init() already complain when hitting an error, it is not needed to print more (especially such an uninformative message). Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Dump PHB diag-data earlyGavin Shan
On PowerNV platform, PHB diag-data is dumped after stopping device drivers. In case of recursive EEH errors, the kernel is usually crashed before dumping PHB diag-data for the second EEH error. It's hard to locate the root cause of the second EEH error without PHB diag-data. The patch adds one more EEH option "eeh=early_log", which helps dumping PHB diag-data immediately once frozen PE is detected, in order to get the PHB diag-data for the second EEH error. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Recover EEH error on ownership change for BCM5719Gavin Shan
In PCI passthrou scenario, we need simulate EEH recovery for Emulex adapters when their ownership changes, as we did in commit 5cfb20b96 ("powerpc/eeh: Emulate EEH recovery for VFIO devices"). Broadcom BCM5719 adpaters are facing same problem and needs same cure. Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Set EEH_PE_RESET on PE resetGavin Shan
The patch introduces additional flag EEH_PE_RESET to indicate the corresponding PE is under reset. In turn, the PE retrieval bakcend on PowerNV platform can return unfrozen state for the EEH core to moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for the purpose. In PCI passthrou case, the problem is more worse: Guest doesn't recover 6th EEH error. The PE is left in isolated (frozen) and config blocked state on Broadcom adapters. We can't retrieve the PE's state correctly any more, even from the host side via sysfs /sys/bus/pci/devices/xxx/eeh_pe_state. Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Refactor eeh_reset_pe()Gavin Shan
The patch refactors eeh_reset_pe() in order for: * Varied return values for different failure cases. * Replace pr_err() with pr_warn() and print function name. * Coding style cleanup. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-10-15powerpc/eeh: Don't collect logs on PE with blocked config spaceGavin Shan
When the PE's config space is marked as blocked, PCI config read requests always return 0xFF's. It's pointless to collect logs in this case. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKEDGavin Shan
The flag EEH_PE_RESET indicates blocking config space of the PE during reset time. We potentially need block PE's config space other than reset time. So it's reasonable to replace it with EEH_PE_CFG_BLOCKED to indicate its usage. There are no substantial code or logic changes in this patch. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Dump PCI config space for all child devicesGavin Shan
The PEs can be organized as nested. Current implementation doesn't dump PCI config space for subordinate devices of child PEs. However, the frozen PE could be caused by those subordinate devices of its child PEs. The patch dumps PCI config space for all subordinate devices of the problematic PE. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Emulate EEH recovery for VFIO devicesGavin Shan
When enabling EEH functionality on passed through devices (PE) with VFIO, the devices in the PE would be removed permanently from guest side. In that case, the PE remains frozen state. When returning PE to host, or restarting the guest again, we had mechanism unfreezing the PE by clearing PESTA/B frozen bits. However, that's not enough for some adapters, which are indicated as following "lspci" shows. Those adapters require hot reset on the parent bus to bring their firmware back to workable state. Otherwise, those adaptrs won't be operative and the host (for returning case) or the guest will fail to load the drivers for those adapters without exception. 0000:01:00.0 Ethernet controller: Emulex Corporation OneConnect \ 10Gb NIC (be3) (rev 02) 0000:01:00.0 0200: 19a2:0710 (rev 02) 0001:03:00.0 Ethernet controller: Emulex Corporation OneConnect \ NIC (Lancer) (rev 10) 0001:03:00.0 0200: 10df:e220 (rev 10) The patch adds mechanism to emulate EEH recovery (for hot reset on parent PCI bus) on 3 gates to fix the issue: open/release one adapter of the PE, enable EEH functionality on one adapter of the PE. Reported-by: Murilo Fossa Vicentini <muvic@br.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>