summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2023-03-15qcom: llcc/edac: Fix the base address used for accessing LLCC banksManivannan Sadhasivam
The Qualcomm LLCC/EDAC drivers were using a fixed register stride for accessing the (Control and Status Registers) CSRs of each LLCC bank. This stride only works for some SoCs like SDM845 for which driver support was initially added. But the later SoCs use different register stride that vary between the banks with holes in-between. So it is not possible to use a single register stride for accessing the CSRs of each bank. By doing so could result in a crash. For fixing this issue, let's obtain the base address of each LLCC bank from devicetree and get rid of the fixed stride. This also means, there is no need to rely on reg-names property and the base addresses can be obtained using the index. First index is LLCC bank 0 and last index is LLCC broadcast. If the SoC supports more than one bank, then those need to be defined in devicetree for index from 1..N-1. Reported-by: Parikshit Pareek <quic_ppareek@quicinc.com> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230314080443.64635-13-manivannan.sadhasivam@linaro.org
2023-03-13EDAC/skx: Fix overflows on the DRAM row address mapping arraysQiuxu Zhuo
The current DRAM row address mapping arrays skx_{open,close}_row[] only support ranks with sizes up to 16G. Decoding a rank address to a DRAM row address for a 32G rank by using either one of the above arrays by the skx_edac driver, will result in an overflow on the array. For a 32G rank, the most significant DRAM row address bit (the bit17) is mapped from the bit34 of the rank address. Add this new mapping item to both arrays to fix the overflow issue. Fixes: 4ec656bdf43a ("EDAC, skx_edac: Add EDAC driver for Skylake") Reported-by: Feng Xu <feng.f.xu@intel.com> Tested-by: Feng Xu <feng.f.xu@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230211011728.71764-1-qiuxu.zhuo@intel.com
2023-02-21Merge tag 'edac_updates_for_v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add a driver for the RAS functionality on Xilinx's on chip memory controller - Add support for decoding errors from the first and second level memory on SKL-based hardware - Add support for the memory controllers in Intel Granite Rapids and Emerald Rapids machines - First round of amd64_edac driver simplification and removal of unneeded functionality - The usual cleanups and fixes * tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive EDAC/amd64: Remove early_channel_count() EDAC/amd64: Remove PCI Function 0 EDAC/amd64: Remove PCI Function 6 EDAC/amd64: Remove scrub rate control for Family 17h and later EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ EDAC/i10nm: Add driver decoder for Sapphire Rapids server EDAC/i10nm: Add Intel Granite Rapids server support EDAC/i10nm: Make more configurations CPU model specific EDAC/i10nm: Add Intel Emerald Rapids server support EDAC/skx_common: Delete duplicated and unreachable code EDAC/skx_common: Enable EDAC support for the "near" memory EDAC/qcom: Add platform_device_id table for module autoloading EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCM dt-bindings: edac: Add bindings for Xilinx ZynqMP OCM
2023-02-21Merge tag 'ras_core_for_v6.3_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Add support for reporting more bits of the physical address on error, on newer AMD CPUs - Mask out bits which don't belong to the address of the error being reported * tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Mask out non-address bits from machine check bank x86/mce: Add support for Extended Physical Address MCA changes x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
2023-02-14EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positiveYazen Ghannam
Reportedly, clang cannot do interprocedural analysis: https://lore.kernel.org/r/20230213-amd64_edac-wsometimes-uninitialized-v1-1-5bde32b89e02@kernel.org and see that those arguments won't be used uninitialized. So, yeah, the code's fine even without this. Normally, such a "fix" won't be applied but that warning gets automatically enabled in -Wall builds and when CONFIG_WERROR is set in allmodconfig builds, the build fails. So shut it up with a minimal fix as this code will see more reorganization very soon. [ bp: Write commit message. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/Y%2BqdVHidnrrKvxiD@dev-arch.thelio-3990X
2023-02-09EDAC/amd64: Remove early_channel_count()Yazen Ghannam
The early_channel_count() function seems to have been useful in the past for knowing how many EDAC mci structures to populate. However, this is no longer needed as the maximum channel count for a system is used instead. Remove the early_channel_count() helper functions and related code. Use the size of the channel layer when iterating over channel structures. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-6-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Remove PCI Function 0Yazen Ghannam
PCI Function 0 is used on Family 17h and later only to read the "dhar" value. This value is printed and provided through a module-specific debug sysfs file. The value is not used for any Family 17h and later code, and it does not have any apparent debug value on these systems. Remove "dhar", Function 0 PCI IDs, and all related code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-5-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Remove PCI Function 6Yazen Ghannam
PCI Function 6 is used on Family 17h and later to access scrub registers. With scrub access removed, this function has no other use. Remove all Function 6 PCI IDs and related code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-4-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Remove scrub rate control for Family 17h and laterYazen Ghannam
The scrub registers on AMD Family 17h and later may be inaccessible to the OS. Furthermore, hardware designers recommend that the scrubbing feature is managed by the firmware. Remove support for the sdram_scrub_rate interface for AMD Family 17h systems and later by not setting the scrub function pointers. The EDAC MC core will then not expose the scrub files in sysfs. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-3-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Don't set up EDAC PCI control on Family 17h+Yazen Ghannam
EDAC PCI control is used to detect/report legacy PCI errors like "Parity" and "SERROR". Modern AMD systems use PCIe Advanced Error Reporting (AER), and legacy PCI errors should not be reported. Remove EDAC PCI control setup on AMD Family 17h and later systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-2-yazen.ghannam@amd.com
2023-02-08EDAC/i10nm: Add driver decoder for Sapphire Rapids serverYouquan Song
Intel SDM (December 2022) vol3B 17.13.2 contains IMC MC error codes for Sapphire Rapids. Current i10nm_edac only supports firmware decoder (ACPI DSM methods) for Sapphire Rapids. So add the driver decoder (decoding DDR memory errors via extracting error information from the IMC MC error codes) for Sapphire Rapids for better decoding performance. Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-01-25EDAC/i10nm: Add Intel Granite Rapids server supportQiuxu Zhuo
The Granite Rapids CPU model uses similar memory controller registers as Sapphire Rapids server but with some different configurations: - Various memory controller numbers for different Granite Rapids CPUs. So detect the number of present memory controllers at run time. - Different MMIO offsets of memory controllers. - Different triples of bus/dev/fun of some PCI devices used in i10nm_edac. Add above configurations and Granite Rapids CPU model ID for EDAC support. [Tony: Fixed 2 typos s/strcture/structure/] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/i10nm: Make more configurations CPU model specificQiuxu Zhuo
The numbers of memory controllers per socket, channels per memory controller, DIMMs per channel and the triples of bus/device/function of PCI devices used in i10nm_edac can be CPU model specific. Add new fields to the structure res_config for above numbers and triples to make them CPU model specific. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/i10nm: Add Intel Emerald Rapids server supportQiuxu Zhuo
The Emerald Rapids CPU model uses similar memory controller registers as Sapphire Rapids server. Add Emerald Rapids CPU model number ID for EDAC support. Tested-by: Li Zhang <li4.zhang@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/skx_common: Delete duplicated and unreachable codeQiuxu Zhuo
skx_mce_check_error() returns early if the error isn't from memory. So when skx_mce_output_error() is invoked from skx_mce_check_error(), it doesn't need to re-check whether the error is from memory. Delete the duplicated and unreachable code from skx_mce_output_error(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/skx_common: Enable EDAC support for the "near" memoryQiuxu Zhuo
The current {skx,i10nm}_edac miss the EDAC support to decode errors from the 1st level memory (the fast "near" memory as cache) of the 2-level memory system. Introduce a helper function skx_error_in_mem() to check whether errors are from memory at the beginning of skx_mce_check_error(). As long as the errors are from memory (either the 1-level memory system or the 2-level memory system), decode the errors. Reported-and-tested-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-20EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_infoManivannan Sadhasivam
The memory for llcc_driv_data is allocated by the LLCC driver. But when it is passed as the private driver info to the EDAC core, it will get freed during the qcom_edac driver release. So when the qcom_edac driver gets probed again, it will try to use the freed data leading to the use-after-free bug. Hence, do not pass llcc_driv_data as pvt_info but rather reference it using the platform_data pointer in the qcom_edac driver. Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs") Reported-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.20 Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org
2023-01-20EDAC/qcom: Add platform_device_id table for module autoloadingManivannan Sadhasivam
Add a device ID table so that the driver loads automatically when the associated platform_device gets registered. Reported-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Link: https://lore.kernel.org/r/20230118150904.26913-3-manivannan.sadhasivam@linaro.org
2023-01-19EDAC/device: Respect any driver-supplied workqueue polling valueManivannan Sadhasivam
The EDAC drivers may optionally pass the poll_msec value. Use that value if available, else fall back to 1000ms. [ bp: Touchups. ] Fixes: e27e3dac6517 ("drivers/edac: add edac_device class") Reported-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.9 Link: https://lore.kernel.org/r/COZYL8MWN97H.MROQ391BGA09@otso
2023-01-10x86/mce: Mask out non-address bits from machine check bankTony Luck
Systems that support various memory encryption schemes (MKTME, TDX, SEV) use high order physical address bits to indicate which key should be used for a specific memory location. When a memory error is reported, some systems may report those key bits in the IA32_MCi_ADDR machine check MSR. The Intel SDM has a footnote for the contents of the address register that says: "Useful bits in this field depend on the address methodology in use when the register state is saved." AMD Processor Programming Reference has a more explicit description of the MCA_ADDR register: "For physical addresses, the most significant bit is given by Core::X86::Cpuid::LongModeInfo[PhysAddrSize]." Add a new #define MCI_ADDR_PHYSADDR for the mask of valid physical address bits within the machine check bank address register. Use this mask for recoverable machine check handling and in the EDAC driver to ignore any key bits that may be present. [ Tony: Based on independent fixes proposed by Fan Du and Isaku Yamahata ] Reported-by: Isaku Yamahata <isaku.yamahata@intel.com> Reported-by: Fan Du <fan.du@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20230109152936.397862-1-tony.luck@intel.com
2023-01-09EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCMSai Krishna Potthuri
Add EDAC support for Xilinx ZynqMP OCM Controller, so this driver reports CE and UE errors upon interrupt generation. Also add debugfs files for error injection. On Xilinx ZynqMP platform, both OCM Controller driver(zynqmp_edac) and DDR Memory Controller driver(synopsys_edac) co-exist which means both can be loaded at a time. This scenario is tested on Xilinx ZynqMP platform. Fix following issue reported by the robot: "MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/edac/xlnx,zynqmp-ocmc.yaml" [ bp: - Massage commit message - s/EDAC_ZYNQMP_OCM/EDAC_ZYNQMP/ - Touchups ] Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230104084512.1855243-3-sai.krishna.potthuri@amd.com
2023-01-03EDAC/highbank: Fix memory leak in highbank_mc_probe()Miaoqian Lin
When devres_open_group() fails, it returns -ENOMEM without freeing memory allocated by edac_mc_alloc(). Call edac_mc_free() on the error handling path to avoid a memory leak. [ bp: Massage commit message. ] Fixes: a1b01edb2745 ("edac: add support for Calxeda highbank memory controller") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20221229054825.1361993-1-linmq006@gmail.com
2022-12-30EDAC/device: Fix period calculation in edac_device_reset_delay_period()Eliav Farber
Fix period calculation in case user sets a value of 1000. The input of round_jiffies_relative() should be in jiffies and not in milli-seconds. [ bp: Use the same code pattern as in edac_device_workq_setup() for clarity. ] Fixes: c4cf3b454eca ("EDAC: Rework workqueue handling") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20221020124458.22153-1-farbere@amazon.com
2022-12-12Merge branches 'edac-ghes' and 'edac-misc' into edac-updates-for-v6.2Borislav Petkov (AMD)
Combine all queued EDAC changes for submission into v6.2: * ras/edac-ghes: EDAC/igen6: Return the correct error type when not the MC owner apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg() EDAC: Check for GHES preference in the chipset-specific EDAC drivers EDAC/ghes: Make ghes_edac a proper module EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Add a notifier for reporting memory errors efi/cper: Export several helpers for ghes_edac to use * ras/edac-misc: EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper() EDAC/i5400: Fix typo in comment: vaious -> various EDAC/mc_sysfs: Increase legacy channel support to 12 MAINTAINERS: Make Mauro EDAC reviewer MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac EDAC/i5000: Mark as BROKEN Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2022-11-28EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper()Yang Yingliang
As the comment of pci_get_domain_bus_and_slot() says, it returns a PCI device with refcount incremented, so it doesn't need to call an extra pci_dev_get() in pci_get_dev_wrapper(), and the PCI device needs to be put in the error path. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20221128065512.3572550-1-yangyingliang@huawei.com
2022-11-25EDAC/i5400: Fix typo in comment: vaious -> variousChen Zhang
Fix spelling typo in comment: vaious -> various. [ bp: Massage. ] Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Chen Zhang <chenzhang@kylinos.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221102081248.45694-1-chenzhang@kylinos.cn
2022-10-31EDAC/mc_sysfs: Increase legacy channel support to 12Yazen Ghannam
Newer AMD systems, such as Genoa, can support up to 12 channels per EDAC "mc" device. These are detected by the device's EDAC module, and the current EDAC interface is properly enumerated. However, the legacy EDAC sysfs interface provides device attributes only for channels 0 to 7. Therefore, channels 8 to 11 will not be visible in the legacy interface. This was overlooked in the initial support for AMD Genoa. Add additional device attributes so that up to 12 channels are visible in the legacy EDAC sysfs interface. Fixes: e2be5955a886 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20221018153630.14664-1-yazen.ghannam@amd.com
2022-10-25EDAC/igen6: Return the correct error type when not the MC ownerJia He
Return -EBUSY instead of -ENODEV just like the other EDAC drivers do. [ bp: Rewrite text. ] Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221018082214.569504-8-justin.he@arm.com
2022-10-21EDAC: Check for GHES preference in the chipset-specific EDAC driversJia He
Call ghes_get_devices() to check whether ghes_edac should be used on the platform where it is preferred over the corresponding chipset-specific EDAC driver. Unlike the existing edac_get_owner() check, the ghes_get_devices() check works independent to the module_init ordering. [ bp: Massage. ] Suggested-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-6-justin.he@arm.com
2022-10-21EDAC/ghes: Make ghes_edac a proper moduleJia He
Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") introduced a bug leading to ghes_edac_register() to be invoked before edac_init(). Because at that time the bus "edac" hadn't been even registered, this created sysfs nodes as /devices/mc0 instead of /sys/devices/system/edac/mc/mc0 on an Ampere eMag server. Fix this by turning ghes_edac into a proper module. The list of GHES devices returned is not protected from being modified concurrently but it is pretty static as it gets created only during GHES init and latter is not a module so... [ bp: Massage. ] Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") Co-developed-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-5-justin.he@arm.com
2022-10-21EDAC/ghes: Prepare to make ghes_edac a proper moduleJia He
To make ghes_edac a proper module, prepare to decouple its dependencies from GHES. Move the ghes_edac.force_load parameter to ghes.c in order to properly control whether ghes_edac should be force-loaded: In ghes_edac_register() it is too late to set the module flag. Introduce a helper ghes_get_devices(), which returns the list of GHES devices which got probed when the platform-check passes on the system. The previous force_load check is not needed in ghes_edac_unregister() since it will be checked in the module's init function of ghes_edac later. [ bp: Massage. ] Suggested-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-4-justin.he@arm.com
2022-10-20EDAC/ghes: Add a notifier for reporting memory errorsJia He
In order to make it a proper module and disentangle it from facilities, add a notifier for reporting memory errors. Use an atomic notifier because calls sites like ghes_proc_in_irq() run in interrupt context. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-3-justin.he@arm.com
2022-10-17EDAC/i5000: Mark as BROKENAristeu Rozanski
i5000_edac supports very old hardware which isn't available and it's been broken for single/dual channel for many years without anyone noticing. Marking as BROKEN. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220921181009.oxytvicy6sry6it7@redhat.com
2022-10-13Merge patch series "Use composable cache instead of L2 cache"Palmer Dabbelt
Zong Li <zong.li@sifive.com> says: Since composable cache may be L3 cache if private L2 cache exists, we should use its original name "composable cache" to prevent confusion. This patchset contains the modification which is related to ccache, such as DT binding and EDAC driver. * b4-shazam-merge: riscv: Add cache information in AUX vector soc: sifive: ccache: define the macro for the register shifts soc: sifive: ccache: use pr_fmt() to remove CCACHE: prefixes soc: sifive: ccache: reduce printing on init soc: sifive: ccache: determine the cache level from dts soc: sifive: ccache: Rename SiFive L2 cache to Composable cache. dt-bindings: sifive-ccache: change Sifive L2 cache to Composable cache Link: https://lore.kernel.org/r/20220913061817.22564-1-zong.li@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-13soc: sifive: ccache: Rename SiFive L2 cache to Composable cache.Greentime Hu
Since composable cache may be L3 cache if there is a L2 cache, we should use its original name composable cache to prevent confusion. There are some new lines were generated due to adding the compatible "sifive,ccache0" into ID table and indent requirement. The sifive L2 has been renamed to sifive CCACHE, EDAC driver needs to apply the change as well. Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Zong Li <zong.li@sifive.com> Co-developed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20220913061817.22564-3-zong.li@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-04Merge branches 'edac-drivers' and 'edac-misc' into edac-updates-for-v6.1Borislav Petkov
Combine all queued EDAC changes for submission into v6.1: * edac-drivers: EDAC/ie31200: Add Skylake-S support * edac-misc: EDAC/i7300: Correct the i7300_exit() function name in comment x86/sb_edac: Add row column translation for Broadwell EDAC/i10nm: Print an extra register set of retry_rd_err_log EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM EDAC/skx_common: Add ChipSelect ADXL component EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations EDAC: Remove obsolete declarations in edac_module.h EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs EDAC/skx_common: Make output format similar EDAC/skx_common: Use driver decoder first EDAC/mc: Drop duplicated dimm->nr_pages debug printout EDAC/mc: Replace spaces with tabs in memtype flags definition EDAC/wq: Remove unneeded flush_workqueue() Signed-off-by: Borislav Petkov <bp@suse.de>
2022-09-23EDAC/i7300: Correct the i7300_exit() function name in commentColin Ian King
The incorrect function name is being used in the comment for function i7300_exit. Correct this. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220805125008.2346559-1-colin.i.king@gmail.com
2022-09-23x86/sb_edac: Add row column translation for BroadwellYouquan Song
The sb_edac driver lacks translation for DIMM internal address. Add memory address translation for row/column/bank/bank_group on Broadwell. Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220722233338.341567-1-tony.luck@intel.com
2022-09-23EDAC/i10nm: Print an extra register set of retry_rd_err_logQiuxu Zhuo
Sapphire Rapids server adds an extra register set for logging more retry_rd_err_log data. So add code to print the extra register set. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220722233338.341567-1-tony.luck@intel.com
2022-09-23EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBMQiuxu Zhuo
An HBM memory channel is divided into two pseudo channels. Each pseudo channel has its own retry_rd_err_log registers. Retrieve and print retry_rd_err_log registers of the HBM pseudo channel if the memory error is from HBM. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220722233338.341567-1-tony.luck@intel.com
2022-09-23EDAC/skx_common: Add ChipSelect ADXL componentQiuxu Zhuo
Each pseudo channel of HBM has its own retry_rd_err_log registers. The bit 0 of ChipSelect ADXL component encodes the pseudo channel number of HBM memory. So add ChipSelect ADXL component to get HBM pseudo channel number. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220722233338.341567-1-tony.luck@intel.com
2022-09-18EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarationsUwe Kleine-König
When moving the definition of ppc4xx_edac_driver further down, the forward declarations can just be dropped. Do this to reduce needless line repetition. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220917232013.489931-1-u.kleine-koenig@pengutronix.de
2022-09-11EDAC: Remove obsolete declarations in edac_module.hGaosheng Cui
Commit 4de78c6877ec ("drivers/edac: mod PCI poll names"), renamed the respective variables and accessors but left the old accessor declarations edac_get_log_ce(), edac_get_log_ue(), edac_get_poll_msec() and edac_get_panic_on_ue() in place. Remove them. [ bp: Masssage commit message. ] Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220911094038.3224365-1-cuigaosheng1@huawei.com
2022-09-08EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUsYouquan Song
Current i10nm_edac only supports firmware decoder (ACPI DSM methods). MCA bank registers of Ice Lake or Tremont CPUs contain the information to decode DDR memory errors. To get better decoding performance, add the driver decoder (decoding DDR memory errors via extracting error information from MCA bank registers) for Ice Lake and Tremont CPUs. Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220901194310.115427-1-tony.luck@intel.com/
2022-09-08EDAC/skx_common: Make output format similarQiuxu Zhuo
The decoded output format of driver decoder is different from the output format of firmware decoder. Make output format similar regardless of decode function (Align driver decoder's to firmware decoder's). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220901194310.115427-1-tony.luck@intel.com/
2022-09-08EDAC/skx_common: Use driver decoder firstQiuxu Zhuo
The performance of driver decoder[1] is better than the performance of firmware decoder[2], especially on frequent correctable errors. So use the driver decoder first, fall back to firmware decoder if the driver decoder is unavailable. Also rename the function pointer skx_decode to driver_decode (better name to contrast with adxl_decode). [1] Decode errors by extracting error information from registers of memory controllers and/or MCA bank registers. [2] Decode errors by calling ACPI DSM methods. Co-developed-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20220901194310.115427-1-tony.luck@intel.com/
2022-09-01EDAC/mc: Drop duplicated dimm->nr_pages debug printoutSerge Semin
The duplicated edac_dbg()-based dimm->nr_pages print was introduced in 6e84d359b2be ("edac_mc: Cleanup per-dimm_info debug messages"). The duplicated line can be found even in the commit message text: [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8 [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 Drop the second edac_dbg() call. [ bp: Massage commit message. ] Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220822190730.27277-14-Sergey.Semin@baikalelectronics.ru
2022-08-25EDAC/wq: Remove unneeded flush_workqueue()ran jianping
destroy_workqueue() already takes care of flushing the workqueue so there is no need to flush it explicitly. [ bp: Massage commit message. ] Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ran jianping <ran.jianping@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220424062127.3219542-1-ran.jianping@zte.com.cn
2022-08-25EDAC/ie31200: Add Skylake-S supportJosh Hant
Add device IDs for Skylake-S CPUs according to datasheet. Signed-off-by: Josh Hant <joshuahant@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jason Baron <jbaron@akamai.com> Link: https://lore.kernel.org/r/20220712102121.20812-1-joshuahant@gmail.com
2022-08-06Merge tag 'powerpc-6.0-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for syscall stack randomization - Add support for atomic operations to the 32 & 64-bit BPF JIT - Full support for KASAN on 64-bit Book3E - Add a watchdog driver for the new PowerVM hypervisor watchdog - Add a number of new selftests for the Power10 PMU support - Add a driver for the PowerVM Platform KeyStore - Increase the NMI watchdog timeout during live partition migration, to avoid timeouts due to increased memory access latency - Add support for using the 'linux,pci-domain' device tree property for PCI domain assignment - Many other small features and fixes Thanks to Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira Rajeev, Bagas Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Greg Kurz, Haowen Bai, Hari Bathini, Jason A. Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg Haefliger, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada, Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Ning Qiang, Pali Rohár, Petr Mladek, Rashmica Gupta, Sachin Sant, Scott Cheloha, Segher Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu Jianfeng, and Zhouyi Zhou. * tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (191 commits) powerpc/64e: Fix kexec build error EDAC/ppc_4xx: Include required of_irq header directly powerpc/pci: Fix PHB numbering when using opal-phbid powerpc/64: Init jump labels before parse_early_param() selftests/powerpc: Avoid GCC 12 uninitialised variable warning powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address powerpc/xive: Fix refcount leak in xive_get_max_prio powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader powerpc/perf: Include caps feature for power10 DD1 version powerpc: add support for syscall stack randomization powerpc: Move system_call_exception() to syscall.c powerpc/powernv: rename remaining rng powernv_ functions to pnv_ powerpc/powernv/kvm: Use darn for H_RANDOM on Power9 powerpc/powernv: Avoid crashing if rng is NULL selftests/powerpc: Fix matrix multiply assist test powerpc/signal: Update comment for clarity powerpc: make facility_unavailable_exception 64s powerpc/platforms/83xx/suspend: Remove write-only global variable powerpc/platforms/83xx/suspend: Prevent unloading the driver powerpc/platforms/83xx/suspend: Reorder to get rid of a forward declaration ...