summaryrefslogtreecommitdiff
path: root/drivers/edac/igen6_edac.c
AgeCommit message (Collapse)Author
2024-11-18Merge branch 'edac-misc' into edac-updatesBorislav Petkov (AMD)
* edac-misc: MAINTAINERS: Change FSL DDR EDAC maintainership RAS/AMD/ATL: Add debug prints for DF register reads EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2 EDAC/bluefield: Fix potential integer overflow EDAC/igen6: Add Intel Panther Lake-H SoCs support Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-11-08EDAC/igen6: Add polling supportOrange Kao
Some PCs with Intel N100 (with PCI device 8086:461c, DID_ADL_N_SKU4) experienced issues with error interrupts not working, even with the following configuration in the BIOS. In-Band ECC Support: Enabled In-Band ECC Operation Mode: 2 (make all requests protected and ignore range checks) IBECC Error Injection Control: Inject Correctable Error on insertion counter Error Injection Insertion Count: 251658240 (0xf000000) Add polling mode support for these machines to ensure that memory error events are handled. Signed-off-by: Orange Kao <orange@aiven.io> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/all/20241106114024.941659-3-orange@aiven.io
2024-11-08EDAC/igen6: Initialize edac_op_state according to the configuration dataQiuxu Zhuo
Currently, igen6_edac sets edac_op_state to EDAC_OPSTATE_NMI, while the driver also supports memory errors reported from Machine Check. Initialize edac_op_state to the correct value according to the configuration data that the driver probed. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20241106114024.941659-2-orange@aiven.io
2024-11-04EDAC/igen6: Avoid segmentation fault on module unloadOrange Kao
The segmentation fault happens because: During modprobe: 1. In igen6_probe(), igen6_pvt will be allocated with kzalloc() 2. In igen6_register_mci(), mci->pvt_info will point to &igen6_pvt->imc[mc] During rmmod: 1. In mci_release() in edac_mc.c, it will kfree(mci->pvt_info) 2. In igen6_remove(), it will kfree(igen6_pvt); Fix this issue by setting mci->pvt_info to NULL to avoid the double kfree. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360 Signed-off-by: Orange Kao <orange@aiven.io> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io
2024-10-14EDAC/igen6: Add Intel Panther Lake-H SoCs supportLili Li
Panther Lake-H SoCs share the same IBECC registers with Meteor Lake-P SoCs. Add Panther Lake-H SoC compute die IDs for EDAC support. Signed-off-by: Lili Li <lili.li@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241012071439.54165-1-qiuxu.zhuo@intel.com
2024-09-03EDAC/igen6: Fix conversion of system address to physical memory addressQiuxu Zhuo
The conversion of system address to physical memory address (as viewed by the memory controller) by igen6_edac is incorrect when the system address is above the TOM (Total amount Of populated physical Memory) for Elkhart Lake and Ice Lake (Neural Network Processor). Fix this conversion. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com
2024-07-15Merge tag 'edac_updates_for_v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - The AMD memory controllers data fabric version 4.5 supports non-power-of-2 denormalization in the sense that certain bits of the system physical address cannot be reconstructed from the normalized address reported by the RAS hardware. Add support for handling such addresses - Switch the EDAC drivers to the new Intel CPU model defines - The usual fixes and cleanups all over the place * tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Add missing MODULE_DESCRIPTION() macros EDAC/dmc520: Use devm_platform_ioremap_resource() EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization RAS/AMD/ATL: Validate address map when information is gathered RAS/AMD/ATL: Expand helpers for adding and removing base and hole RAS/AMD/ATL: Read DRAM hole base early RAS/AMD/ATL: Add amd_atl pr_fmt() prefix RAS/AMD/ATL: Add a missing module description EDAC, i10nm: make skx_common.o a separate module EDAC/skx: Switch to new Intel CPU model defines EDAC/sb_edac: Switch to new Intel CPU model defines EDAC, pnd2: Switch to new Intel CPU model defines EDAC/i10nm: Switch to new Intel CPU model defines EDAC/ghes: Add missing newline to pr_info() statement RAS/AMD/ATL: Add missing newline to pr_info() statement EDAC/thunderx: Remove unused struct error_syndrome
2024-06-14EDAC/igen6: Add Intel Arrow Lake-U/H SoCs supportQiuxu Zhuo
Arrow Lake-U/H SoCs share same IBECC registers with Meteor Lake-P SoCs. Add Arrow Lake-U/H SoC compute die IDs for EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240614030354.69180-1-qiuxu.zhuo@intel.com
2024-06-04EDAC/igen6: Convert PCIBIOS_* return codes to errnosIlpo Järvinen
errcmd_enable_error_reporting() uses pci_{read,write}_config_word() that return PCIBIOS_* codes. The return code is then returned all the way into the probe function igen6_probe() that returns it as is. The probe functions, however, should return normal errnos. Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal errno before returning it from errcmd_enable_error_reporting(). Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com
2024-02-01EDAC/igen6: Add one more Intel Alder Lake-N SoC supportLili Li
Add a new Intel Alder Lake-N SoC compute die ID for EDAC support. Signed-off-by: Lili Li <lili.li@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20240129062040.60809-2-qiuxu.zhuo@intel.com
2023-12-05EDAC/igen6: Add Intel Meteor Lake-P SoCs supportQiuxu Zhuo
Add Intel Meteor Lake-P SoC compute die IDs for EDAC support. These Meteor Lake-P SoCs share similar IBECC registers with Alder Lake-P SoCs. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05EDAC/igen6: Add Intel Meteor Lake-PS SoCs supportQiuxu Zhuo
Add Intel Meteor Lake-PS SoC compute die IDs for EDAC support. These SoCs share similar IBECC registers with Alder Lake-P SoCs. The only difference is that IBECC presence is detected through an MMIO-mapped register instead of the capability register in the PCI configuration space. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05EDAC/igen6: Add Intel Raptor Lake-P SoCs supportQiuxu Zhuo
Add Intel Raptor Lake-P SoC compute die IDs for EDAC support. These Raptor Lake-P SoCs share similar IBECC registers with Alder Lake-P SoCs but extend the most significant bit of the error address logged in IBECC from bit 38 to bit 45. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05EDAC/igen6: Add Intel Alder Lake-N SoCs supportQiuxu Zhuo
Add Intel Alder Lake-N SoC compute die IDs for EDAC support. Alder Lake-N, with one memory controller, is a reduced version of Alder Lake-P, which has two memory controllers. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-12-05EDAC/igen6: Make get_mchbar() helper functionQiuxu Zhuo
Make get_mchbar() helper function to retrieve the BAR address of the memory controller. No function changes. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-08-02EDAC/igen6: Fix the issue of no error eventsQiuxu Zhuo
Current igen6_edac checks for pending errors before the registration of the error handler. However, there is a possibility that the error occurs during the registration process, leading to unhandled pending errors and no future error events. This issue can be reproduced by repeatedly injecting errors during the loading of the igen6_edac. Fix this issue by moving the pending error handler after the registration of the error handler, ensuring that no pending errors are left unhandled. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: Ee Wey Lim <ee.wey.lim@intel.com> Tested-by: Ee Wey Lim <ee.wey.lim@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com
2022-10-25EDAC/igen6: Return the correct error type when not the MC ownerJia He
Return -EBUSY instead of -ENODEV just like the other EDAC drivers do. [ bp: Rewrite text. ] Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221018082214.569504-8-justin.he@arm.com
2022-10-21EDAC: Check for GHES preference in the chipset-specific EDAC driversJia He
Call ghes_get_devices() to check whether ghes_edac should be used on the platform where it is preferred over the corresponding chipset-specific EDAC driver. Unlike the existing edac_get_owner() check, the ghes_get_devices() check works independent to the module_init ordering. [ bp: Massage. ] Suggested-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-6-justin.he@arm.com
2021-06-17EDAC/igen6: Add Intel Alder Lake SoC supportQiuxu Zhuo
Alder Lake SoC shares the same memory controller and In-Band ECC (IBECC) IP with Tiger Lake SoC. Like Tiger Lake, it also has two memory controllers each associated one IBECC instance. The minor differences include the MMIO offset of each memory controller and the type of memory error address logged in the IBECC. So add Alder Lake compute die IDs, adjust the MMIO offset for each memory controller and handle the type of memory error address logged in the IBECC for Alder Lake EDAC support. Tested-by: Vrukesh V Panse <vrukesh.v.panse@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-7-tony.luck@intel.com
2021-06-17EDAC/igen6: Add Intel Tiger Lake SoC supportQiuxu Zhuo
Tiger Lake SoC shares the same memory controller and In-Band ECC (IBECC) IP with Elkhart Lake SoC. The main differences are that Tiger Lake has two memory controllers each associated with one IBECC and uses Machine Check for the memory error notification. So add Tiger Lake compute die IDs, MCE decoding chain registration, and memory slice decoding for Tiger Lake EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-6-tony.luck@intel.com
2021-06-17EDAC/igen6: Add Intel ICL-NNPI SoC supportQiuxu Zhuo
The Ice Lake Neural Network Processor for Deep Learning Inference (ICL-NNPI) SoC shares the same memory controller and In-Band ECC with Elkhart Lake SoC. Add the ICL-NNPI compute die IDs for EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210611170123.1057025-5-tony.luck@intel.com
2020-11-23EDAC/igen6: ecclog_llist can be statickernel test robot
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20201123031850.GA20416@aef56166e5fc Signed-off-by: Tony Luck <tony.luck@intel.com>
2020-11-19EDAC/igen6: Add debugfs interface for Intel client SoC EDAC driverQiuxu Zhuo
Add debugfs support to fake memory correctable errors to test the error reporting path and the error address decoding logic in the igen6_edac driver. Please note that the fake errors are also reported to EDAC core and then the CE counter in EDAC sysfs is also increased. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2020-11-19EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECCQiuxu Zhuo
This driver supports Intel client SoC with integrated memory controller using In-Band ECC(IBECC). The memory correctable and uncorrectable errors are reported via NMIs. The driver handles the NMIs and decodes the memory error address to platform specific address. The first IBECC-supported SoC is Elkhart Lake. [Tony: s/#include <linux/nmi.h>/#include <asm/nmi.h>/ to fix randconfig build] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>