summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2023-04-27Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-25Merge tag 'soc-drivers-6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "The most notable updates this time are for Qualcomm Snapdragon platforms. The Inline-Crypto-Engine gets a new DT binding and driver, and a number of drivers now support additional Snapdragon variants, in particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc (edac) and rpm drivers get notable functionality updates. Updates on other platforms include: - Various updates to the Mediatek mutex and mmsys drivers, including support for the Helio X10 SoC - Support for unidirectional mailbox channels in Arm SCMI firmware - Support for per cpu asynchronous notification in OP-TEE firmware - Minor updates for memory controller drivers. - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra, Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX SoC drivers, mainly updating the use of MODULE_LICENSE() macros and obsolete DT driver interfaces" * tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits) soc: ti: smartreflex: Simplify getting the opam_sr pointer bus: vexpress-config: Add explicit of_platform.h include soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS memory: mtk-smi: mt8365: Add SMI Support dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365 dt-bindings: memory-controllers: mediatek,smi-common: add mt8365 memory: tegra: read values from correct device dt-bindings: crypto: Add Qualcomm Inline Crypto Engine soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver dt-bindings: firmware: document Qualcomm QCM2290 SCM soc: qcom: rpmh-rsc: Support RSC v3 minor versions soc: qcom: smd-rpm: Use GFP_ATOMIC in write path soc/tegra: fuse: Remove nvmem root only access soc/tegra: cbb: tegra194: Use of_address_count() helper soc/tegra: cbb: Remove MODULE_LICENSE in non-modules ARM: tegra: Remove MODULE_LICENSE in non-modules soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource() soc: tegra: cbb: Drop empty platform remove function firmware: arm_scmi: Add support for unidirectional mailbox channels dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels ...
2023-04-24Merge branches 'edac-drivers', 'edac-amd64' and 'edac-misc' into edac-updatesBorislav Petkov (AMD)
Combine all queued EDAC changes for submission into v6.4: * ras/edac-drivers: EDAC/i10nm: Add Intel Sierra Forest server support EDAC/skx: Fix overflows on the DRAM row address mapping arrays * ras/edac-amd64: (27 commits) EDAC/amd64: Fix indentation in umc_determine_edac_cap() EDAC/amd64: Add get_err_info() to pvt->ops EDAC/amd64: Split dump_misc_regs() into dct/umc functions EDAC/amd64: Split init_csrows() into dct/umc functions EDAC/amd64: Split determine_edac_cap() into dct/umc functions EDAC/amd64: Rename f17h_determine_edac_ctl_cap() EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions EDAC/amd64: Split ecc_enabled() into dct/umc functions EDAC/amd64: Split read_mc_regs() into dct/umc functions EDAC/amd64: Split determine_memory_type() into dct/umc functions EDAC/amd64: Split read_base_mask() into dct/umc functions EDAC/amd64: Split prep_chip_selects() into dct/umc functions EDAC/amd64: Rework hw_info_{get,put} EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt EDAC/amd64: Do not discover ECC symbol size for Family 17h and later EDAC/amd64: Drop dbam_to_cs() for Family 17h and later EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions EDAC/amd64: Rename debug_display_dimm_sizes() * ras/edac-misc: EDAC/altera: Remove MODULE_LICENSE in non-module EDAC: Sanitize MODULE_AUTHOR strings EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR EDAC/i5100: Fix typo in comment EDAC/altera: Remove redundant error logging Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-04-10EDAC/i10nm: Add Intel Sierra Forest server supportQiuxu Zhuo
The Sierra Forest CPU model uses similar memory controller registers as Granite Rapids server. Add Sierra Forest CPU model ID for EDAC support. Tested-by: Li Zhang <li4.zhang@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20230410131531.11914-1-qiuxu.zhuo@intel.com
2023-04-04EDAC/amd64: Fix indentation in umc_determine_edac_cap()Yang Li
Use consistent indentation to improve the readability and fix: drivers/edac/amd64_edac.c:1279 umc_determine_edac_cap() warn: inconsistent indenting Fixes: f6a4b4a1aa16 ("EDAC/amd64: Split determine_edac_cap() into dct/umc functions") Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230404022557.46409-1-yang.lee@linux.alibaba.com
2023-04-01EDAC/altera: Remove MODULE_LICENSE in non-moduleNick Alcock
Since 8b41fc4454e3 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. altera_edac is not a module for a while now, remove the macro call. Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230217141059.392471-24-nick.alcock@oracle.com
2023-03-28EDAC: Sanitize MODULE_AUTHOR stringsBorislav Petkov (AMD)
Fixup the remaining MODULE_AUTHOR strings to not contain newlines. Shorten and unbreak others. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230328134309.23159-1-bp@alien8.de
2023-03-28EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHORJonathan Neuschäfer
MODULE_AUTHOR strings don't usually include a newline character. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230129165054.1675554-1-j.neuschaefer@gmx.net
2023-03-24EDAC/amd64: Add get_err_info() to pvt->opsMuralidhara M K
GPU Nodes will use a different method to determine the chip select and channel of an error. A function pointer should be used rather than introduce another branching condition. Prepare for this by adding get_err_info() to pvt->ops. This function is only called from the modern code path, so a legacy function is not defined. Make sure to call this after MCA_STATUS[SyndV] is checked, since the csrow value is found in MCA_SYND. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split dump_misc_regs() into dct/umc functionsMuralidhara M K
Add a function pointer to pvt->ops. No functional change is intended. [ Yazen: Rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-22-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split init_csrows() into dct/umc functionsMuralidhara M K
Call them from their respective setup_mci_misc_attrs() paths. Also, drop the check for an "empty" device, i.e. one without memory. This is redundant and already done in instance_has_memory() earlier in the init path. No functional change is intended. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-21-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split determine_edac_cap() into dct/umc functionsMuralidhara M K
Call them from their respective setup_mci_misc_attrs() paths. No functional change is intended. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-20-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Rename f17h_determine_edac_ctl_cap()Yazen Ghannam
...to match the "umc_" prefix convention. No functional change is intended. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-19-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functionsMuralidhara M K
The init_one_instance() path is shared between legacy and modern systems. So add the new functions to a function pointer in pvt->ops. No functional change is intended. [ Yazen: Rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-18-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split ecc_enabled() into dct/umc functionsMuralidhara M K
Call them using a function pointer in pvt->ops. The "ECC enabled" check is done outside of the hardware information gathering done in hw_info_get(). So a high-level function pointer is needed to separate the legacy and modern paths. No functional change is intended. [Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-17-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split read_mc_regs() into dct/umc functionsMuralidhara M K
Call them from their respective hw_info_get() paths. ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz() is left out of the UMC path. Do not save TOP_MEM* values on modern controllers because they're not needed there (read: they were used only for debugging, if anything). [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-16-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split determine_memory_type() into dct/umc functionsMuralidhara M K
Call them from their respective hw_info_get() paths. Call them after all other hardware registers have been saved, since the memory type for a device will be determined based on the saved information. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-15-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split read_base_mask() into dct/umc functionsMuralidhara M K
Call them from their respective hw_info_get() paths. Call the new functions after the setting the chip select base and mask counts, since those are need to read the correct number of chip select base and mask registers. And call the new functions before the remaining set up, because the base and mask register values will be needed later. [Yazen: Rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split prep_chip_selects() into dct/umc functionsMuralidhara M K
Call them from their respective hw_info_get() function. Avoid the need for family/model-based function pointers. Add the calls before reading hardware registers from the memory controllers, since the number of chip select bases and masks needs to be known first. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-13-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Rework hw_info_{get,put}Yazen Ghannam
The bulk of system-specific information is gathered at init time with hw_info_get(). This function calls a number of helper functions, and many of these helper functions are split between a modern UMC/DF path and a legacy DCT path. Split hw_info_get() into legacy and modern versions. This creates two separate code paths early on, and legacy and modern helper functions can be called directly in the appropriate code path. Also, simplify hw_info_put() and share it between legacy and modern systems. NULL pointer checks are done in pci_dev_put() and kfree(), so they can be called unconditionally. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-12-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvtMuralidhara M K
Future AMD systems will support heterogeneous "AMD Node" types, e.g. CPU and GPU types. Therefore, a global family type shared across all AMD nodes is no longer appropriate. Move struct low_ops routines and members of struct amd64_family_type to struct amd64_pvt. Currently, there are many code branches that split between "modern" and "legacy" systems. Another code branch will be needed in order to cover GPU cases. However, rather than introduce another branching case in multiple functions, the current branching code should be switched to a set of function pointers. This change makes the code more readable and simplifies adding support for new families/models. In order to reuse code, define two sets of function pointers. Use one for modern systems (Family 17h and later). This will not change between current CPU families. Use another set of function pointers for legacy systems (before Family 17h). Use the Family 16h versions as default for the legacy ops since these are the latest, and adjust the function pointers as needed for older families. [ Yazen: rebased/reworked patch and reworded commit message. ] [ bp: Fix rev8 or later check. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-11-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Do not discover ECC symbol size for Family 17h and laterYazen Ghannam
The ECC symbol size was needed on legacy system to lookup the ECC syndrome. This is not needed on modern systems because the ECC syndrome is explicitly provided in the MCA information. Remove the ECC symbol size discovery code for modern UMC-based systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-10-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Drop dbam_to_cs() for Family 17h and laterYazen Ghannam
The same function is used to calculate chip select size for all Zen-based family/models. Therefore, a family/model function pointer is not necessary. Drop the dbam_to_cs() function pointer for Family 17h and later systems. Also, move the Family 17h function to avoid a forward declaration. Rename it to indicate that the UMC Address Mask is used rather than the legacy DBAM value. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-9-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functionsYazen Ghannam
Split get_csrow_nr_pages() into a legacy and modern versions in preparation for further legacy/modern refactoring. Also, rename f17_get_cs_mode() to match the new convention. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-8-yazen.ghannam@amd.com
2023-03-24EDAC/amd64: Rename debug_display_dimm_sizes()Yazen Ghannam
Use the "dct" and "umc" prefixes for legacy and modern versions respectively. Also, move the "dct" version to avoid a forward declaration, and fixup some checkpatch warnings in the process. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-7-yazen.ghannam@amd.com
2023-03-23EDAC/i5100: Fix typo in commentJongwoo Han
Correct typo from 'preform' to 'perform' in comment. Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230302021120.56794-1-jongwooo.han@gmail.com
2023-03-22EDAC/sysfs: move to use bus_get_dev_root()Greg Kroah-Hartman
Direct access to the struct bus_type dev_root pointer is going away soon so replace that with a call to bus_get_dev_root() instead, which is what it is there for. Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: linux-edac@vger.kernel.org Link: https://lore.kernel.org/r/20230313182918.1312597-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-21EDAC/altera: Remove redundant error loggingDeepak R Varma
A call to platform_get_irq() already prints an error on failure within its own implementation. So printing another error based on its return value in the caller is redundant and should be removed. The clean up also makes if condition block braces unnecessary. Remove that as well. Issue identified using platform_get_irq.cocci coccinelle semantic patch. Signed-off-by: Deepak R Varma <drv@mailo.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y/+j27kqdhflPtaj@ubun2204.myguest.virtualbox.org
2023-03-15qcom: llcc/edac: Support polling mode for ECC handlingManivannan Sadhasivam
Not all Qcom platforms support IRQ mode for ECC handling. For those platforms, the current EDAC driver will not be probed due to missing ECC IRQ in devicetree. So add support for polling mode so that the EDAC driver can be used on all Qcom platforms supporting LLCC. The polling delay of 5000ms is chosen based on Qcom downstream/vendor driver. Reported-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230314080443.64635-14-manivannan.sadhasivam@linaro.org
2023-03-15qcom: llcc/edac: Fix the base address used for accessing LLCC banksManivannan Sadhasivam
The Qualcomm LLCC/EDAC drivers were using a fixed register stride for accessing the (Control and Status Registers) CSRs of each LLCC bank. This stride only works for some SoCs like SDM845 for which driver support was initially added. But the later SoCs use different register stride that vary between the banks with holes in-between. So it is not possible to use a single register stride for accessing the CSRs of each bank. By doing so could result in a crash. For fixing this issue, let's obtain the base address of each LLCC bank from devicetree and get rid of the fixed stride. This also means, there is no need to rely on reg-names property and the base addresses can be obtained using the index. First index is LLCC bank 0 and last index is LLCC broadcast. If the SoC supports more than one bank, then those need to be defined in devicetree for index from 1..N-1. Reported-by: Parikshit Pareek <quic_ppareek@quicinc.com> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230314080443.64635-13-manivannan.sadhasivam@linaro.org
2023-03-13EDAC/skx: Fix overflows on the DRAM row address mapping arraysQiuxu Zhuo
The current DRAM row address mapping arrays skx_{open,close}_row[] only support ranks with sizes up to 16G. Decoding a rank address to a DRAM row address for a 32G rank by using either one of the above arrays by the skx_edac driver, will result in an overflow on the array. For a 32G rank, the most significant DRAM row address bit (the bit17) is mapped from the bit34 of the rank address. Add this new mapping item to both arrays to fix the overflow issue. Fixes: 4ec656bdf43a ("EDAC, skx_edac: Add EDAC driver for Skylake") Reported-by: Feng Xu <feng.f.xu@intel.com> Tested-by: Feng Xu <feng.f.xu@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230211011728.71764-1-qiuxu.zhuo@intel.com
2023-02-21Merge tag 'edac_updates_for_v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add a driver for the RAS functionality on Xilinx's on chip memory controller - Add support for decoding errors from the first and second level memory on SKL-based hardware - Add support for the memory controllers in Intel Granite Rapids and Emerald Rapids machines - First round of amd64_edac driver simplification and removal of unneeded functionality - The usual cleanups and fixes * tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive EDAC/amd64: Remove early_channel_count() EDAC/amd64: Remove PCI Function 0 EDAC/amd64: Remove PCI Function 6 EDAC/amd64: Remove scrub rate control for Family 17h and later EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ EDAC/i10nm: Add driver decoder for Sapphire Rapids server EDAC/i10nm: Add Intel Granite Rapids server support EDAC/i10nm: Make more configurations CPU model specific EDAC/i10nm: Add Intel Emerald Rapids server support EDAC/skx_common: Delete duplicated and unreachable code EDAC/skx_common: Enable EDAC support for the "near" memory EDAC/qcom: Add platform_device_id table for module autoloading EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCM dt-bindings: edac: Add bindings for Xilinx ZynqMP OCM
2023-02-21Merge tag 'ras_core_for_v6.3_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Add support for reporting more bits of the physical address on error, on newer AMD CPUs - Mask out bits which don't belong to the address of the error being reported * tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Mask out non-address bits from machine check bank x86/mce: Add support for Extended Physical Address MCA changes x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
2023-02-14EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positiveYazen Ghannam
Reportedly, clang cannot do interprocedural analysis: https://lore.kernel.org/r/20230213-amd64_edac-wsometimes-uninitialized-v1-1-5bde32b89e02@kernel.org and see that those arguments won't be used uninitialized. So, yeah, the code's fine even without this. Normally, such a "fix" won't be applied but that warning gets automatically enabled in -Wall builds and when CONFIG_WERROR is set in allmodconfig builds, the build fails. So shut it up with a minimal fix as this code will see more reorganization very soon. [ bp: Write commit message. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/Y%2BqdVHidnrrKvxiD@dev-arch.thelio-3990X
2023-02-09EDAC/amd64: Remove early_channel_count()Yazen Ghannam
The early_channel_count() function seems to have been useful in the past for knowing how many EDAC mci structures to populate. However, this is no longer needed as the maximum channel count for a system is used instead. Remove the early_channel_count() helper functions and related code. Use the size of the channel layer when iterating over channel structures. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-6-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Remove PCI Function 0Yazen Ghannam
PCI Function 0 is used on Family 17h and later only to read the "dhar" value. This value is printed and provided through a module-specific debug sysfs file. The value is not used for any Family 17h and later code, and it does not have any apparent debug value on these systems. Remove "dhar", Function 0 PCI IDs, and all related code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-5-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Remove PCI Function 6Yazen Ghannam
PCI Function 6 is used on Family 17h and later to access scrub registers. With scrub access removed, this function has no other use. Remove all Function 6 PCI IDs and related code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-4-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Remove scrub rate control for Family 17h and laterYazen Ghannam
The scrub registers on AMD Family 17h and later may be inaccessible to the OS. Furthermore, hardware designers recommend that the scrubbing feature is managed by the firmware. Remove support for the sdram_scrub_rate interface for AMD Family 17h systems and later by not setting the scrub function pointers. The EDAC MC core will then not expose the scrub files in sysfs. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-3-yazen.ghannam@amd.com
2023-02-09EDAC/amd64: Don't set up EDAC PCI control on Family 17h+Yazen Ghannam
EDAC PCI control is used to detect/report legacy PCI errors like "Parity" and "SERROR". Modern AMD systems use PCIe Advanced Error Reporting (AER), and legacy PCI errors should not be reported. Remove EDAC PCI control setup on AMD Family 17h and later systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-2-yazen.ghannam@amd.com
2023-02-08EDAC/i10nm: Add driver decoder for Sapphire Rapids serverYouquan Song
Intel SDM (December 2022) vol3B 17.13.2 contains IMC MC error codes for Sapphire Rapids. Current i10nm_edac only supports firmware decoder (ACPI DSM methods) for Sapphire Rapids. So add the driver decoder (decoding DDR memory errors via extracting error information from the IMC MC error codes) for Sapphire Rapids for better decoding performance. Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2023-01-25EDAC/i10nm: Add Intel Granite Rapids server supportQiuxu Zhuo
The Granite Rapids CPU model uses similar memory controller registers as Sapphire Rapids server but with some different configurations: - Various memory controller numbers for different Granite Rapids CPUs. So detect the number of present memory controllers at run time. - Different MMIO offsets of memory controllers. - Different triples of bus/dev/fun of some PCI devices used in i10nm_edac. Add above configurations and Granite Rapids CPU model ID for EDAC support. [Tony: Fixed 2 typos s/strcture/structure/] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/i10nm: Make more configurations CPU model specificQiuxu Zhuo
The numbers of memory controllers per socket, channels per memory controller, DIMMs per channel and the triples of bus/device/function of PCI devices used in i10nm_edac can be CPU model specific. Add new fields to the structure res_config for above numbers and triples to make them CPU model specific. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/i10nm: Add Intel Emerald Rapids server supportQiuxu Zhuo
The Emerald Rapids CPU model uses similar memory controller registers as Sapphire Rapids server. Add Emerald Rapids CPU model number ID for EDAC support. Tested-by: Li Zhang <li4.zhang@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/skx_common: Delete duplicated and unreachable codeQiuxu Zhuo
skx_mce_check_error() returns early if the error isn't from memory. So when skx_mce_output_error() is invoked from skx_mce_check_error(), it doesn't need to re-check whether the error is from memory. Delete the duplicated and unreachable code from skx_mce_output_error(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-25EDAC/skx_common: Enable EDAC support for the "near" memoryQiuxu Zhuo
The current {skx,i10nm}_edac miss the EDAC support to decode errors from the 1st level memory (the fast "near" memory as cache) of the 2-level memory system. Introduce a helper function skx_error_in_mem() to check whether errors are from memory at the beginning of skx_mce_check_error(). As long as the errors are from memory (either the 1-level memory system or the 2-level memory system), decode the errors. Reported-and-tested-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
2023-01-20EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_infoManivannan Sadhasivam
The memory for llcc_driv_data is allocated by the LLCC driver. But when it is passed as the private driver info to the EDAC core, it will get freed during the qcom_edac driver release. So when the qcom_edac driver gets probed again, it will try to use the freed data leading to the use-after-free bug. Hence, do not pass llcc_driv_data as pvt_info but rather reference it using the platform_data pointer in the qcom_edac driver. Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs") Reported-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.20 Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org
2023-01-20EDAC/qcom: Add platform_device_id table for module autoloadingManivannan Sadhasivam
Add a device ID table so that the driver loads automatically when the associated platform_device gets registered. Reported-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Link: https://lore.kernel.org/r/20230118150904.26913-3-manivannan.sadhasivam@linaro.org
2023-01-19EDAC/device: Respect any driver-supplied workqueue polling valueManivannan Sadhasivam
The EDAC drivers may optionally pass the poll_msec value. Use that value if available, else fall back to 1000ms. [ bp: Touchups. ] Fixes: e27e3dac6517 ("drivers/edac: add edac_device class") Reported-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.9 Link: https://lore.kernel.org/r/COZYL8MWN97H.MROQ391BGA09@otso
2023-01-10x86/mce: Mask out non-address bits from machine check bankTony Luck
Systems that support various memory encryption schemes (MKTME, TDX, SEV) use high order physical address bits to indicate which key should be used for a specific memory location. When a memory error is reported, some systems may report those key bits in the IA32_MCi_ADDR machine check MSR. The Intel SDM has a footnote for the contents of the address register that says: "Useful bits in this field depend on the address methodology in use when the register state is saved." AMD Processor Programming Reference has a more explicit description of the MCA_ADDR register: "For physical addresses, the most significant bit is given by Core::X86::Cpuid::LongModeInfo[PhysAddrSize]." Add a new #define MCI_ADDR_PHYSADDR for the mask of valid physical address bits within the machine check bank address register. Use this mask for recoverable machine check handling and in the EDAC driver to ignore any key bits that may be present. [ Tony: Based on independent fixes proposed by Fan Du and Isaku Yamahata ] Reported-by: Isaku Yamahata <isaku.yamahata@intel.com> Reported-by: Fan Du <fan.du@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20230109152936.397862-1-tony.luck@intel.com
2023-01-09EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCMSai Krishna Potthuri
Add EDAC support for Xilinx ZynqMP OCM Controller, so this driver reports CE and UE errors upon interrupt generation. Also add debugfs files for error injection. On Xilinx ZynqMP platform, both OCM Controller driver(zynqmp_edac) and DDR Memory Controller driver(synopsys_edac) co-exist which means both can be loaded at a time. This scenario is tested on Xilinx ZynqMP platform. Fix following issue reported by the robot: "MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/edac/xlnx,zynqmp-ocmc.yaml" [ bp: - Massage commit message - s/EDAC_ZYNQMP_OCM/EDAC_ZYNQMP/ - Touchups ] Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230104084512.1855243-3-sai.krishna.potthuri@amd.com