linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-05-30	Merge tag 'acpi-6.16-rc1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These address issues introduced by recent ACPI changes merged previously: - Unbreak acpi_ut_safe_strncpy() by restoring its previous behavior changed incorrectly by a recent update (Ahmed Salem) - Make a new static checker warning in the recently introduced ACPI MRRM table parser go away (Dan Carpenter) - Fix ACPI table referece leak in error path of einj_probe() (Dan Carpenter)" * tag 'acpi-6.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: Switch back to using strncpy() in acpi_ut_safe_strncpy() ACPI: MRRM: Silence error code static checker warning ACPI: APEI: EINJ: Clean up on error in einj_probe()
2025-05-28	Merge tag 'arm64-upstream' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The headline feature is the re-enablement of support for Arm's Scalable Matrix Extension (SME) thanks to a bumper crop of fixes from Mark Rutland. If matrices aren't your thing, then Ryan's page-table optimisation work is much more interesting. Summary: ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected Memory management: - Prevent BSS exports from being used by the early PI code - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support - Fix default setting of the OUTPUT variable so that tests are installed in the right location vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function Miscellaneous: - Add some missing header inclusions to the CCA headers - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical) - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: cputype: Add cputype definition for HIP12 arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again perf/arm-cmn: Add CMN S3 ACPI binding arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace perf/arm-cmn: Initialise cmn->cpu earlier kselftest/arm64: Set default OUTPUT path when undefined arm64: Update comment regarding values in __boot_cpu_mode arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/cpuinfo: only show one cpu's info in c_show() arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop ...
2025-05-27	ACPI: APEI: EINJ: Clean up on error in einj_probe()	Dan Carpenter
	Call acpi_put_table() before returning the error code. Fixes: e54b1dc1c4f0 ("ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/aDVRBok33LZhXcId@stanley.mountain Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-09	ACPI: APEI: EINJ: Remove redundant calls to einj_get_available_error_type()	Zaid Alali
	A single call to einj_get_available_error_type() in init function is sufficient to save the return value in a global variable to be used later in various places in the code. This change has no functional impact, but only removes unnecessary redundant function calls. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zaid Alali <zaidal@os.amperecomputing.com> Link: https://patch.msgid.link/20250506213814.2365788-5-zaidal@os.amperecomputing.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-08	firmware: SDEI: Allow sdei initialization without ACPI_APEI_GHES	Huang Yiwei
	SDEI usually initialize with the ACPI table, but on platforms where ACPI is not used, the SDEI feature can still be used to handle specific firmware calls or other customized purposes. Therefore, it is not necessary for ARM_SDE_INTERFACE to depend on ACPI_APEI_GHES. In commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in acpi_init()"), to make APEI ready earlier, sdei_init was moved into acpi_ghes_init instead of being a standalone initcall, adding ACPI_APEI_GHES dependency to ARM_SDE_INTERFACE. This restricts the flexibility and usability of SDEI. This patch corrects the dependency in Kconfig and splits sdei_init() into two separate functions: sdei_init() and acpi_sdei_init(). sdei_init() will be called by arch_initcall and will only initialize the platform driver, while acpi_sdei_init() will initialize the device from acpi_ghes_init() when ACPI is ready. This allows the initialization of SDEI without ACPI_APEI_GHES enabled. Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") Cc: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250507045757.2658795-1-quic_hyiwei@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-07	ACPI: APEI: EINJ: Fix probe error message	Jon Hunter
	Commit 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface") updated the APEI error injection driver to use the faux device interface and now for devices that don't support ACPI, the following error message is seen on boot: ERR KERN faux acpi-einj: probe did not succeed, tearing down the device The APEI error injection driver returns -ENODEV in the probe function if ACPI is not supported and so after transitioning the driver to the faux device interface, the error returned from the probe now causes the above error message to be displayed. Fix this by moving the code that detects if ACPI is supported to the einj_init() function to fix the false error message displayed for devices that don't support ACPI. Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/20250501124621.1251450-1-jonathanh@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-04-09	ACPI: APEI: EINJ: Transition to the faux device interface	Sudeep Holla
	The APEI error injection driver does not require the creation of a platform device. Originally, this approach was chosen for simplicity when the driver was first implemented. With the introduction of the lightweight faux device interface, we now have a more appropriate alternative. Migrate the driver to utilize the faux bus, given that the platform device it previously created was not a real one anyway. This will simplify the code, reducing its footprint while maintaining functionality. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/20250317-plat2faux_dev-v1-8-5fe67c085ad5@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-14	acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors	Smita Koralahalli
	When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol errors. The defined trace events cxl_aer_uncorrectable_error and cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them to trace FW-First Protocol errors. Since the CXL code is required to be called from process context and GHES is in interrupt context, use workqueues for processing. Similar to CXL CPER event handling, use kfifo to handle errors as it simplifies queue processing by providing lock free fifo operations. Add the ability for the CXL sub-system to register a workqueue to process CXL CPER protocol errors. [DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-06	acpi/ghes, cper: Recognize and cache CXL Protocol errors	Smita Koralahalli
	Add support in GHES to detect and process CXL CPER Protocol errors, as defined in UEFI v2.10, section N.2.13. Define struct cxl_cper_prot_err_work_data to cache CXL protocol error information, including RAS capabilities and severity, for further handling. These cached CXL CPER records will later be processed by workqueues within the CXL subsystem. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250123084421.127697-5-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-14	APEI: GHES: Have GHES honor the panic= setting	Borislav Petkov
	The GHES driver overrides the panic= setting by force-rebooting the system after a fatal hw error has been reported. The intent being that such an error would be reported earlier. However, this is not optimal when a hard-to-debug issue requires long time to reproduce and when that happens, the box will get rebooted after 30 seconds and thus destroy the whole hw context of when the error happened. So rip out the default GHES panic timeout and honor the global one. In the panic disabled (panic=0) case, the error will still be logged to dmesg for later inspection and if panic after a hw error is really required, then that can be controlled the usual way - use panic= on the cmdline or set it in the kernel .config's CONFIG_PANIC_TIMEOUT. Reported-by: Feng Tang <feng.tang@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250113125224.GFZ4UMiNtWIJvgpveU@fat_crate.local Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-02	module: Convert symbol namespace to string literal	Peter Zijlstra
	Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS \| while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS$([^)])$/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(])$([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(])\(([^,]+), ([^)]+)$/ && $0 !~ /(EXPORT_SYMBOL_NS[^(])/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(])$([^,]+), ([^)]+)$/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-12	ACPI: Switch back to struct platform_driver::remove()	Uwe Kleine-König
	After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/acpi to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/9ee1a9813f53698be62aab9d810b2d97a2a9f186.1731397722.git.u.kleine-koenig@baylibre.com [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-05	Merge tag 'cxl-fixes-6.12-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fix from Ira Weiny: - Fix calculation for SBDF in error injection * tag 'cxl-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: EINJ, CXL: Fix CXL device SBDF calculation
2024-10-02	move asm/unaligned.h to linux/unaligned.h	Al Viro
	asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-30	EINJ, CXL: Fix CXL device SBDF calculation	Ben Cheatham
	The SBDF of the target CXL 2.0 compliant root port is required to inject a CXL protocol error as per ACPI 6.5. The SBDF given has to be in the following format: 31 24 23 16 15 11 10 8 7 0 +-------------------------------------------------+ \| segment \| bus \| device \| function \| reserved \| +-------------------------------------------------+ The SBDF calculated in cxl_dport_get_sbdf() doesn't account for the reserved bits currently, causing the wrong SBDF to be used. Fix said calculation to properly shift the SBDF. Without this fix, error injection into CXL 2.0 root ports through the CXL debugfs interface (<debugfs>/cxl) is broken. Injection through the legacy interface (<debugfs>/apei/einj/) will still work because the SBDF is manually provided by the user. Fixes: 12fb28ea6b1cf ("EINJ: Add CXL error type support") Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> Reviewed-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> Link: https://patch.msgid.link/20240927163428.366557-1-Benjamin.Cheatham@amd.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-09-27	Merge tag 'cxl-for-6.12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull compute express link (cxl) updates from Dave Jiang: "Major changes address HDM decoder initialization from DVSEC ranges, refactoring the code related to cxl mailboxes to be independent of the memory devices, and adding support for shared upstream link access_coordinate calculation, as well as a change to remove locking from memory notifier callback. In addition, a number of misc cleanups and refactoring of the code are also included. Address HDM decoder initialization from DVSEC ranges: - Only register non-zero DVSEC ranges - Remove duplicate implementation of waiting for memory_info_valid - Simplify the checking of mem_enabled in cxl_hdm_decode_init() Refactor the code related to cxl mailboxes to be independent of the memory devices: - Move cxl headers in include/linux/ to include/cxl - Move all mailbox related data to 'struct cxl_mailbox' - Refactor mailbox APIs with 'struct cxl_mailbox' as input instead of memory device state Add support for shared upstream link access_coordinate calculation for configurations that have multiple targets under a switch or a root port where the aggregated bandwidth can be greater than the upstream link of the switch/RP upstream link: - Preserve the CDAT access_coordinate from an endpoint - Add the support for shared upstream link access_coordinate calculation - Add documentation to explain how the calculations are done Remove locking from memory notifier callback. Misc cleanups: - Convert devm_cxl_add_root() to return using ERR_CAST() - cxl_test use dev_is_platform() instead of open coding - Remove duplicate include of header core.h in core/cdat.c - use scoped resource management to drop put_device() for cxl_port - Use scoped_guard to drop device_lock() for cxl_port - Refactor __devm_cxl_add_port() to drop gotos - Rename cxl_setup_parent_dport to cxl_dport_init_aer and cxl_dport_map_regs() to cxl_dport_map_ras() - Refactor cxl_dport_init_aer() to be more concise - Remove duplicate host_bridge->native_aer checking in cxl_dport_init_ras_reporting() - Fix comment for cxl_query_cmd()" * tag 'cxl-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) cxl: Add documentation to explain the shared link bandwidth calculation cxl: Calculate region bandwidth of targets with shared upstream link cxl: Preserve the CDAT access_coordinate for an endpoint cxl: Fix comment regarding cxl_query_cmd() return data cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input cxl: Move mailbox related bits to the same context cxl: move cxl headers to new include/cxl/ directory cxl/region: Remove lock from memory notifier callback cxl/pci: simplify the check of mem_enabled in cxl_hdm_decode_init() cxl/pci: Check Mem_info_valid bit for each applicable DVSEC cxl/pci: Remove duplicated implementation of waiting for memory_info_valid cxl/pci: Fix to record only non-zero ranges cxl/pci: Remove duplicate host_bridge->native_aer checking cxl/pci: cxl_dport_map_rch_aer() cleanup cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs() cxl/port: Refactor __devm_cxl_add_port() to drop goto pattern cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port cxl/port: Use __free() to drop put_device() for cxl_port cxl: Remove duplicate included header file core.h tools/testing/cxl: Use dev_is_platform() ...
2024-09-27	[tree-wide] finally take no_llseek out	Al Viro
	no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek \| grep -v porting.rst \| while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-09	cxl: move cxl headers to new include/cxl/ directory	Dave Jiang
	Group all cxl related kernel headers into include/cxl/ directory. Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20240905223711.1990186-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-05	Merge branches 'acpi-ec', 'acpi-apei' and 'pnp'	Rafael J. Wysocki
	Merge ACPI EC driver fixes, an ACPI APEI fix and PNP fixes for 6.10-rc3: - Fix error handling during EC operation region accesses in the ACPI EC driver (Armin Wolf). - Fix a memory leak in the APEI error injection driver introduced during its converion to a platform driver (Dan Williams). - Fix build failures related to the dev_is_pnp() macro by redefining it as a proper function and exporting it to modules as appropriate and unexport pnp_bus_type which need not be exported any more (Andy Shevchenko). * acpi-ec: ACPI: EC: Avoid returning AE_OK on errors in address space handler ACPI: EC: Abort address space access upon error * acpi-apei: ACPI: APEI: EINJ: Fix einj_dev release leak * pnp: PNP: Hide pnp_bus_type from the non-PNP code PNP: Make dev_is_pnp() to be a function and export it for modules
2024-05-27	ACPI: APEI: EINJ: Fix einj_dev release leak	Dan Williams
	The platform driver conversion of EINJ mistakenly used platform_device_del() to unwind platform_device_register_full() at module exit. This leads to a small leak of one 'struct platform_device' instance per module load/unload cycle. Switch to platform_device_unregister() which performs both device_del() and final put_device(). Fixes: 5621fafaac00 ("EINJ: Migrate to a platform driver") Cc: 6.9+ <stable@vger.kernel.org> # 6.9+ Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ben Cheatham <Benjamin.Cheatham@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-15	Merge tag 'cxl-for-6.10' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL updates from Dave Jiang: - Three CXL mailbox passthrough commands are added to support the populating and clearing of vendor debug logs: - Get Log Capabilities - Get Supported Log Sub-List Commands - Clear Log - Add support of Device Phyiscal Address (DPA) to Host Physical Address (HPA) translation for CXL events of cxl_dram and cxl_general media. This allows user space to figure out which CXL region the event occured via trace event. - Connect CXL to CPER reporting. If a device is configured for firmware first, CXL event records are not sent directly to the host. Those records are reported through EFI Common Platform Error Records (CPER). Add support to route the CPER records through the CXL sub-system in order to provide DPA to HPA translation and also event decoding and tracing. This is useful for users to determine which system issues may correspond to specific hardware events. - A number of misc cleanups and fixes: - Fix for compile warning of cxl_security_ops - Add debug message for invalid interleave granularity - Enhancement to cxl-test event testing - Add dev_warn() on unsupported mixed mode decoder - Fix use of phys_to_target_node() for x86 - Use helper function for decoder enum instead of open coding - Include missing headers for cxl-event - Fix MAINTAINERS file entry - Fix cxlr_pmem memory leak - Cleanup __cxl_parse_cfmws via scope-based resource menagement - Convert cxl_pmem_region_alloc() to scope-based resource management * tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) cxl/cper: Remove duplicated GUID defines cxl/cper: Fix non-ACPI-APEI-GHES build cxl/pci: Process CPER events acpi/ghes: Process CXL Component Events cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management cxl/acpi: Cleanup __cxl_parse_cfmws() cxl/region: Fix cxlr_pmem leaks cxl/core: Add region info to cxl_general_media and cxl_dram events cxl/region: Move cxl_trace_hpa() work to the region driver cxl/region: Move cxl_dpa_to_region() work to the region driver cxl/trace: Correct DPA field masks for general_media & dram events MAINTAINERS: repair file entry in COMPUTE EXPRESS LINK cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h> cxl/hdm: Debug, use decoder name function cxl: Fix use of phys_to_target_node() for x86 cxl/hdm: dev_warn() on unsupported mixed mode decoder cxl/test: Enhance event testing cxl/hdm: Add debug message for invalid interleave granularity cxl: Fix compile warning for cxl_security_ops extern cxl/mbox: Add Clear Log mailbox command ...
2024-05-02	cxl/cper: Remove duplicated GUID defines	Ira Weiny
	Commit 54ce1927eb78 ("cxl/cper: Fix errant CPER prints for CXL events") moved the CXL CPER section defines to include/linux/cper.h from ghes.c When the latest cxl/cper series was reworked those defines were kept in ghes.c by accident. Thus they were duplicated. Delete the duplicate defines keeping them in the header to be shared between efi and apei. Suggested-by: Fabio M. De Francesco <fabio.maria.de.francesco@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20240502-cper-fix-dup-guid-v1-1-283cc447c7bf@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01	acpi/ghes: Process CXL Component Events	Ira Weiny
	BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then inform the OS of these events via UEFI. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference lies in the use of a GUID as the CPER Section Type which matches the UUID defined in CXL 3.1 Table 8-43. Currently a configuration such as this will trace a non standard event in the log omitting useful details of the event. In addition the CXL sub-system contains additional region and HPA information useful to the user.[0] The CXL code is required to be called from process context as it needs to take a device lock. The GHES code may be in interrupt context. This complicated the use of a callback. Dan Williams suggested the use of work items as an atomic way of switching between the callback execution and a default handler.[1] The use of a kfifo simplifies queue processing by providing lock free fifo operations. cxl_cper_kfifo_get() allows easier management of the kfifo between the ghes and cxl modules. CXL 3.1 Table 8-127 requires a device to have a queue depth of 1 for each of the four event logs. A combined queue depth of 32 is chosen to provide room for 8 entries of each log type. Add GHES support to detect CXL CPER records. Add the ability for the CXL sub-system to register a work queue to process the events. This patch adds back the functionality which was removed to fix the report by Dan Carpenter[2]. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Link: http://lore.kernel.org/r/cover.1711598777.git.alison.schofield@intel.com [0] Link: http://lore.kernel.org/r/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch [1] Link: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20240426-cxl-cper3-v4-1-58076cce1624@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08	ACPI: APEI: EINJ: mark remove callback as __exit	Uwe Kleine-König
	The einj_driver driver is registered using platform_driver_probe(). In this case it cannot get unbound via sysfs and it's ok to put the remove callback into an exit section. To prevent the modpost warning about einj_driver referencing .exit.text, mark the driver struct with __refdata and explain the situation in a comment. This is an improvement over commit a24118a8a687 ("ACPI: APEI: EINJ: mark remove callback as non-__exit") which recently addressed the same issue, but picked a less optimal variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-26	ACPI: APEI: EINJ: mark remove callback as non-__exit	Arnd Bergmann
	The remove callback of a device is called whenever it is unbound, which may happen during runtime e.g. through sysfs, so this is not allowed to be dropped from the binary: WARNING: modpost: vmlinux: section mismatch in reference: einj_driver+0x8 (section: .data) -> einj_remove (section: .exit.text) ERROR: modpost: Section mismatches detected. Remove that annotation. Fixes: 12fb28ea6b1c ("EINJ: Add CXL error type support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-16	Merge tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl	Linus Torvalds
	Pull CXL updates from Dan Williams: "CXL has mechanisms to enumerate the performance characteristics of memory devices. Those mechanisms allow Linux to build the equivalent of ACPI SRAT, SLIT, and HMAT tables dynamically at runtime. That capability is necessary because static ACPI can not represent dynamic CXL configurations (and reconfigurations). So, building on the v6.8 work to add "Quality of Service" enumeration, this update plumbs CXL "access coordinates" (read/write access latency and bandwidth) in all the same places that ACPI HMAT feeds similar data. Follow-on patches from the -mm side can then use that data to feed mechanisms like mm/memory-tiers.c. Greg has acked the touch to drivers/base/. The other feature update this cycle is support for CXL error injection via the ACPI EINJ module. That facility enables injection of bus protocol errors provided the user knows the magic address values to insert in the interface. To hide that magic, and make this easier to use, new error injection attributes were added to CXL debugfs. That interface injects the errors relative to a CXL object rather than require user tooling to know how to lookup and inject RCRB (Root Complex Register Block) addresses into the raw EINJ debugfs interface. It received some helpful review comments from Tony, but no explicit acks from the ACPI side. The primary user visible change for existing EINJ users is that they may find that einj.ko was already loaded by cxl_core.ko. Previously, einj.ko was only loaded on demand. The usual collection of miscellaneous cleanups are also present this cycle. Summary: - Supplement ACPI HMAT reported memory performance with native CXL memory performance enumeration - Add support for CXL error injection via the ACPI EINJ mechanism - Cleanup CXL DOE and CDAT integration - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) Documentation/ABI/testing/debugfs-cxl: Fix "Unexpected indentation" lib/firmware_table: Provide buffer length argument to cdat_table_parse() cxl/pci: Get rid of pointer arithmetic reading CDAT table cxl/pci: Rename DOE mailbox handle to doe_mb cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location cxl/core: Add CXL EINJ debugfs files EINJ, Documentation: Update EINJ kernel doc EINJ: Add CXL error type support EINJ: Migrate to a platform driver cxl/region: Deal with numa nodes not enumerated by SRAT cxl/region: Add memory hotplug notifier for cxl region cxl/region: Add sysfs attribute for locality attributes of CXL regions cxl/region: Calculate performance data for a region cxl: Set cxlmd->endpoint before adding port device cxl: Move QoS class to be calculated from the nearest CPU cxl: Split out host bridge access coordinates cxl: Split out combine_coordinates() for common shared usage ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes ACPI: HMAT: Introduce 2 levels of generic port access class base/node / ACPI: Enumerate node access class for 'struct access_coordinate' ...
2024-03-12	EINJ: Add CXL error type support	Ben Cheatham
	Move CXL protocol error types from einj.c (now einj-core.c) to einj-cxl.c. einj-cxl.c implements the necessary handling for CXL protocol error injection and exposes an API for the CXL core to use said functionality, while also allowing the EINJ module to be built without CXL support. Because CXL error types targeting CXL 1.0/1.1 ports require special handling, only allow them to be injected through the new cxl debugfs interface (next commit) and return an error when attempting to inject through the legacy interface. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> Link: https://lore.kernel.org/r/20240311142508.31717-3-Benjamin.Cheatham@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12	EINJ: Migrate to a platform driver	Ben Cheatham
	Change the EINJ module to install a platform device/driver on module init and move the module init() and exit() functions to driver probe and remove. This change allows the EINJ module to load regardless of whether setting up EINJ succeeds, which allows dependent modules to still load (i.e. the CXL core). Since EINJ may no longer be initialized when the module loads, any functions that are called from dependent/external modules should safegaurd against the case EINJ didn't load. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> Link: https://lore.kernel.org/r/20240311142508.31717-2-Benjamin.Cheatham@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-29	ACPI: APEI: Skip initialization of GHES_ASSIST structures for Machine Check ↵	Avadhut Naik
	Architecture To support GHES_ASSIST on Machine Check Architecture (MCA) error sources, a set of GHES structures is provided by the system firmware for each MCA error source. Each of these sets consists of a GHES structure for each MCA bank on each logical CPU, with all structures of a set sharing a common Related Source ID, equal to the Source ID of one of the MCA error source structures.[1] On SOCs with large core counts, this typically equates to tens of thousands of GHES_ASSIST structures for MCA under "/sys/bus/platform/drivers/GHES". Support for GHES_ASSIST however, hasn't been implemented in the kernel. As such, the information provided through these structures is not consumed by Linux. Moreover, these GHES_ASSIST structures for MCA, which are supposed to provide supplemental information in context of an error reported by hardware, are setup as independent error sources by the kernel during HEST initialization. Additionally, if the Type field of the Notification structure, associated with these GHES_ASSIST structures for MCA, is set to Polled, the kernel sets up a timer for each individual structure. The duration of the timer is derived from the Poll Interval field of the Notification structure. On SOCs with high core counts, this will result in tens of thousands of timers expiring periodically causing unnecessary preemptions and wastage of CPU cycles. The problem will particularly intensify if Poll Interval duration is not sufficiently high. Since GHES_ASSIST support is not present in kernel, skip initialization of GHES_ASSIST structures for MCA to eliminate their performance impact. [1] ACPI specification 6.5, section 18.7 Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-27	ACPI: APEI: GHES: Convert to platform remove callback returning void	Uwe Kleine-König
	The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Instead of returning an error code, emit a better error message than the core. Apart from the improved error message this patch has no effects for the driver. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-20	acpi/ghes: Remove CXL CPER notifications	Dan Williams
	Initial tests with the CXL CPER implementation identified that error reports were being duplicated in the log and the trace event [1]. Then it was discovered that the notification handler took sleeping locks while the GHES event handling runs in spin_lock_irqsave() context [2] While the duplicate reporting was fixed in v6.8-rc4, the fix for the sleeping-lock-vs-atomic collision would enjoy more time to settle and gain some test cycles. Given how late it is in the development cycle, remove the CXL hookup for now and try again during the next merge window. Note that end result is that v6.8 does not emit CXL CPER payloads to the kernel log, but this is in line with the CXL trend to move error reporting to trace events instead of the kernel log. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1] Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-03	cxl/cper: Fix errant CPER prints for CXL events	Ira Weiny
	Jonathan reports that CXL CPER events dump an extra generic error message. {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 {1}[Hardware Error]: event severity: recoverable {1}[Hardware Error]: Error 0, type: recoverable {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6 {1}[Hardware Error]: section length: 0x90 {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................ {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................ ... CXL events were rerouted though the CXL subsystem for additional processing. However, when that work was done it was missed that cper_estatus_print_section() continued with a generic error message which is confusing. Teach CPER print code to ignore printing details of some section types. Assign the CXL event GUIDs to this set to prevent confusing unknown prints. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-01-18	Merge tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl	Linus Torvalds
	Pull CXL (Compute Express Link) updates from Dan Williams: "The bulk of this update is support for enumerating the performance capabilities of CXL memory targets and connecting that to a platform CXL memory QoS class. Some follow-on work remains to hook up this data into core-mm policy, but that is saved for v6.9. The next significant update is unifying how CXL event records (things like background scrub errors) are processed between so called "firmware first" and native error record retrieval. The CXL driver handler that processes the record retrieved from the device mailbox is now the handler for that same record format coming from an EFI/ACPI notification source. This also contains miscellaneous feature updates, like Get Timestamp, and other fixups. Summary: - Add support for parsing the Coherent Device Attribute Table (CDAT) - Add support for calculating a platform CXL QoS class from CDAT data - Unify the tracing of EFI CXL Events with native CXL Events. - Add Get Timestamp support - Miscellaneous cleanups and fixups" * tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (41 commits) cxl/core: use sysfs_emit() for attr's _show() cxl/pci: Register for and process CPER events PCI: Introduce cleanup helpers for device reference counts and locks acpi/ghes: Process CXL Component Events cxl/events: Create a CXL event union cxl/events: Separate UUID from event structures cxl/events: Remove passing a UUID to known event traces cxl/events: Create common event UUID defines cxl/events: Promote CXL event structures to a core header cxl: Refactor to use __free() for cxl_root allocation in cxl_endpoint_port_probe() cxl: Refactor to use __free() for cxl_root allocation in cxl_find_nvdimm_bridge() cxl: Fix device reference leak in cxl_port_perf_data_calculate() cxl: Convert find_cxl_root() to return a 'struct cxl_root *' cxl: Introduce put_cxl_root() helper cxl/port: Fix missing target list lock cxl/port: Fix decoder initialization when nr_targets > interleave_ways cxl/region: fix x9 interleave typo cxl/trace: Pass UUID explicitly to event traces cxl/region: use %pap format to print resource_size_t cxl/region: Add dev_dbg() detail on failure to allocate HPA space ...
2024-01-09	acpi/ghes: Process CXL Component Events	Ira Weiny
	BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then send these events to the OS via UEFI. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference is the use of a GUID in the Section Type rather than a UUID as part of the event itself. Add GHES support to detect CXL CPER records and call a registered callback with the event. A notifier chain was considered for the callback but the complexity did not justify the use case as only the CXL subsystem requires this event. Enforce that only one callback can be registered at any time. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-7-1bb8a4ca2c7a@intel.com [djbw: fixup checkpatch errors] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-21	ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events	Shuai Xue
	There are two major types of uncorrected recoverable (UCR) errors : - Synchronous error: The error is detected and raised at the point of the consumption in the execution flow, e.g. when a CPU tries to access a poisoned cache line. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64 and Machine Check Exception (MCE) on X86. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error. - Asynchronous error: The error is detected out of processor execution context, e.g. when an error is detected by a background scrubber. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error. When APEI firmware first is enabled, a platform may describe one error source for the handling of synchronous errors (e.g. MCE or SEA notification ), or for handling asynchronous errors (e.g. SCI or External Interrupt notification). In other words, we can distinguish synchronous errors by APEI notification. For synchronous errors, kernel will kill the current process which accessing the poisoned page by sending SIGBUS with BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in early kill mode. However, the GHES driver always sets mf_flags to 0 so that all synchronous errors are handled as asynchronous errors in memory failure. To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous events. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-21	ACPI: APEI: EINJ: Add support for vendor defined error types	Avadhut Naik
	Vendor-Defined Error types are supported by the platform apart from standard error types if bit 31 is set in the output of GET_ERROR_TYPE Error Injection Action.[1] While the errors themselves and the length of their associated "OEM Defined data structure" might vary between vendors, the physical address of this structure can be computed through vendor_extension and length fields of "SET_ERROR_TYPE_WITH_ADDRESS" and "Vendor Error Type Extension" Structures respectively.[2][3] Currently, however, the einj module only computes the physical address of Vendor Error Type Extension Structure. Neither does it compute the physical address of OEM Defined structure nor does it establish the memory mapping required for injecting Vendor-defined errors. Consequently, userspace tools have to establish the very mapping through /dev/mem, nopat kernel parameter and system calls like mmap/munmap initially before injecting Vendor-defined errors. Circumvent the issue by computing the physical address of OEM Defined data structure and establishing the required mapping with the structure. Create a new file "oem_error", if the system supports Vendor-defined errors, to export this mapping, through debugfs_create_blob(). Userspace tools can then populate their respective OEM Defined structure instances and just write to the file as part of injecting Vendor-defined Errors. Similarly, the tools can also read from the file if the system firmware provides some information through the OEM defined structure after error injection. [1] ACPI specification 6.5, section 18.6.4 [2] ACPI specification 6.5, Table 18.31 [3] ACPI specification 6.5, Table 18.32 Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-21	ACPI: APEI: EINJ: Refactor available_error_type_show()	Avadhut Naik
	OSPM can discover the error injection capabilities of the platform by executing GET_ERROR_TYPE error injection action.[1] The action returns a DWORD representing a bitmap of platform supported error injections.[2] The available_error_type_show() function determines the bits set within this DWORD and provides a verbose output, from einj_error_type_string array, through /sys/kernel/debug/apei/einj/available_error_type file. The function however, assumes one to one correspondence between an error's position in the bitmap and its array entry offset. Consequently, some errors like Vendor Defined Error Type fail this assumption and will incorrectly be shown as not supported, even if their corresponding bit is set in the bitmap and they have an entry in the array. Navigate around the issue by converting einj_error_type_string into an array of structures with a predetermined mask for all error types corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE action. The same breaks the aforementioned assumption resulting in all supported error types by a platform being outputted through the above available_error_type file. [1] ACPI specification 6.5, Table 18.25 [2] ACPI specification 6.5, Table 18.30 Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-24	ACPI: APEI: Use ERST timeout for slow devices	Jeshua Smith
	Slow devices such as flash may not meet the default 1ms timeout value, so use the ERST max execution time value that they provide as the timeout if it is larger. Signed-off-by: Jeshua Smith <jeshuas@nvidia.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-21	ACPI: APEI: Fix AER info corruption when error status data has multiple sections	Shiju Jose
	ghes_handle_aer() passes AER data to the PCI core for logging and recovery by calling aer_recover_queue() with a pointer to struct aer_capability_regs. The problem was that aer_recover_queue() queues the pointer directly without copying the aer_capability_regs data. The pointer was to the ghes->estatus buffer, which could be reused before aer_recover_work_func() reads the data. To avoid this problem, allocate a new aer_capability_regs structure from the ghes_estatus_pool, copy the AER data from the ghes->estatus buffer into it, pass a pointer to the new struct to aer_recover_queue(), and free it after aer_recover_work_func() has processed it. Reported-by: Bjorn Helgaas <helgaas@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12	APEI: GHES: correctly return NULL for ghes_get_devices()	Li Yang
	Since 315bada690e0 ("EDAC: Check for GHES preference in the chipset-specific EDAC drivers"), vendor specific EDAC driver will not probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device is present. Make ghes_get_devices() return NULL when the GHES device list is empty to fix the problem. Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module") Signed-off-by: Li Yang <leoyang.li@nxp.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12	ACPI: APEI: mark bert_disable as __initdata	Miaohe Lin
	It's only used inside the __init section. Mark it __initdata. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-05	ACPI: APEI: GHES: Remove unused ghes_estatus_pool_size_request()	Miaohe Lin
	ghes_estatus_pool_size_request() is unused now, so remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-25	efi: fix missing prototype warnings	Arnd Bergmann
	The cper.c file needs to include an extra header, and efi_zboot_entry needs an extern declaration to avoid these 'make W=1' warnings: drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes] drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes] drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes] To make this easier, move the cper specific declarations to include/linux/cper.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-03-27	ACPI: APEI: EINJ: warn on invalid argument when explicitly indicated by platform	Shuai Xue
	OSPM executes an EXECUTE_OPERATION action to instruct the platform to begin the injection operation, then executes a GET_COMMAND_STATUS action to determine the status of the completed operation. The ACPI Specification documented error codes[1] are: 0 = Success (Linux #define EINJ_STATUS_SUCCESS) 1 = Unknown failure (Linux #define EINJ_STATUS_FAIL) 2 = Invalid Access (Linux #define EINJ_STATUS_INVAL) The original code report -EBUSY for both "Unknown Failure" and "Invalid Access" cases. Actually, firmware could do some platform dependent sanity checks and returns different error codes, e.g. "Invalid Access" to indicate to the user that the parameters they supplied cannot be used for injection. To this end, fix to return -EINVAL in the __einj_error_inject() error handling case instead of always -EBUSY, when explicitly indicated by the platform in the status of the completed operation. [1] ACPI Specification 6.5 18.6.1. Error Injection Table Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-20	ACPI: APEI: EINJ: Add CXL error types	Tony Luck
	ACPI 6.5 added six new error types for CXL. See chapter 18 table 18.30. Add strings for the new types so that Linux will list them in the /sys/kernel/debug/apei/einj/available_error_types file. It seems no other changes are needed. Linux already accepts the CXL codes (on a BIOS that advertises them). Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-30	ACPI: APEI: EINJ: Limit error type to 32-bit width	Shuai Xue
	The bit map of error types to inject is 32-bit width [1]. Add parameter check to reflect the fact. [1] ACPI Specification 6.4, Section 18.6.4. Error Types Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-25	treewide: Convert del_timer() to timer_shutdown()	Steven Rostedt (Google)
	Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); \| - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); \| kmem_cache_free(slab, ptr); \| kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-12	Merge tag 'edac_updates_for_6.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Make ghes_edac a simple module like the rest of the EDAC drivers and drop the forced built-in only configuration by disentangling it from GHES (Jia He) - The usual small cleanups and improvements all over EDAC land * tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper() EDAC/i5400: Fix typo in comment: vaious -> various EDAC/mc_sysfs: Increase legacy channel support to 12 MAINTAINERS: Make Mauro EDAC reviewer MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac EDAC/igen6: Return the correct error type when not the MC owner apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg() EDAC: Check for GHES preference in the chipset-specific EDAC drivers EDAC/ghes: Make ghes_edac a proper module EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Add a notifier for reporting memory errors efi/cper: Export several helpers for ghes_edac to use EDAC/i5000: Mark as BROKEN
2022-12-07	ACPI: APEI: EINJ: Refactor available_error_type_show()	Jay Lu
	Move error type descriptions into an array and loop over error types to improve readability and maintainability. Replace seq_printf() with seq_puts() as recommended by checkpatch.pl. Signed-off-by: Jay Lu <jaylu102@amd.com> Co-developed-by: Ben Cheatham <benjamin.cheatham@amd.com> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-07	ACPI: APEI: EINJ: Fix formatting errors	Jay Lu
	Checkpatch reveals warnings and an error due to missing lines and incorrect indentations. Add the missing lines after declarations and fix the suspect indentations. Signed-off-by: Jay Lu <jaylu102@amd.com> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>