summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/mce/severity.c
AgeCommit message (Collapse)Author
2025-03-17x86/mce: use is_copy_from_user() to determine copy-from-user contextShuai Xue
Patch series "mm/hwpoison: Fix regressions in memory failure handling", v4. ## 1. What am I trying to do: This patchset resolves two critical regressions related to memory failure handling that have appeared in the upstream kernel since version 5.17, as compared to 5.10 LTS. - copyin case: poison found in user page while kernel copying from user space - instr case: poison found while instruction fetching in user space ## 2. What is the expected outcome and why - For copyin case: Kernel can recover from poison found where kernel is doing get_user() or copy_from_user() if those places get an error return and the kernel return -EFAULT to the process instead of crashing. More specifily, MCE handler checks the fixup handler type to decide whether an in kernel #MC can be recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space. - For instr case: If a poison found while instruction fetching in user space, full recovery is possible. User process takes #PF, Linux allocates a new page and fills by reading from storage. ## 3. What actually happens and why - For copyin case: kernel panic since v5.17 Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the extable fixup type for copy-from-user operations, changing it from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS handling when posion found in get_user() or copy_from_user(). - For instr case: user process is killed by a SIGBUS signal due to #CMCI and #MCE race When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. ### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR. Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to #CMCI. Just to add to the confusion, Linux does take an action (in uc_decode_notifier()) to try to offline the page despite the UC*NA* signature name. ### Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read. Thus: 1) Core is clever and thinks address A is needed soon, issues a speculative read. 2) Core finds it is going to use address A soon after sending the read request 3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A. Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison). ## Why user process is killed for instr case Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") tries to fix noise message "Memory error not recovered" and skips duplicate SIGBUSs due to the race. But it also introduced a bug that kill_accessing_process() return -EHWPOISON for instr case, as result, kill_me_maybe() send a SIGBUS to user process. # 4. The fix, in my opinion, should be: - For copyin case: The key point is whether the error context is in a read from user memory. We do not care about the ex-type if we know its a MOV reading from userspace. is_copy_from_user() return true when both of the following two checks are true: - the current instruction is copy - source address is user memory If copy_user is true, we set m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV; Then do_machine_check() will try fixup_exception() first. - For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS. - For patch 3: The return value of memory_failure() is quite important while discussed instr case regression with Tony and Miaohe for patch 2, so add comment about the return value. This patch (of 3): Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a ("x86/futex: Remove .fixup usage") updated the extable fixup type for copy-from-user operations, changing it from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no longer functions as an in-kernel recovery context. Consequently, the error context for copy-from-user operations no longer functions as an in-kernel recovery context, resulting in kernel panics with the message: "Machine check: Data load in unrecoverable area of kernel." To address this, it is crucial to identify if an error context involves a read operation from user memory. The function is_copy_from_user() can be utilized to determine: - the current operation is copy - when reading user memory When these conditions are met, is_copy_from_user() will return true, confirming that it is indeed a direct copy from user memory. This check is essential for correctly handling the context of errors in these operations without relying on the extable fixup types that previously allowed for in-kernel recovery. So, use is_copy_from_user() to determine if a context is copy user directly. Link: https://lkml.kernel.org/r/20250312112852.82415-1-xueshuai@linux.alibaba.com Link: https://lkml.kernel.org/r/20250312112852.82415-2-xueshuai@linux.alibaba.com Fixes: 4c132d1d844a ("x86/futex: Remove .fixup usage") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tony Luck <tony.luck@intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-13Merge tag 'x86-cpu-2024-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: - Rework the x86 CPU vendor/family/model code: introduce the 'VFM' value that is an 8+8+8 bit concatenation of the vendor/family/model value, and add macros that work on VFM values. This simplifies the addition of new Intel models & families, and simplifies existing enumeration & quirk code. - Add support for the AMD 0x80000026 leaf, to better parse topology information - Optimize the NUMA allocation layout of more per-CPU data structures - Improve the workaround for AMD erratum 1386 - Clear TME from /proc/cpuinfo as well, when disabled by the firmware - Improve x86 self-tests - Extend the mce_record tracepoint with the ::ppin and ::microcode fields - Implement recovery for MCE errors in TDX/SEAM non-root mode - Misc cleanups and fixes * tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Switch to new Intel CPU model defines x86/tsc_msr: Switch to new Intel CPU model defines x86/tsc: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/resctrl: Switch to new Intel CPU model defines x86/microcode/intel: Switch to new Intel CPU model defines x86/mce: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines x86/cpu/intel_epb: Switch to new Intel CPU model defines x86/aperfmperf: Switch to new Intel CPU model defines x86/apic: Switch to new Intel CPU model defines perf/x86/msr: Switch to new Intel CPU model defines perf/x86/intel/uncore: Switch to new Intel CPU model defines perf/x86/intel/pt: Switch to new Intel CPU model defines perf/x86/lbr: Switch to new Intel CPU model defines perf/x86/intel/cstate: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/bugs: Switch to new Intel CPU model defines x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h x86/cpu/vfm: Add new macros to work with (vendor/family/model) values ...
2024-04-29x86/mce: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. [ bp: Squash *three* mce patches into one, fold in fix: https://lore.kernel.org/r/20240429022051.63360-1-tony.luck@intel.com ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20240424181511.41772-1-tony.luck%40intel.com
2024-04-09x86/mce: Implement recovery for errors in TDX/SEAM non-root modeTony Luck
Machine check SMIs (MSMI) signaled during SEAM operation (typically inside TDX guests), on a system with Intel eMCA enabled, might eventually be reported to the kernel #MC handler with the saved RIP on the stack pointing to the instruction in kernel code after the SEAMCALL instruction that entered the SEAM operation. Linux currently says that is a fatal error and shuts down. There is a new bit in IA32_MCG_STATUS that, when set to 1, indicates that the machine check didn't originally occur at that saved RIP, but during SEAM non-root operation. Add new entries to the severity table to detect this for both data load and instruction fetch that set the severity to "AR" (action required). Increase the width of the mcgmask/mcgres fields in "struct severity" from unsigned char to unsigned short since the new bit is in position 12. Action required for these errors is just mark the page as poisoned and return from the machine check handler. HW ABI notes: ============= The SEAM_NR bit in IA32_MCG_STATUS hasn't yet made it into the Intel Software Developers' Manual. But it is described in section 16.5.2 of "Intel(R) Trust Domain Extensions (Intel(R) TDX) Module Base Architecture Specification" downloadable from: https://cdrdv2.intel.com/v1/dl/getContent/733575 Backport notes: =============== Little value in backporting this patch to stable or LTS kernels as this is only relevant with support for TDX, which I assume won't be backported. But for anyone taking this to v6.1 or older, you also need commit: a51cbd0d86d3 ("x86/mce: Use severity table to handle uncorrected errors in kernel") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240408180944.44638-1-tony.luck@intel.com
2024-04-04x86/extable: Remove unused fixup type EX_TYPE_COPYTong Tiangen
After 034ff37d3407 ("x86: rewrite '__copy_user_nocache' function") rewrote __copy_user_nocache() to use EX_TYPE_UACCESS instead of the EX_TYPE_COPY exception type, there are no more EX_TYPE_COPY users, so remove it. [ bp: Massage commit message. ] Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240204082627.3892816-2-tongtiangen@huawei.com
2022-10-31x86/mce: Use severity table to handle uncorrected errors in kernelTony Luck
mce_severity_intel() has a special case to promote UC and AR errors in kernel context to PANIC severity. The "AR" case is already handled with separate entries in the severity table for all instruction fetch errors, and those data fetch errors that are not in a recoverable area of the kernel (i.e. have an extable fixup entry). Add an entry to the severity table for UC errors in kernel context that reports severity = PANIC. Delete the special case code from mce_severity_intel(). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220922195136.54575-2-tony.luck@intel.com
2022-04-25x86/mce: Add messages for panic errors in AMD's MCE gradingCarlos Bilbao
When a machine error is graded as PANIC by the AMD grading logic, the MCE handler calls mce_panic(). The notification chain does not come into effect so the AMD EDAC driver does not decode the errors. In these cases, the messages displayed to the user are more cryptic and miss information that might be relevant, like the context in which the error took place. Add messages to the grading logic for machine errors so that it is clear what error it was. [ bp: Massage commit message. ] Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20220405183212.354606-3-carlos.bilbao@amd.com
2022-04-25x86/mce: Simplify AMD severity grading logicCarlos Bilbao
The MCE handler needs to understand the severity of the machine errors to act accordingly. Simplify the AMD grading logic following a logic that closely resembles the descriptions of the public PPR documents. This will help include more fine-grained grading of errors in the future. [ bp: Touchups. ] Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20220405183212.354606-2-carlos.bilbao@amd.com
2022-02-23x86/mce: Remove the tolerance level controlBorislav Petkov
This is pretty much unused and not really useful. What is more, all relevant MCA hardware has recoverable machine checks support so there's no real need to tweak MCA tolerance levels in order to *maybe* extend machine lifetime. So rip it out. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/YcDq8PxvKtTENl/e@zn.tnic
2022-02-13x86/mce: Use arch atomic and bit helpersBorislav Petkov
The arch helpers do not have explicit KASAN instrumentation. Use them in noinstr code. Inline a couple more functions with single call sites, while at it: mce_severity_amd_smca() has a single call-site which is noinstr so force the inlining and fix: vmlinux.o: warning: objtool: mce_severity_amd.constprop.0()+0xca: call to \ mce_severity_amd_smca() leaves .noinstr.text section Always inline mca_msr_reg(): text data bss dec hex filename 16065240 128031326 36405368 180501934 ac23dae vmlinux.before 16065240 128031294 36405368 180501902 ac23d8e vmlinux.after and mce_no_way_out() as the latter one is used only once, to fix: vmlinux.o: warning: objtool: mce_read_aux()+0x53: call to mca_msr_reg() leaves .noinstr.text section vmlinux.o: warning: objtool: do_machine_check()+0xc9: call to mce_no_way_out() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20220204083015.17317-4-bp@alien8.de
2021-12-20x86/mce: Check regs before accessing itBorislav Petkov
Commit in Fixes accesses pt_regs before checking whether it is NULL or not. Make sure the NULL pointer check happens first. Fixes: 0a5b288e85bb ("x86/mce: Prevent severity computation from being instrumented") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211217102029.GA29708@kili
2021-12-13x86/mce: Prevent severity computation from being instrumentedBorislav Petkov
Mark all the MCE severity computation logic noinstr and allow instrumentation when it "calls out". Fixes vmlinux.o: warning: objtool: do_machine_check()+0xc5d: call to mce_severity() leaves .noinstr.text section Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208111343.8130-7-bp@alien8.de
2021-12-13x86/mce: Use mce_rdmsrl() in severity checking codeBorislav Petkov
MCA has its own special MSR accessors. Use them. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208111343.8130-4-bp@alien8.de
2021-11-01Merge tag 'ras_core_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Get rid of a bunch of function pointers used in MCA land in favor of normal functions. This is in preparation of making the MCA code noinstr-aware - When the kernel copies data from user addresses and it encounters a machine check, a SIGBUS is sent to that process. Change this action to either an -EFAULT which is returned to the user or a short write, making the recovery action a lot more user-friendly * tag 'ras_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Sort mca_config members to get rid of unnecessary padding x86/mce: Get rid of the ->quirk_no_way_out() indirect call x86/mce: Get rid of msr_ops x86/mce: Get rid of machine_check_vector x86/mce: Get rid of the mce_severity function pointer x86/mce: Drop copyin special case for #MC x86/mce: Change to not send SIGBUS error during copy from user
2021-09-23x86/mce: Get rid of the mce_severity function pointerBorislav Petkov
Turn it into a normal function which calls an AMD- or Intel-specific variant depending on the CPU it runs on. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20210922165101.18951-2-bp@alien8.de
2021-09-13x86/extable: Remove EX_TYPE_FAULT from MCE safe fixupsThomas Gleixner
Now that the MC safe copy and FPU have been converted to use the MCE safe fixup types remove EX_TYPE_FAULT from the list of types which MCE considers to be safe to be recovered in kernel. This removes the SGX exception handling of ENCLS from the #MC safe handling, but according to the SGX wizards the current SGX implementations cannot survive #MC on ENCLS: https://lore.kernel.org/r/YS+upEmTfpZub3s9@google.com The code relies on the trap number being stored if ENCLS raised an exception. That's still working, but it does no longer trick the MCE code into assuming that #MC is handled correctly for ENCLS. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210908132525.445255957@linutronix.de
2021-09-13x86/extable: Provide EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFEThomas Gleixner
Provide exception fixup types which can be used to identify fixups which allow in kernel #MC recovery and make them invoke the existing handlers. These will be used at places where #MC recovery is handled correctly by the caller. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210908132525.269689153@linutronix.de
2021-09-13x86/extable: Rework the exception table mechanicsThomas Gleixner
The exception table entries contain the instruction address, the fixup address and the handler address. All addresses are relative. Storing the handler address has a few downsides: 1) Most handlers need to be exported 2) Handlers can be defined everywhere and there is no overview about the handler types 3) MCE needs to check the handler type to decide whether an in kernel #MC can be recovered. The functionality of the handler itself is not in any way special, but for these checks there need to be separate functions which in the worst case have to be exported. Some of these 'recoverable' exception fixups are pretty obscure and just reuse some other handler to spare code. That obfuscates e.g. the #MC safe copy functions. Cleaning that up would require more handlers and exports Rework the exception fixup mechanics by storing a fixup type number instead of the handler address and invoke the proper handler for each fixup type. Also teach the extable sort to leave the type field alone. This makes most handlers static except for special cases like the MCE MSR fixup and the BPF fixup. This allows to add more types for cleaning up the obscure places without adding more handler code and exports. There is a marginal code size reduction for a production config and it removes _eight_ exported symbols. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lkml.kernel.org/r/20210908132525.211958725@linutronix.de
2021-04-27Merge tag 'x86_core_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Borislav Petkov: - Turn the stack canary into a normal __percpu variable on 32-bit which gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. * tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, sched: Treat Intel SNC topology as default, COD as exception x86/cpu: Comment Skylake server stepping too x86/cpu: Resort and comment Intel models objtool/x86: Rewrite retpoline thunk calls objtool: Skip magical retpoline .altinstr_replacement objtool: Cache instruction relocs objtool: Keep track of retpoline call sites objtool: Add elf_create_undef_symbol() objtool: Extract elf_symbol_add() objtool: Extract elf_strtab_concat() objtool: Create reloc sections implicitly objtool: Add elf_create_reloc() helper objtool: Rework the elf_rebuild_reloc_section() logic objtool: Fix static_call list generation objtool: Handle per arch retpoline naming objtool: Correctly handle retpoline thunk calls x86/retpoline: Simplify retpolines x86/alternatives: Optimize optimize_nops() x86: Add insn_decode_kernel() x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration ...
2021-03-31x86: Add insn_decode_kernel()Peter Zijlstra
Add a helper to decode kernel instructions; there's no point in endlessly repeating those last two arguments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210326151259.379242587@infradead.org
2021-03-21x86: Fix various typos in comments, take #2Ingo Molnar
Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-15x86/mce: Convert to insn_decode()Borislav Petkov
Simplify code, no functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210304174237.31945-11-bp@alien8.de
2020-10-07x86/mce: Decode a kernel instruction to determine if it is copying from userTony Luck
All instructions copying data between kernel and user memory are tagged with either _ASM_EXTABLE_UA or _ASM_EXTABLE_CPY entries in the exception table. ex_fault_handler_type() returns EX_HANDLER_UACCESS for both of these. Recovery is only possible when the machine check was triggered on a read from user memory. In this case the same strategy for recovery applies as if the user had made the access in ring3. If the fault was in kernel memory while copying to user there is no current recovery plan. For MOV and MOVZ instructions a full decode of the instruction is done to find the source address. For MOVS instructions the source address is in the %rsi register. The function fault_in_kernel_space() determines whether the source address is kernel or user, upgrade it from "static" so it can be used here. Co-developed-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-7-tony.luck@intel.com
2020-10-07x86/mce: Provide method to find out the type of an exception handlerTony Luck
Avoid a proliferation of ex_has_*_handler() functions by having just one function that returns the type of the handler (if any). Drop the __visible attribute for this function. It is not called from assembler so the attribute is not necessary. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-3-tony.luck@intel.com
2020-10-07x86/mce: Pass pointer to saved pt_regs to severity calculation routinesYouquan Song
New recovery features require additional information about processor state when a machine check occurred. Pass pt_regs down to the routines that need it. No functional change. Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201006210910.21062-2-tony.luck@intel.com
2020-09-30x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule listTony Luck
Way back in v3.19 Intel and AMD shared the same machine check severity grading code. So it made sense to add a case for AMD DEFERRED errors in commit e3480271f592 ("x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error") But later in v4.2 AMD switched to a separate grading function in commit bf80bbd7dcf5 ("x86/mce: Add an AMD severities-grading function") Belatedly drop the DEFERRED case from the Intel rule list. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-3-tony.luck@intel.com
2020-09-30x86/mce: Add Skylake quirk for patrol scrub reported errorsBorislav Petkov
The patrol scrubber in Skylake and Cascade Lake systems can be configured to report uncorrected errors using a special signature in the machine check bank and to signal using CMCI instead of machine check. Update the severity calculation mechanism to allow specifying the model, minimum stepping and range of machine check bank numbers. Add a new rule to detect the special signature (on model 0x55, stepping >=4 in any of the memory controller banks). [ bp: Rewrite it. aegl: Productize it. ] Suggested-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-2-tony.luck@intel.com
2020-04-14x86/mce: Fixup exception only for the correct MCEsBorislav Petkov
The severity grading code returns IN_KERNEL_RECOV error context for errors which have happened in kernel space but from which the kernel can recover. Whether the recovery can happen is determined by the exception table entry having as handler ex_handler_fault() and which has been declared at build time using _ASM_EXTABLE_FAULT(). IN_KERNEL_RECOV is used in mce_severity_intel() to lookup the corresponding error severity in the severities table. However, the mapping back from error severity to whether the error is IN_KERNEL_RECOV is ambiguous and in the very paranoid case - which might not be possible right now - but be better safe than sorry later, an exception fixup could be attempted for another MCE whose address is in the exception table and has the proper severity. Which would be unfortunate, to say the least. Therefore, mark such MCEs explicitly as MCE_IN_KERNEL_RECOV so that the recovery attempt is done only for them. Document the whole handling, while at it, as it is not trivial. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-10-bp@alien8.de
2019-08-05x86/mce: Don't check for the overflow bit on action optional machine checksTony Luck
We currently do not process SRAO (Software Recoverable Action Optional) machine checks if they are logged with the overflow bit set to 1 in the machine check bank status register. This is overly conservative. There are two cases where we could end up with an SRAO+OVER log based on the SDM volume 3 overwrite rules in "Table 15-8. Overwrite Rules for UC, CE, and UCR Errors" 1) First a corrected error is logged, then the SRAO error overwrites. The second error overwrites the first because uncorrected errors have a higher severity than corrected errors. 2) The SRAO error was logged first, followed by a correcetd error. In this case the first error is retained in the bank. So in either case the machine check bank will contain the address of the SRAO error. So we can process that even if the overflow bit was set. Reported-by: Yongkai Wu <yongkaiwu@tencent.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190718182920.32621-1-tony.luck@intel.com
2019-07-08Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Boris is on vacation so I'm sending the RAS bits this time. The main changes were: - Various RAS/CEC improvements and fixes by Borislav Petkov: - error insertion fixes - offlining latency fix - memory leak fix - additional sanity checks - cleanups - debug output improvements - More SMCA enhancements by Yazen Ghannam: - make banks truly per-CPU which they are in the hardware - don't over-cache certain registers - make the number of MCA banks per-CPU variable The long term goal with these changes is to support future heterogenous SMCA extensions. - Misc fixes and improvements" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Do not check return value of debugfs_create functions x86/MCE: Determine MCA banks' init state properly x86/MCE: Make the number of MCA banks a per-CPU variable x86/MCE/AMD: Don't cache block addresses on SMCA systems x86/MCE: Make mce_banks a per-CPU array x86/MCE: Make struct mce_banks[] static RAS/CEC: Add copyright RAS/CEC: Add CONFIG_RAS_CEC_DEBUG and move CEC debug features there RAS/CEC: Dump the different array element sections RAS/CEC: Rename count_threshold to action_threshold RAS/CEC: Sanity-check array on every insertion RAS/CEC: Fix potential memory leak RAS/CEC: Do not set decay value on error RAS/CEC: Check count_threshold unconditionally RAS/CEC: Fix pfn insertion
2019-06-14x86/mce: Do not check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. The only way this can fail is if: * debugfs superblock can not be pinned - something really went wrong with the vfs layer. * file is created with same name - the caller's fault. * new_inode() fails - happens if memory is exhausted. so failing to clean up debugfs properly is the least of the system's sproblems in uch a situation. [ bp: Extend commit message, remove unused err var in inject_init(). ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190612151531.GA16278@kroah.com
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-25x86/mce: Improve error message when kernel cannot recover, p2Tony Luck
In c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover") a case was added for a machine check caused by a DATA access to poison memory from the kernel. A case should have been added also for an uncorrectable error during an instruction fetch in the kernel. Add that extra case so the error message now reads: mce: [Hardware Error]: Machine check: Instruction fetch error in kernel Fixes: c7d606f560e4 ("x86/mce: Improve error message when kernel cannot recover") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Pu Wen <puwen@hygon.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190225205940.15226-1-tony.luck@intel.com
2018-12-05x86/mce: Streamline MCE subsystem's namingBorislav Petkov
Rename the containing folder to "mce" which is the most widespread name. Drop the "mce[-_]" filename prefix of some compilation units (while others don't have it). This unifies the file naming in the MCE subsystem: mce/ |-- amd.c |-- apei.c |-- core.c |-- dev-mcelog.c |-- genpool.c |-- inject.c |-- intel.c |-- internal.h |-- Makefile |-- p5.c |-- severity.c |-- therm_throt.c |-- threshold.c `-- winchip.c No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20181205141323.14995-1-bp@alien8.de