summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2025-04-23x86/boot: Disable jump tables in PIC codeArd Biesheuvel
objtool already struggles to identify jump tables correctly in non-PIC code, where the idiom is something like jmpq *table(,%idx,8) and the table is a list of absolute addresses of jump targets. When using -fPIC, both the table reference as well as the jump targets are emitted in a RIP-relative manner, resulting in something like leaq table(%rip), %tbl movslq (%tbl,%idx,4), %offset addq %offset, %tbl jmpq *%tbl and the table is a list of offsets of the jump targets relative to the start of the entire table. Considering that this sequence of instructions can be interleaved with other instructions that have nothing to do with the jump table in question, it is extremely difficult to infer the control flow by deriving the jump targets from the indirect jump, the location of the table and the relative offsets it contains. So let's not bother and disable jump tables for code built with -fPIC under arch/x86/boot/startup. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250422210510.600354-2-ardb+git@google.com
2025-04-23crypto: x86/sha1 - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: x86/ghash - Use API partial block handlingHerbert Xu
Use the Crypto API partial block handling. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-22x86/vdso: Remove redundant #ifdeffery around in_ia32_syscall()Thomas Weißschuh
The #ifdefs only guard code that is also guarded by in_ia32_syscall(), which already contains the same #ifdef itself. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20240910-x86-vdso-ifdef-v1-2-877c9df9b081@linutronix.de
2025-04-22x86/vdso: Remove #ifdeffery around page setup variantsThomas Weißschuh
Replace the open-coded ifdefs in C sources files with IS_ENABLED(). This makes the code easier to read and enables the compiler to typecheck also the disabled parts, before optimizing them away. To make this work, also remove the ifdefs from declarations of used variables. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20240910-x86-vdso-ifdef-v1-1-877c9df9b081@linutronix.de
2025-04-22x86/asm: Retire RIP_REL_REF()Ard Biesheuvel
Now that all users have been moved into startup/ where PIC codegen is used, RIP_REL_REF() is no longer needed. Remove it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-14-ardb+git@google.com
2025-04-22x86/boot: Drop RIP_REL_REF() uses from early SEV codeArd Biesheuvel
Now that the early SEV code is built with -fPIC, RIP_REL_REF() has no effect and can be dropped. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-13-ardb+git@google.com
2025-04-22x86/boot: Move SEV startup code into startup/Ard Biesheuvel
Move the SEV startup code into arch/x86/boot/startup/, where it will reside along with other code that executes extremely early, and therefore needs to be built in a special manner. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-12-ardb+git@google.com
2025-04-22x86/sev: Split off startup code from core codeArd Biesheuvel
Disentangle the SEV core code and the SEV code that is called during early boot. The latter piece will be moved into startup/ in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-11-ardb+git@google.com
2025-04-22x86/sev: Move noinstr NMI handling code into separate source fileArd Biesheuvel
GCC may ignore the __no_sanitize_address function attribute when inlining, resulting in KASAN instrumentation in code tagged as 'noinstr'. Move the SEV NMI handling code, which is noinstr, into a separate source file so KASAN can be disabled on the whole file without losing coverage of other SEV core code, once the startup code is split off from it too. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-10-ardb+git@google.com
2025-04-22Merge branch 'x86/urgent' into x86/boot, to merge dependent commit and ↵Ingo Molnar
upstream fixes In particular we need this fix before applying subsequent changes: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-22x86/cpu: Help users notice when running old Intel microcodeDave Hansen
Old microcode is bad for users and for kernel developers. For users, it exposes them to known fixed security and/or functional issues. These obviously rarely result in instant dumpster fires in every environment. But it is as important to keep your microcode up to date as it is to keep your kernel up to date. Old microcode also makes kernels harder to debug. A developer looking at an oops need to consider kernel bugs, known CPU issues and unknown CPU issues as possible causes. If they know the microcode is up to date, they can mostly eliminate known CPU issues as the cause. Make it easier to tell if CPU microcode is out of date. Add a list of released microcode. If the loaded microcode is older than the release, tell users in a place that folks can find it: /sys/devices/system/cpu/vulnerabilities/old_microcode Tell kernel kernel developers about it with the existing taint flag: TAINT_CPU_OUT_OF_SPEC == Discussion == When a user reports a potential kernel issue, it is very common to ask them to reproduce the issue on mainline. Running mainline, they will (independently from the distro) acquire a more up-to-date microcode version list. If their microcode is old, they will get a warning about the taint and kernel developers can take that into consideration when debugging. Just like any other entry in "vulnerabilities/", users are free to make their own assessment of their exposure. == Microcode Revision Discussion == The microcode versions in the table were generated from the Intel microcode git repo: 8ac9378a8487 ("microcode-20241112 Release") which as of this writing lags behind the latest microcode-20250211. It can be argued that the versions that the kernel picks to call "old" should be a revision or two old. Which specific version is picked is less important to me than picking *a* version and enforcing it. This repository contains only microcode versions that Intel has deemed to be OS-loadable. It is quite possible that the BIOS has loaded a newer microcode than the latest in this repo. If this happens, the system is considered to have new microcode, not old. Specifically, the sysfs file and taint flag answer the question: Is the CPU running on the latest OS-loadable microcode, or something even later that the BIOS loaded? In other words, Intel never publishes an authoritative list of CPUs and latest microcode revisions. Until it does, this is the best that Linux can do. Also note that the "intel-ucode-defs.h" file is simple, ugly and has lots of magic numbers. That's on purpose and should allow a single file to be shared across lots of stable kernel regardless of if they have the new "VFM" infrastructure or not. It was generated with a dumb script. == FAQ == Q: Does this tell me if my system is secure or insecure? A: No. It only tells you if your microcode was old when the system booted. Q: Should the kernel warn if the microcode list itself is too old? A: No. New kernels will get new microcode lists, both mainline and stable. The only way to have an old list is to be running an old kernel in which case you have bigger problems. Q: Is this for security or functional issues? A: Both. Q: If a given microcode update only has functional problems but no security issues, will it be considered old? A: Yes. All microcode image versions within a microcode release are treated identically. Intel appears to make security updates without disclosing them in the release notes. Thus, all updates are considered to be security-relevant. Q: Who runs old microcode? A: Anybody with an old distro. This happens all the time inside of Intel where there are lots of weird systems in labs that might not be getting regular distro updates and might also be running rather exotic microcode images. Q: If I update my microcode after booting will it stop saying "Vulnerable"? A: No. Just like all the other vulnerabilies, you need to reboot before the kernel will reassess your vulnerability. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "Ahmed S. Darwish" <darwi@linutronix.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/20250421195659.CF426C07%40davehans-spike.ostc.intel.com (cherry picked from commit 9127865b15eb0a1bd05ad7efe29489c44394bdc1)
2025-04-22Merge branch 'x86/cpu' into x86/microcode, to pick up dependent commitsIngo Molnar
Avoid a conflict in <asm/cpufeatures.h> by merging pending x86/cpu changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-21fs: remove uselib() system callChristian Brauner
This system call has been deprecated for quite a while now. Let's try and remove it from the kernel completely. Link: https://lore.kernel.org/20250415-kanufahren-besten-02ac00e6becd@brauner Acked-by: Kees Cook <kees@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-19x86/e820: Discard high memory that can't be addressed by 32-bit systemsMike Rapoport (Microsoft)
Dave Hansen reports the following crash on a 32-bit system with CONFIG_HIGHMEM=y and CONFIG_X86_PAE=y: > 0xf75fe000 is the mem_map[] entry for the first page >4GB. It > obviously wasn't allocated, thus the oops. BUG: unable to handle page fault for address: f75fe000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pdpt = 0000000002da2001 *pde = 000000000300c067 *pte = 0000000000000000 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-00288-ge618ee89561b-dirty #311 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 EIP: __free_pages_core+0x3c/0x74 ... Call Trace: memblock_free_pages+0x11/0x2c memblock_free_all+0x2ce/0x3a0 mm_core_init+0xf5/0x320 start_kernel+0x296/0x79c i386_start_kernel+0xad/0xb0 startup_32_smp+0x151/0x154 The mem_map[] is allocated up to the end of ZONE_HIGHMEM which is defined by max_pfn. The bug was introduced by this recent commit: 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing") Previously, freeing of high memory was also clamped to the end of ZONE_HIGHMEM but after this change, memblock_free_all() tries to free memory above the of ZONE_HIGHMEM as well and that causes access to mem_map[] entries beyond the end of the memory map. To fix this, discard the memory after max_pfn from memblock on 32-bit systems so that core MM would be aware only of actually usable memory. Fixes: 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing") Reported-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Shevchenko <andy@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davide Ciminaghi <ciminaghi@gnudd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/20250413080858.743221-1-rppt@kernel.org # discussion and submission
2025-04-19crypto: lib/poly1305 - restore ability to remove modulesEric Biggers
Though the module_exit functions are now no-ops, they should still be defined, since otherwise the modules become unremovable. Fixes: 1f81c58279c7 ("crypto: arm/poly1305 - remove redundant shash algorithm") Fixes: f4b1a73aec5c ("crypto: arm64/poly1305 - remove redundant shash algorithm") Fixes: 378a337ab40f ("crypto: powerpc/poly1305 - implement library instead of shash") Fixes: 21969da642a2 ("crypto: x86/poly1305 - remove redundant shash algorithm") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-19crypto: lib/chacha - restore ability to remove modulesEric Biggers
Though the module_exit functions are now no-ops, they should still be defined, since otherwise the modules become unremovable. Fixes: 08820553f33a ("crypto: arm/chacha - remove the redundant skcipher algorithms") Fixes: 8c28abede16c ("crypto: arm64/chacha - remove the skcipher algorithms") Fixes: f7915484c020 ("crypto: powerpc/chacha - remove the skcipher algorithms") Fixes: ceba0eda8313 ("crypto: riscv/chacha - implement library instead of skcipher") Fixes: 632ab0978f08 ("crypto: x86/chacha - remove the skcipher algorithms") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-18Merge tag 'x86-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix hypercall detection on Xen guests - Extend the AMD microcode loader SHA check to Zen5, to block loading of any unreleased standalone Zen5 microcode patches - Add new Intel CPU model number for Bartlett Lake - Fix the workaround for AMD erratum 1054 - Fix buggy early memory acceptance between SEV-SNP guests and the EFI stub * tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/sev: Avoid shared GHCB page for early memory acceptance x86/cpu/amd: Fix workaround for erratum 1054 x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any unreleased standalone Zen5 microcode patches x86/xen: Fix __xen_hypercall_setfunc()
2025-04-18Merge tag 'timers-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a lockdep false positive in the i8253 driver" * tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/i8253: Call clockevent_i8253_disable() with interrupts disabled
2025-04-18Merge tag 'perf-urgent-2025-04-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fixes from Ingo Molnar: "Miscellaneous fixes and a hardware-enabling change: - Fix Intel uncore PMU IIO free running counters on SPR, ICX and SNR systems - Fix Intel PEBS buffer overflow handling - Fix skid in Intel PEBS sampling of user-space general purpose registers - Enable Panther Lake PMU support - similar to Lunar Lake" * tag 'perf-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add Panther Lake support perf/x86/intel: Allow to update user space GPRs from PEBS records perf/x86/intel: Don't clear perf metrics overflow bit unconditionally perf/x86/intel/uncore: Fix the scale of IIO free running counters on SPR perf/x86/intel/uncore: Fix the scale of IIO free running counters on ICX perf/x86/intel/uncore: Fix the scale of IIO free running counters on SNR
2025-04-18x86/mm: Fix {,un}use_temporary_mm() IRQ statePeter Zijlstra
As the function switch_mm_irqs_off() implies, it ought to be called with IRQs *off*. Commit 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs are on") caused this to not be the case for EFI. Ensure IRQs are off where it matters. Fixes: 58f8ffa91766 ("x86/mm: Allow temporary MMs when IRQs are on") Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20250418095034.GR38216@noisy.programming.kicks-ass.net
2025-04-18x86/boot/sev: Avoid shared GHCB page for early memory acceptanceArd Biesheuvel
Communicating with the hypervisor using the shared GHCB page requires clearing the C bit in the mapping of that page. When executing in the context of the EFI boot services, the page tables are owned by the firmware, and this manipulation is not possible. So switch to a different API for accepting memory in SEV-SNP guests, one which is actually supported at the point during boot where the EFI stub may need to accept memory, but the SEV-SNP init code has not executed yet. For simplicity, also switch the memory acceptance carried out by the decompressor when not booting via EFI - this only involves the allocation for the decompressed kernel, and is generally only called after kexec, as normal boot will jump straight into the kernel from the EFI stub. Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com # discussion thread #1 Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com # discussion thread #2 Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com # final submission
2025-04-18x86/cpu/amd: Fix workaround for erratum 1054Sandipan Das
Erratum 1054 affects AMD Zen processors that are a part of Family 17h Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However, when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected processors was incorrectly changed in a way that the IRPerfEn bit gets set only for unaffected Zen 1 processors. Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This includes a subset of Zen 1 (Family 17h Models 30h and above) and all later processors. Also clear X86_FEATURE_IRPERF on affected processors so that the IRPerfCount register is not used by other entities like the MSR PMU driver. Fixes: 232afb557835 ("x86/CPU/AMD: Add X86_FEATURE_ZEN1") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/caa057a9d6f8ad579e2f1abaa71efbd5bd4eaf6d.1744956467.git.sandipan.das@amd.com
2025-04-18perf/x86/amd/uncore: Prevent UMC counters from saturatingSandipan Das
Unlike L3 and DF counters, UMC counters (PERF_CTRs) set the Overflow bit (bit 48) and saturate on overflow. A subsequent pmu->read() of the event reports an incorrect accumulated count as there is no difference between the previous and the current values of the counter. To avoid this, inspect the current counter value and proactively reset the corresponding PERF_CTR register on every pmu->read(). Combined with the periodic reads initiated by the hrtimer, the counters never get a chance saturate but the resolution reduces to 47 bits. Fixes: 25e56847821f ("perf/x86/amd/uncore: Add memory controller support") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Song Liu <song@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/dee9c8af2c6d66814cf4c6224529c144c620cf2c.1744906694.git.sandipan.das@amd.com
2025-04-18perf/x86/amd/uncore: Add parameter to configure hrtimerSandipan Das
Introduce a module parameter for configuring the hrtimer duration in milliseconds. The default duration is 60000 milliseconds and the intent is to allow users to customize it to suit jitter tolerances. It should be noted that a longer duration will reduce jitter but affect accuracy if the programmed events cause the counters to overflow multiple times in a single interval. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/6cb0101da74955fa9c8361f168ffdf481ae8a200.1744906694.git.sandipan.das@amd.com
2025-04-18perf/x86/amd/uncore: Use hrtimer for handling overflowsSandipan Das
Uncore counters do not provide mechanisms like interrupts to report overflows and the accumulated user-visible count is incorrect if there is more than one overflow between two successive read requests for the same event because the value of prev_count goes out-of-date for calculating the correct delta. To avoid this, start a hrtimer to periodically initiate a pmu->read() of the active counters for keeping prev_count up-to-date. It should be noted that the hrtimer duration should be lesser than the shortest time it takes for a counter to overflow for this approach to be effective. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/8ecf5fe20452da1cd19cf3ff4954d3e7c5137468.1744906694.git.sandipan.das@amd.com
2025-04-18perf/x86/intel/uncore: Use HRTIMER_MODE_HARD for detecting overflowsSandipan Das
hrtimer handlers can be deferred to softirq context and affect timely detection of counter overflows. Hence switch to HRTIMER_MODE_HARD. Disabling and re-enabling IRQs in the hrtimer handler is not required as pmu->start() and pmu->stop() can no longer intervene while updating event->hw.prev_count. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/0ad4698465077225769e8edd5b2c7e8f48f636d5.1744906694.git.sandipan.das@amd.com
2025-04-18perf/x86/amd/uncore: Remove unused 'struct amd_uncore_ctx::node' memberSandipan Das
Fixes: d6389d3ccc13 ("perf/x86/amd/uncore: Refactor uncore management") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/30f9254c2de6c4318dd0809ef85a1677f68eef10.1744906694.git.sandipan.das@amd.com
2025-04-18x86/asm: Rename rep_nop() to native_pause()Uros Bizjak
Rename rep_nop() function to what it really does. No functional change intended. Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lore.kernel.org/r/20250418080805.83679-1-ubizjak@gmail.com
2025-04-18x86/asm: Replace "REP; NOP" with PAUSE mnemonicUros Bizjak
Current minimum required version of binutils is 2.25, which supports PAUSE instruction mnemonic. Replace "REP; NOP" with this proper mnemonic. No functional change intended. Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lore.kernel.org/r/20250418080805.83679-2-ubizjak@gmail.com
2025-04-18x86/asm: Remove semicolon from "rep" prefixesUros Bizjak
Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "rep" prefixes, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "rep; insn" (or its alternate "rep\n\tinsn" form) as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving "rep" prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in: Error: invalid instruction `add' after `rep' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20250418071437.4144391-2-ubizjak@gmail.com
2025-04-18x86/boot: Remove semicolon from "rep" prefixesUros Bizjak
Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "rep" prefixes, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "rep; insn" (or its alternate "rep\n\tinsn" form) as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving "rep" prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in: Error: invalid instruction `add' after `rep' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Mares <mj@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250418071437.4144391-1-ubizjak@gmail.com
2025-04-18uprobes/x86: Add support to emulate NOP instructionsJiri Olsa
Add support to emulate all NOP instructions as the original uprobe instruction. This change speeds up uprobe on top of all NOP instructions and is a preparation for usdt probe optimization, that will be done on top of NOP5 instructions. With this change the usdt probe on top of NOP5s won't take the performance hit compared to usdt probe on top of standard NOP instructions. Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Hao Luo <haoluo@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20250414083647.1234007-1-jolsa@kernel.org
2025-04-17x86/PCI: Drop 'pci' suffix from intel_mid_pci.cAndy Shevchenko
CE4100 PCI specific code has no 'pci' suffix in the filename, intel_mid_pci.c is the only one that duplicates the folder name in its filename, drop that redundancy. While at it, group the respective modules in the Makefile. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20250407070321.3761063-1-andriy.shevchenko@linux.intel.com
2025-04-17x86/mm: Remove now unused SHARED_KERNEL_PMDDave Hansen
All the users of SHARED_KERNEL_PMD are gone. Zap it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173244.1125BEC3%40davehans-spike.ostc.intel.com
2025-04-17x86/mm: Remove duplicated PMD preallocation macroDave Hansen
MAX_PREALLOCATED_PMDS and PREALLOCATED_PMDS are now identical. Just use PREALLOCATED_PMDS and remove "MAX". Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173242.5ED13A5B%40davehans-spike.ostc.intel.com
2025-04-17x86/mm: Preallocate all PAE page tablesDave Hansen
Finally, move away from having PAE kernels share any PMDs across processes. This was already the default on PTI kernels which are the common case. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173241.1288CAB4%40davehans-spike.ostc.intel.com
2025-04-17x86/mm: Fix up comments around PMD preallocationDave Hansen
The "paravirt environment" is no longer in the tree. Axe that part of the comment. Also add a blurb to remind readers that "USER_PMDS" refer to the PTI user *copy* of the page tables, not the user *portion*. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173240.5B1AB322%40davehans-spike.ostc.intel.com
2025-04-17x86/mm: Simplify PAE PGD sharing macrosDave Hansen
There are a few too many levels of abstraction here. First, just expand the PREALLOCATED_PMDS macro in place to make it clear that it is only conditional on PTI. Second, MAX_PREALLOCATED_PMDS is only used in one spot for an on-stack allocation. It has a *maximum* value of 4. Do not bother with the macro MAX() magic. Just set it to 4. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173238.6E3CDA56%40davehans-spike.ostc.intel.com
2025-04-17x86/mm: Always tell core mm to sync kernel mappingsDave Hansen
Each mm_struct has its own copy of the page tables. When core mm code makes changes to a copy of the page tables those changes sometimes need to be synchronized with other mms' copies of the page tables. But when this synchronization actually needs to happen is highly architecture and configuration specific. In cases where kernel PMDs are shared across processes (SHARED_KERNEL_PMD) the core mm does not itself need to do that synchronization for kernel PMD changes. The x86 code communicates this by clearing the PGTBL_PMD_MODIFIED bit cleared in those configs to avoid expensive synchronization. The kernel is moving toward never sharing kernel PMDs on 32-bit. Prepare for that and make 32-bit PAE always set PGTBL_PMD_MODIFIED, even if there is no modification to synchronize. This obviously adds some synchronization overhead in cases where the kernel page tables are being changed. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173237.EC790E95%40davehans-spike.ostc.intel.com
2025-04-17x86/mm: Always "broadcast" PMD setting operationsDave Hansen
Kernel PMDs can either be shared across processes or private to a process. On 64-bit, they are always shared. 32-bit non-PAE hardware does not have PMDs, but the kernel logically squishes them into the PGD and treats them as private. Here are the four cases: 64-bit: Shared 32-bit: non-PAE: Private 32-bit: PAE+ PTI: Private 32-bit: PAE+noPTI: Shared Note that 32-bit is all "Private" except for PAE+noPTI being an oddball. The 32-bit+PAE+noPTI case will be made like the rest of 32-bit shortly. But until that can be done, temporarily treat the 32-bit+PAE+noPTI case as Private. This will do unnecessary walks across pgd_list and unnecessary PTE setting but should be otherwise harmless. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173235.F63F50D1%40davehans-spike.ostc.intel.com
2025-04-17x86/mm: Always allocate a whole page for PAE PGDsDave Hansen
A hardware PAE PGD is only 32 bytes. A PGD is PAGE_SIZE in the other paging modes. But for reasons*, the kernel _sometimes_ allocates a whole page even though it only ever uses 32 bytes. Make PAE less weird. Just allocate a page like the other paging modes. This was already being done for PTI (and Xen in the past) and nobody screamed that loudly about it so it can't be that bad. * The original reason for PAGE_SIZE allocations for the PAE PGDs was Xen's need to detect page table writes. But 32-bit PTI forced it too for reasons I'm unclear about. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250414173234.D34F0C3E%40davehans-spike.ostc.intel.com
2025-04-17Merge tag 'for-linus-6.15a-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "Just a single fix for the Xen multicall driver avoiding a percpu variable referencing initdata by its initializer" * tag 'for-linus-6.15a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: fix multicall debug feature
2025-04-17x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off()Peter Zijlstra
The CONFIG_DEBUG_VM=y warning in switch_mm_irqs_off() started triggering in testing: VM_WARN_ON_ONCE(prev != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(prev))); AFAIU what happens is that unuse_temporary_mm() clears the mm_cpumask() for the current CPU, while switch_mm_irqs_off() then checks that the mm_cpumask() bit is set for the current CPU. While this behaviour hasn't really changed since the following commit: 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily") introduced both, but the warning is wrong, so remove it. [ mingo: Patchified Peter's email. ] Reported-by: syzbot+c2537ce72a879a38113e@syzkaller.appspotmail.com Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20250414135629.GA17910@noisy.programming.kicks-ass.net
2025-04-17perf/x86/intel: Introduce pairs of PEBS static callsDapeng Mi
Arch-PEBS retires IA32_PEBS_ENABLE and MSR_PEBS_DATA_CFG MSRs, so intel_pmu_pebs_enable/disable() and intel_pmu_pebs_enable/disable_all() are not needed to call for ach-PEBS. To make the code cleaner, introduce static calls x86_pmu_pebs_enable/disable() and x86_pmu_pebs_enable/disable_all() instead of adding "x86_pmu.arch_pebs" check directly in these helpers. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20250415114428.341182-7-dapeng1.mi@linux.intel.com
2025-04-17perf/x86/intel: Rename x86_pmu.pebs to x86_pmu.ds_pebsDapeng Mi
Since architectural PEBS would be introduced in subsequent patches, rename x86_pmu.pebs to x86_pmu.ds_pebs for distinguishing with the upcoming architectural PEBS. Besides restrict reserve_ds_buffers() helper to work only for the legacy DS based PEBS and avoid it to corrupt the pebs_active flag and release PEBS buffer incorrectly for arch-PEBS since the later patch would reuse these flags and alloc/release_pebs_buffer() helpers for arch-PEBS. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20250415114428.341182-6-dapeng1.mi@linux.intel.com
2025-04-17perf/x86/intel: Decouple BTS initialization from PEBS initializationDapeng Mi
Move x86_pmu.bts flag initialization into bts_init() from intel_ds_init() and rename intel_ds_init() to intel_pebs_init() since it fully initializes PEBS now after removing the x86_pmu.bts initialization. It's safe to move x86_pmu.bts into bts_init() since all x86_pmu.bts flag are called after bts_init() execution. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20250415114428.341182-5-dapeng1.mi@linux.intel.com
2025-04-17perf/x86/intel: Parse CPUID archPerfmonExt leaves for non-hybrid CPUsDapeng Mi
CPUID archPerfmonExt (0x23) leaves are supported to enumerate CPU level's PMU capabilities on non-hybrid processors as well. This patch supports to parse archPerfmonExt leaves on non-hybrid processors. Architectural PEBS leverages archPerfmonExt sub-leaves 0x4 and 0x5 to enumerate the PEBS capabilities as well. This patch is a precursor of the subsequent arch-PEBS enabling patches. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20250415114428.341182-4-dapeng1.mi@linux.intel.com
2025-04-17perf/x86/intel: Add PMU support for Clearwater ForestDapeng Mi
From the PMU's perspective, Clearwater Forest is similar to the previous generation Sierra Forest. The key differences are the ARCH PEBS feature and the new added 3 fixed counters for topdown L1 metrics events. The ARCH PEBS is supported in the following patches. This patch provides support for basic perfmon features and 3 new added fixed counters. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20250415114428.341182-3-dapeng1.mi@linux.intel.com
2025-04-17Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>