Age | Commit message (Collapse) | Author |
|
Add a module param to each KVM vendor module to allow disabling device
posted interrupts without having to sacrifice all of APICv/AVIC, and to
also effectively enumerate to userspace whether or not KVM may be
utilizing device posted IRQs. Disabling device posted interrupts is
very desirable for testing, and can even be desirable for production
environments, e.g. if the host kernel wants to interpose on device
interrupts.
Put the module param in kvm-{amd,intel}.ko instead of kvm.ko to match
the overall APICv/AVIC controls, and to avoid complications with said
controls. E.g. if the param is in kvm.ko, KVM needs to be snapshot the
original user-defined value to play nice with a vendor module being
reloaded with different enable_apicv settings.
Link: https://lore.kernel.org/r/20250401161804.842968-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rescan I/O APIC routes for a vCPU after handling an intercepted I/O APIC
EOI for an IRQ that is not targeting said vCPU, i.e. after handling what's
effectively a stale EOI VM-Exit. If a level-triggered IRQ is in-flight
when IRQ routing changes, e.g. because the guest changes routing from its
IRQ handler, then KVM intercepts EOIs on both the new and old target vCPUs,
so that the in-flight IRQ can be de-asserted when it's EOI'd.
However, only the EOI for the in-flight IRQ needs to be intercepted, as
IRQs on the same vector with the new routing are coincidental, i.e. occur
only if the guest is reusing the vector for multiple interrupt sources.
If the I/O APIC routes aren't rescanned, KVM will unnecessarily intercept
EOIs for the vector and negative impact the vCPU's interrupt performance.
Note, both commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig
race") and commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI
and IOAPIC reconfigure race") mentioned this issue, but it was considered
a "rare" occurrence thus was not addressed. However in real environments,
this issue can happen even in a well-behaved guest.
Cc: Kai Huang <kai.huang@intel.com>
Co-developed-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
Signed-off-by: xuyun <xuyun_xy.xy@linux.alibaba.com>
Signed-off-by: weizijie <zijie.wei@linux.alibaba.com>
[sean: massage changelog and comments, use int/-1, reset at scan]
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20250304013335.4155703-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The latest AMD platform has introduced a new instruction called PREFETCHI.
This instruction loads a cache line from a specified memory address into
the indicated data or instruction cache level, based on locality reference
hints.
Feature bit definition:
CPUID_Fn80000021_EAX [bit 20] - Indicates support for IC prefetch.
This feature is analogous to Intel's PREFETCHITI (CPUID.(EAX=7,ECX=1):EDX),
though the CPUID bit definitions differ between AMD and Intel.
Advertise support to userspace, as no additional enabling is necessary
(PREFETCHI can't be intercepted as there's no instruction specific behavior
that needs to be virtualize).
The feature is documented in Processor Programming Reference (PPR)
for AMD Family 1Ah Model 02h, Revision C1 (Link below).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/ee1c08fc400bb574a2b8f2c6a0bd9def10a29d35.1744130533.git.babu.moger@amd.com
[sean: rewrite shortlog to highlight the KVM functionality]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rename the WRMSRNS instruction opcode macro so that it doesn't collide
with X86_FEATURE_WRMSRNS when using token pasting to generate references
to X86_FEATURE_WRMSRNS. KVM heavily uses token pasting to generate KVM's
set of support feature bits, and adding WRMSRNS support in KVM will run
will run afoul of the opcode macro.
arch/x86/kvm/cpuid.c:719:37: error: pasting "X86_FEATURE_" and "" "" does not
give a valid preprocessing token
719 | u32 __leaf = __feature_leaf(X86_FEATURE_##name); \
| ^~~~~~~~~~~~
KVM has worked around one such collision in the past by #undef'ing the
problematic macro in order to avoid blocking a KVM rework, but such games
are generally undesirable, e.g. requires bleeding macro details into KVM,
risks weird behavior if what KVM is #undef'ing changes, etc.
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20250227010111.3222742-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Per the APM [1]:
Some processors, identified by CPUID Fn8000_0008_EBX[IbrsSameMode]
(bit 19) = 1, provide additional speculation limits. For these
processors, when IBRS is set, indirect branch predictions are not
influenced by any prior indirect branches, regardless of mode (CPL
and guest/host) and regardless of whether the prior indirect branches
occurred before or after the setting of IBRS. This is referred to as
Same Mode IBRS.
Define this feature bit, which will be used by KVM to determine if an
IBPB is required on nested VM-exits in SVM.
[1] AMD64 Architecture Programmer's Manual Pub. 40332, Rev 4.08 - April
2024, Volume 2, 3.2.9 Speculation Control MSRs
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250221163352.3818347-2-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
* Single fix for broken usage of 'multi-MIDR' infrastructure in PI
code, adding an open-coded erratum check for Cavium ThunderX
* Bugfixes from a planned posted interrupt rework
* Do not use kvm_rip_read() unconditionally to cater for guests
with inaccessible register state.
|
|
This commits breaks SNP guests:
234cf67fc3bd ("x86/sev: Split off startup code from core code")
The SNP guest boots, but no longer has access to the VMPCK keys needed
to communicate with the ASP, which is used, for example, to obtain an
attestation report.
The secrets_pa value is defined as static in both startup.c and
core.c. It is set by a function in startup.c and so when used in
core.c its value will be 0.
Share it again and add the sev_ prefix to put it into the global
SEV symbols namespace.
[ mingo: Renamed to sev_secrets_pa ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Link: https://lore.kernel.org/r/cf878810-81ed-3017-52c6-ce6aa41b5f01@amd.com
|
|
kvm_arch_has_irq_bypass() is a small function and even though it does
not appear in any *really* hot paths, it's also not entirely rare.
Make it inline---it also works out nicely in preparation for using it in
kvm-intel.ko and kvm-amd.ko, since the function is not currently exported.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Recently _pgd_alloc() was switched from using __get_free_pages() to
pagetable_alloc_noprof(), which might return a compound page in case
the allocation order is larger than 0.
On x86 this will be the case if CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
is set, even if PTI has been disabled at runtime.
When running as a Xen PV guest (this will always disable PTI), using
a compound page for a PGD will result in VM_BUG_ON_PGFLAGS being
triggered when the Xen code tries to pin the PGD.
Fix the Xen issue together with the not needed 8k allocation for a
PGD with PTI disabled by replacing PGD_ALLOCATION_ORDER with an
inline helper returning the needed order for PGD allocations.
Fixes: a9b3c355c2e6 ("asm-generic: pgalloc: provide generic __pgd_{alloc,free}")
Reported-by: Petr Vaněk <arkamar@atlas.cz>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Petr Vaněk <arkamar@atlas.cz>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250422131717.25724-1-jgross%40suse.com
|
|
Replace the open-coded ifdefs in C sources files with IS_ENABLED().
This makes the code easier to read and enables the compiler to typecheck
also the disabled parts, before optimizing them away.
To make this work, also remove the ifdefs from declarations of used
variables.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20240910-x86-vdso-ifdef-v1-1-877c9df9b081@linutronix.de
|
|
Now that all users have been moved into startup/ where PIC codegen is
used, RIP_REL_REF() is no longer needed. Remove it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250418141253.2601348-14-ardb+git@google.com
|
|
Now that the early SEV code is built with -fPIC, RIP_REL_REF() has no
effect and can be dropped.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250418141253.2601348-13-ardb+git@google.com
|
|
upstream fixes
In particular we need this fix before applying subsequent changes:
d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Old microcode is bad for users and for kernel developers.
For users, it exposes them to known fixed security and/or functional
issues. These obviously rarely result in instant dumpster fires in
every environment. But it is as important to keep your microcode up
to date as it is to keep your kernel up to date.
Old microcode also makes kernels harder to debug. A developer looking
at an oops need to consider kernel bugs, known CPU issues and unknown
CPU issues as possible causes. If they know the microcode is up to
date, they can mostly eliminate known CPU issues as the cause.
Make it easier to tell if CPU microcode is out of date. Add a list
of released microcode. If the loaded microcode is older than the
release, tell users in a place that folks can find it:
/sys/devices/system/cpu/vulnerabilities/old_microcode
Tell kernel kernel developers about it with the existing taint
flag:
TAINT_CPU_OUT_OF_SPEC
== Discussion ==
When a user reports a potential kernel issue, it is very common
to ask them to reproduce the issue on mainline. Running mainline,
they will (independently from the distro) acquire a more up-to-date
microcode version list. If their microcode is old, they will
get a warning about the taint and kernel developers can take that
into consideration when debugging.
Just like any other entry in "vulnerabilities/", users are free to
make their own assessment of their exposure.
== Microcode Revision Discussion ==
The microcode versions in the table were generated from the Intel
microcode git repo:
8ac9378a8487 ("microcode-20241112 Release")
which as of this writing lags behind the latest microcode-20250211.
It can be argued that the versions that the kernel picks to call "old"
should be a revision or two old. Which specific version is picked is
less important to me than picking *a* version and enforcing it.
This repository contains only microcode versions that Intel has deemed
to be OS-loadable. It is quite possible that the BIOS has loaded a
newer microcode than the latest in this repo. If this happens, the
system is considered to have new microcode, not old.
Specifically, the sysfs file and taint flag answer the question:
Is the CPU running on the latest OS-loadable microcode,
or something even later that the BIOS loaded?
In other words, Intel never publishes an authoritative list of CPUs
and latest microcode revisions. Until it does, this is the best that
Linux can do.
Also note that the "intel-ucode-defs.h" file is simple, ugly and
has lots of magic numbers. That's on purpose and should allow a
single file to be shared across lots of stable kernel regardless of if
they have the new "VFM" infrastructure or not. It was generated with
a dumb script.
== FAQ ==
Q: Does this tell me if my system is secure or insecure?
A: No. It only tells you if your microcode was old when the
system booted.
Q: Should the kernel warn if the microcode list itself is too old?
A: No. New kernels will get new microcode lists, both mainline
and stable. The only way to have an old list is to be running
an old kernel in which case you have bigger problems.
Q: Is this for security or functional issues?
A: Both.
Q: If a given microcode update only has functional problems but
no security issues, will it be considered old?
A: Yes. All microcode image versions within a microcode release
are treated identically. Intel appears to make security
updates without disclosing them in the release notes. Thus,
all updates are considered to be security-relevant.
Q: Who runs old microcode?
A: Anybody with an old distro. This happens all the time inside
of Intel where there are lots of weird systems in labs that
might not be getting regular distro updates and might also
be running rather exotic microcode images.
Q: If I update my microcode after booting will it stop saying
"Vulnerable"?
A: No. Just like all the other vulnerabilies, you need to
reboot before the kernel will reassess your vulnerability.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Ahmed S. Darwish" <darwi@linutronix.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/20250421195659.CF426C07%40davehans-spike.ostc.intel.com
(cherry picked from commit 9127865b15eb0a1bd05ad7efe29489c44394bdc1)
|
|
Rename rep_nop() function to what it really does.
No functional change intended.
Suggested-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250418080805.83679-1-ubizjak@gmail.com
|
|
Current minimum required version of binutils is 2.25,
which supports PAUSE instruction mnemonic.
Replace "REP; NOP" with this proper mnemonic.
No functional change intended.
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250418080805.83679-2-ubizjak@gmail.com
|
|
Minimum version of binutils required to compile the kernel is 2.25.
This version correctly handles the "rep" prefixes, so it is possible
to remove the semicolon, which was used to support ancient versions
of GNU as.
Due to the semicolon, the compiler considers "rep; insn" (or its
alternate "rep\n\tinsn" form) as two separate instructions. Removing
the semicolon makes asm length calculations more accurate, consequently
making scheduling and inlining decisions of the compiler more accurate.
Removing the semicolon also enables assembler checks involving "rep"
prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in:
Error: invalid instruction `add' after `rep'
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20250418071437.4144391-2-ubizjak@gmail.com
|
|
All the users of SHARED_KERNEL_PMD are gone. Zap it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173244.1125BEC3%40davehans-spike.ostc.intel.com
|
|
Each mm_struct has its own copy of the page tables. When core mm code
makes changes to a copy of the page tables those changes sometimes
need to be synchronized with other mms' copies of the page tables. But
when this synchronization actually needs to happen is highly
architecture and configuration specific.
In cases where kernel PMDs are shared across processes
(SHARED_KERNEL_PMD) the core mm does not itself need to do that
synchronization for kernel PMD changes. The x86 code communicates
this by clearing the PGTBL_PMD_MODIFIED bit cleared in those
configs to avoid expensive synchronization.
The kernel is moving toward never sharing kernel PMDs on 32-bit.
Prepare for that and make 32-bit PAE always set PGTBL_PMD_MODIFIED,
even if there is no modification to synchronize. This obviously adds
some synchronization overhead in cases where the kernel page tables
are being changed.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20250414173237.EC790E95%40davehans-spike.ostc.intel.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The static key mmio_stale_data_clear controls the KVM-only mitigation for MMIO
Stale Data vulnerability. Rename it to reflect its purpose.
No functional change.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250416-mmio-rename-v2-1-ad1f5488767c@linux.intel.com
|
|
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-5-chang.seok.bae@intel.com
|
|
Advanced Performance Extensions (APX) is associated with a new state
component number 19. To support saving and restoring of the corresponding
registers via the XSAVE mechanism, introduce the component definition
along with the necessary sanity checks.
Define the new component number, state name, and those register data
type. Then, extend the size checker to validate the register data type
and explicitly list the APX feature flag as a dependency for the new
component in xsave_cpuid_features[].
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-3-chang.seok.bae@intel.com
|
|
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-2-chang.seok.bae@intel.com
|
|
Add missing header inclusions and protect against double inclusion.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Bartlett Lake has a P-core only product with Raptor Cove.
[ mingo: Switch around the define as pointed out by Christian Ludloff:
Ratpr Cove is the core, Bartlett Lake is the product.
Signed-off-by: Pi Xiange <xiange.pi@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Christian Ludloff <ludloff@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: "Ahmed S. Darwish" <darwi@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://lore.kernel.org/r/20250414032839.5368-1-xiange.pi@intel.com
|
|
Conflicts:
tools/arch/x86/include/asm/cpufeatures.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Shorten X86_FEATURE_AMD_HETEROGENEOUS_CORES to X86_FEATURE_AMD_HTR_CORES
to make the last column aligned consistently in the whole file.
No functional changes.
Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250415175410.2944032-4-xin@zytor.com
|
|
Shorten X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT to
X86_FEATURE_CLEAR_BHB_VMEXIT to make the last column aligned
consistently in the whole file.
There's no need to explain in the name what the mitigation does.
No functional changes.
Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250415175410.2944032-3-xin@zytor.com
|
|
It is a special file with special formatting so remove one whitespace
damage and format newer defines like the rest.
No functional changes.
[ Xin: Do the same to tools/arch/x86/include/asm/cpufeatures.h. ]
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250415175410.2944032-2-xin@zytor.com
|
|
Whack this thing because:
- the "unknown" handling is done only for this vuln and not for the
others
- it doesn't do anything besides reporting things differently. It
doesn't apply any mitigations - it is simply causing unnecessary
complications to the code which don't bring anything besides
maintenance overhead to what is already a very nasty spaghetti pile
- all the currently unaffected CPUs can also be in "unknown" status so
there's no need for special handling here
so get rid of it.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: David Kaplan <david.kaplan@amd.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/r/20250414150951.5345-1-bp@kernel.org
|
|
Align the backspaces vertically again, after recent cleanups.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250414094130.6768-1-bp@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-7-mingo@kernel.org
|
|
- There's no need for a newline after the SPDX line
- But there's a need for one before the closing header guard.
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Cc: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Link: https://lore.kernel.org/r/20250413084144.3746608-6-mingo@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Cc: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Link: https://lore.kernel.org/r/20250413084144.3746608-5-mingo@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-4-mingo@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-3-mingo@kernel.org
|
|
Collect AMD specific platform header files in <asm/amd/*.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <superm1@kernel.org>
Link: https://lore.kernel.org/r/20250413084144.3746608-2-mingo@kernel.org
|
|
A few uses of 'fps' snuck in, which is rather confusing
(to me) as it suggests frames-per-second. ;-)
Rename them to the canonical 'fpstate' name.
No change in functionality.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-9-mingo@kernel.org
|
|
PF_KTHREAD tasks
init_task's FPU state initialization was a bit of a hack:
__x86_init_fpu_begin = .;
. = __x86_init_fpu_begin + 128*PAGE_SIZE;
__x86_init_fpu_end = .;
But the init task isn't supposed to be using the FPU context
in any case, so remove the hack and add in some debug warnings.
As Linus noted in the discussion, the init task (and other
PF_KTHREAD tasks) *can* use the FPU via kernel_fpu_begin()/_end(),
but they don't need the context area because their FPU use is not
preemptible or reentrant, and they don't return to user-space.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250409211127.3544993-8-mingo@kernel.org
|
|
This encapsulates the fpu__drop() functionality better, and it
will also enable other changes that want to check a task for
PF_KTHREAD before calling x86_task_fpu().
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-6-mingo@kernel.org
|
|
As suggested by Oleg, remove the thread::fpu pointer, as we can
calculate it via x86_task_fpu() at compile-time.
This improves code generation a bit:
kepler:~/tip> size vmlinux.before vmlinux.after
text data bss dec hex filename
26475405 10435342 1740804 38651551 24dc69f vmlinux.before
26475339 10959630 1216516 38651485 24dc65d vmlinux.after
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250409211127.3544993-5-mingo@kernel.org
|
|
Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.
This allows us to remove the old kludge of task_struct being variable size:
struct task_struct {
...
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
*/
randomized_struct_fields_end
/* CPU-specific state of this task: */
struct thread_struct thread;
/*
* WARNING: on x86, 'thread_struct' contains a variable-sized
* structure. It *MUST* be at the end of 'task_struct'.
*
* Do not put anything below here!
*/
};
... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.
But the primary motivation is to allow the decoupling of task_struct from
hardware details (<asm/processor.h> in particular), and to eventually allow
the per-task infrastructure:
DECLARE_PER_TASK(type, name);
...
per_task(current, name) = val;
... which requires task_struct to be a constant size struct.
The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.
Fixed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-4-mingo@kernel.org
|
|
This will make the removal of the task_struct::thread.fpu array
easier.
No change in functionality - code generated before and after this
commit is identical on x86-defconfig:
kepler:~/tip> diff -up vmlinux.before.asm vmlinux.after.asm
kepler:~/tip>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250409211127.3544993-3-mingo@kernel.org
|
|
The per-task FPU context/save area is allocated right
next to task_struct, currently in a variable-size
array via task_struct::thread.fpu[], but we plan to
fully hide it from the C type scope.
Introduce the x86_task_fpu() accessor that gets to the
FPU context pointer explicitly from the task pointer.
Right now this is a simple (task)->thread.fpu wrapper.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250409211127.3544993-2-mingo@kernel.org
|
|
Use asm_inline() to instruct the compiler that the size of asm()
is the minimum size of one instruction, ignoring how many instructions
the compiler thinks it is. ALTERNATIVE macro that expands to several
pseudo directives causes instruction length estimate to count
more than 20 instructions.
bloat-o-meter reports minimal code size increase
(x86_64 defconfig with CONFIG_ADDRESS_MASKING, gcc-14.2.1):
add/remove: 2/2 grow/shrink: 5/1 up/down: 2365/-1995 (370)
Function old new delta
-----------------------------------------------------
do_get_mempolicy - 1449 +1449
copy_nodes_to_user - 226 +226
__x64_sys_get_mempolicy 35 213 +178
syscall_user_dispatch_set_config 157 332 +175
__ia32_sys_get_mempolicy 31 206 +175
set_syscall_user_dispatch 29 181 +152
__do_sys_mremap 2073 2083 +10
sp_insert 133 117 -16
task_set_syscall_user_dispatch 172 - -172
kernel_get_mempolicy 1807 - -1807
Total: Before=21423151, After=21423521, chg +0.00%
The code size increase is due to the compiler inlining
more functions that inline untagged_addr(), e.g:
task_set_syscall_user_dispatch() is now fully inlined in
set_syscall_user_dispatch():
000000000010b7e0 <set_syscall_user_dispatch>:
10b7e0: f3 0f 1e fa endbr64
10b7e4: 49 89 c8 mov %rcx,%r8
10b7e7: 48 89 d1 mov %rdx,%rcx
10b7ea: 48 89 f2 mov %rsi,%rdx
10b7ed: 48 89 fe mov %rdi,%rsi
10b7f0: 65 48 8b 3d 00 00 00 mov %gs:0x0(%rip),%rdi
10b7f7: 00
10b7f8: e9 03 fe ff ff jmp 10b600 <task_set_syscall_user_dispatch>
that after inlining becomes:
000000000010b730 <set_syscall_user_dispatch>:
10b730: f3 0f 1e fa endbr64
10b734: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax
10b73b: 00
10b73c: 48 85 ff test %rdi,%rdi
10b73f: 74 54 je 10b795 <set_syscall_user_dispatch+0x65>
10b741: 48 83 ff 01 cmp $0x1,%rdi
10b745: 74 06 je 10b74d <set_syscall_user_dispatch+0x1d>
10b747: b8 ea ff ff ff mov $0xffffffea,%eax
10b74c: c3 ret
10b74d: 48 85 f6 test %rsi,%rsi
10b750: 75 7b jne 10b7cd <set_syscall_user_dispatch+0x9d>
10b752: 48 85 c9 test %rcx,%rcx
10b755: 74 1a je 10b771 <set_syscall_user_dispatch+0x41>
10b757: 48 89 cf mov %rcx,%rdi
10b75a: 49 b8 ef cd ab 89 67 movabs $0x123456789abcdef,%r8
10b761: 45 23 01
10b764: 90 nop
10b765: 90 nop
10b766: 90 nop
10b767: 90 nop
10b768: 90 nop
10b769: 90 nop
10b76a: 90 nop
10b76b: 90 nop
10b76c: 49 39 f8 cmp %rdi,%r8
10b76f: 72 6e jb 10b7df <set_syscall_user_dispatch+0xaf>
10b771: 48 89 88 48 08 00 00 mov %rcx,0x848(%rax)
10b778: 48 89 b0 50 08 00 00 mov %rsi,0x850(%rax)
10b77f: 48 89 90 58 08 00 00 mov %rdx,0x858(%rax)
10b786: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax)
10b78d: f0 80 48 08 20 lock orb $0x20,0x8(%rax)
10b792: 31 c0 xor %eax,%eax
10b794: c3 ret
10b795: 48 09 d1 or %rdx,%rcx
10b798: 48 09 f1 or %rsi,%rcx
10b79b: 75 aa jne 10b747 <set_syscall_user_dispatch+0x17>
10b79d: 48 c7 80 48 08 00 00 movq $0x0,0x848(%rax)
10b7a4: 00 00 00 00
10b7a8: 48 c7 80 50 08 00 00 movq $0x0,0x850(%rax)
10b7af: 00 00 00 00
10b7b3: 48 c7 80 58 08 00 00 movq $0x0,0x858(%rax)
10b7ba: 00 00 00 00
10b7be: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax)
10b7c5: f0 80 60 08 df lock andb $0xdf,0x8(%rax)
10b7ca: 31 c0 xor %eax,%eax
10b7cc: c3 ret
10b7cd: 48 8d 3c 16 lea (%rsi,%rdx,1),%rdi
10b7d1: 48 39 fe cmp %rdi,%rsi
10b7d4: 0f 82 78 ff ff ff jb 10b752 <set_syscall_user_dispatch+0x22>
10b7da: e9 68 ff ff ff jmp 10b747 <set_syscall_user_dispatch+0x17>
10b7df: b8 f2 ff ff ff mov $0xfffffff2,%eax
10b7e4: c3 ret
Please note a series of NOPs that get replaced with an alternative:
11f0: 65 48 23 05 00 00 00 and %gs:0x0(%rip),%rax
11f7: 00
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250407072129.33440-1-ubizjak@gmail.com
|
|
To reduce the impact of the API renames in -next, add compatibility
wrappers for the two most popular MSR access APIs: rdmsrl() and wrmsrl().
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
|
|
Since objtool's inception, frame pointer warnings have been manually
silenced for __arch_hweight*() to allow those functions' inline asm to
avoid using ASM_CALL_CONSTRAINT.
The potentially dubious reasoning for that decision over nine years ago
was that since !X86_FEATURE_POPCNT is exceedingly rare, it's not worth
hurting the code layout for a function call that will never happen on
the vast majority of systems.
However, those functions actually started using ASM_CALL_CONSTRAINT with
the following commit:
194a613088a8 ("x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()")
And rightfully so, as it makes the code correct. ASM_CALL_CONSTRAINT
will soon have no effect for non-FP configs anyway.
With ASM_CALL_CONSTRAINT in place, ANNOTATE_IGNORE_ALTERNATIVE no longer
has a purpose for the hweight functions. Remove it.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/e7070dba3278c90f1a836b16157dcd34ccd21e21.1744318586.git.jpoimboe@kernel.org
|
|
Refer __percpu_prefix to __force_percpu_prefix to avoid duplicate
definition. While there, slightly reorder definitions to a more
logical sequence, remove unneeded double quotes and move misplaced
comment to the right place.
No functional changes intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250411093130.81389-1-ubizjak@gmail.com
|