summaryrefslogtreecommitdiff
path: root/arch/x86/coco/sev
AgeCommit message (Collapse)Author
10 daysx86/sev: Work around broken noinstr on GCCArd Biesheuvel
Forcibly disable KCSAN for the sev-nmi.c source file, which only contains functions annotated as 'noinstr' but is emitted with calls to KCSAN instrumentation nonetheless. E.g., vmlinux.o: error: objtool: __sev_es_nmi_complete+0x58: call to __kcsan_check_access() leaves .noinstr.text section make[2]: *** [/usr/local/google/home/ardb/linux/scripts/Makefile.vmlinux_o:72: vmlinux.o] Error 1 Fixes: a3cbbb4717e1 ("x86/boot: Move SEV startup code into startup/") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/20250714073402.4107091-2-ardb+git@google.com
2025-07-01x86/sev: Use TSC_FACTOR for Secure TSC frequency calculationNikunj A Dadhania
When using Secure TSC, the GUEST_TSC_FREQ MSR reports a frequency based on the nominal P0 frequency, which deviates slightly (typically ~0.2%) from the actual mean TSC frequency due to clocking parameters. Over extended VM uptime, this discrepancy accumulates, causing clock skew between the hypervisor and a SEV-SNP VM, leading to early timer interrupts as perceived by the guest. The guest kernel relies on the reported nominal frequency for TSC-based timekeeping, while the actual frequency set during SNP_LAUNCH_START may differ. This mismatch results in inaccurate time calculations, causing the guest to perceive hrtimers as firing earlier than expected. Utilize the TSC_FACTOR from the SEV firmware's secrets page (see "Secrets Page Format" in the SNP Firmware ABI Specification) to calculate the mean TSC frequency, ensuring accurate timekeeping and mitigating clock skew in SEV-SNP VMs. Use early_ioremap_encrypted() to map the secrets page as ioremap_encrypted() uses kmalloc() which is not available during early TSC initialization and causes a panic. [ bp: Drop the silly dummy var: https://lore.kernel.org/r/20250630192726.GBaGLlHl84xIopx4Pt@fat_crate.local ] Fixes: 73bbf3b0fbba ("x86/tsc: Init the TSC for Secure TSC guests") Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250630081858.485187-1-nikunj@amd.com
2025-06-03Merge tag 'hyperv-next-signed-20250602' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel) - Fixes for Hyper-V UIO driver (Long Li) - Fixes for Hyper-V PCI driver (Michael Kelley) - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley) - Documentation updates for Hyper-V VMBus (Michael Kelley) - Enhance logging for hv_kvp_daemon (Shradha Gupta) * tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits) Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir Documentation: hyperv: Update VMBus doc with new features and info PCI: hv: Remove unnecessary flex array in struct pci_packet Drivers: hv: Remove hv_alloc/free_* helpers Drivers: hv: Use kzalloc for panic page allocation uio_hv_generic: Align ring size to system page uio_hv_generic: Use correct size for interrupt and monitor pages Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary arch/x86: Provide the CPU number in the wakeup AP callback x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap() PCI: hv: Get vPCI MSI IRQ domain from DeviceTree ACPI: irq: Introduce acpi_get_gsi_dispatcher() Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device() Drivers: hv: vmbus: Get the IRQ number from DeviceTree dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties arm64, x86: hyperv: Report the VTL the system boots in arm64: hyperv: Initialize the Virtual Trust Level field Drivers: hv: Provide arch-neutral implementation of get_vtl() Drivers: hv: Enable VTL mode for arm64 ...
2025-05-27Merge tag 'x86_sev_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD SEV update from Borislav Petkov: "Add a virtual TPM driver glue which allows a guest kernel to talk to a TPM device emulated by a Secure VM Service Module (SVSM) - a helper module of sorts which runs at a different privilege level in the SEV-SNP VM stack. The intent being that a TPM device is emulated by a trusted entity and not by the untrusted host which is the default assumption in the confidential computing scenarios" * tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Register tpm-svsm platform device tpm: Add SNP SVSM vTPM driver svsm: Add header with SVSM_VTPM_CMD helpers x86/sev: Add SVSM vTPM probe/send_command functions
2025-05-23arch/x86: Provide the CPU number in the wakeup AP callbackRoman Kisel
When starting APs, confidential guests and paravisor guests need to know the CPU number, and the pattern of using the linear search has emerged in several places. With N processors that leads to the O(N^2) time complexity. Provide the CPU number in the AP wake up callback so that one can get the CPU number in constant time. Suggested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250507182227.7421-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250507182227.7421-3-romank@linux.microsoft.com>
2025-05-21Merge tag 'v6.15-rc7' into x86/core, to pick up fixesIngo Molnar
Pick up build fixes from upstream to make this tree more testable. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-15x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID headerAhmed S. Darwish
The main CPUID header <asm/cpuid.h> was originally a storefront for the headers: <asm/cpuid/api.h> <asm/cpuid/leaf_0x2_api.h> Now that the latter CPUID(0x2) header has been merged into the former, there is no practical difference between <asm/cpuid.h> and <asm/cpuid/api.h>. Migrate all users to the <asm/cpuid/api.h> header, in preparation of the removal of <asm/cpuid.h>. Don't remove <asm/cpuid.h> just yet, in case some new code in -next started using it. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de
2025-05-13x86/sev: Make sure pages are not skipped during kdumpAshish Kalra
When shared pages are being converted to private during kdump, additional checks are performed. They include handling the case of a GHCB page being contained within a huge page. Currently, this check incorrectly skips a page just below the GHCB page from being transitioned back to private during kdump preparation. This skipped page causes a 0x404 #VC exception when it is accessed later while dumping guest memory for vmcore generation. Correct the range to be checked for GHCB contained in a huge page. Also, ensure that the skipped huge page containing the GHCB page is transitioned back to private by applying the correct address mask later when changing GHCBs to private at end of kdump preparation. [ bp: Massage commit message. ] Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec") Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250506183529.289549-1-Ashish.Kalra@amd.com
2025-05-13x86/sev: Do not touch VMSA pages during SNP guest memory kdumpAshish Kalra
When kdump is running makedumpfile to generate vmcore and dump SNP guest memory it touches the VMSA page of the vCPU executing kdump. It then results in unrecoverable #NPF/RMP faults as the VMSA page is marked busy/in-use when the vCPU is running and subsequently a causes guest softlockup/hang. Additionally, other APs may be halted in guest mode and their VMSA pages are marked busy and touching these VMSA pages during guest memory dump will also cause #NPF. Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out of guest mode and then clear the VMSA bit on their VMSA pages. If the vCPU running kdump is an AP, mark it's VMSA page as offline to ensure that makedumpfile excludes that page while dumping guest memory. Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec") Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250428214151.155464-1-Ashish.Kalra@amd.com
2025-05-13Merge branch 'x86/msr' into x86/core, to resolve conflictsIngo Molnar
Conflicts: arch/x86/boot/startup/sme.c arch/x86/coco/sev/core.c arch/x86/kernel/fpu/core.c arch/x86/kernel/fpu/xstate.c Semantic conflict: arch/x86/include/asm/sev-internal.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-05x86/sev: Disentangle #VC handling code from startup codeArd Biesheuvel
Most of the SEV support code used to reside in a single C source file that was included in two places: the core kernel, and the decompressor. The code that is actually shared with the decompressor was moved into a separate, shared source file under startup/, on the basis that the decompressor also executes from the early 1:1 mapping of memory. However, while the elaborate #VC handling and instruction decoding that it involves is also performed by the decompressor, it does not actually occur in the core kernel at early boot, and therefore, does not need to be part of the confined early startup code. So split off the #VC handling code and move it back into arch/x86/coco where it came from, into another C source file that is included from both the decompressor and the core kernel. Code movement only - no functional change intended. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250504095230.2932860-31-ardb+git@google.com
2025-05-02x86/msr: Convert __rdmsr() uses to native_rdmsrq() usesXin Li (Intel)
__rdmsr() is the lowest level MSR write API, with native_rdmsr() and native_rdmsrq() serving as higher-level wrappers around it. #define native_rdmsr(msr, val1, val2) \ do { \ u64 __val = __rdmsr((msr)); \ (void)((val1) = (u32)__val); \ (void)((val2) = (u32)(__val >> 32)); \ } while (0) static __always_inline u64 native_rdmsrq(u32 msr) { return __rdmsr(msr); } However, __rdmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __rdmsr() uses to native_rdmsrq() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-10-xin@zytor.com
2025-05-02x86/msr: Add explicit includes of <asm/msr.h>Xin Li (Intel)
For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
2025-04-24x86/sev: Share the sev_secrets_pa value againTom Lendacky
This commits breaks SNP guests: 234cf67fc3bd ("x86/sev: Split off startup code from core code") The SNP guest boots, but no longer has access to the VMPCK keys needed to communicate with the ASP, which is used, for example, to obtain an attestation report. The secrets_pa value is defined as static in both startup.c and core.c. It is set by a function in startup.c and so when used in core.c its value will be 0. Share it again and add the sev_ prefix to put it into the global SEV symbols namespace. [ mingo: Renamed to sev_secrets_pa ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Link: https://lore.kernel.org/r/cf878810-81ed-3017-52c6-ce6aa41b5f01@amd.com
2025-04-22x86/boot: Move SEV startup code into startup/Ard Biesheuvel
Move the SEV startup code into arch/x86/boot/startup/, where it will reside along with other code that executes extremely early, and therefore needs to be built in a special manner. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-12-ardb+git@google.com
2025-04-22x86/sev: Split off startup code from core codeArd Biesheuvel
Disentangle the SEV core code and the SEV code that is called during early boot. The latter piece will be moved into startup/ in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-11-ardb+git@google.com
2025-04-22x86/sev: Move noinstr NMI handling code into separate source fileArd Biesheuvel
GCC may ignore the __no_sanitize_address function attribute when inlining, resulting in KASAN instrumentation in code tagged as 'noinstr'. Move the SEV NMI handling code, which is noinstr, into a separate source file so KASAN can be disabled on the whole file without losing coverage of other SEV core code, once the startup code is split off from it too. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250418141253.2601348-10-ardb+git@google.com
2025-04-12x86/sev: Prepare for splitting off early SEV codeArd Biesheuvel
Prepare for splitting off parts of the SEV core.c source file into a file that carries code that must tolerate being called from the early 1:1 mapping. This will allow special build-time handling of thise code, to ensure that it gets generated in a way that is compatible with the early execution context. So create a de-facto internal SEV API and put the definitions into sev-internal.h. No attempt is made to allow this header file to be included in arbitrary other sources - this is explicitly not the intent. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-20-ardb+git@google.com
2025-04-12x86/asm: Make rip_rel_ptr() usable from fPIC codeArd Biesheuvel
RIP_REL_REF() is used in non-PIC C code that is called very early, before the kernel virtual mapping is up, which is the mapping that the linker expects. It is currently used in two different ways: - to refer to the value of a global variable, including as an lvalue in assignments; - to take the address of a global variable via the mapping that the code currently executes at. The former case is only needed in non-PIC code, as PIC code will never use absolute symbol references when the address of the symbol is not being used. But taking the address of a variable in PIC code may still require extra care, as a stack allocated struct assignment may be emitted as a memcpy() from a statically allocated copy in .rodata. For instance, this void startup_64_setup_gdt_idt(void) { struct desc_ptr startup_gdt_descr = { .address = (__force unsigned long)gdt_page.gdt, .size = GDT_SIZE - 1, }; may result in an absolute symbol reference in PIC code, even though the struct is allocated on the stack and populated at runtime. To address this case, make rip_rel_ptr() accessible in PIC code, and update any existing uses where the address of a global variable is taken using RIP_REL_REF. Once all code of this nature has been moved into arch/x86/boot/startup and built with -fPIC, RIP_REL_REF() can be retired, and only rip_rel_ptr() will remain. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-14-ardb+git@google.com
2025-04-10x86/sev: Register tpm-svsm platform deviceStefano Garzarella
SNP platform can provide a vTPM device emulated by SVSM. The "tpm-svsm" device can be handled by the platform driver registered by the x86/sev core code. Register the platform device only when SVSM is available and it supports vTPM commands as checked by snp_svsm_vtpm_probe(). [ bp: Massage commit message. ] Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/20250410135118.133240-5-sgarzare@redhat.com
2025-04-10x86/sev: Add SVSM vTPM probe/send_command functionsStefano Garzarella
Add two new functions to probe and send commands to the SVSM vTPM. They leverage the two calls defined by the AMD SVSM specification [1] for the vTPM protocol: SVSM_VTPM_QUERY and SVSM_VTPM_CMD. Expose snp_svsm_vtpm_send_command() to be used by a TPM driver. [1] "Secure VM Service Module for SEV-SNP Guests" Publication # 58019 Revision: 1.00 [ bp: Some doc touchups. ] Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Co-developed-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/20250403100943.120738-2-sgarzare@redhat.com
2025-04-10x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-24Merge tag 'x86-sev-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Ingo Molnar: - Improve sme_enable() PIC build robustness (Kevin Loughlin) - Simplify vc_handle_msr() a bit (Peng Hao) [ Just reminding myself and everybody else about the endless stream of x86 TLAs: "SEV" is AMD's Secure Encrypted Virtualization - Linus ] * tag 'x86-sev-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Simplify the code by removing unnecessary 'else' statement x86/sev: Add missing RIP_REL_REF() invocations during sme_enable()
2025-03-22tracing: Disable branch profiling in noinstr codeJosh Poimboeuf
CONFIG_TRACE_BRANCH_PROFILING inserts a call to ftrace_likely_update() for each use of likely() or unlikely(). That breaks noinstr rules if the affected function is annotated as noinstr. Disable branch profiling for files with noinstr functions. In addition to some individual files, this also includes the entire arch/x86 subtree, as well as the kernel/entry, drivers/cpuidle, and drivers/idle directories, all of which are noinstr-heavy. Due to the nature of how sched binaries are built by combining multiple .c files into one, branch profiling is disabled more broadly across the sched code than would otherwise be needed. This fixes many warnings like the following: vmlinux.o: warning: objtool: do_syscall_64+0x40: call to ftrace_likely_update() leaves .noinstr.text section vmlinux.o: warning: objtool: __rdgsbase_inactive+0x33: call to ftrace_likely_update() leaves .noinstr.text section vmlinux.o: warning: objtool: handle_bug.isra.0+0x198: call to ftrace_likely_update() leaves .noinstr.text section ... Reported-by: Ingo Molnar <mingo@kernel.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/fb94fc9303d48a5ed370498f54500cc4c338eb6d.1742586676.git.jpoimboe@kernel.org
2025-03-17x86/sev: Simplify the code by removing unnecessary 'else' statementPeng Hao
No need for an 'else' statement after a 'return'. [ mingo: Clarified the changelog ] Signed-off-by: Peng Hao <flyingpeng@tencent.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2025-03-07virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutexAlexey Kardashevskiy
Compared to the SNP Guest Request, the "Extended" version adds data pages for receiving certificates. If not enough pages provided, the HV can report to the VM how much is needed so the VM can reallocate and repeat. Commit ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex") moved handling of the allocated/desired pages number out of scope of said mutex and create a possibility for a race (multiple instances trying to trigger Extended request in a VM) as there is just one instance of snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex. Fix the issue by moving the data blob/size and the GHCB input struct (snp_req_data) into snp_guest_req which is allocated on stack now and accessed by the GHCB caller under that mutex. Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four callers needs it. Free the received blob in get_ext_report() right after it is copied to the userspace. Possible future users of snp_send_guest_request() are likely to have different ideas about the buffer size anyways. Fixes: ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex") Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com
2025-01-28x86/sev: Disable jump tables in SEV startup codeArd Biesheuvel
When retpolines and IBT are both disabled, the compiler is free to use jump tables to optimize switch instructions. However, these are emitted by Clang as absolute references into .rodata: jmp *-0x7dfffe90(,%r9,8) R_X86_64_32S .rodata+0x170 Given that this code will execute before that address in .rodata has even been mapped, it is guaranteed to crash a SEV-SNP guest in a way that is difficult to diagnose. So disable jump tables when building this code. It would be better if we could attach this annotation to the __head macro but this appears to be impossible. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250127114334.1045857-6-ardb+git@google.com
2025-01-26Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-25mm/memblock: add memblock_alloc_or_panic interfaceGuo Weikang
Before SLUB initialization, various subsystems used memblock_alloc to allocate memory. In most cases, when memory allocation fails, an immediate panic is required. To simplify this behavior and reduce repetitive checks, introduce `memblock_alloc_or_panic`. This function ensures that memory allocation failures result in a panic automatically, improving code readability and consistency across subsystems that require this behavior. [guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic] Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/ Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24Merge tag 'x86-boot-2025-01-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: - A large and involved preparatory series to pave the way to add exception handling for relocate_kernel - which will be a debugging facility that has aided in the field to debug an exceptionally hard to debug early boot bug. Plus assorted cleanups and fixes that were discovered along the way, by David Woodhouse: - Clean up and document register use in relocate_kernel_64.S - Use named labels in swap_pages in relocate_kernel_64.S - Only swap pages for ::preserve_context mode - Allocate PGD for x86_64 transition page tables separately - Copy control page into place in machine_kexec_prepare() - Invoke copy of relocate_kernel() instead of the original - Move relocate_kernel to kernel .data section - Add data section to relocate_kernel - Drop page_list argument from relocate_kernel() - Eliminate writes through kernel mapping of relocate_kernel page - Clean up register usage in relocate_kernel() - Mark relocate_kernel page as ROX instead of RWX - Disable global pages before writing to control page - Ensure preserve_context flag is set on return to kernel - Use correct swap page in swap_pages function - Fix stack and handling of re-entry point for ::preserve_context - Mark machine_kexec() with __nocfi - Cope with relocate_kernel() not being at the start of the page - Use typedef for relocate_kernel_fn function prototype - Fix location of relocate_kernel with -ffunction-sections (fix by Nathan Chancellor) - A series to remove the last remaining absolute symbol references from .head.text, and enforce this at build time, by Ard Biesheuvel: - Avoid WARN()s and panic()s in early boot code - Don't hang but terminate on failure to remap SVSM CA - Determine VA/PA offset before entering C code - Avoid intentional absolute symbol references in .head.text - Disable UBSAN in early boot code - Move ENTRY_TEXT to the start of the image - Move .head.text into its own output section - Reject absolute references in .head.text - The above build-time enforcement uncovered a handful of bugs of essentially non-working code, and a wrokaround for a toolchain bug, fixed by Ard Biesheuvel as well: - Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12 - Disable UBSAN on SEV code that may execute very early - Disable ftrace branch profiling in SEV startup code - And miscellaneous cleanups: - kexec_core: Add and update comments regarding the KEXEC_JUMP flow (Rafael J. Wysocki) - x86/sysfs: Constify 'struct bin_attribute' (Thomas Weißschuh)" * tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) x86/sev: Disable ftrace branch profiling in SEV startup code x86/kexec: Use typedef for relocate_kernel_fn function prototype x86/kexec: Cope with relocate_kernel() not being at the start of the page kexec_core: Add and update comments regarding the KEXEC_JUMP flow x86/kexec: Mark machine_kexec() with __nocfi x86/kexec: Fix location of relocate_kernel with -ffunction-sections x86/kexec: Fix stack and handling of re-entry point for ::preserve_context x86/kexec: Use correct swap page in swap_pages function x86/kexec: Ensure preserve_context flag is set on return to kernel x86/kexec: Disable global pages before writing to control page x86/sev: Don't hang but terminate on failure to remap SVSM CA x86/sev: Disable UBSAN on SEV code that may execute very early x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12 x86/sysfs: Constify 'struct bin_attribute' x86/kexec: Mark relocate_kernel page as ROX instead of RWX x86/kexec: Clean up register usage in relocate_kernel() x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page x86/kexec: Drop page_list argument from relocate_kernel() x86/kexec: Add data section to relocate_kernel x86/kexec: Move relocate_kernel to kernel .data section ...
2025-01-14x86/sev: Disable ftrace branch profiling in SEV startup codeArd Biesheuvel
Ftrace branch profiling inserts absolute references to its metadata at call sites, and this implies that this kind of instrumentation cannot be used while executing from the 1:1 mapping of memory. Therefore, disable ftrace branch profiling in the SEV startup routines, by disabling it for the entire SEV core source file. Closes: https://lore.kernel.org/oe-kbuild-all/202501072244.zZrx9864-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250107151826.820147-2-ardb+git@google.com
2025-01-08x86/tsc: Init the TSC for Secure TSC guestsNikunj A Dadhania
Use the GUEST_TSC_FREQ MSR to discover the TSC frequency instead of relying on kvm-clock based frequency calibration. Override both CPU and TSC frequency calibration callbacks with securetsc_get_tsc_khz(). Since the difference between CPU base and TSC frequency does not apply in this case, the same callback is being used. [ bp: Carve out from https://lore.kernel.org/r/20250106124633.1418972-11-nikunj@amd.com ] Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250106124633.1418972-11-nikunj@amd.com
2025-01-07x86/sev: Prevent RDTSC/RDTSCP interception for Secure TSC enabled guestsNikunj A Dadhania
The hypervisor should not be intercepting RDTSC/RDTSCP when Secure TSC is enabled. A #VC exception will be generated if the RDTSC/RDTSCP instructions are being intercepted. If this should occur and Secure TSC is enabled, guest execution should be terminated as the guest cannot rely on the TSC value provided by the hypervisor. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Peter Gonda <pgonda@google.com> Link: https://lore.kernel.org/r/20250106124633.1418972-9-nikunj@amd.com
2025-01-07x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guestsNikunj A Dadhania
The hypervisor should not be intercepting GUEST_TSC_FREQ MSR(0xcOO10134) when Secure TSC is enabled. A #VC exception will be generated otherwise. If this should occur and Secure TSC is enabled, terminate guest execution. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250106124633.1418972-8-nikunj@amd.com
2025-01-07x86/sev: Change TSC MSR behavior for Secure TSC enabled guestsNikunj A Dadhania
Secure TSC enabled guests should not write to the MSR_IA32_TSC (0x10) register as the subsequent TSC value reads are undefined. On AMD, MSR_IA32_TSC is intercepted by the hypervisor by default. MSR_IA32_TSC read/write accesses should not exit to the hypervisor for such guests. Accesses to MSR_IA32_TSC need special handling in the #VC handler for the guests with Secure TSC enabled. Writes to MSR_IA32_TSC should be ignored and flagged once with a warning, and reads of MSR_IA32_TSC should return the result of the RDTSC instruction. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250106124633.1418972-7-nikunj@amd.com
2025-01-07x86/sev: Add Secure TSC support for SNP guestsNikunj A Dadhania
Add support for Secure TSC in SNP-enabled guests. Secure TSC allows guests to securely use RDTSC/RDTSCP instructions, ensuring that the parameters used cannot be altered by the hypervisor once the guest is launched. Secure TSC-enabled guests need to query TSC information from the AMD Security Processor. This communication channel is encrypted between the AMD Security Processor and the guest, with the hypervisor acting merely as a conduit to deliver the guest messages to the AMD Security Processor. Each message is protected with AEAD (AES-256 GCM). [ bp: Zap a stray newline over amd_cc_platform_has() while at it, simplify CC_ATTR_GUEST_SNP_SECURE_TSC check ] Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250106124633.1418972-6-nikunj@amd.com
2025-01-07x86/sev: Don't hang but terminate on failure to remap SVSM CAArd Biesheuvel
Commit 09d35045cd0f ("x86/sev: Avoid WARN()s and panic()s in early boot code") replaced a panic() that could potentially hit before the kernel is even mapped with a deadloop, to ensure that execution does not proceed when the condition in question hits. As Tom suggests, it is better to terminate and return to the hypervisor in this case, using a newly invented failure code to describe the failure condition. Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/all/9ce88603-20ca-e644-2d8a-aeeaf79cde69@amd.com
2025-01-07x86/sev: Relocate SNP guest messaging routines to common codeNikunj A Dadhania
At present, the SEV guest driver exclusively handles SNP guest messaging. All routines for sending guest messages are embedded within it. To support Secure TSC, SEV-SNP guests must communicate with the AMD Security Processor during early boot. However, these guest messaging functions are not accessible during early boot since they are currently part of the guest driver. Hence, relocate the core SNP guest messaging functions to SEV common code and provide an API for sending SNP guest messages. No functional change, but just an export symbol added for snp_send_guest_request() and dropped the export symbol on snp_issue_guest_request() and made it static. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250106124633.1418972-5-nikunj@amd.com
2025-01-07x86/sev: Carve out and export SNP guest messaging init routinesNikunj A Dadhania
Currently, the sev-guest driver is the only user of SNP guest messaging. All routines for initializing SNP guest messaging are implemented within the sev-guest driver and are not available during early boot. In preparation for adding Secure TSC guest support, carve out APIs to allocate and initialize the guest messaging descriptor context and make it part of coco/sev/core.c. As there is no user of sev_guest_platform_data anymore, remove the structure. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250106124633.1418972-4-nikunj@amd.com
2025-01-02x86/sev: Disable UBSAN on SEV code that may execute very earlyArd Biesheuvel
Clang 14 and older may emit UBSAN instrumentation into code that is inlined into functions marked with __no_sanitize_undefined¹. This may result in faults when the code is executed very early, which may be the case for functions annotated as __head. Now that this requirement is strictly enforced, the build will fail in this case with the following message Absolute reference to symbol '.data' not permitted in .head.text Work around this by disabling UBSAN instrumentation on all SEV core code. ¹ https://lore.kernel.org/r/20250101024348.GA1828419@ax162 [ bp: Add a footnote with Nathan's detailed explanation and a Fixes tag ] Fixes: 3b6f99a94b04 ("x86/boot: Disable UBSAN in early boot code") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250101115119.114584-2-ardb@kernel.org
2024-12-05x86/boot: Disable UBSAN in early boot codeArd Biesheuvel
The early boot code runs from a 1:1 mapping of memory, and may execute before the kernel virtual mapping is even up. This means absolute symbol references cannot be permitted in this code. UBSAN injects references to global data structures into the code, and without -fPIC, those references are emitted as absolute references to kernel virtual addresses. Accessing those will fault before the kernel virtual mapping is up, so UBSAN needs to be disabled in early boot code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205112804.3416920-13-ardb+git@google.com
2024-12-05x86/sev: Avoid WARN()s and panic()s in early boot codeArd Biesheuvel
Using WARN() or panic() while executing from the early 1:1 mapping is unlikely to do anything useful: the string literals are passed using their kernel virtual addresses which are not even mapped yet. But even if they were, calling into the printk() machinery from the early 1:1 mapped code is not going to get very far. So drop the WARN()s entirely, and replace panic() with a deadloop. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20241205112804.3416920-10-ardb+git@google.com
2024-11-07x86/sev: Cleanup vc_handle_msr()Borislav Petkov (AMD)
Carve out the MSR_SVSM_CAA into a helper with the suggestion that upcoming future users should do the same. Rename that silly exit_info_1 into what it actually means in this function - whether the MSR access is a read or a write. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://lore.kernel.org/r/20241106172647.GAZyum1zngPDwyD2IJ@fat_crate.local
2024-10-28x86/sev: Convert shared memory back to private on kexecAshish Kalra
SNP guests allocate shared buffers to perform I/O. It is done by allocating pages normally from the buddy allocator and converting them to shared with set_memory_decrypted(). The second, kexec-ed, kernel has no idea what memory is converted this way. It only sees E820_TYPE_RAM. Accessing shared memory via private mapping will cause unrecoverable RMP page-faults. On kexec, walk direct mapping and convert all shared memory back to private. It makes all RAM private again and second kernel may use it normally. Additionally, for SNP guests, convert all bss decrypted section pages back to private. The conversion occurs in two steps: stopping new conversions and unsharing all memory. In the case of normal kexec, the stopping of conversions takes place while scheduling is still functioning. This allows for waiting until any ongoing conversions are finished. The second step is carried out when all CPUs except one are inactive and interrupts are disabled. This prevents any conflicts with code that may access shared memory. Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/05a8c15fb665dbb062b04a8cb3d592a63f235937.1722520012.git.ashish.kalra@amd.com
2024-10-16virt: sev-guest: Consolidate SNP guest messaging parameters to a structNikunj A Dadhania
Add a snp_guest_req structure to eliminate the need to pass a long list of parameters. This structure will be used to call the SNP Guest message request API, simplifying the function arguments. Update the snp_issue_guest_request() prototype to include the new guest request structure. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241009092850.197575-5-nikunj@amd.com
2024-10-16x86/sev: Cache the secrets page addressNikunj A Dadhania
Instead of calling get_secrets_page(), which parses the CC blob every time to get the secrets page physical address (secrets_pa), save the secrets page physical address during snp_init() from the CC blob. Since get_secrets_page() is no longer used, remove the function. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241009092850.197575-4-nikunj@amd.com
2024-10-15x86/virt: Move SEV-specific parsing into arch/x86/virt/svmPavan Kumar Paluri
Move SEV-specific kernel command line option parsing support from arch/x86/coco/sev/core.c to arch/x86/virt/svm/cmdline.c so that both host and guest related SEV command line options can be supported. No functional changes intended. Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241014130948.1476946-2-papaluri@amd.com
2024-07-30x86/sev: Fix __reserved field in sev_configPavan Kumar Paluri
sev_config currently has debug, ghcbs_initialized, and use_cas fields. However, __reserved count has not been updated. Fix this. Fixes: 34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas") Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240729180808.366587-1-papaluri@amd.com
2024-07-11x86/sev: Move SEV compilation unitsBorislav Petkov (AMD)
A long time ago it was agreed upon that the coco stuff needs to go where it belongs: https://lore.kernel.org/all/Yg5nh1RknPRwIrb8@zn.tnic and not keep it in arch/x86/kernel. TDX did that and SEV can't find time to do so. So lemme do it. If people have trouble converting their ongoing featuritis patches, ask me for a sed script. No functional changes. Move the instrumentation exclusion bits too, as helpfully caught and reported by the 0day folks. Closes: https://lore.kernel.org/oe-kbuild-all/202406220748.hG3qlmDx-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-lkp/202407091342.46d7dbb-oliver.sang@intel.com Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Ashish Kalra <ashish.kalra@amd.com> Tested-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/r/20240619093014.17962-1-bp@kernel.org