summaryrefslogtreecommitdiff
path: root/arch/x86/virt/svm
AgeCommit message (Collapse)Author
2025-05-15x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID headerAhmed S. Darwish
The main CPUID header <asm/cpuid.h> was originally a storefront for the headers: <asm/cpuid/api.h> <asm/cpuid/leaf_0x2_api.h> Now that the latter CPUID(0x2) header has been merged into the former, there is no practical difference between <asm/cpuid.h> and <asm/cpuid/api.h>. Migrate all users to the <asm/cpuid/api.h> header, in preparation of the removal of <asm/cpuid.h>. Don't remove <asm/cpuid.h> just yet, in case some new code in -next started using it. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de
2025-05-02x86/msr: Add explicit includes of <asm/msr.h>Xin Li (Intel)
For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
2025-04-10x86/msr: Rename 'wrmsrl()' to 'wrmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-10x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-22x86/kexec: Export e820_table_kexec[] to sysfsDave Young
Previously the e820_table_kexec[] was exported to sysfs since kexec-tools uses the memmap entries to prepare the e820 table for the new kernel. The following commit, ~8 years ago, introduced e820_table_firmware[] and changed the behavior to export the firmware table instead: 12df216c61c8 ("x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table") Originally the kexec_file_load and kexec_load syscalls both used e820_table_kexec[]. Since the sysfs exported entries are from e820_table_firmware[] people now need to tune both tables for kexec. Restore the old behavior so the kexec_load and kexec_file_load syscalls work with only one table update. The e820_table_firmware[] is used by hibernation kernel code and it works without the sysfs exporting. Also remove the SEV e820_table_firmware[] updating code. Also update the code comments here and drop the comments about setup_data reservation since it is not needed any more after this change was made a year ago: fc7f27cda843 ("x86/kexec: Do not update E820 kexec table for setup_data") [ mingo: Tidy up the changelog and comments. ] Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Link: https://lore.kernel.org/r/Z5jcb1GKhLvH8kDc@darkstar.users.ipa.redhat.com
2025-02-14x86/sev: Fix broken SNP support with KVM module built-inAshish Kalra
Fix issues with enabling SNP host support and effectively SNP support which is broken with respect to the KVM module being built-in. SNP host support is enabled in snp_rmptable_init() which is invoked as device_initcall(). SNP check on IOMMU is done during IOMMU PCI init (IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is currently invoked via device_initcall() and cannot be invoked via subsys_initcall() as core IOMMU subsystem gets initialized via subsys_initcall(). Now, if kvm_amd module is built-in, it gets initialized before SNP host support is enabled in snp_rmptable_init() : [ 10.131811] kvm_amd: TSC scaling supported [ 10.136384] kvm_amd: Nested Virtualization enabled [ 10.141734] kvm_amd: Nested Paging enabled [ 10.146304] kvm_amd: LBR virtualization supported [ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509) [ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99) [ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99) [ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported [ 10.177052] kvm_amd: Virtual GIF supported ... ... [ 10.201648] kvm_amd: in svm_enable_virtualization_cpu And then svm_x86_ops->enable_virtualization_cpu() (svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following: wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa); So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs. snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu() as following : ... [ 11.256138] kvm_amd: in svm_enable_virtualization_cpu ... [ 11.264918] SEV-SNP: in snp_rmptable_init This triggers a #GP exception in snp_rmptable_init() when snp_enable() is invoked to set SNP_EN in SYSCFG MSR: [ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30) ... [ 11.294404] Call Trace: [ 11.294482] <IRQ> [ 11.294513] ? show_stack_regs+0x26/0x30 [ 11.294522] ? ex_handler_msr+0x10f/0x180 [ 11.294529] ? search_extable+0x2b/0x40 [ 11.294538] ? fixup_exception+0x2dd/0x340 [ 11.294542] ? exc_general_protection+0x14f/0x440 [ 11.294550] ? asm_exc_general_protection+0x2b/0x30 [ 11.294557] ? __pfx_snp_enable+0x10/0x10 [ 11.294567] ? native_write_msr+0x8/0x30 [ 11.294570] ? __snp_enable+0x5d/0x70 [ 11.294575] snp_enable+0x19/0x20 [ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0 [ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20 [ 11.294589] __sysvec_call_function+0x20/0x90 [ 11.294596] sysvec_call_function+0x80/0xb0 [ 11.294601] </IRQ> [ 11.294603] <TASK> [ 11.294605] asm_sysvec_call_function+0x1f/0x30 ... [ 11.294631] arch_cpu_idle+0xd/0x20 [ 11.294633] default_idle_call+0x34/0xd0 [ 11.294636] do_idle+0x1f1/0x230 [ 11.294643] ? complete+0x71/0x80 [ 11.294649] cpu_startup_entry+0x30/0x40 [ 11.294652] start_secondary+0x12d/0x160 [ 11.294655] common_startup_64+0x13e/0x141 [ 11.294662] </TASK> This #GP exception is getting triggered due to the following errata for AMD family 19h Models 10h-1Fh Processors: Processor may generate spurious #GP(0) Exception on WRMSR instruction: Description: The Processor will generate a spurious #GP(0) Exception on a WRMSR instruction if the following conditions are all met: - the target of the WRMSR is a SYSCFG register. - the write changes the value of SYSCFG.SNPEn from 0 to 1. - One of the threads that share the physical core has a non-zero value in the VM_HSAVE_PA MSR. The document being referred to above: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf To summarize, with kvm_amd module being built-in, KVM/SVM initialization happens before host SNP is enabled and this SVM initialization sets VM_HSAVE_PA to non-zero, which then triggers a #GP when SYSCFG.SNPEn is being set and this will subsequently cause SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not set on all CPUs. Essentially SNP host enabling code should be invoked before KVM initialization, which is currently not the case when KVM is built-in. Add fix to call snp_rmptable_init() early from iommu_snp_enable() directly and not invoked via device_initcall() which enables SNP host support before KVM initialization with kvm_amd module built-in. Add additional handling for `iommu=off` or `amd_iommu=off` options. Note that IOMMUs need to be enabled for SNP initialization, therefore, if host SNP support is enabled but late IOMMU initialization fails then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity checks in SNP firmware will fail with invalid configuration error as below: [ 9.723114] ccp 0000:23:00.1: sev enabled [ 9.727602] ccp 0000:23:00.1: psp enabled [ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002) [ 9.739098] ccp 0000:a2:00.1: no command queues available [ 9.745167] ccp 0000:a2:00.1: psp enabled [ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3 [ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5 Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Acked-by: Joerg Roedel <jroedel@suse.de> Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-14x86/sev: Add full support for a segmented RMP tableTom Lendacky
A segmented RMP table allows for improved locality of reference between the memory protected by the RMP and the RMP entries themselves. Add support to detect and initialize a segmented RMP table with multiple segments as configured by the system BIOS. While the RMPREAD instruction will be used to read an RMP entry in a segmented RMP, initialization and debugging capabilities will require the mapping of the segments. The RMP_CFG MSR indicates if segmented RMP support is enabled and, if enabled, the amount of memory that an RMP segment covers. When segmented RMP support is enabled, the RMP_BASE MSR points to the start of the RMP bookkeeping area, which is 16K in size. The RMP Segment Table (RST) is located immediately after the bookkeeping area and is 4K in size. The RST contains up to 512 8-byte entries that identify the location of the RMP segment and amount of memory mapped by the segment (which must be less than or equal to the configured segment size). The physical address that is covered by a segment is based on the segment size and the index of the segment in the RST. The RMP entry for a physical address is based on the offset within the segment. For example, if the segment size is 64GB (0x1000000000 or 1 << 36), then physical address 0x9000800000 is RST entry 9 (0x9000800000 >> 36) and RST entry 9 covers physical memory 0x9000000000 to 0x9FFFFFFFFF. The RMP entry index within the RMP segment is the physical address AND-ed with the segment mask, 64GB - 1 (0xFFFFFFFFF), and then right-shifted 12 bits or PHYS_PFN(0x9000800000 & 0xFFFFFFFFF), which is 0x800. CPUID 0x80000025_EBX[9:0] describes the number of RMP segments that can be cached by the hardware. Additionally, if CPUID 0x80000025_EBX[10] is set, then the number of actual RMP segments defined cannot exceed the number of RMP segments that can be cached and can be used as a maximum RST index. [ bp: Unify printk hex format specifiers. ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/02afd0ffd097a19cb6e5fb1bb76eb110496c5b11.1734101742.git.thomas.lendacky@amd.com
2024-12-14x86/sev: Treat the contiguous RMP table as a single RMP segmentTom Lendacky
In preparation for support of a segmented RMP table, treat the contiguous RMP table as a segmented RMP table with a single segment covering all of memory. By treating a contiguous RMP table as a single segment, much of the code that initializes and accesses the RMP can be re-used. Segmented RMP tables can have up to 512 segment entries. Each segment will have metadata associated with it to identify the segment location, the segment size, etc. The segment data and the physical address are used to determine the index of the segment within the table and then the RMP entry within the segment. For an actual segmented RMP table environment, much of the segment information will come from a configuration MSR. For the contiguous RMP, though, much of the information will be statically defined. [ bp: Touchups, explain array_index_nospec() usage. ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/8c40fbc9c5217f0d79b37cf861eff03ab0330bef.1733172653.git.thomas.lendacky@amd.com
2024-12-14x86/sev: Map only the RMP table entries instead of the full RMP rangeTom Lendacky
In preparation for support of a segmented RMP table, map only the RMP table entries. The RMP bookkeeping area is only ever accessed when first enabling SNP and does not need to remain mapped. To accomplish this, split the initialization of the RMP bookkeeping area and the initialization of the RMP entry area. The RMP bookkeeping area will be mapped only while it is being initialized. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Reviewed-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/22f179998d319834f49c13a8c01187fbf0fd308d.1733172653.git.thomas.lendacky@amd.com
2024-12-14x86/sev: Move the SNP probe routine out of the wayTom Lendacky
To make patch review easier for the segmented RMP support, move the SNP probe function out from in between the initialization-related routines. No functional change. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/6c2975bbf132d567dd12e1435be1d18c0bf9131c.1733172653.git.thomas.lendacky@amd.com
2024-12-14x86/sev: Add support for the RMPREAD instructionTom Lendacky
The RMPREAD instruction returns an architecture defined format of an RMP table entry. This is the preferred method for examining RMP entries. The instruction is advertised in CPUID 0x8000001f_EAX[21]. Use this instruction when available. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Reviewed-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/72c734ac8b324bbc0c839b2c093a11af4a8881fa.1733172653.git.thomas.lendacky@amd.com
2024-12-13x86/sev: Prepare for using the RMPREAD instruction to access the RMPTom Lendacky
The RMPREAD instruction returns an architecture defined format of an RMP entry. This is the preferred method for examining RMP entries. In preparation for using the RMPREAD instruction, convert the existing code that directly accesses the RMP to map the raw RMP information into the architecture defined format. RMPREAD output returns a status bit for the 2MB region status. If the input page address is 2MB aligned and any other pages within the 2MB region are assigned, then 2MB region status will be set to 1. Otherwise, the 2MB region status will be set to 0. For systems that do not support RMPREAD, calculating this value would require looping over all of the RMP table entries within that range until one is found with the assigned bit set. Since this bit is not defined in the current format, and so not used today, do not incur the overhead associated with calculating it. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/da49d5af1eb7f9039f35f14a32ca091efb2dd818.1733172653.git.thomas.lendacky@amd.com
2024-11-19Merge tag 'x86_sev_for_v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Do the proper memory conversion of guest memory in order to be able to kexec kernels in SNP guests along with other adjustments and cleanups to that effect - Start converting and moving functionality from the sev-guest driver into core code with the purpose of supporting the secure TSC SNP feature where the hypervisor cannot influence the TSC exposed to the guest anymore - Add a "nosnp" cmdline option in order to be able to disable SNP support in the hypervisor and thus free-up resources which are not going to be used - Cleanups [ Reminding myself about the endless TLA's again: SEV is the AMD Secure Encrypted Virtualization - Linus ] * tag 'x86_sev_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Cleanup vc_handle_msr() x86/sev: Convert shared memory back to private on kexec x86/mm: Refactor __set_clr_pte_enc() x86/boot: Skip video memory access in the decompressor for SEV-ES/SNP virt: sev-guest: Carve out SNP message context structure virt: sev-guest: Reduce the scope of SNP command mutex virt: sev-guest: Consolidate SNP guest messaging parameters to a struct x86/sev: Cache the secrets page address x86/sev: Handle failures from snp_init() virt: sev-guest: Use AES GCM crypto library x86/virt: Provide "nosnp" boot option for sev kernel command line x86/virt: Move SEV-specific parsing into arch/x86/virt/svm
2024-10-23x86/sev: Ensure that RMP table fixups are reservedAshish Kalra
The BIOS reserves RMP table memory via e820 reservations. This can still lead to RMP page faults during kexec if the host tries to access memory within the same 2MB region. Commit 400fea4b9651 ("x86/sev: Add callback to apply RMP table fixups for kexec" adjusts the e820 reservations for the RMP table so that the entire 2MB range at the start/end of the RMP table is marked reserved. The e820 reservations are then passed to firmware via SNP_INIT where they get marked HV-Fixed. The RMP table fixups are done after the e820 ranges have been added to memblock, allowing the fixup ranges to still be allocated and used by the system. The problem is that this memory range is now marked reserved in the e820 tables and during SNP initialization these reserved ranges are marked as HV-Fixed. This means that the pages cannot be used by an SNP guest, only by the hypervisor. However, the memory management subsystem does not make this distinction and can allocate one of those pages to an SNP guest. This will ultimately result in RMPUPDATE failures associated with the guest, causing it to fail to start or terminate when accessing the HV-Fixed page. The issue is captured below with memblock=debug: [ 0.000000] SEV-SNP: *** DEBUG: snp_probe_rmptable_info:352 - rmp_base=0x280d4800000, rmp_end=0x28357efffff ... [ 0.000000] BIOS-provided physical RAM map: ... [ 0.000000] BIOS-e820: [mem 0x00000280d4800000-0x0000028357efffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000028357f00000-0x0000028357ffffff] usable ... ... [ 0.183593] memblock add: [0x0000028357f00000-0x0000028357ffffff] e820__memblock_setup+0x74/0xb0 ... [ 0.203179] MEMBLOCK configuration: [ 0.207057] memory size = 0x0000027d0d194000 reserved size = 0x0000000009ed2c00 [ 0.215299] memory.cnt = 0xb ... [ 0.311192] memory[0x9] [0x0000028357f00000-0x0000028357ffffff], 0x0000000000100000 bytes flags: 0x0 ... ... [ 0.419110] SEV-SNP: Reserving start/end of RMP table on a 2MB boundary [0x0000028357e00000] [ 0.428514] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved [ 0.428517] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved [ 0.428520] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved ... ... [ 5.604051] MEMBLOCK configuration: [ 5.607922] memory size = 0x0000027d0d194000 reserved size = 0x0000000011faae02 [ 5.616163] memory.cnt = 0xe ... [ 5.754525] memory[0xc] [0x0000028357f00000-0x0000028357ffffff], 0x0000000000100000 bytes on node 0 flags: 0x0 ... ... [ 10.080295] Early memory node ranges[ 10.168065] ... node 0: [mem 0x0000028357f00000-0x0000028357ffffff] ... ... [ 8149.348948] SEV-SNP: RMPUPDATE failed for PFN 28357f7c, pg_level: 1, ret: 2 As shown above, the memblock allocations show 1MB after the end of the RMP as available for allocation, which is what the RMP table fixups have reserved. This memory range subsequently gets allocated as SNP guest memory, resulting in an RMPUPDATE failure. This can potentially be fixed by not reserving the memory range in the e820 table, but that causes kexec failures when using the KEXEC_FILE_LOAD syscall. The solution is to use memblock_reserve() to mark the memory reserved for the system, ensuring that it cannot be allocated to an SNP guest. Since HV-Fixed memory is still readable/writable by the host, this only ends up being a problem if the memory in this range requires a page state change, which generally will only happen when allocating memory in this range to be used for running SNP guests, which is now possible with the SNP hypervisor support in kernel 6.11. Backporter note: Fixes tag points to a 6.9 change but as the last paragraph above explains, this whole thing can happen after 6.11 received SNP HV support, therefore backporting to 6.9 is not really necessary. [ bp: Massage commit message. ] Fixes: 400fea4b9651 ("x86/sev: Add callback to apply RMP table fixups for kexec") Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@kernel.org> # 6.11, see Backporter note above. Link: https://lore.kernel.org/r/20240815221630.131133-1-Ashish.Kalra@amd.com
2024-10-15x86/virt: Provide "nosnp" boot option for sev kernel command linePavan Kumar Paluri
Provide a "nosnp" kernel command line option to prevent enabling of the RMP and SEV-SNP features in the host/hypervisor. Not initializing the RMP removes system overhead associated with RMP checks. [ bp: Actually make it a HV-only cmdline option. ] Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241014130948.1476946-3-papaluri@amd.com
2024-10-15x86/virt: Move SEV-specific parsing into arch/x86/virt/svmPavan Kumar Paluri
Move SEV-specific kernel command line option parsing support from arch/x86/coco/sev/core.c to arch/x86/virt/svm/cmdline.c so that both host and guest related SEV command line options can be supported. No functional changes intended. Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241014130948.1476946-2-papaluri@amd.com
2024-07-11x86/sev: Do RMP memory coverage check after max_pfn has been setTom Lendacky
The RMP table is probed early in the boot process before max_pfn has been set, so the logic to check if the RMP covers all of system memory is not valid. Move the RMP memory coverage check from snp_probe_rmptable_info() into snp_rmptable_init(), which is well after max_pfn has been set. Also, fix the calculation to use PFN_UP instead of PHYS_PFN, in order to compute the required RMP size properly. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/bec4364c7e34358cc576f01bb197a7796a109169.1718984524.git.thomas.lendacky@amd.com
2024-04-29x86/sev: Add callback to apply RMP table fixups for kexecAshish Kalra
Handle cases where the RMP table placement in the BIOS is not 2M aligned and the kexec-ed kernel could try to allocate from within that chunk which then causes a fatal RMP fault. The kexec failure is illustrated below: SEV-SNP: RMP table physical range [0x0000007ffe800000 - 0x000000807f0fffff] BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000008efff] usable BIOS-e820: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS ... BIOS-e820: [mem 0x0000004080000000-0x0000007ffe7fffff] usable BIOS-e820: [mem 0x0000007ffe800000-0x000000807f0fffff] reserved BIOS-e820: [mem 0x000000807f100000-0x000000807f1fefff] usable As seen here in the e820 memory map, the end range of the RMP table is not aligned to 2MB and not reserved but it is usable as RAM. Subsequently, kexec -s (KEXEC_FILE_LOAD syscall) loads it's purgatory code and boot_param, command line and other setup data into this RAM region as seen in the kexec logs below, which leads to fatal RMP fault during kexec boot. Loaded purgatory at 0x807f1fa000 Loaded boot_param, command line and misc at 0x807f1f8000 bufsz=0x1350 memsz=0x2000 Loaded 64bit kernel at 0x7ffae00000 bufsz=0xd06200 memsz=0x3894000 Loaded initrd at 0x7ff6c89000 bufsz=0x4176014 memsz=0x4176014 E820 memmap: 0000000000000000-000000000008efff (1) 000000000008f000-000000000008ffff (4) 0000000000090000-000000000009ffff (1) ... 0000004080000000-0000007ffe7fffff (1) 0000007ffe800000-000000807f0fffff (2) 000000807f100000-000000807f1fefff (1) 000000807f1ff000-000000807fffffff (2) nr_segments = 4 segment[0]: buf=0x00000000e626d1a2 bufsz=0x4000 mem=0x807f1fa000 memsz=0x5000 segment[1]: buf=0x0000000029c67bd6 bufsz=0x1350 mem=0x807f1f8000 memsz=0x2000 segment[2]: buf=0x0000000045c60183 bufsz=0xd06200 mem=0x7ffae00000 memsz=0x3894000 segment[3]: buf=0x000000006e54f08d bufsz=0x4176014 mem=0x7ff6c89000 memsz=0x4177000 kexec_file_load: type:0, start:0x807f1fa150 head:0x1184d0002 flags:0x0 Check if RMP table start and end physical range in the e820 tables are not aligned to 2MB and in that case map this range to reserved in all the three e820 tables. [ bp: Massage. ] Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/df6e995ff88565262c2c7c69964883ff8aa6fc30.1714090302.git.ashish.kalra@amd.com
2024-04-04x86/CPU/AMD: Track SNP host status with cc_platform_*()Borislav Petkov (AMD)
The host SNP worthiness can determined later, after alternatives have been patched, in snp_rmptable_init() depending on cmdline options like iommu=pt which is incompatible with SNP, for example. Which means that one cannot use X86_FEATURE_SEV_SNP and will need to have a special flag for that control. Use that newly added CC_ATTR_HOST_SEV_SNP in the appropriate places. Move kdump_sev_callback() to its rightful place, while at it. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Link: https://lore.kernel.org/r/20240327154317.29909-6-bp@alien8.de
2024-01-29crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdumpAshish Kalra
Add a kdump safe version of sev_firmware_shutdown() and register it as a crash_kexec_post_notifier so it will be invoked during panic/crash to do SEV/SNP shutdown. This is required for transitioning all IOMMU pages to reclaim/hypervisor state, otherwise re-init of IOMMU pages during crashdump kernel boot fails and panics the crashdump kernel. This panic notifier runs in atomic context, hence it ensures not to acquire any locks/mutexes and polls for PSP command completion instead of depending on PSP command completion interrupt. [ mdr: Remove use of "we" in comments. ] Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240126041126.1927228-21-michael.roth@amd.com
2024-01-29x86/sev: Introduce an SNP leaked pages listAshish Kalra
Pages are unsafe to be released back to the page-allocator if they have been transitioned to firmware/guest state and can't be reclaimed or transitioned back to hypervisor/shared state. In this case, add them to an internal leaked pages list to ensure that they are not freed or touched/accessed to cause fatal page faults. [ mdr: Relocate to arch/x86/virt/svm/sev.c ] Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20240126041126.1927228-16-michael.roth@amd.com
2024-01-29x86/sev: Adjust the directmap to avoid inadvertent RMP faultsMichael Roth
If the kernel uses a 2MB or larger directmap mapping to write to an address, and that mapping contains any 4KB pages that are set to private in the RMP table, an RMP #PF will trigger and cause a host crash. SNP-aware code that owns the private PFNs will never attempt such a write, but other kernel tasks writing to other PFNs in the range may trigger these checks inadvertently due to writing to those other PFNs via a large directmap mapping that happens to also map a private PFN. Prevent this by splitting any 2MB+ mappings that might end up containing a mix of private/shared PFNs as a result of a subsequent RMPUPDATE for the PFN/rmp_level passed in. Another way to handle this would be to limit the directmap to 4K mappings in the case of hosts that support SNP, but there is potential risk for performance regressions of certain host workloads. Handling it as-needed results in the directmap being slowly split over time, which lessens the risk of a performance regression since the more the directmap gets split as a result of running SNP guests, the more likely the host is being used primarily to run SNP guests, where a mostly-split directmap is actually beneficial since there is less chance of TLB flushing and cpa_lock contention being needed to perform these splits. Cases where a host knows in advance it wants to primarily run SNP guests and wishes to pre-split the directmap can be handled by adding a tuneable in the future, but preliminary testing has shown this to not provide a signficant benefit in the common case of guests that are backed primarily by 2MB THPs, so it does not seem to be warranted currently and can be added later if a need arises in the future. Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20240126041126.1927228-12-michael.roth@amd.com
2024-01-29x86/sev: Add helper functions for RMPUPDATE and PSMASH instructionBrijesh Singh
The RMPUPDATE instruction updates the access restrictions for a page via its corresponding entry in the RMP Table. The hypervisor will use the instruction to enforce various access restrictions on pages used for confidential guests and other specialized functionality. See APM3 for details on the instruction operations. The PSMASH instruction expands a 2MB RMP entry in the RMP table into a corresponding set of contiguous 4KB RMP entries while retaining the state of the validated bit from the original 2MB RMP entry. The hypervisor will use this instruction in cases where it needs to re-map a page as 4K rather than 2MB in a guest's nested page table. Add helpers to make use of these instructions. [ mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/20240126041126.1927228-11-michael.roth@amd.com
2024-01-29x86/fault: Add helper for dumping RMP entriesBrijesh Singh
This information will be useful for debugging things like page faults due to RMP access violations and RMPUPDATE failures. [ mdr: move helper to standalone patch, rework dump logic as suggested by Boris. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240126041126.1927228-8-michael.roth@amd.com
2024-01-29x86/sev: Add RMP entry lookup helpersBrijesh Singh
Add a helper that can be used to access information contained in the RMP entry corresponding to a particular PFN. This will be needed to make decisions on how to handle setting up mappings in the NPT in response to guest page-faults and handling things like cleaning up pages and setting them back to the default hypervisor-owned state when they are no longer being used for private data. [ mdr: separate 'assigned' indicator from return code, and simplify function signatures for various helpers. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240126041126.1927228-7-michael.roth@amd.com
2024-01-29x86/sev: Add SEV-SNP host initialization supportBrijesh Singh
The memory integrity guarantees of SEV-SNP are enforced through a new structure called the Reverse Map Table (RMP). The RMP is a single data structure shared across the system that contains one entry for every 4K page of DRAM that may be used by SEV-SNP VMs. The APM Volume 2 section on Secure Nested Paging (SEV-SNP) details a number of steps needed to detect/enable SEV-SNP and RMP table support on the host: - Detect SEV-SNP support based on CPUID bit - Initialize the RMP table memory reported by the RMP base/end MSR registers and configure IOMMU to be compatible with RMP access restrictions - Set the MtrrFixDramModEn bit in SYSCFG MSR - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR - Configure IOMMU RMP table entry format is non-architectural and it can vary by processor. It is defined by the PPR document for each respective CPU family. Restrict SNP support to CPU models/families which are compatible with the current RMP table entry format to guard against any undefined behavior when running on other system types. Future models/support will handle this through an architectural mechanism to allow for broader compatibility. SNP host code depends on CONFIG_KVM_AMD_SEV config flag which may be enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV instead of CONFIG_AMD_MEM_ENCRYPT. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Co-developed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://lore.kernel.org/r/20240126041126.1927228-5-michael.roth@amd.com