summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/svm/svm.c
AgeCommit message (Collapse)Author
2021-08-02KVM: SVM: Drop redundant clearing of vcpu->arch.hflags at INIT/RESETSean Christopherson
Drop redundant clears of vcpu->arch.hflags in init_vmcb() since kvm_vcpu_reset() always clears hflags, and it is also always zero at vCPU creation time. And of course, the second clearing in init_vmcb() was always redundant. Suggested-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-46-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Emulate #INIT in response to triple fault shutdownSean Christopherson
Emulate a full #INIT instead of simply initializing the VMCB if the guest hits a shutdown. Initializing the VMCB but not other vCPU state, much of which is mirrored by the VMCB, results in incoherent and broken vCPU state. Ideally, KVM would not automatically init anything on shutdown, and instead put the vCPU into e.g. KVM_MP_STATE_UNINITIALIZED and force userspace to explicitly INIT or RESET the vCPU. Even better would be to add KVM_MP_STATE_SHUTDOWN, since technically NMI can break shutdown (and SMI on Intel CPUs). But, that ship has sailed, and emulating #INIT is the next best thing as that has at least some connection with reality since there exist bare metal platforms that automatically INIT the CPU if it hits shutdown. Fixes: 46fe4ddd9dbb ("[PATCH] KVM: SVM: Propagate cpu shutdown events to userspace") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-45-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Move setting of sregs during vCPU RESET/INIT to common x86Sean Christopherson
Move the setting of CR0, CR4, EFER, RFLAGS, and RIP from vendor code to common x86. VMX and SVM now have near-identical sequences, the only difference being that VMX updates the exception bitmap. Updating the bitmap on SVM is unnecessary, but benign. Unfortunately it can't be left behind in VMX due to the need to update exception intercepts after the control registers are set. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-37-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Stuff save->dr6 at during VMSA sync, not at RESET/INITSean Christopherson
Move code to stuff vmcb->save.dr6 to its architectural init value from svm_vcpu_reset() into sev_es_sync_vmsa(). Except for protected guests, a.k.a. SEV-ES guests, vmcb->save.dr6 is set during VM-Enter, i.e. the extra write is unnecessary. For SEV-ES, stuffing save->dr6 handles a theoretical case where the VMSA could be encrypted before the first KVM_RUN. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Drop redundant writes to vmcb->save.cr4 at RESET/INITSean Christopherson
Drop direct writes to vmcb->save.cr4 during vCPU RESET/INIT, as the values being written are fully redundant with respect to svm_set_cr4(vcpu, 0) a few lines earlier. Note, svm_set_cr4() also correctly forces X86_CR4_PAE when NPT is disabled. No functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-32-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Tweak order of cr0/cr4/efer writes at RESET/INITSean Christopherson
Hoist svm_set_cr0() up in the sequence of register initialization during vCPU RESET/INIT, purely to match VMX so that a future patch can move the sequences to common x86. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Don't bother writing vmcb->save.rip at vCPU RESET/INITSean Christopherson
Drop unnecessary initialization of vmcb->save.rip during vCPU RESET/INIT, as svm_vcpu_run() unconditionally propagates VCPU_REGS_RIP to save.rip. No true functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Move EDX initialization at vCPU RESET to common codeSean Christopherson
Move the EDX initialization at vCPU RESET, which is now identical between VMX and SVM, into common code. No functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Consolidate APIC base RESET initialization codeSean Christopherson
Consolidate the APIC base RESET logic, which is currently spread out across both x86 and vendor code. For an in-kernel APIC, the vendor code is redundant. But for a userspace APIC, KVM relies on the vendor code to initialize vcpu->arch.apic_base. Hoist the vcpu->arch.apic_base initialization above the !apic check so that it applies to both flavors of APIC emulation, and delete the vendor code. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Drop explicit MMU reset at RESET/INITSean Christopherson
Drop an explicit MMU reset in SVM's vCPU RESET/INIT flow now that the common x86 path correctly handles conditional MMU resets, e.g. if INIT arrives while the vCPU is in 64-bit mode. This reverts commit ebae871a509d ("kvm: svm: reset mmu on VCPU reset"). Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Fall back to KVM's hardcoded value for EDX at RESET/INITSean Christopherson
At vCPU RESET/INIT (mostly RESET), stuff EDX with KVM's hardcoded, default Family-Model-Stepping ID of 0x600 if CPUID.0x1 isn't defined. At RESET, the CPUID lookup is guaranteed to "miss" because KVM emulates RESET before exposing the vCPU to userspace, i.e. userspace can't possibly have done set the vCPU's CPUID model, and thus KVM will always write '0'. At INIT, using 0x600 is less bad than using '0'. While initializing EDX to '0' is _extremely_ unlikely to be noticed by the guest, let alone break the guest, and can be overridden by userspace for the RESET case, using 0x600 is preferable as it will allow consolidating the relevant VMX and SVM RESET/INIT logic in the future. And, digging through old specs suggests that neither Intel nor AMD have ever shipped a CPU that initialized EDX to '0' at RESET. Regarding 0x600 as KVM's default Family, it is a sane default and in many ways the most appropriate. Prior to the 386 implementations, DX was undefined at RESET. With the 386, 486, 586/P5, and 686/P6/Athlon, both Intel and AMD set EDX to 3, 4, 5, and 6 respectively. AMD switched to using '15' as its primary Family with the introduction of AMD64, but Intel has continued using '6' for the last few decades. So, '6' is a valid Family for both Intel and AMD CPUs, is compatible with both 32-bit and 64-bit CPUs (albeit not a perfect fit for 64-bit AMD), and of the common Families (3 - 6), is the best fit with respect to KVM's virtual CPU model. E.g. prior to the P6, Intel CPUs did not have a STI window. Modern operating systems, Linux included, rely on the STI window, e.g. for "safe halt", and KVM unconditionally assumes the virtual CPU has an STI window. Thus enumerating a Family ID of 3, 4, or 5 would be provably wrong. Opportunistically remove a stale comment. Fixes: 66f7b72e1171 ("KVM: x86: Make register state after reset conform to specification") Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Require exact CPUID.0x1 match when stuffing EDX at INITSean Christopherson
Do not allow an inexact CPUID "match" when querying the guest's CPUID.0x1 to stuff EDX during INIT. In the common case, where the guest CPU model is an AMD variant, allowing an inexact match is a nop since KVM doesn't emulate Intel's goofy "out-of-range" logic for AMD and Hygon. If the vCPU model happens to be an Intel variant, an inexact match is possible if and only if the max CPUID leaf is precisely '0'. Aside from the fact that there's probably no CPU in existence with a single CPUID leaf, if the max CPUID leaf is '0', that means that CPUID.0.EAX is '0', and thus an inexact match for CPUID.0x1.EAX will also yield '0'. So, with lots of twisty logic, no functional change intended. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: SVM: Zero out GDTR.base and IDTR.base on INITSean Christopherson
Explicitly set GDTR.base and IDTR.base to zero when intializing the VMCB. Functionally this only affects INIT, as the bases are implicitly set to zero on RESET by virtue of the VMCB being zero allocated. Per AMD's APM, GDTR.base and IDTR.base are zeroed after RESET and INIT. Fixes: 04d2cc7780d4 ("KVM: Move main vcpu loop into subarch independent code") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-02KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VMSean Christopherson
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <0e8760a26151f47dc47052b25ca8b84fffe0641e.1625186503.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be ↵Maxim Levitsky
deactivated It is possible for AVIC inhibit and AVIC active state to be mismatched. Currently we disable AVIC right away on vCPU which started the AVIC inhibit request thus this warning doesn't trigger but at least in theory, if svm_set_vintr is called at the same time on multiple vCPUs, the warning can happen. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-27KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initializedPaolo Bonzini
Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect dereference of vmcb->control.reserved_sw before the vmcb is checked for being non-NULL. The compiler is usually sinking the dereference after the check; instead of doing this ourselves in the source, ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called with a non-NULL VMCB. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Vineeth Pillai <viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Untested for now due to issues with my AMD machine. - Paolo]
2021-07-26KVM: nSVM: Swap the parameter order for ↵Vitaly Kuznetsov
svm_copy_vmrun_state()/svm_copy_vmloadsave_state() Make svm_copy_vmrun_state()/svm_copy_vmloadsave_state() interface match 'memcpy(dest, src)' to avoid any confusion. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210719090322.625277-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-26KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state()Vitaly Kuznetsov
To match svm_copy_vmrun_state(), rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state(). Opportunistically add missing braces to 'else' branch in vmload_vmsave_interception(). No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210716144104.465269-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15KVM: nSVM: Restore nested control upon leaving SMMVitaly Kuznetsov
If the VM was migrated while in SMM, no nested state was saved/restored, and therefore svm_leave_smm has to load both save and control area of the vmcb12. Save area is already loaded from HSAVE area, so now load the control area as well from the vmcb12. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15KVM: nSVM: Fix L1 state corruption upon return from SMMVitaly Kuznetsov
VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") broke return from SMM when we entered there from guest (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem manifests itself like this: kvm_exit: reason EXIT_RSM rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x7ffb3000 kvm_nested_vmrun: rip: 0x000000007ffbb280 vmcb: 0x0000000008224000 nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000 npt: on kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002 intercepts: fd44bfeb 0000217f 00000000 kvm_entry: vcpu 0, rip 0xffffffffffbbe119 kvm_exit: reason EXIT_NPF rip 0xffffffffffbbe119 info 200000006 1ab000 kvm_nested_vmexit: vcpu 0 reason npf rip 0xffffffffffbbe119 info1 0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000 error_code 0x00000000 kvm_page_fault: address 1ab000 error_code 6 kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000 int_info 0 int_info_err 0 kvm_entry: vcpu 0, rip 0x7ffbb280 kvm_exit: reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_inj_exception: #GP (0x0) Note: return to L2 succeeded but upon first exit to L1 its RIP points to 'RSM' instruction but we're not in SMM. The problem appears to be that VMCB01 gets irreversibly destroyed during SMM execution. Previously, we used to have 'hsave' VMCB where regular (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just switch to VMCB01 from VMCB02. Pre-split (working) flow looked like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() restores L1's state from 'hsave' - SMM -> RSM - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have pre-SMM (and pre L2 VMRUN) L1's state there - L2's state is restored from SMRAM - upon first exit L1's state is restored from L1. This was always broken with regards to svm_get_nested_state()/ svm_set_nested_state(): 'hsave' was never a part of what's being save and restored so migration happening during SMM triggered from L2 would never restore L1's state correctly. Post-split flow (broken) looks like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() switches to VMCB01 from VMCB02 - SMM -> RSM - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01 is already lost. - L2's state is restored from SMRAM - upon first exit L1's state is restored from VMCB01 but it is corrupted (reflects the state during 'RSM' execution). VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest and host state so when we switch back to VMCS02 L1's state is intact there. To resolve the issue we need to save L1's state somewhere. We could've created a third VMCB for SMM but that would require us to modify saved state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA) seems appropriate: L0 is free to save any (or none) of L1's state there. Currently, KVM does 'none'. Note, for nested state migration to succeed, both source and destination hypervisors must have the fix. We, however, don't need to create a new flag indicating the fact that HSAVE area is now populated as migration during SMM triggered from L2 was always broken. Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15KVM: nSVM: Check the value written to MSR_VM_HSAVE_PAVitaly Kuznetsov
APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when the supplied address is not page-aligned or is outside of "maximum supported physical address for this implementation". page_address_valid() check seems suitable. Also, forcefully page-align the address when it's written from VMM. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> [Add comment about behavior for host-provided values. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15KVM: SVM: add module param to control the #SMI interceptionMaxim Levitsky
In theory there are no side effects of not intercepting #SMI, because then #SMI becomes transparent to the OS and the KVM. Plus an observation on recent Zen2 CPUs reveals that these CPUs ignore #SMI interception and never deliver #SMI VMexits. This is also useful to test nested KVM to see that L1 handles #SMIs correctly in case when L1 doesn't intercept #SMI. Finally the default remains the same, the SMI are intercepted by default thus this patch doesn't have any effect unless non default module param value is used. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15KVM: SVM: remove INIT intercept handlerMaxim Levitsky
Kernel never sends real INIT even to CPUs, other than on boot. Thus INIT interception is an error which should be caught by a check for an unknown VMexit reason. On top of that, the current INIT VM exit handler skips the current instruction which is wrong. That was added in commit 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code"). Fixes: 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-15KVM: SVM: #SMI interception must not skip the instructionMaxim Levitsky
Commit 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code"), unfortunately made a mistake of treating nop_on_interception and nop_interception in the same way. Former does truly nothing while the latter skips the instruction. SMI VM exit handler should do nothing. (SMI itself is handled by the host when we do STGI) Fixes: 5ff3a351f687 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-14KVM: SVM: Revert clearing of C-bit on GPA in #NPF handlerSean Christopherson
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for non-SEV guests, and for SEV guests the C-bit is dropped before the GPA hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM to mishandle #NPFs with that collide with the host's C-bit. Although the APM doesn't explicitly state that the C-bit is not reserved for non-SEV, Tom Lendacky confirmed that the following snippet about the effective reduction due to the C-bit does indeed apply only to SEV guests. Note that because guest physical addresses are always translated through the nested page tables, the size of the guest physical address space is not impacted by any physical address space reduction indicated in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however, the guest physical address space is effectively reduced by 1 bit. And for SEV guests, the APM clearly states that the bit is dropped before walking the nested page tables. If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Consequently, the hypervisor does not need to be aware of which pages the guest has chosen to mark private. Note, the bogus C-bit clearing was removed from legacy #PF handler in commit 6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception"). Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address") Cc: Peter Gonda <pgonda@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210625020354.431829-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24KVM: x86: Print CPU of last attempted VM-entry when dumping VMCS/VMCBJim Mattson
Failed VM-entry is often due to a faulty core. To help identify bad cores, print the id of the last logical processor that attempted VM-entry whenever dumping a VMCS or VMCB. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20210621221648.1833148-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-23Merge branch 'topic/ppc-kvm' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes
2021-06-18KVM: SVM: Refuse to load kvm_amd if NX support is not availableSean Christopherson
Refuse to load KVM if NX support is not available. Shadow paging has assumed NX support since commit 9167ab799362 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active"), and NPT has assumed NX support since commit b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation"). While the NX huge pages mitigation should not be enabled by default for AMD CPUs, it can be turned on by userspace at will. Unlike Intel CPUs, AMD does not provide a way for firmware to disable NX support, and Linux always sets EFER.NX=1 if it is supported. Given that it's extremely unlikely that a CPU supports NPT but not NX, making NX a formal requirement is far simpler than adding requirements to the mitigation flow. Fixes: 9167ab799362 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active") Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210615164535.2146172-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: introduce kvm_register_clear_availableMaxim Levitsky
Small refactoring that will be used in the next patch. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: SVM: hyper-v: Direct Virtual Flush supportVineeth Pillai
From Hyper-V TLFS: "The hypervisor exposes hypercalls (HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressList, and HvFlushVirtualAddressListEx) that allow operating systems to more efficiently manage the virtual TLB. The L1 hypervisor can choose to allow its guest to use those hypercalls and delegate the responsibility to handle them to the L0 hypervisor. This requires the use of a partition assist page." Add the Direct Virtual Flush support for SVM. Related VMX changes: commit 6f6a657c9998 ("KVM/Hyper-V/VMX: Add direct tlb flush support") Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <fc8d24d8eb7017266bb961e39a171b0caf298d7f.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: SVM: hyper-v: Enlightened MSR-Bitmap supportVineeth Pillai
Enlightened MSR-Bitmap as per TLFS: "The L1 hypervisor may collaborate with the L0 hypervisor to make MSR accesses more efficient. It can enable enlightened MSR bitmaps by setting the corresponding field in the enlightened VMCS to 1. When enabled, L0 hypervisor does not monitor the MSR bitmaps for changes. Instead, the L1 hypervisor must invalidate the corresponding clean field after making changes to one of the MSR bitmaps." Enable this for SVM. Related VMX changes: commit ceef7d10dfb6 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support") Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <87df0710f95d28b91cc4ea014fc4d71056eebbee.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: SVM: hyper-v: Remote TLB flush for SVMVineeth Pillai
Enable remote TLB flush for SVM. Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Message-Id: <1ee364e397e142aed662d2920d198cd03772f1a5.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: nVMX: nSVM: 'nested_run' should count guest-entry attempts that make it ↵Krish Sadhukhan
to guest code Currently, the 'nested_run' statistic counts all guest-entry attempts, including those that fail during vmentry checks on Intel and during consistency checks on AMD. Convert this statistic to count only those guest-entries that make it past these state checks and make it to guest code. This will tell us the number of guest-entries that actually executed or tried to execute guest code. Signed-off-by: Krish Sadhukhan <Krish.Sadhukhan@oracle.com> Message-Id: <20210609180340.104248-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Drop "pre_" from enter/leave_smm() helpersSean Christopherson
Now that .post_leave_smm() is gone, drop "pre_" from the remaining helpers. The helpers aren't invoked purely before SMI/RSM processing, e.g. both helpers are invoked after state is snapshotted (from regs or SMRAM), and the RSM helper is invoked after some amount of register state has been stuffed. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Drop vendor specific functions for APICv/AVIC enablementVitaly Kuznetsov
Now that APICv/AVIC enablement is kept in common 'enable_apicv' variable, there's no need to call kvm_apicv_init() from vendor specific code. No functional change intended. Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210609150911.1471882-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Use common 'enable_apicv' variable for both APICv and AVICVitaly Kuznetsov
Unify VMX and SVM code by moving APICv/AVIC enablement tracking to common 'enable_apicv' variable. Note: unlike APICv, AVIC is disabled by default. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210609150911.1471882-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: X86: Add vendor callbacks for writing the TSC multiplierIlias Stamatis
Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the VMCS every time the VMCS is loaded. Instead of doing this, set this field from common code on initialization and whenever the scaling ratio changes. Additionally remove vmx->current_tsc_ratio. This field is redundant as vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling ratio. The vmx->current_tsc_ratio field is only used for avoiding unnecessary writes but it is no longer needed after removing the code from the VMCS load path. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Message-Id: <20210607105438.16541-1-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: X86: Move write_l1_tsc_offset() logic to common code and rename itIlias Stamatis
The write_l1_tsc_offset() callback has a misleading name. It does not set L1's TSC offset, it rather updates the current TSC offset which might be different if a nested guest is executing. Additionally, both the vmx and svm implementations use the same logic for calculating the current TSC before writing it to hardware. Rename the function and move the common logic to the caller. The vmx/svm specific code now merely sets the given offset to the corresponding hardware structure. Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-9-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: X86: Add functions for retrieving L2 TSC fields from common codeIlias Stamatis
In order to implement as much of the nested TSC scaling logic as possible in common code, we need these vendor callbacks for retrieving the TSC offset and the TSC multiplier that L1 has set for L2. Signed-off-by: Ilias Stamatis <ilstam@amazon.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-7-ilstam@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-24KVM: SVM: Drop unneeded CONFIG_X86_LOCAL_APIC checkVitaly Kuznetsov
AVIC dependency on CONFIG_X86_LOCAL_APIC is dead code since commit e42eef4ba388 ("KVM: add X86_LOCAL_APIC dependency"). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210518144339.1987982-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com>
2021-05-17Merge tag 'kvmarm-fixes-5.13-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.13, take #1 - Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code
2021-05-16Merge tag 'x86_urgent_for_v5.13_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The three SEV commits are not really urgent material. But we figured since getting them in now will avoid a huge amount of conflicts between future SEV changes touching tip, the kvm and probably other trees, sending them to you now would be best. The idea is that the tip, kvm etc branches for 5.14 will all base ontop of -rc2 and thus everything will be peachy. What is more, those changes are purely mechanical and defines movement so they should be fine to go now (famous last words). Summary: - Enable -Wundef for the compressed kernel build stage - Reorganize SEV code to streamline and simplify future development" * tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/compressed: Enable -Wundef x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG x86/sev: Move GHCB MSR protocol and NAE definitions in a common header x86/sev-es: Rename sev-es.{ch} to sev.{ch}
2021-05-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: - Lots of bug fixes. - Fix virtualization of RDPID - Virtualization of DR6_BUS_LOCK, which on bare metal is new to this release - More nested virtualization migration fixes (nSVM and eVMCS) - Fix for KVM guest hibernation - Fix for warning in SEV-ES SRCU usage - Block KVM from loading on AMD machines with 5-level page tables, due to the APM not mentioning how host CR4.LA57 exactly impacts the guest. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (48 commits) KVM: SVM: Move GHCB unmapping to fix RCU warning KVM: SVM: Invert user pointer casting in SEV {en,de}crypt helpers kvm: Cap halt polling at kvm->max_halt_poll_ns tools/kvm_stat: Fix documentation typo KVM: x86: Prevent deadlock against tk_core.seq KVM: x86: Cancel pvclock_gtod_work on module removal KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging KVM: X86: Expose bus lock debug exception to guest KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model KVM: x86: Move uret MSR slot management to common x86 KVM: x86: Export the number of uret MSRs to vendor modules KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way KVM: VMX: Use common x86's uret MSR list as the one true list KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list KVM: VMX: Configure list of user return MSRs at module init KVM: x86: Add support for RDPID without RDTSCP KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host ...
2021-05-10x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFGBrijesh Singh
The SYSCFG MSR continued being updated beyond the K8 family; drop the K8 name from it. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com
2021-05-07KVM: SVM: Move GHCB unmapping to fix RCU warningTom Lendacky
When an SEV-ES guest is running, the GHCB is unmapped as part of the vCPU run support. However, kvm_vcpu_unmap() triggers an RCU dereference warning with CONFIG_PROVE_LOCKING=y because the SRCU lock is released before invoking the vCPU run support. Move the GHCB unmapping into the prepare_guest_switch callback, which is invoked while still holding the SRCU lock, eliminating the RCU dereference warning. Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <b2f9b79d15166f2c3e4375c0d9bc3268b7696455.1620332081.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07KVM: x86: Prevent KVM SVM from loading on kernels with 5-level pagingSean Christopherson
Disallow loading KVM SVM if 5-level paging is supported. In theory, NPT for L1 should simply work, but there unknowns with respect to how the guest's MAXPHYADDR will be handled by hardware. Nested NPT is more problematic, as running an L1 VMM that is using 2-level page tables requires stacking single-entry PDP and PML4 tables in KVM's NPT for L2, as there are no equivalent entries in L1's NPT to shadow. Barring hardware magic, for 5-level paging, KVM would need stack another layer to handle PML5. Opportunistically rename the lm_root pointer, which is used for the aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to call out that it's specifically for PML4. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210505204221.1934471-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU modelSean Christopherson
Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to the guest CPU model instead of the host CPU behavior. While not strictly necessary to avoid guest breakage, emulating cross-vendor "architecture" will provide consistent behavior for the guest, e.g. WRMSR fault behavior won't change if the vCPU is migrated to a host with divergent behavior. Note, the "new" kvm_is_supported_user_return_msr() checks do not add new functionality on either SVM or VMX. On SVM, the equivalent was "tsc_aux_uret_slot < 0", and on VMX the check was buried in the vmx_find_uret_msr() call at the find_uret_msr label. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07KVM: x86: Move uret MSR slot management to common x86Sean Christopherson
Now that SVM and VMX both probe MSRs before "defining" user return slots for them, consolidate the code for probe+define into common x86 and eliminate the odd behavior of having the vendor code define the slot for a given MSR. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07KVM: x86: Add support for RDPID without RDTSCPSean Christopherson
Allow userspace to enable RDPID for a guest without also enabling RDTSCP. Aside from checking for RDPID support in the obvious flows, VMX also needs to set ENABLE_RDTSCP=1 when RDPID is exposed. For the record, there is no known scenario where enabling RDPID without RDTSCP is desirable. But, both AMD and Intel architectures allow for the condition, i.e. this is purely to make KVM more architecturally accurate. Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Reported-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in hostSean Christopherson
Probe MSR_TSC_AUX whether or not RDTSCP is supported in the host, and if probing succeeds, load the guest's MSR_TSC_AUX into hardware prior to VMRUN. Because SVM doesn't support interception of RDPID, RDPID cannot be disallowed in the guest (without resorting to binary translation). Leaving the host's MSR_TSC_AUX in hardware would leak the host's value to the guest if RDTSCP is not supported. Note, there is also a kernel bug that prevents leaking the host's value. The host kernel initializes MSR_TSC_AUX if and only if RDTSCP is supported, even though the vDSO usage consumes MSR_TSC_AUX via RDPID. I.e. if RDTSCP is not supported, there is no host value to leak. But, if/when the host kernel bug is fixed, KVM would start leaking MSR_TSC_AUX in the case where hardware supports RDPID but RDTSCP is unavailable for whatever reason. Probing MSR_TSC_AUX will also allow consolidating the probe and define logic in common x86, and will make it simpler to condition the existence of MSR_TSX_AUX (from the guest's perspective) on RDTSCP *or* RDPID. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>