summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx
AgeCommit message (Collapse)Author
2021-04-13KVM: VMX: Don't use vcpu->run->internal.ndata as an array indexReiji Watanabe
__vmx_handle_exit() uses vcpu->run->internal.ndata as an index for an array access. Since vcpu->run is (can be) mapped to a user address space with a writer permission, the 'ndata' could be updated by the user process at anytime (the user process can set it to outside the bounds of the array). So, it is not safe that __vmx_handle_exit() uses the 'ndata' that way. Fixes: 1aa561b1a4c0 ("kvm: x86: Add "last CPU" to some KVM_EXIT information") Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210413154739.490299-1-reijiw@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-21x86: Fix various typos in comments, take #2Ingo Molnar
Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-18x86: Fix various typos in commentsIngo Molnar
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-15KVM: VMX: Track root HPA instead of EPTP for paravirt Hyper-V TLB flushSean Christopherson
Track the address of the top-level EPT struct, a.k.a. the root HPA, instead of the EPTP itself for Hyper-V's paravirt TLB flush. The paravirt API takes only the address, not the full EPTP, and in theory tracking the EPTP could lead to false negatives, e.g. if the HPA matched but the attributes in the EPTP do not. In practice, such a mismatch is extremely unlikely, if not flat out impossible, given how KVM generates the EPTP. Opportunsitically rename the related fields to use the 'root' nomenclature, and to prefix them with 'hv_' to connect them to Hyper-V's paravirt TLB flushing. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Skip additional Hyper-V TLB EPTP flushes if one failsSean Christopherson
Skip additional EPTP flushes if one fails when processing EPTPs for Hyper-V's paravirt TLB flushing. If _any_ flush fails, KVM falls back to a full global flush, i.e. additional flushes are unnecessary (and will likely fail anyways). Continue processing the loop unless a mismatch was already detected, e.g. to handle the case where the first flush fails and there is a yet-to-be-detected mismatch. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is enabledSean Christopherson
Ifdef away the Hyper-V specific fields in structs kvm_vmx and vcpu_vmx as each field has only a single reference outside of the struct itself that isn't already wrapped in ifdeffery (and both are initialization). vcpu_vmx.ept_pointer in particular should be wrapped as it is valid if and only if Hyper-v is active, i.e. non-Hyper-V code cannot rely on it to actually track the current EPTP (without additional code changes). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgdSean Christopherson
Explicitly check that kvm_x86_ops.tlb_remote_flush() points at Hyper-V's implementation for PV flushing instead of assuming that a non-NULL implementation means running on Hyper-V. Wrap the related logic in ifdeffery as hv_remote_flush_tlb() is defined iff CONFIG_HYPERV!=n. Short term, the explicit check makes it more obvious why a non-NULL tlb_remote_flush() triggers EPTP shenanigans. Long term, this will allow TDX to define its own implementation of tlb_remote_flush() without running afoul of Hyper-V. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matchesSean Christopherson
Don't invalidate the common EPTP, and thus trigger rechecking of EPTPs across all vCPUs, if the new EPTP matches the old/common EPTP. In all likelihood this is a meaningless optimization, but there are (uncommon) scenarios where KVM can reload the same EPTP. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatchSean Christopherson
Drop the dedicated 'ept_pointers_match' field in favor of stuffing 'hv_tlb_eptp' with INVALID_PAGE to mark it as invalid, i.e. to denote that there is at least one EPTP mismatch. Use a local variable to track whether or not a mismatch is detected so that hv_tlb_eptp can be used to skip redundant flushes. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Do Hyper-V TLB flush iff vCPU's EPTP hasn't been flushedSean Christopherson
Combine the for-loops for Hyper-V TLB EPTP checking and flushing, and in doing so skip flushes for vCPUs whose EPTP matches the target EPTP. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Fold Hyper-V EPTP checking into it's only callerSean Christopherson
Fold check_ept_pointer_match() into hv_remote_flush_tlb_with_range() in preparation for combining the kvm_for_each_vcpu loops of the ==CHECK and !=MATCH statements. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB flushSean Christopherson
Capture kvm_vmx in a local variable instead of polluting hv_remote_flush_tlb_with_range() with to_kvm_vmx(kvm). No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flushSean Christopherson
Explicitly track the EPTP that is common to all vCPUs instead of grabbing vCPU0's EPTP when invoking Hyper-V's paravirt TLB flush. Tracking the EPTP will allow optimizing the checks when loading a new EPTP and will also allow dropping ept_pointer_match, e.g. by marking the common EPTP as invalid. This also technically fixes a bug where KVM could theoretically flush an invalid GPA if all vCPUs have an invalid root. In practice, it's likely impossible to trigger a remote TLB flush in such a scenario. In any case, the superfluous flush is completely benign. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Get active PCID only when writing a CR3 valueSean Christopherson
Retrieve the active PCID only when writing a guest CR3 value, i.e. don't get the PCID when using EPT or NPT. The PCID is especially problematic for EPT as the bits have different meaning, and so the PCID and must be manually stripped, which is annoying and unnecessary. And on VMX, getting the active PCID also involves reading the guest's CR3 and CR4.PCIDE, i.e. may add pointless VMREADs. Opportunistically rename the pgd/pgd_level params to root_hpa and root_level to better reflect their new roles. Keep the function names, as "load the guest PGD" is still accurate/correct. Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an unsigned long. The EPTP holds a 64-bit value, even in 32-bit mode, so in theory EPT could support HIGHMEM for 32-bit KVM. Never mind that doing so would require changing the MMU page allocators and reworking the MMU to use kmap(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU properSean Christopherson
Let the MMU deal with the SPTE masks to avoid splitting the logic and knowledge across the MMU and VMX. The SPTE masks that are used for EPT are very, very tightly coupled to the MMU implementation. The use of available bits, the existence of A/D types, the fact that shadow_x_mask even exists, and so on and so forth are all baked into the MMU implementation. Cross referencing the params to the masks is also a nightmare, as pretty much every param is a u64. A future patch will make the location of the MMU_WRITABLE and HOST_WRITABLE bits MMU specific, to free up bit 11 for a MMU_PRESENT bit. Doing that change with the current kvm_mmu_set_mask_ptes() would be an absolute mess. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86/mmu: Co-locate code for setting various SPTE masksSean Christopherson
Squish all the code for (re)setting the various SPTE masks into one location. With the split code, it's not at all clear that the masks are set once during module initialization. This will allow a future patch to clean up initialization of the masks without shuffling code all over tarnation. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEsSean Christopherson
Stop tagging MMIO SPTEs with specific available bits and instead detect MMIO SPTEs by checking for their unique SPTE value. The value is guaranteed to be unique on shadow paging and NPT as setting reserved physical address bits on any other type of SPTE would consistute a KVM bug. Ditto for EPT, as creating a WX non-MMIO would also be a bug. Note, this approach is also future-compatibile with TDX, which will need to reflect MMIO EPT violations as #VEs into the guest. To create an EPT violation instead of a misconfig, TDX EPTs will need to have RWX=0, But, MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so MMIO SPTEs will still be guaranteed to have a unique value within a given MMU context. The main motivation is to make it easier to reason about which types of SPTEs use which available bits. As a happy side effect, this frees up two more bits for storing the MMIO generation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move RDPMC emulation to common codeSean Christopherson
Move the entirety of the accelerated RDPMC emulation to x86.c, and assign the common handler directly to the exit handler array for VMX. SVM has bizarre nrips behavior that prevents it from directly invoking the common handler. The nrips goofiness will be addressed in a future patch. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move trivial instruction-based exit handlers to common codeSean Christopherson
Move the trivial exit handlers, e.g. for instructions that KVM "emulates" as nops, to common x86 code. Assign the common handlers directly to the exit handler arrays and drop the vendor trampolines. Opportunistically use pr_warn_once() where appropriate. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move XSETBV emulation to common codeSean Christopherson
Move the entirety of XSETBV emulation to x86.c, and assign the function directly to both VMX's and SVM's exit handlers, i.e. drop the unnecessary trampolines. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Handle triple fault in L2 without killing L1Sean Christopherson
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move nVMX's consistency check macro to common codeSean Christopherson
Move KVM's CC() macro to x86.h so that it can be reused by nSVM. Debugging VM-Enter is as painful on SVM as it is on VMX. Rename the more visible macro to KVM_NESTED_VMENTER_CONSISTENCY_CHECK to avoid any collisions with the uber-concise "CC". Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switchSean Christopherson
Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC instruction itself is executed in the previous EPTP context, any side effects, e.g. updating RIP, should occur in the old context. Practically speaking, this bug is benign as VMX doesn't touch the MMU when skipping an emulated instruction, nor does queuing a single-step #DB. No other post-switch side effects exist. Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: to track if L1 is running L2 VMDongli Zhang
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit 88dddc11a8d6 ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-10x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" caseSean Christopherson
Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop" case. Patching in perf_guest_get_msrs_nop() during setup does not work if there is no PMU, as setup bails before updating the static calls, leaving x86_pmu.guest_get_msrs NULL and thus a complete nop. Ultimately, this causes VMX abort on VM-Exit due to KVM putting random garbage from the stack into the MSR load list. Add a comment in KVM to note that nr_msrs is valid if and only if the return value is non-NULL. Fixes: abd562df94d1 ("x86/perf: Use static_call for x86_pmu.guest_get_msrs") Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: syzbot+cce9ef2dd25246f815ee@syzkaller.appspotmail.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210309171019.1125243-1-seanjc@google.com
2021-02-23KVM: vmx/pmu: Fix dummy check if lbr_desc->event is createdLike Xu
If lbr_desc->event is successfully created, the intel_pmu_create_ guest_lbr_event() will return 0, otherwise it will return -ENOENT, and then jump to LBR msrs dummy handling. Fixes: 1b5ac3226a1a ("KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE") Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210223013958.1280444-1-like.xu@linux.intel.com> [Add "< 0" and PTR_ERR to make the code clearer. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: VMX: Dynamically enable/disable PML based on memslot dirty loggingMakarand Sonare
Currently, if enable_pml=1 PML remains enabled for the entire lifetime of the VM irrespective of whether dirty logging is enable or disabled. When dirty logging is disabled, all the pages of the VM are manually marked dirty, so that PML is effectively non-operational. Setting the dirty bits is an expensive operation which can cause severe MMU lock contention in a performance sensitive path when dirty logging is disabled after a failed or canceled live migration. Manually setting dirty bits also fails to prevent PML activity if some code path clears dirty bits, which can incur unnecessary VM-Exits. In order to avoid this extra overhead, dynamically enable/disable PML when dirty logging gets turned on/off for the first/last memslot. Signed-off-by: Makarand Sonare <makarandsonare@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: x86: Move MMU's PML logic to common codeSean Christopherson
Drop the facade of KVM's PML logic being vendor specific and move the bits that aren't truly VMX specific into common x86 code. The MMU logic for dealing with PML is tightly coupled to the feature and to VMX's implementation, bouncing through kvm_x86_ops obfuscates the code without providing any meaningful separation of concerns or encapsulation. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: x86/mmu: Make dirty log size hook (PML) a value, not a functionSean Christopherson
Store the vendor-specific dirty log size in a variable, there's no need to wrap it in a function since the value is constant after hardware_setup() runs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: nVMX: Disable PML in hardware when running L2Sean Christopherson
Unconditionally disable PML in vmcs02, KVM emulates PML purely in the MMU, e.g. vmx_flush_pml_buffer() doesn't even try to copy the L2 GPAs from vmcs02's buffer to vmcs12. At best, enabling PML is a nop. At worst, it will cause vmx_flush_pml_buffer() to record bogus GFNs in the dirty logs. Initialize vmcs02.GUEST_PML_INDEX such that PML writes would trigger VM-Exit if PML was somehow enabled, skip flushing the buffer for guest mode since the index is bogus, and freak out if a PML full exit occurs when L2 is active. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-18KVM: nVMX: no need to undo inject_page_fault change on nested vmexitPaolo Bonzini
This is not needed because the tweak was done on the guest_mmu, while nested_ept_uninit_mmu_context has just changed vcpu->arch.walk_mmu back to the root_mmu. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-18KVM: VMX: read idt_vectoring_info a bit earlierMaxim Levitsky
trace_kvm_exit prints this value (using vmx_get_exit_info) so it makes sense to read it before the trace point. Fixes: dcf068da7eb2 ("KVM: VMX: Introduce generic fastpath handler") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210217145718.1217358-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-18KVM: VMX: Allow INVPCID in guest without PCIDSean Christopherson
Remove the restriction that prevents VMX from exposing INVPCID to the guest without PCID also being exposed to the guest. The justification of the restriction is that INVPCID will #UD if it's disabled in the VMCS. While that is a true statement, it's also true that RDTSCP will #UD if it's disabled in the VMCS. Neither of those things has any dependency whatsoever on the guest being able to set CR4.PCIDE=1, which is what is effectively allowed by exposing PCID to the guest. Removing the bogus restriction aligns VMX with SVM, and also allows for an interesting configuration. INVPCID is that fastest way to do a global TLB flush, e.g. see native_flush_tlb_global(). Allowing INVPCID without PCID would let a guest use the expedited flush while also limiting the number of ASIDs consumed by the guest. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210212003411.1102677-4-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-18KVM: x86: Advertise INVPCID by defaultSean Christopherson
Advertise INVPCID by default (if supported by the host kernel) instead of having both SVM and VMX opt in. INVPCID was opt in when it was a VMX only feature so that KVM wouldn't prematurely advertise support if/when it showed up in the kernel on AMD hardware. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210212003411.1102677-3-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V contextVitaly Kuznetsov
Currently, Hyper-V context is part of 'struct kvm_vcpu_arch' and is always available. As a preparation to allocating it dynamically, check that it is not NULL at call sites which can normally proceed without it i.e. the behavior is identical to the situation when Hyper-V emulation is not being used by the guest. When Hyper-V context for a particular vCPU is not allocated, we may still need to get 'vp_index' from there. E.g. in a hypothetical situation when Hyper-V emulation was enabled on one CPU and wasn't on another, Hyper-V style send-IPI hypercall may still be used. Luckily, vp_index is always initialized to kvm_vcpu_get_idx() and can only be changed when Hyper-V context is present. Introduce kvm_hv_get_vpindex() helper for simplification. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-12-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Always use to_hv_vcpu() accessor to get to 'struct ↵Vitaly Kuznetsov
kvm_vcpu_hv' As a preparation to allocating Hyper-V context dynamically, make it clear who's the user of the said context. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Introduce to_kvm_hv() helperVitaly Kuznetsov
Spelling '&kvm->arch.hyperv' correctly is hard. Also, this makes the code more consistent with vmx/svm where to_kvm_vmx()/to_kvm_svm() are already being used. Opportunistically change kvm_hv_msr_{get,set}_crash_{data,ctl}() and kvm_hv_msr_set_crash_data() to take 'kvm' instead of 'vcpu' as these MSRs are partition wide. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: move kvm_inject_gp up from kvm_set_dr to callersPaolo Bonzini
Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. __kvm_set_dr is pretty much what the callers can use together with kvm_complete_insn_gp, so rename it to kvm_set_dr and drop the old kvm_set_dr wrapper. This also allows nested VMX code, which really wanted to use __kvm_set_dr, to use the right function. While at it, remove the kvm_require_dr() check from the SVM interception. The APM states: All normal exception checks take precedence over the SVM intercepts. which includes the CR4.DE=1 #UD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: reading DR cannot failPaolo Bonzini
kvm_get_dr and emulator_get_dr except an in-range value for the register number so they cannot fail. Change the return type to void. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: VMX: Use GPA legality helpers to replace open coded equivalentsSean Christopherson
Replace a variety of open coded GPA checks with the recently introduced common helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: move kvm_inject_gp up from kvm_set_xcr to callersPaolo Bonzini
Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: move EXIT_FASTPATH_REENTER_GUEST to common codePaolo Bonzini
Now that KVM is using static calls, calling vmx_vcpu_run and vmx_sync_pir_to_irr does not incur anymore the cost of a retpoline. Therefore there is no need anymore to handle EXIT_FASTPATH_REENTER_GUEST in vendor code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: VMX: Use the kernel's version of VMXOFFSean Christopherson
Drop kvm_cpu_vmxoff() in favor of the kernel's cpu_vmxoff(). Modify the latter to return -EIO on fault so that KVM can invoke kvm_spurious_fault() when appropriate. In addition to the obvious code reuse, dropping kvm_cpu_vmxoff() also eliminates VMX's last usage of the __ex()/__kvm_handle_fault_on_reboot() macros, thus helping pave the way toward dropping them entirely. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: VMX: Move Intel PT shenanigans out of VMXON/VMXOFF flowsSean Christopherson
Move the Intel PT tracking outside of the VMXON/VMXOFF helpers so that a future patch can drop KVM's kvm_cpu_vmxoff() in favor of the kernel's cpu_vmxoff() without an associated PT functional change, and without losing symmetry between the VMXON and VMXOFF flows. Barring undocumented behavior, this should have no meaningful effects as Intel PT behavior does not interact with CR4.VMXE. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM/nVMX: Use __vmx_vcpu_run in nested_vmx_check_vmentry_hwUros Bizjak
Replace inline assembly in nested_vmx_check_vmentry_hw with a call to __vmx_vcpu_run. The function is not performance critical, so (double) GPR save/restore in __vmx_vcpu_run can be tolerated, as far as performance effects are concerned. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Reviewed-and-tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> [sean: dropped versioning info from changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: X86: prepend vmx/svm prefix to additional kvm_x86_ops functionsJason Baron
A subsequent patch introduces macros in preparation for simplifying the definition for vmx_x86_ops and svm_x86_ops. Making the naming more uniform expands the coverage of the macros. Add vmx/svm prefix to the following functions: update_exception_bitmap(), enable_nmi_window(), enable_irq_window(), update_cr8_intercept and enable_smi_window(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Message-Id: <ed594696f8e2c2b2bfc747504cee9bbb2a269300.1610680941.git.jbaron@akamai.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOWChenyi Qiang
DR6_INIT contains the 1-reserved bits as well as the bit that is cleared to 0 when the condition (e.g. RTM) happens. The value can be used to initialize dr6 and also be the XOR mask between the #DB exit qualification (or payload) and DR6. Concerning that DR6_INIT is used as initial value only once, rename it to DR6_ACTIVE_LOW and apply it in other places, which would make the incoming changes for bus lock debug exception more simple. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com> [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIESLike Xu
Userspace could enable guest LBR feature when the exactly supported LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES and the LBR is also compatible with vPMU version and host cpu model. The LBR could be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-11-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: vmx/pmu: Release guest LBR event via lazy release mechanismLike Xu
The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to indicate whether a guest LBR event is still needed by the vcpu. If the vcpu no longer accesses LBR related registers within a scheduling time slice, and the enable bit of LBR has been unset, vPMU will treat the guest LBR event as a bland event of a vPMC counter and release it as usual. Also, the pass-through state of LBR records msrs is cancelled. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-10-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMILike Xu
The current vPMU only supports Architecture Version 2. According to Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual PMI and the KVM would emulate to clear the LBR bit (bit 0) in IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR to resume recording branches. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-9-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>