summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx/vmx.c
AgeCommit message (Collapse)Author
2020-10-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Two fixes for this merge window, and an unrelated bugfix for a host hang" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: ioapic: break infinite recursion on lazy EOI KVM: vmx: rename pi_init to avoid conflict with paride KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build
2020-10-24KVM: vmx: rename pi_init to avoid conflict with paridePaolo Bonzini
allyesconfig results in: ld: drivers/block/paride/paride.o: in function `pi_init': (.text+0x1340): multiple definition of `pi_init'; arch/x86/kvm/vmx/posted_intr.o:posted_intr.c:(.init.text+0x0): first defined here make: *** [Makefile:1164: vmlinux] Error 1 because commit: commit 8888cdd0996c2d51cd417f9a60a282c034f3fa28 Author: Xiaoyao Li <xiaoyao.li@intel.com> Date: Wed Sep 23 11:31:11 2020 -0700 KVM: VMX: Extract posted interrupt support to separate files added another pi_init(), though one already existed in the paride code. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2020-10-21Merge branch 'kvm-fixes' into 'next'Paolo Bonzini
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21KVM: VMX: Intercept guest reserved CR4 bits to inject #GP faultSean Christopherson
Intercept CR4 bits that are guest reserved so that KVM correctly injects a #GP fault if the guest attempts to set a reserved bit. If a feature is supported by the CPU but is not exposed to the guest, and its associated CR4 bit is not intercepted by KVM by default, then KVM will fail to inject a #GP if the guest sets the CR4 bit without triggering an exit, e.g. by toggling only the bit in question. Note, KVM doesn't give the guest direct access to any CR4 bits that are also dependent on guest CPUID. Yet. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200930041659.28181-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21KVM: x86: Move call to update_exception_bitmap() into VMX codeSean Christopherson
Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called back-to-back, subsume the exception bitmap update into the common CPUID update. Drop the SVM invocation entirely as SVM's exception bitmap doesn't vary with respect to guest CPUID. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200930041659.28181-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21KVM: x86: allow kvm_x86_ops.set_efer to return an error valueMaxim Levitsky
This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21KVM: VMX: Ignore userspace MSR filters for x2APICSean Christopherson
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace filtering. Allowing userspace to intercept reads to x2APIC MSRs when APICV is fully enabled for the guest simply can't work; the LAPIC and thus virtual APIC is in-kernel and cannot be directly accessed by userspace. To keep things simple we will in fact forbid intercepting x2APIC MSRs altogether, independent of the default_allow setting. Cc: Alexander Graf <graf@amazon.com> Cc: Aaron Lewis <aaronlewis@google.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201005195532.8674-3-sean.j.christopherson@intel.com> [Modified to operate even if APICv is disabled, adjust documentation. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-19KVM: VMX: Fix x2APIC MSR intercept handling on !APICV platformsPeter Xu
Fix an inverted flag for intercepting x2APIC MSRs and intercept writes by default, even when APICV is enabled. Fixes: 3eb900173c71 ("KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied") Co-developed-by: Peter Xu <peterx@redhat.com> [sean: added changelog] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201005195532.8674-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-14Merge tag 'objtool-core-2020-10-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "Most of the changes are cleanups and reorganization to make the objtool code more arch-agnostic. This is in preparation for non-x86 support. Other changes: - KASAN fixes - Handle unreachable trap after call to noreturn functions better - Ignore unreachable fake jumps - Misc smaller fixes & cleanups" * tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf build: Allow nested externs to enable BUILD_BUG() usage objtool: Allow nested externs to enable BUILD_BUG() objtool: Permit __kasan_check_{read,write} under UACCESS objtool: Ignore unreachable trap after call to noreturn functions objtool: Handle calling non-function symbols in other sections objtool: Ignore unreachable fake jumps objtool: Remove useless tests before save_reg() objtool: Decode unwind hint register depending on architecture objtool: Make unwind hint definitions available to other architectures objtool: Only include valid definitions depending on source file type objtool: Rename frame.h -> objtool.h objtool: Refactor jump table code to support other architectures objtool: Make relocation in alternative handling arch dependent objtool: Abstract alternative special case handling objtool: Move macros describing structures to arch-dependent code objtool: Make sync-check consider the target architecture objtool: Group headers to check in a single list objtool: Define 'struct orc_entry' only when needed objtool: Skip ORC entry creation for non-text sections objtool: Move ORC logic out of check() ...
2020-10-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Two bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept KVM: arm64: Restore missing ISB on nVHE __tlb_switch_to_guest
2020-10-03KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF interceptPaolo Bonzini
The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of the #PF intercept bit in the exception bitmap when they do not match. This means that, if PFEC_MASK and/or PFEC_MATCH are set, the hypervisor can get a vmexit for #PF exceptions even when the corresponding bit is clear in the exception bitmap. This is unexpected and is promptly detected by a WARN_ON_ONCE. To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr). Reported-by: Qian Cai <cai@redhat.com>> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-29KVM: VMX: vmx_uret_msrs_list[] can be statickernel test robot
Fixes: 14a61b642de9 ("KVM: VMX: Rename "vmx_msr_index" to "vmx_uret_msrs_list"") Signed-off-by: kernel test robot <lkp@intel.com> Message-Id: <20200928153714.GA6285@a3a878002045> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: VMX: Prevent MSR passthrough when MSR access is deniedAlexander Graf
We will introduce the concept of MSRs that may not be handled in kernel space soon. Some MSRs are directly passed through to the guest, effectively making them handled by KVM from user space's point of view. This patch introduces all logic required to ensure that MSRs that user space wants trapped are not marked as direct access for guests. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-7-graf@amazon.com> [Replace "_idx" with "_slot". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Prepare MSR bitmaps for userspace tracked MSRsAaron Lewis
Prepare vmx and svm for a subsequent change that ensures the MSR permission bitmap is set to allow an MSR that userspace is tracking to force a vmx_vmexit in the guest. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Oliver Upton <oupton@google.com> [agraf: rebase, adapt SVM scheme to nested changes that came in between] Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-5-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename vmx_uret_msr's "index" to "slot"Sean Christopherson
Rename "index" to "slot" in struct vmx_uret_msr to align with the terminology used by common x86's kvm_user_return_msrs, and to avoid conflating "MSR's ECX index" with "MSR's index into an array". No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-16-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename "vmx_msr_index" to "vmx_uret_msrs_list"Sean Christopherson
Rename "vmx_msr_index" to "vmx_uret_msrs_list" to associate it with the uret MSRs array, and to avoid conflating "MSR's ECX index" with "MSR's index into an array". Similarly, don't use "slot" in the name as that terminology is claimed by the common x86 "user_return_msrs" mechanism. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-15-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename "vmx_set_guest_msr" to "vmx_set_guest_uret_msr"Sean Christopherson
Add "uret" to vmx_set_guest_msr() to explicitly associate it with the guest_uret_msrs array, and to differentiate it from vmx_set_msr() as well as VMX's load/store MSRs. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-14-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr"Sean Christopherson
Rename "find_msr_entry" to scope it to VMX and to associate it with guest_uret_msrs. Drop the "entry" so that the function name pairs with the existing __vmx_find_uret_msr(), which intentionally uses a double underscore prefix instead of appending "index" or "slot" as those names are already claimed by other pieces of the user return MSR stack. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-13-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Add vmx_setup_uret_msr() to handle lookup and swapSean Christopherson
Add vmx_setup_uret_msr() to wrap the lookup and manipulation of the uret MSRs array during setup_msrs(). In addition to consolidating code, this eliminates move_msr_up(), which while being a very literally description of the function, isn't exacly helpful in understanding the net effect of the code. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-12-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Move uret MSR lookup into update_transition_efer()Sean Christopherson
Move checking for the existence of MSR_EFER in the uret MSR array into update_transition_efer() so that the lookup and manipulation of the array in setup_msrs() occur back-to-back. This paves the way toward adding a helper to wrap the lookup and manipulation. To avoid unnecessary overhead, defer the lookup until the uret array would actually be modified in update_transition_efer(). EFER obviously exists on CPUs that support the dedicated VMCS fields for switching EFER, and EFER must exist for the guest and host EFER.NX value to diverge, i.e. there is no danger of attempting to read/write EFER when it doesn't exist. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Check guest support for RDTSCP before processing MSR_TSC_AUXSean Christopherson
Check for RDTSCP support prior to checking if MSR_TSC_AUX is in the uret MSRs array so that the array lookup and manipulation are back-to-back. This paves the way toward adding a helper to wrap the lookup and manipulation. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename "__find_msr_index" to "__vmx_find_uret_msr"Sean Christopherson
Rename "__find_msr_index" to scope it to VMX, associate it with guest_uret_msrs, and to avoid conflating "MSR's ECX index" with "MSR's array index". Similarly, don't use "slot" in the name so as to avoid colliding the common x86's half of "user_return_msrs" (the slot in kvm_user_return_msrs is not the same slot in guest_uret_msrs). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename vcpu_vmx's "guest_msrs_ready" to "guest_uret_msrs_loaded"Sean Christopherson
Add "uret" to "guest_msrs_ready" to explicitly associate it with the "guest_uret_msrs" array, and replace "ready" with "loaded" to more precisely reflect what it tracks, e.g. "ready" could be interpreted as meaning ready for processing (setup_msrs() has run), which is wrong. "loaded" also aligns with the similar "guest_state_loaded" field. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename vcpu_vmx's "save_nmsrs" to "nr_active_uret_msrs"Sean Christopherson
Add "uret" into the name of "save_nmsrs" to explicitly associate it with the guest_uret_msrs array, and replace "save" with "active" (for lack of a better word) to better describe what is being tracked. While "save" is more or less accurate when viewed as a literal description of the field, e.g. it holds the number of MSRs that were saved into the array the last time setup_msrs() was invoked, it can easily be misinterpreted by the reader, e.g. as meaning the number of MSRs that were saved from hardware at some point in the past, or as the number of MSRs that need to be saved at some point in the future, both of which are wrong. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename vcpu_vmx's "nmsrs" to "nr_uret_msrs"Sean Christopherson
Rename vcpu_vmx.nsmrs to vcpu_vmx.nr_uret_msrs to explicitly associate it with the guest_uret_msrs array. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr"Sean Christopherson
Rename struct "shared_msr_entry" to "vmx_uret_msr" to align with x86's rename of "shared_msrs" to "user_return_msrs", and to call out that the struct is specific to VMX, i.e. not part of the generic "shared_msrs" framework. Abbreviate "user_return" as "uret" to keep line lengths marginally sane and code more or less readable. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename "vmx_find_msr_index" to "vmx_find_loadstore_msr_slot"Sean Christopherson
Add "loadstore" to vmx_find_msr_index() to differentiate it from the so called shared MSRs helpers (which will soon be renamed), and replace "index" with "slot" to better convey that the helper returns slot in the array, not the MSR index (the value that gets stuffed into ECX). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Prepend "MAX_" to MSR array size definesSean Christopherson
Add "MAX" to the LOADSTORE and so called SHARED MSR defines to make it more clear that the define controls the array size, as opposed to the actual number of valid entries that are in the array. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Rename "shared_msrs" to "user_return_msrs"Sean Christopherson
Rename the "shared_msrs" mechanism, which is used to defer restoring MSRs that are only consumed when running in userspace, to a more banal but less likely to be confusing "user_return_msrs". The "shared" nomenclature is confusing as it's not obvious who is sharing what, e.g. reasonable interpretations are that the guest value is shared by vCPUs in a VM, or that the MSR value is shared/common to guest and host, both of which are wrong. "shared" is also misleading as the MSR value (in hardware) is not guaranteed to be shared/reused between VMs (if that's indeed the correct interpretation of the name), as the ability to share values between VMs is simply a side effect (albiet a very nice side effect) of deferring restoration of the host value until returning from userspace. "user_return" avoids the above confusion by describing the mechanism itself instead of its effects. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Add intr/vectoring info and error code to kvm_exit tracepointSean Christopherson
Extend the kvm_exit tracepoint to align it with kvm_nested_vmexit in terms of what information is captured. On SVM, add interrupt info and error code, while on VMX it add IDT vectoring and error code. This sets the stage for macrofying the kvm_exit tracepoint definition so that it can be reused for kvm_nested_vmexit without loss of information. Opportunistically stuff a zero for VM_EXIT_INTR_INFO if the VM-Enter failed, as the field is guaranteed to be invalid. Note, it'd be possible to further filter the interrupt/exception fields based on the VM-Exit reason, but the helper is intended only for tracepoints, i.e. an extra VMREAD or two is a non-issue, the failed VM-Enter case is just low hanging fruit. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923201349.16097-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nVMX: Explicitly check for valid guest state for !unrestricted guestSean Christopherson
Call guest_state_valid() directly instead of querying emulation_required when checking if L1 is attempting VM-Enter with invalid guest state. If emulate_invalid_guest_state is false, KVM will fixup segment regs to avoid emulation and will never set emulation_required, i.e. KVM will incorrectly miss the associated consistency checks because the nested path stuffs segments directly into vmcs02. Opportunsitically add Consistency Check tracing to make future debug suck a little less. Fixes: 2bb8cafea80bf ("KVM: vVMX: signal failure for nested VMEntry if emulation_required") Fixes: 3184a995f782c ("KVM: nVMX: fix vmentry failure code when L2 state would require emulation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923184452.980-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename ops.h to vmx_ops.hSean Christopherson
Rename ops.h to vmx_ops.h to allow adding a tdx_ops.h in the future without causing massive confusion. Trust Domain Extensions (TDX) is built on VMX, but KVM cannot directly access the VMCS(es) for a TDX guest, thus TDX will need its own "ops" implementation for wrapping the low level operations. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923183112.3030-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Extract posted interrupt support to separate filesXiaoyao Li
Extract the posted interrupt code so that it can be reused for Trust Domain Extensions (TDX), which requires posted interrupts and can use KVM VMX's implementation almost verbatim. TDX is different enough from raw VMX that it is highly desirable to implement the guts of TDX in a separate file, i.e. reusing posted interrupt code by shoving TDX support into vmx.c would be a mess. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923183112.3030-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Add a helper and macros to reduce boilerplate for sec exec ctlsSean Christopherson
Add a helper function and several wrapping macros to consolidate the copy-paste code in vmx_compute_secondary_exec_control() for adjusting controls that are dependent on guest CPUID bits. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200925003011.21016-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename RDTSCP secondary exec control name to insert "ENABLE"Sean Christopherson
Rename SECONDARY_EXEC_RDTSCP to SECONDARY_EXEC_ENABLE_RDTSCP in preparation for consolidating the logic for adjusting secondary exec controls based on the guest CPUID model. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Unconditionally clear CPUID.INVPCID if !CPUID.PCIDSean Christopherson
If PCID is not exposed to the guest, clear INVPCID in the guest's CPUID even if the VMCS INVPCID enable is not supported. This will allow consolidating the secondary execution control adjustment code without having to special case INVPCID. Technically, this fixes a bug where !CPUID.PCID && CPUID.INVCPID would result in unexpected guest behavior (#UD instead of #GP/#PF), but KVM doesn't support exposing INVPCID if it's not supported in the VMCS, i.e. such a config is broken/bogus no matter what. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename vmx_*_supported() helpers to cpu_has_vmx_*()Sean Christopherson
Rename helpers for a few controls to conform to the more prevelant style of cpu_has_vmx_<feature>(). Consistent names will allow adding macros to consolidate the boilerplate code for adjusting secondary execution controls. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Use "illegal GPA" helper for PT/RTIT output base checkSean Christopherson
Use kvm_vcpu_is_illegal_gpa() to check for a legal GPA when validating a PT output base instead of open coding a clever, but difficult to read, variant. Code readability is far more important than shaving a few uops in a slow path. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Move illegal GPA helper out of the MMU codeSean Christopherson
Rename kvm_mmu_is_illegal_gpa() to kvm_vcpu_is_illegal_gpa() and move it to cpuid.h so that's it's colocated with cpuid_maxphyaddr(). The helper is not MMU specific and will gain a user that is completely unrelated to the MMU in a future patch. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Replace MSR_IA32_RTIT_OUTPUT_BASE_MASK with helper functionSean Christopherson
Replace the subtly not-a-constant MSR_IA32_RTIT_OUTPUT_BASE_MASK with a proper helper function to check whether or not the specified base is valid. Blindly referencing the local 'vcpu' is especially nasty. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Use precomputed MAXPHYADDR for RTIT base MSR checkSean Christopherson
Use cpuid_maxphyaddr() instead of cpuid_query_maxphyaddr() for the RTIT base MSR check. There is no reason to recompute MAXPHYADDR as the precomputed version is synchronized with CPUID updates, and MSR_IA32_RTIT_OUTPUT_BASE is not written between stuffing CPUID and refreshing vcpu->arch.maxphyaddr. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Do not perform emulation for INVD interceptTom Lendacky
The INVD instruction is emulated as a NOP, just skip the instruction instead. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <addd41be2fbf50f5f4059e990a2a0cff182d2136.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Invoke NMI handler via indirect call instead of INTnSean Christopherson
Rework NMI VM-Exit handling to invoke the kernel handler by function call instead of INTn. INTn microcode is relatively expensive, and aligning the IRQ and NMI handling will make it easier to update KVM should some newfangled method for invoking the handlers come along. Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915191505.10355-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Move IRQ invocation to assembly subroutineSean Christopherson
Move the asm blob that invokes the appropriate IRQ handler after VM-Exit into a proper subroutine. Unconditionally create a stack frame in the subroutine so that, as objtool sees things, the function has standard stack behavior. The dynamic stack adjustment makes using unwind hints problematic. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915191505.10355-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: x86: Add kvm_x86_ops hook to short circuit emulationSean Christopherson
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in ↵Krish Sadhukhan
vmcs02 if vmcs12 doesn't set it Currently, prepare_vmcs02_early() does not check if the "unrestricted guest" VM-execution control in vmcs12 is turned off and leaves the corresponding bit on in vmcs02. Due to this setting, vmentry checks which are supposed to render the nested guest state as invalid when this VM-execution control is not set, are passing in hardware. This patch turns off the "unrestricted guest" VM-execution control in vmcs02 if vmcs12 has turned it off. Suggested-by: Jim Mattson <jmattson@google.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200921081027.23047-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: X86: Move handling of INVPCID types to x86Babu Moger
INVPCID instruction handling is mostly same across both VMX and SVM. So, move the code to common x86.c. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.cBabu Moger
Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: LAPIC: Reduce world switch latency caused by timer_advance_nsWanpeng Li
All the checks in lapic_timer_int_injected(), __kvm_wait_lapic_expire(), and these function calls waste cpu cycles when the timer mode is not tscdeadline. We can observe ~1.3% world switch time overhead by kvm-unit-tests/vmexit.flat vmcall testing on AMD server. This patch reduces the world switch latency caused by timer_advance_ns feature when the timer mode is not tscdeadline by simpling move the check against apic->lapic_timer.expired_tscdeadline much earlier. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1599731444-3525-7-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>