summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2019-10-22KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_updateMiaohe Lin
Guest physical APIC ID may not equal to vcpu->vcpu_id in some case. We may set the wrong physical id in avic_handle_ldr_update as we always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page instead. Export and use kvm_xapic_id here and in avic_handle_apic_id_update as suggested by Vitaly. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: svm: Update svm_xsaves_supportedAaron Lewis
AMD CPUs now support XSAVES in a limited fashion (they require IA32_XSS to be zero). AMD has no equivalent of Intel's "Enable XSAVES/XRSTORS" VM-execution control. Instead, XSAVES is always available to the guest when supported on the host. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I40dc2c682eb0d38c2208d95d5eb7bbb6c47f6317 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: x86: Move IA32_XSS to kvm_{get,set}_msr_commonAaron Lewis
Hoist support for RDMSR/WRMSR of IA32_XSS from vmx into common code so that it can be used for svm as well. Right now, kvm only allows the guest IA32_XSS to be zero, so the guest's usage of XSAVES will be exactly the same as XSAVEC. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Ie4b0f777d71e428fbee6e82071ac2d7618e9bb40 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Move IA32_XSS-swapping on VM-entry/VM-exit to common x86 codeAaron Lewis
Hoist the vendor-specific code related to loading the hardware IA32_XSS MSR with guest/host values on VM-entry/VM-exit to common x86 code. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Ic6e3430833955b98eb9b79ae6715cf2a3fdd6d82 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Use wrmsr for switching between guest and host IA32_XSS on IntelAaron Lewis
When the guest can execute the XSAVES/XRSTORS instructions, use wrmsr to set the hardware IA32_XSS MSR to guest/host values on VM-entry/VM-exit, rather than the MSR-load areas. By using the same approach as AMD, we will be able to use a common implementation for both (in the next patch). Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I9447d104b2615c04e39e4af0c911e1e7309bf464 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Use wrmsr for switching between guest and host IA32_XSS on AMDAaron Lewis
When the guest can execute the XSAVES/XRSTORS instructions, set the hardware IA32_XSS MSR to guest/host values on VM-entry/VM-exit. Note that vcpu->arch.ia32_xss is currently guaranteed to be 0 on AMD, since there is no way to change it. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Id51a782462086e6d7a3ab621838e200f1c005afd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Remove unneeded kvm_vcpu variable, guest_xcr0_loadedAaron Lewis
The kvm_vcpu variable, guest_xcr0_loaded, is a waste of an 'int' and a conditional branch. VMX and SVM are the only users, and both unconditionally pair kvm_load_guest_xcr0() with kvm_put_guest_xcr0() making this check unnecessary. Without this variable, the predicates in kvm_load_guest_xcr0 and kvm_put_guest_xcr0 should match. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I7b1eb9b62969d7bbb2850f27e42f863421641b23 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Fix conditions for guest IA32_XSS supportAaron Lewis
Volume 4 of the SDM says that IA32_XSS is supported if CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3] is set, so only the X86_FEATURE_XSAVES check is necessary (X86_FEATURE_XSAVES is the Linux name for CPUID(EAX=0DH,ECX=1):EAX.XSS[bit 3]). Fixes: 4d763b168e9c5 ("KVM: VMX: check CPUID before allowing read/write of IA32_XSS") Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I9059b9f2e3595e4b09a4cdcf14b933b22ebad419 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Introduce vcpu->arch.xsaves_enabledAaron Lewis
Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to vcpu->arch. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Rename {vmx,nested_vmx}_vcpu_setup()Xiaoyao Li
Rename {vmx,nested_vmx}_vcpu_setup() to match what they really do. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Initialize vmx->guest_msrs[] right after allocationXiaoyao Li
Move the initialization of vmx->guest_msrs[] from vmx_vcpu_setup() to vmx_create_vcpu(), and put it right after its allocation. This also is the preperation for next patch. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Remove vmx->hv_deadline_tsc initialization from vmx_vcpu_setup()Xiaoyao Li
... It can be removed here because the same code is called later in vmx_vcpu_reset() as the flow: kvm_arch_vcpu_setup() -> kvm_vcpu_reset() -> vmx_vcpu_reset() Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Write VPID to vmcs when creating vcpuXiaoyao Li
Move the code that writes vmx->vpid to vmcs from vmx_vcpu_reset() to vmx_vcpu_setup(), because vmx->vpid is allocated when creating vcpu and never changed. So we don't need to update the vmcs.vpid when resetting vcpu. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86/vPMU: Declare kvm_pmu->reprogram_pmi field using DECLARE_BITMAPLike Xu
Replace the explicit declaration of "u64 reprogram_pmi" with the generic macro DECLARE_BITMAP for all possible appropriate number of bits. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: remove redundant code in kvm_arch_vm_ioctlMiaohe Lin
If we reach here with r = 0, we will reassign r = 0 unnecesarry, then do the label set_irqchip_out work. If we reach here with r != 0, then we will do the label work directly. So this if statement and r = 0 assignment is redundant. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm parameterSuthikulpanit, Suravee
Generally, APICv for all vcpus in the VM are enable/disable in the same manner. So, get_enable_apicv() should represent APICv status of the VM instead of each VCPU. Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter instead of struct kvm_vcpu. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Fold decache_cr3() into cache_reg()Sean Christopherson
Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common cache_reg() callback and drop the dedicated decache_cr3(). The name decache_cr3() is somewhat confusing as the caching behavior of CR3 follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and has nothing in common with the caching behavior of CR0/CR4 (whose decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage). This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM. Change it to a WARN_ON_ONCE() -- if the cache never requires filling, the value is already in the right place -- and opportunistically add one in VMX to provide an equivalent check. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Add helpers to test/mark reg availability and dirtinessSean Christopherson
Add helpers to prettify code that tests and/or marks whether or not a register is available and/or dirty. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Fold 'enum kvm_ex_reg' definitions into 'enum kvm_reg'Sean Christopherson
Now that indexing into arch.regs is either protected by WARN_ON_ONCE or done with hardcoded enums, combine all definitions for registers that are tracked by regs_avail and regs_dirty into 'enum kvm_reg'. Having a single enum type will simplify additional cleanup related to regs_avail and regs_dirty. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: Add WARNs to detect out-of-bounds register indicesSean Christopherson
Add WARN_ON_ONCE() checks in kvm_register_{read,write}() to detect reg values that would cause KVM to overflow vcpu->arch.regs. Change the reg param to an 'int' to make it clear that the reg index is unverified. Regarding the overhead of WARN_ON_ONCE(), now that all fixed GPR reads and writes use dedicated accessors, e.g. kvm_rax_read(), the overhead is limited to flows where the reg index is generated at runtime. And there is at least one historical bug where KVM has generated an out-of- bounds access to arch.regs (see commit b68f3cc7d9789, "KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels"). Adding the WARN_ON_ONCE() protection paves the way for additional cleanup related to kvm_reg and kvm_reg_ex. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Optimize vmx_set_rflags() for unrestricted guestSean Christopherson
Rework vmx_set_rflags() to avoid the extra code need to handle emulation of real mode and invalid state when unrestricted guest is disabled. The primary reason for doing so is to avoid the call to vmx_get_rflags(), which will incur a VMREAD when RFLAGS is not already available. When running nested VMs, the majority of calls to vmx_set_rflags() will occur without an associated vmx_get_rflags(), i.e. when stuffing GUEST_RFLAGS during transitions between vmcs01 and vmcs02. Note, vmx_get_rflags() guarantees RFLAGS is marked available. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Replace "else" with early "return" in the unrestricted guest branch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Consolidate to_vmx() usage in RFLAGS accessorsSean Christopherson
Capture struct vcpu_vmx in a local variable to improve the readability of vmx_{g,s}et_rflags(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-dateSean Christopherson
Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has not been read from the VMCS since the last VM-Enter. If vcpu->arch.cr3 is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS, meaning KVM will do a VMREAD and then VMWRITE the value it just pulled from the VMCS. Note, this is a purely theoretical change, no instances of skipping the VMREAD+VMWRITE have been observed with this change. Tested-by: Reto Buerki <reet@codelabs.ch> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Reduce WBINVD/DF_FLUSH invocationsTom Lendacky
Performing a WBINVD and DF_FLUSH are expensive operations. Currently, a WBINVD/DF_FLUSH is performed every time an SEV guest terminates. However, the WBINVD/DF_FLUSH is only required when an ASID is being re-allocated to a new SEV guest. Also, a single WBINVD/DF_FLUSH can enable all ASIDs that have been disassociated from guests through DEACTIVATE. To reduce the number of WBINVD/DF_FLUSH invocations, introduce a new ASID bitmap to track ASIDs that need to be reclaimed. When an SEV guest is terminated, add its ASID to the reclaim bitmap instead of clearing the bitmap in the existing SEV ASID bitmap. This delays the need to perform a WBINVD/DF_FLUSH invocation when an SEV guest terminates until all of the available SEV ASIDs have been used. At that point, the WBINVD/DF_FLUSH invocation can be performed and all ASIDs in the reclaim bitmap moved to the available ASIDs bitmap. The semaphore around DEACTIVATE can be changed to a read semaphore with the semaphore taken in write mode before performing the WBINVD/DF_FLUSH. Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Remove unneeded WBINVD and DF_FLUSH when starting SEV guestsTom Lendacky
Performing a WBINVD and DF_FLUSH are expensive operations. The SEV support currently performs this WBINVD/DF_FLUSH combination when an SEV guest is terminated, so there is no need for it to be done before LAUNCH. However, when the SEV firmware transitions the platform from UNINIT state to INIT state, all ASIDs will be marked invalid across all threads. Therefore, as part of transitioning the platform to INIT state, perform a WBINVD/DF_FLUSH after a successful INIT in the PSP/SEV device driver. Since the PSP/SEV device driver is x86 only, it can reference and use the WBINVD related functions directly. Cc: Gary Hook <gary.hook@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-EnterSean Christopherson
Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter instead of deferring the VMWRITE until vmx_set_cr3(). If the VMWRITE is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated VM-Exit occurs without actually entering L2, e.g. if the nested run is squashed because nested VM-Enter (from L1) is putting L2 into HLT. Note, the above scenario can occur regardless of whether L1 is intercepting HLT, e.g. L1 can intercept HLT and then re-enter L2 with vmcs.GUEST_ACTIVITY_STATE=HALTED. But practically speaking, a VMM will likely put a guest into HALTED if and only if it's not intercepting HLT. In an ideal world where EPT *requires* unrestricted guest (and vice versa), VMX could handle CR3 similar to how it handles RSP and RIP, e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run(). But the unrestricted guest silliness complicates the dirty tracking logic to the point that explicitly handling vmcs02.GUEST_CR3 during nested VM-Enter is a simpler overall implementation. Cc: stable@vger.kernel.org Reported-and-tested-by: Reto Buerki <reet@codelabs.ch> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Guard against DEACTIVATE when performing WBINVD/DF_FLUSHTom Lendacky
The SEV firmware DEACTIVATE command disassociates an SEV guest from an ASID, clears the WBINVD indicator on all threads and indicates that the SEV firmware DF_FLUSH command must be issued before the ASID can be re-used. The SEV firmware DF_FLUSH command will return an error if a WBINVD has not been performed on every thread before it has been invoked. A window exists between the WBINVD and the invocation of the DF_FLUSH command where an SEV firmware DEACTIVATE command could be invoked on another thread, clearing the WBINVD indicator. This will cause the subsequent SEV firmware DF_FLUSH command to fail which, in turn, results in the SEV firmware ACTIVATE command failing for the reclaimed ASID. This results in the SEV guest failing to start. Use a mutex to close the WBINVD/DF_FLUSH window by obtaining the mutex before the DEACTIVATE and releasing it after the DF_FLUSH. This ensures that any DEACTIVATE cannot run before a DF_FLUSH has completed. Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Serialize access to the SEV ASID bitmapTom Lendacky
The SEV ASID bitmap currently is not protected against parallel SEV guest startups. This can result in an SEV guest failing to start because another SEV guest could have been assigned the same ASID value. Use a mutex to serialize access to the SEV ASID bitmap. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: clear kvmclock MSR on resetPaolo Bonzini
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is not enabled. This can be confusing, so fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: fix bugon.cocci warningskbuild test robot
Use BUG_ON instead of a if condition followed by BUG. Generated by: scripts/coccinelle/misc/bugon.cocci Fixes: 4b526de50e39 ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()") CC: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Remove specialized handling of unexpected exit-reasonsLiran Alon
Commit bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") introduced specialized handling of specific exit-reasons that should not be raised by CPU because KVM configures VMCS such that they should never be raised. However, since commit 7396d337cfad ("KVM: x86: Return to userspace with internal error on unexpected exit reason"), VMX & SVM exit handlers were modified to generically handle all unexpected exit-reasons by returning to userspace with internal error. Therefore, there is no need for specialized handling of specific unexpected exit-reasons (This specialized handling also introduced inconsistency for these exit-reasons to silently skip guest instruction instead of return to userspace on internal-error). Fixes: bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUIDJim Mattson
When the RDPID instruction is supported on the host, enumerate it in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-18x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*Jiri Slaby
These are all functions which are invoked from elsewhere, so annotate them as global using the new SYM_FUNC_START and their ENDPROC's by SYM_FUNC_END. Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the last users. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernate] Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits] Acked-by: Herbert Xu <herbert@gondor.apana.org.au> [crypto] Cc: Allison Randal <allison@lohutok.net> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andy@infradead.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Armijn Hemel <armijn@tjaldur.nl> Cc: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Enrico Weigelt <info@metux.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: kvm ML <kvm@vger.kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: linux-arch@vger.kernel.org Cc: linux-crypto@vger.kernel.org Cc: linux-efi <linux-efi@vger.kernel.org> Cc: linux-efi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: platform-driver-x86@vger.kernel.org Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Wei Huang <wei@redhat.com> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com> Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz
2019-10-17x86: xen: kvm: Gather the definition of emulate prefixesMasami Hiramatsu
Gather the emulate prefixes, which forcibly make the following instruction emulated on virtualization, in one place. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: x86@kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: xen-devel@lists.xenproject.org Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/156777563917.25081.7286628561790289995.stgit@devnote2
2019-10-04KVM: x86: omit "impossible" pmu MSRs from MSR listPaolo Bonzini
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since some machines actually have MSRs past the reserved range, filtering them against x86_pmu.num_counters_gp may have false positives. Cut the list to 18 entries to avoid this. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jim Mattson <jamttson@google.com> Fixes: e2ada66ec418 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03KVM: nVMX: Fix consistency check on injected exception error codeSean Christopherson
Current versions of Intel's SDM incorrectly state that "bits 31:15 of the VM-Entry exception error-code field" must be zero. In reality, bits 31:16 must be zero, i.e. error codes are 16-bit values. The bogus error code check manifests as an unexpected VM-Entry failure due to an invalid code field (error number 7) in L1, e.g. when injecting a #GP with error_code=0x9f00. Nadav previously reported the bug[*], both to KVM and Intel, and fixed the associated kvm-unit-test. [*] https://patchwork.kernel.org/patch/11124749/ Reported-by: Nadav Amit <namit@vmware.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03KVM: x86: omit absent pmu MSRs from MSR listPaolo Bonzini
INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since some machines actually have MSRs past the reserved range, these may survive the filtering of msrs_to_save array and would be rejected by KVM_GET/SET_MSR. To avoid this, cut the list to whatever CPUID reports for the host's architectural PMU. Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Jim Mattson <jmattson@google.com> Fixes: e2ada66ec418 ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]", 2019-08-21) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-01kvm: vmx: Limit guest PMCs to those supported on the hostJim Mattson
KVM can only virtualize as many PMCs as the host supports. Limit the number of generic counters and fixed counters to the number of corresponding counters supported on the host, rather than to INTEL_PMC_MAX_GENERIC and INTEL_PMC_MAX_FIXED, respectively. Note that INTEL_PMC_MAX_GENERIC is currently 32, which exceeds the 18 contiguous MSR indices reserved by Intel for event selectors. Since the existing code relies on a contiguous range of MSR indices for event selectors, it can't possibly work for more than 18 general purpose counters. Fixes: f5132b01386b5a ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-30kvm: x86, powerpc: do not allow clearing largepages debugfs entryPaolo Bonzini
The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-27KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TFWaiman Long
The l1tf_vmx_mitigation is only set to VMENTER_L1D_FLUSH_NOT_REQUIRED when the ARCH_CAPABILITIES MSR indicates that L1D flush is not required. However, if the CPU is not affected by L1TF, l1tf_vmx_mitigation will still be set to VMENTER_L1D_FLUSH_AUTO. This is certainly not the best option for a !X86_BUG_L1TF CPU. So force l1tf_vmx_mitigation to VMENTER_L1D_FLUSH_NOT_REQUIRED to make it more explicit in case users are checking the vmentry_l1d_flush parameter. Signed-off-by: Waiman Long <longman@redhat.com> [Patch rewritten accoring to Borislav Petkov's suggestion. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-27KVM: x86: fix nested guest live migration with PMLPaolo Bonzini
Shadow paging is fundamentally incompatible with the page-modification log, because the GPAs in the log come from the wrong memory map. In particular, for the EPT page-modification log, the GPAs in the log come from L2 rather than L1. (If there was a non-EPT page-modification log, we couldn't use it for shadow paging because it would log GVAs rather than GPAs). Therefore, we need to rely on write protection to record dirty pages. This has the side effect of bypassing PML, since writes now result in an EPT violation vmexit. This is relatively easy to add to KVM, because pretty much the only place that needs changing is spte_clear_dirty. The first access to the page already goes through the page fault path and records the correct GPA; it's only subsequent accesses that are wrong. Therefore, we can equip set_spte (where the first access happens) to record that the SPTE will have to be write protected, and then spte_clear_dirty will use this information to do the right thing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-27KVM: x86: assign two bits to track SPTE kindsPaolo Bonzini
Currently, we are overloading SPTE_SPECIAL_MASK to mean both "A/D bits unavailable" and MMIO, where the difference between the two is determined by mio_mask and mmio_value. However, the next patch will need two bits to distinguish availability of A/D bits from write protection. So, while at it give MMIO its own bit pattern, and move the two bits from bit 62 to bits 52..53 since Intel is allocating EPT page table bits from the top. Reviewed-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26KVM: x86: Expose XSAVEERPTR to the guestSebastian Andrzej Siewior
I was surprised to see that the guest reported `fxsave_leak' while the host did not. After digging deeper I noticed that the bits are simply masked out during enumeration. The XSAVEERPTR feature is actually a bug fix on AMD which means the kernel can disable a workaround. Pass XSAVEERPTR to the guest if available on the host. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26kvm: x86: Enumerate support for CLZERO instructionJim Mattson
CLZERO is available to the guest if it is supported on the host. Therefore, enumerate support for the instruction in KVM_GET_SUPPORTED_CPUID whenever it is supported on the host. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26kvm: x86: Use AMD CPUID semantics for AMD vCPUsJim Mattson
When the guest CPUID information represents an AMD vCPU, return all zeroes for queries of undefined CPUID leaves, whether or not they are in range. Signed-off-by: Jim Mattson <jmattson@google.com> Fixes: bd22f5cfcfe8f6 ("KVM: move and fix substitue search for missing CPUID entries") Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Jacob Xu <jacobhxu@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26kvm: x86: Improve emulation of CPUID leaves 0BH and 1FHJim Mattson
For these CPUID leaves, the EDX output is not dependent on the ECX input (i.e. the SIGNIFCANT_INDEX flag doesn't apply to EDX). Furthermore, the low byte of the ECX output is always identical to the low byte of the ECX input. KVM does not produce the correct ECX and EDX outputs for any undefined subleaves beyond the first. Special-case these CPUID leaves in kvm_cpuid, so that the ECX and EDX outputs are properly generated for all undefined subleaves. Fixes: 0771671749b59a ("KVM: Enhance guest cpuid management") Fixes: a87f2d3a6eadab ("KVM: x86: Add Intel CPUID.1F cpuid emulation support") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Jacob Xu <jacobhxu@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26KVM: X86: Fix userspace set invalid CR4Wanpeng Li
Reported by syzkaller: WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel] CPU: 0 PID: 6544 Comm: a.out Tainted: G OE 5.3.0-rc4+ #4 RIP: 0010:handle_desc+0x37/0x40 [kvm_intel] Call Trace: vmx_handle_exit+0xbe/0x6b0 [kvm_intel] vcpu_enter_guest+0x4dc/0x18d0 [kvm] kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm] kvm_vcpu_ioctl+0x3ad/0x690 [kvm] do_vfs_ioctl+0xa2/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x74/0x720 entry_SYSCALL_64_after_hwframe+0x49/0xbe When CR4.UMIP is set, guest should have UMIP cpuid flag. Current kvm set_sregs function doesn't have such check when userspace inputs sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast triggers handle_desc warning when executing ltr instruction since guest architectural CR4 doesn't set UMIP. This patch fixes it by adding valid CR4 and CPUID combination checking in __set_sregs. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000 Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26kvm: x86: Fix a spurious -E2BIG in __do_cpuid_funcJim Mattson
Don't return -E2BIG from __do_cpuid_func when processing function 0BH or 1FH and the last interesting subleaf occupies the last allocated entry in the result array. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: 831bf664e9c1fc ("KVM: Refactor and simplify kvm_dev_ioctl_get_supported_cpuid") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-26KVM: LAPIC: Loosen filter for adaptive tuning of lapic_timer_advance_nsWanpeng Li
5000 guest cycles delta is easy to encounter on desktop, per-vCPU lapic_timer_advance_ns always keeps at 1000ns initial value, let's loosen the filter a bit to let adaptive tuning make progress. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-25KVM: nVMX: cleanup and fix host 64-bit mode checksPaolo Bonzini
KVM was incorrectly checking vmcs12->host_ia32_efer even if the "load IA32_EFER" exit control was reset. Also, some checks were not using the new CC macro for tracing. Cleanup everything so that the vCPU's 64-bit mode is determined directly from EFER_LMA and the VMCS checks are based on that, which matches section 26.2.4 of the SDM. Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Fixes: 5845038c111db27902bc220a4f70070fe945871c Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>