summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-04x86: kvm: style: Simplify bool comparisonYANG LI
Fix the following coccicheck warning: ./arch/x86/kvm/x86.c:8012:5-48: WARNING: Comparison to bool Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Message-Id: <1610357578-66081-1-git-send-email-abaci-bugfix@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: Zap the oldest MMU pages, not the newestSean Christopherson
Walk the list of MMU pages in reverse in kvm_mmu_zap_oldest_mmu_pages(). The list is FIFO, meaning new pages are inserted at the head and thus the oldest pages are at the tail. Using a "forward" iterator causes KVM to zap MMU pages that were just added, which obliterates guest performance once the max number of shadow MMU pages is reached. Fixes: 6b82ef2c9cf1 ("KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages") Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210113205030.3481307-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86/mmu: Use boolean returns for (S)PTE accessorsSean Christopherson
Return a 'bool' instead of an 'int' for various PTE accessors that are boolean in nature, e.g. is_shadow_present_pte(). Returning an int is goofy and potentially dangerous, e.g. if a flag being checked is moved into the upper 32 bits of a SPTE, then the compiler may silently squash the entire check since casting to an int is guaranteed to yield a return value of '0'. Opportunistically refactor is_last_spte() so that it naturally returns a bool value instead of letting it implicitly cast 0/1 to false/true. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210123003003.3137525-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: X86: use vzalloc() instead of vmalloc/memsetTian Tao
fixed the following warning: /virt/kvm/dirty_ring.c:70:20-27: WARNING: vzalloc should be used for ring -> dirty_gfns, instead of vmalloc/memset. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Message-Id: <1611547045-13669-1-git-send-email-tiantao6@hisilicon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: Take KVM's SRCU lock only if steal time update is neededSean Christopherson
Enter a SRCU critical section for a memslots lookup during steal time update if and only if a steal time update is actually needed. Taking the lock can be avoided if steal time is disabled by the guest, or if KVM knows it has already flagged the vCPU as being preempted. Reword the comment to be more precise as to exactly why memslots will be queried. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210123000334.3123628-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put()Sean Christopherson
Remove the disabling of page faults across kvm_steal_time_set_preempted() as KVM now accesses the steal time struct (shared with the guest) via a cached mapping (see commit b043138246a4, "x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed".) The cache lookup is flagged as atomic, thus it would be a bug if KVM tried to resolve a new pfn, i.e. we want the splat that would be reached via might_fault(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210123000334.3123628-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: do not assume PTE is writable after follow_pfnPaolo Bonzini
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn. In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable. Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable. Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens <stevensd@google.com> Cc: 3pvd@google.com Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEsBen Gardon
There is a bug in the TDP MMU function to zap SPTEs which could be replaced with a larger mapping which prevents the function from doing anything. Fix this by correctly zapping the last level SPTEs. Cc: stable@vger.kernel.org Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-03KVM: arm64: Stub EXPORT_SYMBOL for nVHE EL2 codeQuentin Perret
In order to ensure the module loader does not get confused if a symbol is exported in EL2 nVHE code (as will be the case when we will compile e.g. lib/memset.S into the EL2 object), make sure to stub all exports using __DISABLE_EXPORTS in the nvhe folder. Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210203141931.615898-3-qperret@google.com
2021-02-03asm-generic: export: Stub EXPORT_SYMBOL with __DISABLE_EXPORTSQuentin Perret
It is currently possible to stub EXPORT_SYMBOL() macros in C code using __DISABLE_EXPORTS, which is necessary to run in constrained environments such as the EFI stub or the decompressor. But this currently doesn't apply to exports from assembly, which can lead to somewhat confusing situations. Consolidate the __DISABLE_EXPORTS infrastructure by checking it from asm-generic/export.h as well. Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210203141931.615898-2-qperret@google.com
2021-02-03KVM: arm64: Correct spelling of DBGDIDR registerAlexandru Elisei
The aarch32 debug ID register is called DBG*D*IDR (emphasis added), not DBGIDR, use the correct spelling. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210128132823.35067-1-alexandru.elisei@arm.com
2021-02-03KVM: arm64: Use symbolic names for the PMU versionsMarc Zyngier
Instead of using a bunch of magic numbers, use the existing definitions that have been added since 8673e02e58410 ("arm64: perf: Add support for ARMv8.5-PMU 64-bit counters") Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-02-03KVM: arm64: Upgrade PMU support to ARMv8.4Marc Zyngier
Upgrading the PMU code from ARMv8.1 to ARMv8.4 turns out to be pretty easy. All that is required is support for PMMIR_EL1, which is read-only, and for which returning 0 is a valid option as long as we don't advertise STALL_SLOT as an implemented event. Let's just do that and adjust what we return to the guest. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-02-03KVM: arm64: Limit the debug architecture to ARMv8.0Marc Zyngier
Let's not pretend we support anything but ARMv8.0 as far as the debug architecture is concerned. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-02-03KVM: arm64: Refactor filtering of ID registersMarc Zyngier
Our current ID register filtering is starting to be a mess of if() statements, and isn't going to get any saner. Let's turn it into a switch(), which has a chance of being more readable, and introduce a FEATURE() macro that allows easy generation of feature masks. No functionnal change intended. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-02-03KVM: arm64: Add handling of AArch32 PCMEID{2,3} PMUv3 registersMarc Zyngier
Despite advertising support for AArch32 PMUv3p1, we fail to handle the PMCEID{2,3} registers, which conveniently alias with the top bits of PMCEID{0,1}_EL1. Implement these registers with the usual AA32(HI/LO) aliasing mechanism. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-02-03KVM: arm64: Fix AArch32 PMUv3 cappingMarc Zyngier
We shouldn't expose *any* PMU capability when no PMU has been configured for this VM. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-02-03KVM: arm64: Fix missing RES1 in emulation of DBGBIDRMarc Zyngier
The AArch32 CP14 DBGDIDR has bit 15 set to RES1, which our current emulation doesn't set. Just add the missing bit. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-02-03KVM: x86: cleanup CR3 reserved bits checksPaolo Bonzini
If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: stable@vger.kernel.org Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-03KVM: SVM: Treat SVM as unsupported when running as an SEV guestSean Christopherson
Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210202212017.2486595-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-02KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit modeSean Christopherson
Set the emulator context to PROT64 if SYSENTER transitions from 32-bit userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at the end of x86_emulate_insn() will incorrectly truncate the new RIP. Note, this bug is mostly limited to running an Intel virtual CPU model on an AMD physical CPU, as other combinations of virtual and physical CPUs do not trigger full emulation. On Intel CPUs, SYSENTER in compatibility mode is legal, and unconditionally transitions to 64-bit mode. On AMD CPUs, SYSENTER is illegal in compatibility mode and #UDs. If the vCPU is AMD, KVM injects a #UD on SYSENTER in compat mode. If the pCPU is Intel, SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring guest TLB shenanigans). Fixes: fede8076aab4 ("KVM: x86: handle wrap around 32-bit address space") Cc: stable@vger.kernel.org Signed-off-by: Jonny Barker <jonny@jonnybarker.com> [sean: wrote changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210202165546.2390296-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-01KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID checkVitaly Kuznetsov
Commit 7a873e455567 ("KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even when PCID support is missing: ==== Test Assertion Failure ==== x86_64/set_sregs_test.c:41: rc pid=6956 tid=6956 - Invalid argument 1 0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41 2 0x00000000004014fc: main at set_sregs_test.c:119 3 0x00007f2d9346d041: ?? ??:0 4 0x000000000040164d: _start at ??:? KVM allowed unsupported CR4 bit (0x20000) Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make kvm_is_valid_cr4() fail. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210201142843.108190-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-01KVM/x86: assign hva with the right value to vm_munmap the pagesZheng Zhan Liang
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Zheng Zhan Liang <zhengzhanliang@huorong.cn> Message-Id: <20210201055310.267029-1-zhengzhanliang@huorong.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-01KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=offPaolo Bonzini
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST will generally use the default value for MSR_IA32_ARCH_CAPABILITIES. When this happens and the host has tsx=on, it is possible to end up with virtual machines that have HLE and RTM disabled, but TSX_CTRL available. If the fleet is then switched to tsx=off, kvm_get_arch_capabilities() will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to use the tsx=off hosts as migration destinations, even though the guests do not have TSX enabled. To allow this migration, allow guests to write to their TSX_CTRL MSR, while keeping the host MSR unchanged for the entire life of the guests. This ensures that TSX remains disabled and also saves MSR reads and writes, and it's okay to do because with tsx=off we know that guests will not have the HLE and RTM features in their CPUID. (If userspace sets bogus CPUID data, we do not expect HLE and RTM to work in guests anyway). Cc: stable@vger.kernel.org Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-01KVM: arm64: Make gen-hyprel endianness agnosticMarc Zyngier
gen-hyprel is, for better or worse, a native-endian program: it assumes that the ELF data structures are in the host's endianness, and even assumes that the compiled kernel is little-endian in one particular case. None of these assumptions hold true though: people actually build (use?) BE arm64 kernels, and seem to avoid doing so on BE hosts. Madness! In order to solve this, wrap each access to the ELF data structures with the required byte-swapping magic. This requires to obtain the kernel data structure, and provide per-endianess wrappers. This result in a kernel that links and even boots in a model. Fixes: 8c49b5d43d4c ("KVM: arm64: Generate hyp relocation data") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-01-28Fix unsynchronized access to sev members through svm_register_enc_regionPeter Gonda
Grab kvm->lock before pinning memory when registering an encrypted region; sev_pin_memory() relies on kvm->lock being held to ensure correctness when checking and updating the number of pinned pages. Add a lockdep assertion to help prevent future regressions. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active") Signed-off-by: Peter Gonda <pgonda@google.com> V2 - Fix up patch description - Correct file paths svm.c -> sev.c - Add unlock of kvm->lock on sev_pin_memory error V1 - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/ Message-Id: <20210127161524.2832400-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-28KVM: Documentation: Fix documentation for nested.Yu Zhang
Nested VMX was enabled by default in commit 1e58e5e59148 ("KVM: VMX: enable nested virtualization by default"), which was merged in Linux 4.20. This patch is to fix the documentation accordingly. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210128154747.4242-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-28Merge tag 'kvmarm-fixes-5.11-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #3 - Avoid clobbering extra registers on initialisation
2021-01-28KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctlMichael Roth
Recent commit 255cbecfe0 modified struct kvm_vcpu_arch to make 'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries rather than embedding the array in the struct. KVM_SET_CPUID and KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed. As a result, KVM_GET_CPUID2 currently returns random fields from struct kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix this by treating 'cpuid_entries' as a pointer when copying its contents to userspace buffer. Fixes: 255cbecfe0c9 ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Michael Roth <michael.roth@amd.com.com> Message-Id: <20210128024451.1816770-1-michael.roth@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-27Merge branch 'arm64/for-next/misc' into kvm-arm64/hyp-relocMarc Zyngier
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-01-25KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMXPaolo Bonzini
VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written"Sean Christopherson
Revert the dirty/available tracking of GPRs now that KVM copies the GPRs to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty. Per commit de3cd117ed2f ("KVM: x86: Omit caching logic for always-available GPRs"), tracking for GPRs noticeably impacts KVM's code footprint. This reverts commit 1c04d8c986567c27c56c05205dceadc92efb14ff. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guestSean Christopherson
Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the GRPs' dirty bits are set from time zero and never cleared, i.e. will always be seen as dirty. The obvious alternative would be to clear the dirty bits when appropriate, but removing the dirty checks is desirable as it allows reverting GPR dirty+available tracking, which adds overhead to all flavors of x86 VMs. Note, unconditionally writing the GPRs in the GHCB is tacitly allowed by the GHCB spec, which allows the hypervisor (or guest) to provide unnecessary info; it's the guest's responsibility to consume only what it needs (the hypervisor is untrusted after all). The guest and hypervisor can supply additional state if desired but must not rely on that additional state being provided. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migrationMaxim Levitsky
Even when we are outside the nested guest, some vmcs02 fields may not be in sync vs vmcs12. This is intentional, even across nested VM-exit, because the sync can be delayed until the nested hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those rarely accessed fields. However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to be able to restore it. To fix that, call copy_vmcs02_to_vmcs12_rare() before the vmcs12 contents are copied to userspace. Fixes: 7952d769c29ca ("KVM: nVMX: Sync rarely accessed guest fields only when needed") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25kvm: tracing: Fix unmatched kvm_entry and kvm_exit eventsLorenzo Brescia
On VMX, if we exit and then re-enter immediately without leaving the vmx_vcpu_run() function, the kvm_entry event is not logged. That means we will see one (or more) kvm_exit, without its (their) corresponding kvm_entry, as shown here: CPU-1979 [002] 89.871187: kvm_entry: vcpu 1 CPU-1979 [002] 89.871218: kvm_exit: reason MSR_WRITE CPU-1979 [002] 89.871259: kvm_exit: reason MSR_WRITE It also seems possible for a kvm_entry event to be logged, but then we leave vmx_vcpu_run() right away (if vmx->emulation_required is true). In this case, we will have a spurious kvm_entry event in the trace. Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run() (where trace_kvm_exit() already is). A trace obtained with this patch applied looks like this: CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395392: kvm_exit: reason MSR_WRITE CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0 CPU-14295 [000] 8388.395503: kvm_exit: reason EXTERNAL_INTERRUPT Of course, not calling trace_kvm_entry() in common x86 code any longer means that we need to adjust the SVM side of things too. Signed-off-by: Lorenzo Brescia <lorenzo.brescia@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: Documentation: Update description of KVM_{GET,CLEAR}_DIRTY_LOGZenghui Yu
Update various words, including the wrong parameter name and the vague description of the usage of "slot" field. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Message-Id: <20201208043439.895-1-yuzenghui@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: x86: get smi pending status correctlyJay Zhou
The injection process of smi has two steps: Qemu KVM Step1: cpu->interrupt_request &= \ ~CPU_INTERRUPT_SMI; kvm_vcpu_ioctl(cpu, KVM_SMI) call kvm_vcpu_ioctl_smi() and kvm_make_request(KVM_REQ_SMI, vcpu); Step2: kvm_vcpu_ioctl(cpu, KVM_RUN, 0) call process_smi() if kvm_check_request(KVM_REQ_SMI, vcpu) is true, mark vcpu->arch.smi_pending = true; The vcpu->arch.smi_pending will be set true in step2, unfortunately if vcpu paused between step1 and step2, the kvm_run->immediate_exit will be set and vcpu has to exit to Qemu immediately during step2 before mark vcpu->arch.smi_pending true. During VM migration, Qemu will get the smi pending status from KVM using KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status will be lost. Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Signed-off-by: Shengen Zhuang <zhuangshengen@huawei.com> Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[]Like Xu
The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as 0x0300 in the intel_perfmon_event_map[]. Correct its usage. Fixes: 62079d8a4312 ("KVM: PMU: add proper support for fixed counter 2") Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh()Like Xu
Since we know vPMU will not work properly when (1) the guest bit_width(s) of the [gp|fixed] counters are greater than the host ones, or (2) guest requested architectural events exceeds the range supported by the host, so we can setup a smaller left shift value and refresh the guest cpuid entry, thus fixing the following UBSAN shift-out-of-bounds warning: shift exponent 197 is too large for 64-bit type 'long long unsigned int' Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348 kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177 kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308 kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709 kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: x86: Add more protection against undefined behavior in rsvd_bits()Sean Christopherson
Add compile-time asserts in rsvd_bits() to guard against KVM passing in garbage hardcoded values, and cap the upper bound at '63' for dynamic values to prevent generating a mask that would overflow a u64. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210113204515.3473079-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VMQuentin Perret
The documentation classifies KVM_ENABLE_CAP with KVM_CAP_ENABLE_CAP_VM as a vcpu ioctl, which is incorrect. Fix it by specifying it as a VM ioctl. Fixes: e5d83c74a580 ("kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic") Signed-off-by: Quentin Perret <qperret@google.com> Message-Id: <20210108165349.747359-1-qperret@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-25Merge tag 'kvmarm-fixes-5.11-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #2 - Don't allow tagged pointers to point to memslots - Filter out ARMv8.1+ PMU events on v8.0 hardware - Hide PMU registers from userspace when no PMU is configured - More PMU cleanups - Don't try to handle broken PSCI firmware - More sys_reg() to reg_to_encoding() conversions
2021-01-25KVM: arm64: Implement the TRNG hypervisor callArd Biesheuvel
Provide a hypervisor implementation of the ARM architected TRNG firmware interface described in ARM spec DEN0098. All function IDs are implemented, including both 32-bit and 64-bit versions of the TRNG_RND service, which is the centerpiece of the API. The API is backed by the kernel's entropy pool only, to avoid guests draining more precious direct entropy sources. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> [Andre: minor fixes, drop arch_get_random() usage] Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210106103453.152275-6-andre.przywara@arm.com
2021-01-25KVM: arm64: Mark the page dirty only if the fault is handled successfullyYanan Wang
We now set the pfn dirty and mark the page dirty before calling fault handlers in user_mem_abort(), so we might end up having spurious dirty pages if update of permissions or mapping has failed. Let's move these two operations after the fault handlers, and they will be done only if the fault has been handled successfully. When an -EAGAIN errno is returned from the map handler, we hope to the vcpu to enter guest directly instead of exiting back to userspace, so adjust the return value at the end of function. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210114121350.123684-4-wangyanan55@huawei.com
2021-01-25KVM: arm64: Filter out the case of only changing permissions from stage-2 ↵Yanan Wang
map path (1) During running time of a a VM with numbers of vCPUs, if some vCPUs access the same GPA almost at the same time and the stage-2 mapping of the GPA has not been built yet, as a result they will all cause translation faults. The first vCPU builds the mapping, and the followed ones end up updating the valid leaf PTE. Note that these vCPUs might want different access permissions (RO, RW, RX, RWX, etc.). (2) It's inevitable that we sometimes will update an existing valid leaf PTE in the map path, and we perform break-before-make in this case. Then more unnecessary translation faults could be caused if the *break stage* of BBM is just catched by other vCPUS. With (1) and (2), something unsatisfactory could happen: vCPU A causes a translation fault and builds the mapping with RW permissions, vCPU B then update the valid leaf PTE with break-before-make and permissions are updated back to RO. Besides, *break stage* of BBM may trigger more translation faults. Finally, some useless small loops could occur. We can make some optimization to solve above problems: When we need to update a valid leaf PTE in the map path, let's filter out the case where this update only change access permissions, and don't update the valid leaf PTE here in this case. Instead, let the vCPU enter back the guest and it will exit next time to go through the relax_perms path without break-before-make if it still wants more permissions. Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210114121350.123684-3-wangyanan55@huawei.com
2021-01-25KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 mapYanan Wang
Procedures of hyp stage-1 map and guest stage-2 map are quite different, but they are tied closely by function kvm_set_valid_leaf_pte(). So adjust the relative code for ease of code maintenance in the future. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210114121350.123684-2-wangyanan55@huawei.com
2021-01-25KVM: arm64: Simplify __kvm_hyp_init HVC detectionAndrew Scull
The arguments for __do_hyp_init are now passed with a pointer to a struct which means there are scratch registers available for use. Thanks to this, we no longer need to use clever, but hard to read, tricks that avoid the need for scratch registers when checking for the __kvm_hyp_init HVC. Tested-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210125145415.122439-2-ascull@google.com
2021-01-25KVM: arm64: Don't clobber x4 in __do_hyp_initAndrew Scull
arm_smccc_1_1_hvc() only adds write contraints for x0-3 in the inline assembly for the HVC instruction so make sure those are the only registers that change when __do_hyp_init is called. Tested-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210125145415.122439-3-ascull@google.com
2021-01-23KVM: arm64: Remove hyp_symbol_addrDavid Brazdil
Hyp code used the hyp_symbol_addr helper to force PC-relative addressing because absolute addressing results in kernel VAs due to the way hyp code is linked. This is not true anymore, so remove the helper and update all of its users. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-9-dbrazdil@google.com
2021-01-23KVM: arm64: Remove patching of fn pointers in hypDavid Brazdil
Storing a function pointer in hyp now generates relocation information used at early boot to convert the address to hyp VA. The existing alternative-based conversion mechanism is therefore obsolete. Remove it and simplify its users. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-8-dbrazdil@google.com