summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2020-02-05kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.Suravee Suthikulpanit
AMD SVM AVIC accelerates EOI write and does not trap. This causes in-kernel PIT re-injection mode to fail since it relies on irq-ack notifier mechanism. So, APICv is activated only when in-kernel PIT is in discard mode e.g. w/ qemu option: -global kvm-pit.lost_tick_policy=discard Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this reason. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Temporarily deactivate AVIC during ExtINT handlingSuravee Suthikulpanit
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary deactivated and fall back to using legacy interrupt injection via vINTR and interrupt window. Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Deactivate AVIC when launching guest with nested SVM supportSuravee Suthikulpanit
Since AVIC does not currently work w/ nested virtualization, deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM] (i.e. indicate support for SVM, which is needed for nested virtualization). Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for this reason. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: hyperv: Use APICv update request interfaceSuravee Suthikulpanit
Since disabling APICv has to be done for all vcpus on AMD-based system, adopt the newly introduced kvm_request_apicv_update() interface, and introduce a new APICV_INHIBIT_REASON_HYPERV. Also, remove the kvm_vcpu_deactivate_apicv() since no longer used. Cc: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Add support for dynamic APICvSuravee Suthikulpanit
Add necessary logics to support (de)activate AVIC at runtime. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce x86 ops hook for pre-update APICvSuravee Suthikulpanit
AMD SVM AVIC needs to update APIC backing page mapping before changing APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl function hook to be called prior KVM APICv update request to each vcpu. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasonsSuravee Suthikulpanit
Inibit reason bits are used to determine if APICv deactivation is applicable for a particular hardware virtualization architecture. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: svm: avic: Add support for dynamic setup/teardown of virtual APIC ↵Suravee Suthikulpanit
backing page Re-factor avic_init_access_page() to avic_update_access_page() since activate/deactivate AVIC requires setting/unsetting the memory region used for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: svm: Add support to (de)activate posted interruptsSuravee Suthikulpanit
Introduce interface for (de)activate posted interrupts, and implement SVM hooks to toggle AMD IOMMU guest virtual APIC mode. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Add APICv (de)activate request trace pointsSuravee Suthikulpanit
Add trace points when sending request to (de)activate APICv. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Add support for dynamic APICv activationSuravee Suthikulpanit
Certain runtime conditions require APICv to be temporary deactivated during runtime. The current implementation only support run-time deactivation of APICv when Hyper-V SynIC is enabled, which is not temporary. In addition, for AMD, when APICv is (de)activated at runtime, all vcpus in the VM have to operate in the same mode. Thus the requesting vcpu must notify the others. So, introduce the following: * A new KVM_REQ_APICV_UPDATE request bit * Interfaces to request all vcpus to update APICv status * A new interface to update APICV-related parameters for each vcpu Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: remove get_enable_apicv from kvm_x86_opsPaolo Bonzini
It is unused now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce APICv inhibit reason bitsSuravee Suthikulpanit
There are several reasons in which a VM needs to deactivate APICv e.g. disable APICv via parameter during module loading, or when enable Hyper-V SynIC support. Additional inhibit reasons will be introduced later on when dynamic APICv is supported, Introduce KVM APICv inhibit reason bits along with a new variable, apicv_inhibit_reasons, to help keep track of APICv state for each VM, Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate the case where APICv is disabled during KVM module load. (e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: lapic: Introduce APICv update helper functionSuravee Suthikulpanit
Re-factor code into a helper function for setting lapic parameters when activate/deactivate APICv, and export the function for subsequent usage. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-31Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "This is the first batch of KVM changes. ARM: - cleanups and corner case fixes. PPC: - Bugfixes x86: - Support for mapping DAX areas with large nested page table entries. - Cleanups and bugfixes here too. A particularly important one is a fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is also a race condition which could be used in guest userspace to exploit the guest kernel, for which the embargo expired today. - Fast path for IPI delivery vmexits, shaving about 200 clock cycles from IPI latency. - Protect against "Spectre-v1/L1TF" (bring data in the cache via speculative out of bound accesses, use L1TF on the sibling hyperthread to read it), which unfortunately is an even bigger whack-a-mole game than SpectreV1. Sean continues his mission to rewrite KVM. In addition to a sizable number of x86 patches, this time he contributed a pretty large refactoring of vCPU creation that affects all architectures but should not have any visible effect. s390 will come next week together with some more x86 patches" * tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) x86/KVM: Clean up host's steal time structure x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed x86/kvm: Cache gfn to pfn translation x86/kvm: Introduce kvm_(un)map_gfn() x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit KVM: PPC: Book3S PR: Fix -Werror=return-type build failure KVM: PPC: Book3S HV: Release lock on page-out failure path KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer KVM: arm64: pmu: Only handle supported event counters KVM: arm64: pmu: Fix chained SW_INCR counters KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset KVM: x86: Use a typedef for fastop functions KVM: X86: Add 'else' to unify fastop and execute call path KVM: x86: inline memslot_valid_for_gpte KVM: x86/mmu: Use huge pages for DAX-backed files KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte() KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust() KVM: x86/mmu: Zap any compound page when collapsing sptes KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch) ...
2020-01-30Merge branch 'cve-2019-3016' into kvm-next-5.6Paolo Bonzini
From Boris Ostrovsky: The KVM hypervisor may provide a guest with ability to defer remote TLB flush when the remote VCPU is not running. When this feature is used, the TLB flush will happen only when the remote VPCU is scheduled to run again. This will avoid unnecessary (and expensive) IPIs. Under certain circumstances, when a guest initiates such deferred action, the hypervisor may miss the request. It is also possible that the guest may mistakenly assume that it has already marked remote VCPU as needing a flush when in fact that request had already been processed by the hypervisor. In both cases this will result in an invalid translation being present in a vCPU, potentially allowing accesses to memory locations in that guest's address space that should not be accessible. Note that only intra-guest memory is vulnerable. The five patches address both of these problems: 1. The first patch makes sure the hypervisor doesn't accidentally clear a guest's remote flush request 2. The rest of the patches prevent the race between hypervisor acknowledging a remote flush request and guest issuing a new one. Conflicts: arch/x86/kvm/x86.c [move from kvm_arch_vcpu_free to kvm_arch_vcpu_destroy]
2020-01-30x86/KVM: Clean up host's steal time structureBoris Ostrovsky
Now that we are mapping kvm_steal_time from the guest directly we don't need keep a copy of it in kvm_vcpu_arch.st. The same is true for the stime field. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missedBoris Ostrovsky
There is a potential race in record_steal_time() between setting host-local vcpu->arch.st.steal.preempted to zero (i.e. clearing KVM_VCPU_PREEMPTED) and propagating this value to the guest with kvm_write_guest_cached(). Between those two events the guest may still see KVM_VCPU_PREEMPTED in its copy of kvm_steal_time, set KVM_VCPU_FLUSH_TLB and assume that hypervisor will do the right thing. Which it won't. Instad of copying, we should map kvm_steal_time and that will guarantee atomicity of accesses to @preempted. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/kvm: Cache gfn to pfn translationBoris Ostrovsky
__kvm_map_gfn()'s call to gfn_to_pfn_memslot() is * relatively expensive * in certain cases (such as when done from atomic context) cannot be called Stashing gfn-to-pfn mapping should help with both cases. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bitBoris Ostrovsky
kvm_steal_time_set_preempted() may accidentally clear KVM_VCPU_FLUSH_TLB bit if it is called more than once while VCPU is preempted. This is part of CVE-2019-3016. (This bug was also independently discovered by Jim Mattson <jmattson@google.com>) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-28Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu-features updates from Ingo Molnar: "The biggest change in this cycle was a large series from Sean Christopherson to clean up the handling of VMX features. This both fixes bugs/inconsistencies and makes the code more coherent and future-proof. There are also two cleanups and a minor TSX syslog messages enhancement" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/cpu: Remove redundant cpu_detect_cache_sizes() call x86/cpu: Print "VMX disabled" error message iff KVM is enabled KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs perf/x86: Provide stubs of KVM helpers for non-Intel CPUs KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits KVM: VMX: Check for full VMX support when verifying CPU compatibility KVM: VMX: Use VMX feature flag to query BIOS enabling KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl() x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_* x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs x86/vmx: Introduce VMX_FEATURES_* x86/cpu: Clear VMX feature flag if VMX is not fully enabled x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization x86/centaur: Use common IA32_FEAT_CTL MSR initialization x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked x86/intel: Initialize IA32_FEAT_CTL MSR at boot tools/x86: Sync msr-index.h from kernel sources selftests, kvm: Replace manual MSR defs with common msr-index.h ...
2020-01-27KVM: x86: Use a typedef for fastop functionsSean Christopherson
Add a typedef to for the fastop function prototype to make the code more readable. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: X86: Add 'else' to unify fastop and execute call pathMiaohe Lin
It also helps eliminate some duplicated code. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86: inline memslot_valid_for_gptePaolo Bonzini
The function now has a single caller, so there is no point in keeping it separate. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Use huge pages for DAX-backed filesSean Christopherson
Walk the host page tables to identify hugepage mappings for ZONE_DEVICE pfns, i.e. DAX pages. Explicitly query kvm_is_zone_device_pfn() when deciding whether or not to bother walking the host page tables, as DAX pages do not set up the head/tail infrastructure, i.e. will return false for PageCompound() even when using huge pages. Zap ZONE_DEVICE sptes when disabling dirty logging, e.g. if live migration fails, to allow KVM to rebuild large pages for DAX-based mappings. Presumably DAX favors large pages, and worst case scenario is a minor performance hit as KVM will need to re-fault all DAX-based pages. Suggested-by: Barret Rhoden <brho@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Zeng <jason.zeng@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: linux-nvdimm <linux-nvdimm@lists.01.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()Sean Christopherson
Remove the late "lpage is disallowed" check from set_spte() now that the initial check is performed after acquiring mmu_lock. Fold the guts of the remaining helper, __mmu_gfn_lpage_is_disallowed(), into kvm_mmu_hugepage_adjust() to eliminate the unnecessary slot !NULL check. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()Sean Christopherson
Fold max_mapping_level() into kvm_mmu_hugepage_adjust() now that HugeTLB mappings are handled in kvm_mmu_hugepage_adjust(), i.e. there isn't a need to pre-calculate the max mapping level. Co-locating all hugepage checks eliminates a memslot lookup, at the cost of performing the __mmu_gfn_lpage_is_disallowed() checks while holding mmu_lock. The latency of lpage_is_disallowed() is likely negligible relative to the rest of the code run while holding mmu_lock, and can be offset to some extent by eliminating the mmu_gfn_lpage_is_disallowed() check in set_spte() in a future patch. Eliminating the check in set_spte() is made possible by performing the initial lpage_is_disallowed() checks while holding mmu_lock. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Zap any compound page when collapsing sptesSean Christopherson
Zap any compound page, e.g. THP or HugeTLB pages, when zapping sptes that can potentially be converted to huge sptes after disabling dirty logging on the associated memslot. Note, this approach could result in false positives, e.g. if a random compound page is mapped into the guest, but mapping non-huge compound pages into the guest is far from the norm, and toggling dirty logging is not a frequent operation. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)Sean Christopherson
Remove logic to retrieve the original gfn now that HugeTLB mappings are are identified in FNAME(fetch), i.e. FNAME(page_fault) no longer adjusts the level or gfn. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Rely on host page tables to find HugeTLB mappingsSean Christopherson
Remove KVM's HugeTLB specific logic and instead rely on walking the host page tables (already done for THP) to identify HugeTLB mappings. Eliminating the HugeTLB-only logic avoids taking mmap_sem and calling find_vma() for all hugepage compatible page faults, and simplifies KVM's page fault code by consolidating all hugepage adjustments into a common helper. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Drop level optimization from fast_page_fault()Sean Christopherson
Remove fast_page_fault()'s optimization to stop the shadow walk if the iterator level drops below the intended map level. The intended map level is only acccurate for HugeTLB mappings (THP mappings are detected after fast_page_fault()), i.e. it's not required for correctness, and a future patch will also move HugeTLB mapping detection to after fast_page_fault(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Walk host page tables to find THP mappingsSean Christopherson
Explicitly walk the host page tables to identify THP mappings instead of relying solely on the metadata in struct page. This sets the stage for using a common method of identifying huge mappings regardless of the underlying implementation (HugeTLB vs THB vs DAX), and hopefully avoids the pitfalls of relying on metadata to identify THP mappings, e.g. see commit 169226f7e0d2 ("mm: thp: handle page cache THP correctly in PageTransCompoundMap") and the need for KVM to explicitly check for a THP compound page. KVM will also naturally work with 1gb THP pages, if they are ever supported. Walking the tables for THP mappings is likely marginally slower than querying metadata, but a future patch will reuse the walk to identify HugeTLB mappings, at which point eliminating the existing VMA lookup for HugeTLB will make this a net positive. Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Refactor THP adjust to prep for changing querySean Christopherson
Refactor transparent_hugepage_adjust() in preparation for walking the host page tables to identify hugepage mappings, initially for THP pages, and eventualy for HugeTLB and DAX-backed pages as well. The latter cases support 1gb pages, i.e. the adjustment logic needs access to the max allowed level. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: Use vcpu-specific gva->hva translation when querying host page sizeSean Christopherson
Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the correct set of memslots is used when handling x86 page faults in SMM. Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27mm: thp: KVM: Explicitly check for THP when populating secondary MMUSean Christopherson
Add a helper, is_transparent_hugepage(), to explicitly check whether a compound page is a THP and use it when populating KVM's secondary MMU. The explicit check fixes a bug where a remapped compound page, e.g. for an XDP Rx socket, is mapped into a KVM guest and is mistaken for a THP, which results in KVM incorrectly creating a huge page in its secondary MMU. Fixes: 936a5fe6e6148 ("thp: kvm mmu transparent hugepage support") Reported-by: syzbot+c9d1fb51ac9d0d10c39d@syzkaller.appspotmail.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86/mmu: Enforce max_level on HugeTLB mappingsSean Christopherson
Limit KVM's mapping level for HugeTLB based on its calculated max_level. The max_level check prior to invoking host_mapping_level() only filters out the case where KVM cannot create a 2mb mapping, it doesn't handle the scenario where KVM can create a 2mb but not 1gb mapping, and the host is using a 1gb HugeTLB mapping. Fixes: 2f57b7051fe8 ("KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27kvm/x86: export kvm_vector_hashing_enabled() is unnecessaryPeng Hao
kvm_vector_hashing_enabled() is just called in kvm.ko module. Signed-off-by: Peng Hao <richard.peng@oppo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: VMX: remove duplicated segment cache clearMiaohe Lin
vmx_set_segment() clears segment cache unconditionally, so we should not clear it again by calling vmx_segment_cache_clear(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27Adding 'else' to reduce checking.Haiwei Li
These two conditions are in conflict, adding 'else' to reduce checking. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: nVMX: Check GUEST_DR7 on vmentry of nested guestsKrish Sadhukhan
According to section "Checks on Guest Control Registers, Debug Registers, and and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry of nested guests: If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7 field must be 0. In KVM, GUEST_DR7 is set prior to the vmcs02 VM-entry by kvm_set_dr() and the latter synthesizes a #GP if any bit in the high dword in the former is set. Hence this field needs to be checked in software. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: X86: Drop x86_set_memory_region()Peter Xu
The helper x86_set_memory_region() is only used in vmx_set_tss_addr() and kvm_arch_destroy_vm(). Push the lock upper in both cases. With that, drop x86_set_memory_region(). This prepares to allow __x86_set_memory_region() to return a HVA mapped, because the HVA will need to be protected by the lock too even after __x86_set_memory_region() returns. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: X86: Don't take srcu lock in init_rmode_identity_map()Peter Xu
We've already got the slots_lock, so we should be safe. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27x86/kvm/hyper-v: remove stale evmcs_already_enabled check from ↵Vitaly Kuznetsov
nested_enable_evmcs() In nested_enable_evmcs() evmcs_already_enabled check doesn't really do anything: controls are already sanitized and we return '0' regardless. Just drop the check. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86: Perform non-canonical checks in 32-bit KVMSean Christopherson
Remove the CONFIG_X86_64 condition from the low level non-canonical helpers to effectively enable non-canonical checks on 32-bit KVM. Non-canonical checks are performed by hardware if the CPU *supports* 64-bit mode, whether or not the CPU is actually in 64-bit mode is irrelevant. For the most part, skipping non-canonical checks on 32-bit KVM is ok-ish because 32-bit KVM always (hopefully) drops bits 63:32 of whatever value it's checking before propagating it to hardware, and architecturally, the expected behavior for the guest is a bit of a grey area since the vCPU itself doesn't support 64-bit mode. I.e. a 32-bit KVM guest can observe the missed checks in several paths, e.g. INVVPID and VM-Enter, but it's debatable whether or not the missed checks constitute a bug because technically the vCPU doesn't support 64-bit mode. The primary motivation for enabling the non-canonical checks is defense in depth. As mentioned above, a guest can trigger a missed check via INVVPID or VM-Enter. INVVPID is straightforward as it takes a 64-bit virtual address as part of its 128-bit INVVPID descriptor and fails if the address is non-canonical, even if INVVPID is executed in 32-bit PM. Nested VM-Enter is a bit more convoluted as it requires the guest to write natural width VMCS fields via memory accesses and then VMPTRLD the VMCS, but it's still possible. In both cases, KVM is saved from a true bug only because its flows that propagate values to hardware (correctly) take "unsigned long" parameters and so drop bits 63:32 of the bad value. Explicitly performing the non-canonical checks makes it less likely that a bad value will be propagated to hardware, e.g. in the INVVPID case, if __invvpid() didn't implicitly drop bits 63:32 then KVM would BUG() on the resulting unexpected INVVPID failure due to hardware rejecting the non-canonical address. The only downside to enabling the non-canonical checks is that it adds a relatively small amount of overhead, but the affected flows are not hot paths, i.e. the overhead is negligible. Note, KVM technically could gate the non-canonical checks on 32-bit KVM with static_cpu_has(X86_FEATURE_LM), but on bare metal that's an even bigger waste of code for everyone except the 0.00000000000001% of the population running on Yonah, and nested 32-bit on 64-bit already fudges things with respect to 64-bit CPU behavior. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Also do so in nested_vmx_check_host_state as reported by Krish. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: nVMX: WARN on failure to set IA32_PERF_GLOBAL_CTRLOliver Upton
Writes to MSR_CORE_PERF_GLOBAL_CONTROL should never fail if the VM-exit and VM-entry controls are exposed to L1. Promote the checks to perform a full WARN if kvm_set_msr() fails and remove the now unused macro SET_MSR_OR_WARN(). Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86: Remove unused ctxt param from emulator's FPU accessorsSean Christopherson
Remove an unused struct x86_emulate_ctxt * param from low level helpers used to access guest FPU state. The unused param was left behind by commit 6ab0b9feb82a ("x86,kvm: remove KVM emulator get_fpu / put_fpu"). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86: Revert "KVM: X86: Fix fpu state crash in kvm guest"Sean Christopherson
Reload the current thread's FPU state, which contains the guest's FPU state, to the CPU registers if necessary during vcpu_enter_guest(). TIF_NEED_FPU_LOAD can be set any time control is transferred out of KVM, e.g. if I/O is triggered during a KVM call to get_user_pages() or if a softirq occurs while KVM is scheduled in. Moving the handling of TIF_NEED_FPU_LOAD from vcpu_enter_guest() to kvm_arch_vcpu_load(), effectively kvm_sched_in(), papered over a bug where kvm_put_guest_fpu() failed to account for TIF_NEED_FPU_LOAD. The easiest way to the kvm_put_guest_fpu() bug was to run with involuntary preemption enable, thus handling TIF_NEED_FPU_LOAD during kvm_sched_in() made the bug go away. But, removing the handling in vcpu_enter_guest() exposed KVM to the rare case of a softirq triggering kernel_fpu_begin() between vcpu_load() and vcpu_enter_guest(). Now that kvm_{load,put}_guest_fpu() correctly handle TIF_NEED_FPU_LOAD, revert the commit to both restore the vcpu_enter_guest() behavior and eliminate the superfluous switch_fpu_return() in kvm_arch_vcpu_load(). Note, leaving the handling in kvm_arch_vcpu_load() isn't wrong per se, but it is unnecessary, and most critically, makes it extremely difficult to find bugs such as the kvm_put_guest_fpu() issue due to shrinking the window where a softirq can corrupt state. A sample trace triggered by warning if TIF_NEED_FPU_LOAD is set while vcpu state is loaded: <IRQ> gcmaes_crypt_by_sg.constprop.12+0x26e/0x660 ? 0xffffffffc024547d ? __qdisc_run+0x83/0x510 ? __dev_queue_xmit+0x45e/0x990 ? ip_finish_output2+0x1a8/0x570 ? fib4_rule_action+0x61/0x70 ? fib4_rule_action+0x70/0x70 ? fib_rules_lookup+0x13f/0x1c0 ? helper_rfc4106_decrypt+0x82/0xa0 ? crypto_aead_decrypt+0x40/0x70 ? crypto_aead_decrypt+0x40/0x70 ? crypto_aead_decrypt+0x40/0x70 ? esp_output_tail+0x8f4/0xa5a [esp4] ? skb_ext_add+0xd3/0x170 ? xfrm_input+0x7a6/0x12c0 ? xfrm4_rcv_encap+0xae/0xd0 ? xfrm4_transport_finish+0x200/0x200 ? udp_queue_rcv_one_skb+0x1ba/0x460 ? udp_unicast_rcv_skb.isra.63+0x72/0x90 ? __udp4_lib_rcv+0x51b/0xb00 ? ip_protocol_deliver_rcu+0xd2/0x1c0 ? ip_local_deliver_finish+0x44/0x50 ? ip_local_deliver+0xe0/0xf0 ? ip_protocol_deliver_rcu+0x1c0/0x1c0 ? ip_rcv+0xbc/0xd0 ? ip_rcv_finish_core.isra.19+0x380/0x380 ? __netif_receive_skb_one_core+0x7e/0x90 ? netif_receive_skb_internal+0x3d/0xb0 ? napi_gro_receive+0xed/0x150 ? 0xffffffffc0243c77 ? net_rx_action+0x149/0x3b0 ? __do_softirq+0xe4/0x2f8 ? handle_irq_event_percpu+0x6a/0x80 ? irq_exit+0xe6/0xf0 ? do_IRQ+0x7f/0xd0 ? common_interrupt+0xf/0xf </IRQ> ? irq_entries_start+0x20/0x660 ? vmx_get_interrupt_shadow+0x2f0/0x710 [kvm_intel] ? kvm_set_msr_common+0xfc7/0x2380 [kvm] ? recalibrate_cpu_khz+0x10/0x10 ? ktime_get+0x3a/0xa0 ? kvm_arch_vcpu_ioctl_run+0x107/0x560 [kvm] ? kvm_init+0x6bf/0xd00 [kvm] ? __seccomp_filter+0x7a/0x680 ? do_vfs_ioctl+0xa4/0x630 ? security_file_ioctl+0x32/0x50 ? ksys_ioctl+0x60/0x90 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x5f/0x1a0 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 9564a1ccad733a90 ]--- This reverts commit e751732486eb3f159089a64d1901992b1357e7cc. Fixes: e751732486eb3 ("KVM: X86: Fix fpu state crash in kvm guest") Reported-by: Derek Yerger <derek@djy.llc> Reported-by: kernel@najdan.com Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Thomas Lambertz <mail@thomaslambertz.de> Cc: Rik van Riel <riel@surriel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86: Ensure guest's FPU state is loaded when accessing for emulationSean Christopherson
Lock the FPU regs and reload the current thread's FPU state, which holds the guest's FPU state, to the CPU registers if necessary prior to accessing guest FPU state as part of emulation. kernel_fpu_begin() can be called from softirq context, therefore KVM must ensure softirqs are disabled (locking the FPU regs disables softirqs) when touching CPU FPU state. Note, for all intents and purposes this reverts commit 6ab0b9feb82a7 ("x86,kvm: remove KVM emulator get_fpu / put_fpu"), but at the time it was applied, removing get/put_fpu() was correct. The re-introduction of {get,put}_fpu() is necessitated by the deferring of FPU state load. Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: x86: Handle TIF_NEED_FPU_LOAD in kvm_{load,put}_guest_fpu()Sean Christopherson
Handle TIF_NEED_FPU_LOAD similar to how fpu__copy() handles the flag when duplicating FPU state to a new task struct. TIF_NEED_FPU_LOAD can be set any time control is transferred out of KVM, be it voluntarily, e.g. if I/O is triggered during a KVM call to get_user_pages, or involuntarily, e.g. if softirq runs after an IRQ occurs. Therefore, KVM must account for TIF_NEED_FPU_LOAD whenever it is (potentially) accessing CPU FPU state. Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27Revert "KVM: x86: Add a WARN on TIF_NEED_FPU_LOAD in kvm_load_guest_fpu()"Paolo Bonzini
This reverts commit 95145c25a78cc0a9d3cbc75708abde432310c5a1. The next few patches will fix the issue so the warning is not needed anymore; revert it separately to simplify application to stable kernels. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>