summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2020-03-18x86: switch ia32_setup_sigcontext() to unsafe_put_user()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18x86: switch setup_sigcontext() to unsafe_put_user()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18x86: switch save_v86_state() to unsafe_put_user()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18x86: kill get_user_{try,catch,ex}Al Viro
no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18x86: get rid of get_user_ex() in restore_sigcontext()Al Viro
Just do copyin into a local struct and be done with that - we are on a shallow stack here. [reworked by tglx, removing the macro horrors while we are touching that] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18x86: get rid of get_user_ex() in ia32_restore_sigcontext()Al Viro
Just do copyin into a local struct and be done with that - we are on a shallow stack here. [reworked by tglx, removing the macro horrors while we are touching that] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18vm86: get rid of get_user_ex() useAl Viro
Just do a copyin of what we want into a local variable and be done with that. We are guaranteed to be on shallow stack here... Note that conditional expression for range passed to access_ok() in mainline had been pointless all along - the only difference between vm86plus_struct and vm86_struct is that the former has one extra field in the end and when we get to copyin of that field (conditional upon 'plus' argument), we use copy_from_user(). Moreover, all fields starting with ->int_revectored are copied that way, so we only need that check (be it done by access_ok() or by user_access_begin()) only on the beginning of the structure - the fields that used to be covered by that get_user_try() block. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18x86: get rid of small constant size cases in raw_copy_{to,from}_user()Al Viro
Very few call sites where that would be triggered remain, and none of those is anywhere near hot enough to bother. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18x86: switch sigframe sigset handling to explict __get_user()/__put_user()Al Viro
... and consolidate the definition of sigframe_ia32->extramask - it's always a 1-element array of 32bit unsigned. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-18KVM: x86: Code style cleanup in kvm_arch_dev_ioctl()Xiaoyao Li
In kvm_arch_dev_ioctl(), the brackets of case KVM_X86_GET_MCE_CAP_SUPPORTED accidently encapsulates case KVM_GET_MSR_FEATURE_INDEX_LIST and case KVM_GET_MSRS. It doesn't affect functionality but it's misleading. Remove unnecessary brackets and opportunistically add a "break" in the default path. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18KVM: x86: Add blurb to CPUID tracepoint when using max basic leaf valuesSean Christopherson
Tack on "used max basic" at the end of the CPUID tracepoint when the output values correspond to the max basic leaf, i.e. when emulating Intel's out-of-range CPUID behavior. Observing "cpuid entry not found" in the tracepoint with non-zero output values is confusing for users that aren't familiar with the out-of-range semantics, and qualifying the "not found" case hopefully makes it clear that "found" means "found the exact entry". Suggested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18KVM: x86: Add requested index to the CPUID tracepointSean Christopherson
Output the requested index when tracing CPUID emulation; it's basically mandatory for leafs where the index is meaningful, and is helpful for verifying KVM correctness even when the index isn't meaningful, e.g. the trace for a Linux guest's hypervisor_cpuid_base() probing appears to be broken (returns all zeroes) at first glance, but is correct because the index is non-zero, i.e. the output values correspond to a random index in the maximum basic leaf. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18KVM: nSVM: check for EFER.SVME=1 before entering guestPaolo Bonzini
EFER is set for L2 using svm_set_efer, which hardcodes EFER_SVME to 1 and hides an incorrect value for EFER.SVME in the L1 VMCB. Perform the check manually to detect invalid guest state. Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18x86: Fix bitops.h warning with a moved castJesse Brandeburg
Fix many sparse warnings when building with C=1. These are useless noise from the bitops.h file and getting rid of them helps developers make more use of the tools and possibly find real bugs. When the kernel is compiled with C=1, there are lots of messages like: arch/x86/include/asm/bitops.h:77:37: warning: cast truncates bits from constant value (ffffff7f becomes 7f) CONST_MASK() is using a signed integer "1" to create the mask which is later cast to (u8), in order to yield an 8-bit value for the assembly instructions to use. Simplify the expressions used to clearly indicate they are working on 8-bit values only, which still keeps sparse happy without an accidental promotion to a 32 bit integer. The warning was occurring because certain bitmasks that end with a bit set next to a natural boundary like 7, 15, 23, 31, end up with a mask like 0x7f, which then results in sign extension due to the integer type promotion rules[1]. It was really only clear_bit() that was having problems, and it was only on some bit checks that resulted in a mask like 0xffffff7f being generated after the inversion. Verify with a test module (see next patch) and assembly inspection that the fix doesn't introduce any change in generated code. [ bp: Massage. ] Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules [1] Link: https://lkml.kernel.org/r/20200310221747.2848474-1-jesse.brandeburg@intel.com
2020-03-18KVM: x86: Expose AVX512 VP2INTERSECT in cpuid for TGLZhenyu Wang
On Tigerlake new AVX512 VP2INTERSECT feature is available. This allows to expose it via KVM_GET_SUPPORTED_CPUID. Cc: "Zhong, Yang" <yang.zhong@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18KVM: nVMX: remove side effects from nested_vmx_exit_reflectedPaolo Bonzini
The name of nested_vmx_exit_reflected suggests that it's purely a test, but it actually marks VMCS12 pages as dirty. Move this to vmx_handle_exit, observing that the initial nested_run_pending check in nested_vmx_exit_reflected is pointless---nested_run_pending has just been cleared in vmx_vcpu_run and won't be set until handle_vmlaunch or handle_vmresume. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-17x86/boot: Fix comment spellingGeert Uytterhoeven
Fix misspelling of "disconnect". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-03-17x86/mm: Remove the now redundant N_MEMORY checkBaoquan He
In commit f70029bbaacb ("mm, memory_hotplug: drop CONFIG_MOVABLE_NODE") the dependency on CONFIG_MOVABLE_NODE was removed for N_MEMORY. Before, CONFIG_HIGHMEM && !CONFIG_MOVABLE_NODE could make (N_MEMORY == N_NORMAL_MEMORY) be true. After that commit, N_MEMORY cannot be equal to N_NORMAL_MEMORY. So the conditional check in paging_init() is not needed anymore, remove it. [ bp: Massage. ] Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lkml.kernel.org/r/20200311011823.27740-1-bhe@redhat.com
2020-03-17KVM: VMX: access regs array in vmenter.S in its natural orderUros Bizjak
Registers in "regs" array are indexed as rax/rcx/rdx/.../rsi/rdi/r8/... Reorder access to "regs" array in vmenter.S to follow its natural order. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-17perf/amd/uncore: Add support for Family 19h L3 PMUKim Phillips
Family 19h introduces change in slice, core and thread specification in its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is incompatible with Family 17h's version of the register. Introduce a new path in l3_thread_slice_mask() to do things differently for Family 19h vs. Family 17h, otherwise the new hardware doesn't get programmed correctly. Instead of a linear core--thread bitmask, Family 19h takes an encoded core number, and a separate thread mask. There are new bits that are set for all cores and all slices, of which only the latter is used, since the driver counts events for all slices on behalf of the specified CPU. Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode decision based on Family 17h or above, not just 17h and 18h: the Family 19h Data Fabric PMC is compatible with the Family 17h DF PMC. [ bp: Touchups. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.com
2020-03-17perf/amd/uncore: Make L3 thread mask code more readableKim Phillips
Convert the l3_thread_slice_mask() function to use the more readable topology_* helper functions, more intuitive variable names like shift and thread_mask, and BIT_ULL(). No functional changes. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.com
2020-03-17perf/amd/uncore: Prepare L3 thread mask code for Family 19hKim Phillips
In order to better accommodate the upcoming Family 19h, given the 80-char line limit, move the existing code into a new l3_thread_slice_mask() function. No functional changes. [ bp: Touchups. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.com
2020-03-17x86: Don't let pgprot_modify() change the page encryption bitThomas Hellstrom
When SEV or SME is enabled and active, vm_get_page_prot() typically returns with the encryption bit set. This means that users of pgprot_modify(, vm_get_page_prot()) (mprotect_fixup(), do_mmap()) end up with a value of vma->vm_pg_prot that is not consistent with the intended protection of the PTEs. This is also important for fault handlers that rely on the VMA vm_page_prot to set the page protection. Fix this by not allowing pgprot_modify() to change the encryption bit, similar to how it's done for PAT bits. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200304114527.3636-2-thomas_os@shipmail.org
2020-03-17x86/amd_nb, char/amd64-agp: Use amd_nb_num() accessorBorislav Petkov
... to find whether there are northbridges present on the system. Convert the last forgotten user and therefore, unexport amd_nb_misc_ids[] too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Michal Kubecek <mkubecek@suse.cz> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20200316150725.925-1-bp@alien8.de
2020-03-16Merge tag 'kvm-s390-next-5.7-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and Enhancements for 5.7 part1 1. Allow to disable gisa 2. protected virtual machines Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state like guest memory and guest registers anymore. Instead the PVMs are mostly managed by a new entity called Ultravisor (UV), which provides an API, so KVM and the PV can request management actions. PVMs are encrypted at rest and protected from hypervisor access while running. They switch from a normal operation into protected mode, so we can still use the standard boot process to load a encrypted blob and then move it into protected mode. Rebooting is only possible by passing through the unprotected/normal mode and switching to protected again. One mm related patch will go via Andrews mm tree ( mm/gup/writeback: add callbacks for inaccessible pages)
2020-03-16KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld()Vitaly Kuznetsov
nested_vmx_handle_enlightened_vmptrld() fails in two cases: - when we fail to kvm_vcpu_map() the supplied GPA - when revision_id is incorrect. Genuine Hyper-V raises #UD in the former case (at least with *some* incorrect GPAs) and does VMfailInvalid() in the later. KVM doesn't do anything so L1 just gets stuck retrying the same faulty VMLAUNCH. nested_vmx_handle_enlightened_vmptrld() has two call sites: nested_vmx_run() and nested_get_vmcs12_pages(). The former needs to queue do much: the failure there happens after migration when L2 was running (and L1 did something weird like wrote to VP assist page from a different vCPU), just kill L1 with KVM_EXIT_INTERNAL_ERROR. Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Squash kbuild autopatch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nVMX: stop abusing need_vmcs12_to_shadow_sync for eVMCS mappingVitaly Kuznetsov
When vmx_set_nested_state() happens, we may not have all the required data to map enlightened VMCS: e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not yet be restored so we need a postponed action. Currently, we (ab)use need_vmcs12_to_shadow_sync/nested_sync_vmcs12_to_shadow() for that but this is not ideal: - We may not need to sync anything if L2 is running - It is hard to propagate errors from nested_sync_vmcs12_to_shadow() as we call it from vmx_prepare_switch_to_guest() which happens just before we do VMLAUNCH, the code is not ready to handle errors there. Move eVMCS mapping to nested_get_vmcs12_pages() and request KVM_REQ_GET_VMCS12_PAGES, it seems to be is less abusive in nature. It would probably be possible to introduce a specialized KVM_REQ_EVMCS_MAP but it is undesirable to propagate eVMCS specifics all the way up to x86.c Note, we don't need to request KVM_REQ_GET_VMCS12_PAGES from vmx_set_nested_state() directly as nested_vmx_enter_non_root_mode() already does that. Requesting KVM_REQ_GET_VMCS12_PAGES is done to document the (non-obvious) side-effect and to be future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16Merge branch 'kvm-null-pointer-fix' into HEADPaolo Bonzini
2020-03-16KVM: nSVM: Remove an obsolete comment.Miaohe Lin
The function does not return bool anymore. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: X86: correct meaningless kvm_apicv_activated() checkPaolo Bonzini
After test_and_set_bit() for kvm->arch.apicv_inhibit_reasons, we will always get false when calling kvm_apicv_activated() because it's sure apicv_inhibit_reasons do not equal to 0. What the code wants to do, is check whether APICv was *already* active and if so skip the costly request; we can do this using cmpxchg. Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nVMX: Consolidate nested MTF checks to helper functionOliver Upton
commit 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation") introduced a helper to check the MTF VM-execution control in vmcs12. Change pre-existing check in nested_vmx_exit_reflected() to instead use the helper. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: VMX: Micro-optimize vmexit time when not exposing PMUWanpeng Li
PMU is not exposed to guest by most of products from cloud providers since the bad performance of PMU emulation and security concern. However, it calls perf_guest_switch_get_msrs() and clear_atomic_switch_msr() unconditionally even if PMU is not exposed to the guest before each vmentry. ~2% vmexit time reduced can be observed by kvm-unit-tests/vmexit.flat on my SKX server. Before patch: vmcall 1559 After patch: vmcall 1529 Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16kvm: svm: Introduce GA Log tracepoint for AVICSuravee Suthikulpanit
GA Log tracepoint is useful when debugging AVIC performance issue as it can be used with perf to count the number of times IOMMU AVIC injects interrupts through the slow-path instead of directly inject interrupts to the target vcpu. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nSVM: avoid loss of pending IRQ/NMI before entering L2Paolo Bonzini
This patch reproduces for nSVM the change that was made for nVMX in commit b5861e5cf2fc ("KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2"). While I do not have a test that breaks without it, I cannot see why it would not be necessary since all events are unblocked by VMRUN's setting of GIF back to 1. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nSVM: implement check_nested_events for interruptsPaolo Bonzini
The current implementation of physical interrupt delivery to a nested guest is quite broken. It relies on svm_interrupt_allowed returning false if VINTR=1 so that the interrupt can be injected from enable_irq_window, but this does not work for guests that do not intercept HLT or that rely on clearing the host IF to block physical interrupts while L2 runs. This patch can be split in two logical parts, but including only one breaks tests so I am combining both changes together. The first and easiest is simply to return true for svm_interrupt_allowed if HF_VINTR_MASK is set and HIF is set. This way the semantics of svm_interrupt_allowed are respected: svm_interrupt_allowed being false does not mean "call enable_irq_window", it means "interrupts cannot be injected now". After doing this, however, we need another place to inject the interrupt, and fortunately we already have one, check_nested_events, which nested SVM does not implement but which is meant exactly for this purpose. It is called before interrupts are injected, and it can therefore do the L2->L1 switch while leaving inject_pending_event none the wiser. This patch was developed together with Cathy Avery, who wrote the test and did a lot of the initial debugging. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1Paolo Bonzini
If a nested VM is started while an IRQ was pending and with V_INTR_MASKING=1, the behavior of the guest depends on host IF. If it is 1, the VM should exit immediately, before executing the first instruction of the guest, because VMRUN sets GIF back to 1. If it is 0 and the host has VGIF, however, at the time of the VMRUN instruction L0 is running the guest with a pending interrupt window request. This interrupt window request is completely irrelevant to L2, since IF only controls virtual interrupts, so this patch drops INTERCEPT_VINTR from the VMCB while running L2 under these circumstances. To simplify the code, both steps of enabling the interrupt window (setting the VINTR intercept and requesting a fake virtual interrupt in svm_inject_irq) are grouped in the svm_set_vintr function, and likewise for dismissing the interrupt window request in svm_clear_vintr. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nSVM: do not change host intercepts while nested VM is runningPaolo Bonzini
Instead of touching the host intercepts so that the bitwise OR in recalc_intercepts just works, mask away uninteresting intercepts directly in recalc_intercepts. This is cleaner and keeps the logic in one place even for intercepts that can change even while L2 is running. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgdPaolo Bonzini
The set_cr3 callback is not setting the guest CR3, it is setting the root of the guest page tables, either shadow or two-dimensional. To make this clearer as well as to indicate that the MMU calls it via kvm_mmu_load_cr3, rename it to load_mmu_pgd. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: unify callbacks to load paging rootPaolo Bonzini
Similar to what kvm-intel.ko is doing, provide a single callback that merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3. This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops. I'm doing that in this same patch because splitting it adds quite a bit of churn due to the need for forward declarations. For the same reason the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu from init_kvm_softmmu and nested_svm_init_mmu_context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Refactor kvm_cpuid() param that controls out-of-range logicSean Christopherson
Invert and rename the kvm_cpuid() param that controls out-of-range logic to better reflect the semantics of the affected callers, i.e. callers that bypass the out-of-range logic do so because they are looking up an exact guest CPUID entry, e.g. to query the maxphyaddr. Similarly, rename kvm_cpuid()'s internal "found" to "exact" to clarify that it tracks whether or not the exact requested leaf was found, as opposed to any usable leaf being found. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Refactor out-of-range logic to contain the madnessSean Christopherson
Move all of the out-of-range logic into a single helper, get_out_of_range_cpuid_entry(), to avoid an extra lookup of CPUID.0.0 and to provide a single location for documenting the out-of-range behavior. No functional change intended. Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Fix CPUID range checks for Hypervisor and Centaur classesSean Christopherson
Rework the masking in the out-of-range CPUID logic to handle the Hypervisor sub-classes, as well as the Centaur class if the guest virtual CPU vendor is Centaur. Masking against 0x80000000 only handles basic and extended leafs, which results in Hypervisor range checks being performed against the basic CPUID class, and Centuar range checks being performed against the Extended class. E.g. if CPUID.0x40000000.EAX returns 0x4000000A and there is no entry for CPUID.0x40000006, then function 0x40000006 would be incorrectly reported as out of bounds. While there is no official definition of what constitutes a class, the convention established for Hypervisor classes effectively uses bits 31:8 as the mask by virtue of checking for different bases in increments of 0x100, e.g. KVM advertises its CPUID functions starting at 0x40000100 when HyperV features are advertised at the default base of 0x40000000. The bad range check doesn't cause functional problems for any known VMM because out-of-range semantics only come into play if the exact entry isn't found, and VMMs either support a very limited Hypervisor range, e.g. the official KVM range is 0x40000000-0x40000001 (effectively no room for undefined leafs) or explicitly defines gaps to be zero, e.g. Qemu explicitly creates zeroed entries up to the Centaur and Hypervisor limits (the latter comes into play when providing HyperV features). The bad behavior can be visually confirmed by dumping CPUID output in the guest when running Qemu with a stable TSC, as Qemu extends the limit of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, without defining zeroed entries for 0x40000002 - 0x4000000f. Note, documentation of Centaur/VIA CPUs is hard to come by. Designating 0xc0000000 - 0xcfffffff as the Centaur class is a best guess as to the behavior of a real Centaur/VIA CPU. Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM x86: Extend AMD specific guest behavior to Hygon virtual CPUsSean Christopherson
Extend guest_cpuid_is_amd() to cover Hygon virtual CPUs and rename it accordingly. Hygon CPUs use an AMD-based core and so have the same basic behavior as AMD CPUs. Fixes: b8f4abb652146 ("x86/kvm: Add Hygon Dhyana support to KVM") Cc: Pu Wen <puwen@hygon.cn> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Add helpers to perform CPUID-based guest vendor checkSean Christopherson
Add helpers to provide CPUID-based guest vendor checks, i.e. to do the ugly register comparisons. Use the new helpers to check for an AMD guest vendor in guest_cpuid_is_amd() as well as in the existing emulator flows. Using the new helpers fixes a _very_ theoretical bug where guest_cpuid_is_amd() would get a false positive on a non-AMD virtual CPU with a vendor string beginning with "Auth" due to the previous logic only checking EBX. It also fixes a marginally less theoretically bug where guest_cpuid_is_amd() would incorrectly return false for a guest CPU with "AMDisbetter!" as its vendor string. Fixes: a0c0feb57992c ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Trace the original requested CPUID function in kvm_cpuid()Jan Kiszka
Trace the requested CPUID function instead of the effective function, e.g. if the requested function is out-of-range and KVM is emulating an Intel CPU, as the intent of the tracepoint is to show if the output came from the actual leaf as opposed to the max basic leaf via redirection. Similarly, leave "found" as is, i.e. report that an entry was found if and only if the requested entry was found. Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH") Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [Sean: Drop "found" semantic change, reword changelong accordingly ] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: CPUID: add support for supervisor statesPaolo Bonzini
Current CPUID 0xd enumeration code does not support supervisor states, because KVM only supports setting IA32_XSS to zero. Change it instead to use a new variable supported_xss, to be set from the hardware_setup callback which is in charge of CPU capabilities. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 codeSean Christopherson
Handle CPUID 0x8000000A in the main switch in __do_cpuid_func() and drop ->set_supported_cpuid() now that both VMX and SVM implementations are empty. Like leaf 0x14 (Intel PT) and leaf 0x8000001F (SEV), leaf 0x8000000A is is (obviously) vendor specific but can be queried in common code while respecting SVM's wishes by querying kvm_cpu_cap_has(). Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nSVM: Advertise and enable NRIPS for L1 iff nrips is enabledSean Christopherson
Set NRIPS in KVM capabilities if and only if nrips=true, which naturally incorporates the boot_cpu_has() check, and set nrips_enabled only if the KVM capability is enabled. Note, previously KVM would set nrips_enabled based purely on userspace input, but at worst that would cause KVM to propagate garbage into L1, i.e. userspace would simply be hosing its VM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nSVM: Expose SVM features to L1 iff nested is enabledSean Christopherson
Set SVM feature bits in KVM capabilities if and only if nested=true, KVM shouldn't advertise features that realistically can't be used. Use kvm_cpu_cap_has(X86_FEATURE_SVM) to indirectly query "nested" in svm_set_supported_cpuid() in anticipation of moving CPUID 0x8000000A adjustments into common x86 code. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move VMX's host_efer to common x86 codeSean Christopherson
Move host_efer to common x86 code and use it for CPUID's is_efer_nx() to avoid constantly re-reading the MSR. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>