From 34a83deac31cd9fdecef331578422095af2db4b0 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 11 May 2023 16:33:48 -0700 Subject: KVM: x86: Use MTRR macros to define possible MTRR MSR ranges Use the MTRR macros to identify the ranges of possible MTRR MSRs instead of bounding the ranges with a mismash of open coded values and unrelated MSR indices. Carving out the gap for the machine check MSRs in particular is confusing, as it's easy to incorrectly think the case statement handles MCE MSRs instead of skipping them. Drop the range-based funneling of MSRs between the end of the MCE MSRs and MTRR_DEF_TYPE, i.e. 0x2A0-0x2FF, and instead handle MTTR_DEF_TYPE as the one-off case that it is. Extract PAT (0x277) as well in anticipation of dropping PAT "handling" from the MTRR code. Keep the range-based handling for the variable+fixed MTRRs even though capturing unknown MSRs 0x214-0x24F is arguably "wrong". There is a gap in the fixed MTRRs, 0x260-0x267, i.e. the MTRR code needs to filter out unknown MSRs anyways, and using a single range generates marginally better code for the big switch statement. Reviewed-by: Kai Huang Link: https://lore.kernel.org/r/20230511233351.635053-6-seanjc@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c0778ca39650..8017e0eb1e67 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3702,8 +3702,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; } break; - case 0x200 ... MSR_IA32_MC0_CTL2 - 1: - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff: + case MSR_IA32_CR_PAT: + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: + case MSR_MTRRdefType: return kvm_mtrr_set_msr(vcpu, msr, data); case MSR_IA32_APICBASE: return kvm_set_apic_base(vcpu, msr_info); @@ -4110,9 +4111,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset; break; } + case MSR_IA32_CR_PAT: case MSR_MTRRcap: - case 0x200 ... MSR_IA32_MC0_CTL2 - 1: - case MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) ... 0x2ff: + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: + case MSR_MTRRdefType: return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data); case 0xcd: /* fsb frequency */ msr_info->data = 3; -- cgit From bc7fe2f0b7511ef6bb2765b6e1f923da449e00ef Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 11 May 2023 16:33:49 -0700 Subject: KVM: x86: Move PAT MSR handling out of mtrr.c Drop handling of MSR_IA32_CR_PAT from mtrr.c now that SVM and VMX handle writes without bouncing through kvm_set_msr_common(). PAT isn't truly an MTRR even though it affects memory types, and more importantly KVM enables hardware virtualization of guest PAT (by NOT setting "ignore guest PAT") when a guest has non-coherent DMA, i.e. KVM doesn't need to zap SPTEs when the guest PAT changes. The read path is and always has been trivial, i.e. burying it in the MTRR code does more harm than good. WARN and continue for the PAT case in kvm_set_msr_common(), as that code is _currently_ reached if and only if KVM is buggy. Defer cleaning up the lack of symmetry between the read and write paths to a future patch. Reviewed-by: Kai Huang Link: https://lore.kernel.org/r/20230511233351.635053-7-seanjc@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8017e0eb1e67..902c9935979f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3703,6 +3703,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } break; case MSR_IA32_CR_PAT: + /* + * Writes to PAT should be handled by vendor code as both SVM + * and VMX track the guest's PAT in the VMCB/VMCS. + */ + WARN_ON_ONCE(1); + + if (!kvm_pat_valid(data)) + return 1; + + vcpu->arch.pat = data; + break; case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: case MSR_MTRRdefType: return kvm_mtrr_set_msr(vcpu, msr, data); @@ -4112,6 +4123,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; } case MSR_IA32_CR_PAT: + msr_info->data = vcpu->arch.pat; + break; case MSR_MTRRcap: case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: case MSR_MTRRdefType: -- cgit From dee321977a230802a5065af4ad4f4f5e8558a738 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 11 May 2023 16:33:51 -0700 Subject: KVM: x86: Move common handling of PAT MSR writes to kvm_set_msr_common() Move the common check-and-set handling of PAT MSR writes out of vendor code and into kvm_set_msr_common(). This aligns writes with reads, which are already handled in common code, i.e. makes the handling of reads and writes symmetrical in common code. Alternatively, the common handling in kvm_get_msr_common() could be moved to vendor code, but duplicating code is generally undesirable (even though the duplicatated code is trivial in this case), and guest writes to PAT should be rare, i.e. the overhead of the extra function call is a non-issue in practice. Suggested-by: Kai Huang Reviewed-by: Kai Huang Link: https://lore.kernel.org/r/20230511233351.635053-9-seanjc@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 6 ------ 1 file changed, 6 deletions(-) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 902c9935979f..5ad55ef71433 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3703,12 +3703,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } break; case MSR_IA32_CR_PAT: - /* - * Writes to PAT should be handled by vendor code as both SVM - * and VMX track the guest's PAT in the VMCB/VMCS. - */ - WARN_ON_ONCE(1); - if (!kvm_pat_valid(data)) return 1; -- cgit From 02f1b0b736606f9870595b3089d9c124f9da8be9 Mon Sep 17 00:00:00 2001 From: Chao Gao Date: Wed, 24 May 2023 14:16:32 +0800 Subject: KVM: x86: Correct the name for skipping VMENTER l1d flush There is no VMENTER_L1D_FLUSH_NESTED_VM. It should be ARCH_CAP_SKIP_VMENTRY_L1DFLUSH. Signed-off-by: Chao Gao Reviewed-by: Xiaoyao Li Link: https://lore.kernel.org/r/20230524061634.54141-3-chao.gao@intel.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5ad55ef71433..7c7be4815eaa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1631,7 +1631,7 @@ static u64 kvm_get_arch_capabilities(void) * If we're doing cache flushes (either "always" or "cond") * we will do one whenever the guest does a vmlaunch/vmresume. * If an outer hypervisor is doing the cache flush for us - * (VMENTER_L1D_FLUSH_NESTED_VM), we can safely pass that + * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that * capability to the guest too, and if EPT is disabled we're not * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will * require a nested hypervisor to do a flush of its own. -- cgit From 0a8a5f2c8c266e9d94fb45f76a26cff135d0051c Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 1 Jun 2023 18:15:17 -0700 Subject: KVM: x86: Use standard mmu_notifier invalidate hooks for APIC access page Now that KVM honors past and in-progress mmu_notifier invalidations when reloading the APIC-access page, use KVM's "standard" invalidation hooks to trigger a reload and delete the one-off usage of invalidate_range(). Aside from eliminating one-off code in KVM, dropping KVM's use of invalidate_range() will allow common mmu_notifier to redefine the API to be more strictly focused on invalidating secondary TLBs that share the primary MMU's page tables. Suggested-by: Jason Gunthorpe Cc: Alistair Popple Cc: Robin Murphy Reviewed-by: Alistair Popple Reviewed-by: Paolo Bonzini Link: https://lore.kernel.org/r/20230602011518.787006-3-seanjc@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 14 -------------- 1 file changed, 14 deletions(-) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c0778ca39650..f962b7e3487e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10435,20 +10435,6 @@ static void vcpu_load_eoi_exitmap(struct kvm_vcpu *vcpu) vcpu, (u64 *)vcpu->arch.ioapic_handled_vectors); } -void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, - unsigned long start, unsigned long end) -{ - unsigned long apic_address; - - /* - * The physical address of apic access page is stored in the VMCS. - * Update it when it becomes invalid. - */ - apic_address = gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); - if (start <= apic_address && apic_address < end) - kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); -} - void kvm_arch_guest_memory_reclaimed(struct kvm *kvm) { static_call_cond(kvm_x86_guest_memory_reclaimed)(kvm); -- cgit From 056b9919a16aedc560e6756a235ef5a62a68db05 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 1 Jun 2023 18:05:50 -0700 Subject: KVM: x86: Use cpu_feature_enabled() for PKU instead of #ifdef Replace an #ifdef on CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS with a cpu_feature_enabled() check on X86_FEATURE_PKU. The macro magic of DISABLED_MASK_BIT_SET() means that cpu_feature_enabled() provides the same end result (no code generated) when PKU is disabled by Kconfig. No functional change intended. Cc: Jon Kohler Reviewed-by: Jon Kohler Link: https://lore.kernel.org/r/20230602010550.785722-1-seanjc@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7c7be4815eaa..6f461afed800 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1017,13 +1017,11 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss); } -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS - if (static_cpu_has(X86_FEATURE_PKU) && + if (cpu_feature_enabled(X86_FEATURE_PKU) && vcpu->arch.pkru != vcpu->arch.host_pkru && ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) write_pkru(vcpu->arch.pkru); -#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ } EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); @@ -1032,15 +1030,13 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) if (vcpu->arch.guest_state_protected) return; -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS - if (static_cpu_has(X86_FEATURE_PKU) && + if (cpu_feature_enabled(X86_FEATURE_PKU) && ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) { vcpu->arch.pkru = rdpkru(); if (vcpu->arch.pkru != vcpu->arch.host_pkru) write_pkru(vcpu->arch.host_pkru); } -#endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) { -- cgit From e12fa4b92a07f4274ffa02e581d059a7869dd50e Mon Sep 17 00:00:00 2001 From: Michal Luczaj Date: Mon, 5 Jun 2023 22:01:21 +0200 Subject: KVM: x86: Clean up: remove redundant bool conversions As test_bit() returns bool, explicitly converting result to bool is unnecessary. Get rid of '!!'. No functional change intended. Suggested-by: Sean Christopherson Signed-off-by: Michal Luczaj Link: https://lore.kernel.org/r/20230605200158.118109-1-mhal@rbox.co Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6f461afed800..355f3df8dbad 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1805,7 +1805,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type) unsigned long *bitmap = ranges[i].bitmap; if ((index >= start) && (index < end) && (flags & type)) { - allowed = !!test_bit(index - start, bitmap); + allowed = test_bit(index - start, bitmap); break; } } -- cgit From 4a2771895ca63a055df815be5e307cce8e85308c Mon Sep 17 00:00:00 2001 From: Like Xu Date: Fri, 2 Jun 2023 18:10:57 -0700 Subject: KVM: x86/svm/pmu: Add AMD PerfMonV2 support If AMD Performance Monitoring Version 2 (PerfMonV2) is detected by the guest, it can use a new scheme to manage the Core PMCs using the new global control and status registers. In addition to benefiting from the PerfMonV2 functionality in the same way as the host (higher precision), the guest also can reduce the number of vm-exits by lowering the total number of MSRs accesses. In terms of implementation details, amd_is_valid_msr() is resurrected since three newly added MSRs could not be mapped to one vPMC. The possibility of emulating PerfMonV2 on the mainframe has also been eliminated for reasons of precision. Co-developed-by: Sandipan Das Signed-off-by: Sandipan Das Signed-off-by: Like Xu [sean: drop "Based on the observed HW." comments] Link: https://lore.kernel.org/r/20230603011058.1038821-12-seanjc@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c0778ca39650..abfba3cae0ba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1483,6 +1483,10 @@ static const u32 msrs_to_save_pmu[] = { MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5, MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2, MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5, + + MSR_AMD64_PERF_CNTR_GLOBAL_CTL, + MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, + MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, }; static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) + @@ -7150,6 +7154,12 @@ static void kvm_probe_msr_to_save(u32 msr_index) kvm_pmu_cap.num_counters_fixed) return; break; + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: + if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) + return; + break; case MSR_IA32_XFD: case MSR_IA32_XFD_ERR: if (!kvm_cpu_cap_has(X86_FEATURE_XFD)) -- cgit From a30642570855faa1992f333ce5cb60351b0a37da Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 6 Jun 2023 17:46:36 -0700 Subject: KVM: x86: Update comments about MSR lists exposed to userspace Refresh comments about msrs_to_save, emulated_msrs, and msr_based_features to remove stale references left behind by commit 2374b7310b66 (KVM: x86/pmu: Use separate array for defining "PMU MSRs to save"), and to better reflect the current reality, e.g. emulated_msrs is no longer just for MSRs that are "kvm-specific". Reported-by: Binbin Wu Link: https://lore.kernel.org/r/20230607004636.1421424-1-seanjc@google.com Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) (limited to 'arch/x86/kvm/x86.c') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 355f3df8dbad..07b0f960de07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1423,15 +1423,14 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu) EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc); /* - * List of msr numbers which we expose to userspace through KVM_GET_MSRS - * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. - * - * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) - * extract the supported MSRs from the related const lists. - * msrs_to_save is selected from the msrs_to_save_all to reflect the - * capabilities of the host cpu. This capabilities test skips MSRs that are - * kvm-specific. Those are put in emulated_msrs_all; filtering of emulated_msrs - * may depend on host virtualization features rather than host cpu features. + * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track + * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS, + * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that + * require host support, i.e. should be probed via RDMSR. emulated_msrs holds + * MSRs that KVM emulates without strictly requiring host support. + * msr_based_features holds MSRs that enumerate features, i.e. are effectively + * CPUID leafs. Note, msr_based_features isn't mutually exclusive with + * msrs_to_save and emulated_msrs. */ static const u32 msrs_to_save_base[] = { @@ -1527,11 +1526,11 @@ static const u32 emulated_msrs_all[] = { MSR_IA32_UCODE_REV, /* - * The following list leaves out MSRs whose values are determined - * by arch/x86/kvm/vmx/nested.c based on CPUID or other MSRs. - * We always support the "true" VMX control MSRs, even if the host - * processor does not, so I am putting these registers here rather - * than in msrs_to_save_all. + * KVM always supports the "true" VMX control MSRs, even if the host + * does not. The VMX MSRs as a whole are considered "emulated" as KVM + * doesn't strictly require them to exist in the host (ignoring that + * KVM would refuse to load in the first place if the core set of MSRs + * aren't supported). */ MSR_IA32_VMX_BASIC, MSR_IA32_VMX_TRUE_PINBASED_CTLS, -- cgit