linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-12-18	KVM: x86: Initialize guest cpu_caps based on KVM support	Sean Christopherson
	Constrain all guest cpu_caps based on KVM support instead of constraining only the few features that KVM _currently_ needs to verify are actually supported by KVM. The intent of cpu_caps is to track what the guest is actually capable of using, not the raw, unfiltered CPUID values that the guest sees. I.e. KVM should always consult it's only support when making decisions based on guest CPUID, and the only reason KVM has historically made the checks opt-in was due to lack of centralized tracking. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-45-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Treat MONTIOR/MWAIT as a "partially emulated" feature	Sean Christopherson
	Enumerate MWAIT in cpuid_func_emulated(), but only if the caller wants to include "partially emulated" features, i.e. features that KVM kinda sorta emulates, but with major caveats. This will allow initializing the guest cpu_caps based on the set of features that KVM virtualizes and/or emulates, without needing to handle things like MONITOR/MWAIT as one-off exceptions. Adding one-off handling for individual features is quite painful, especially when considering future hardening. It's very doable to verify, at compile time, that every CPUID-based feature that KVM queries when emulating guest behavior is actually known to KVM, e.g. to prevent KVM bugs where KVM emulates some feature but fails to advertise support to userspace. In other words, any features that are special cased, i.e. not handled generically in the CPUID framework, would also need to be special cased for any hardening efforts that build on said framework. Link: https://lore.kernel.org/r/20241128013424.4096668-44-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Extract code for generating per-entry emulated CPUID information	Sean Christopherson
	Extract the meat of __do_cpuid_func_emulated() into a separate helper, cpuid_func_emulated(), so that cpuid_func_emulated() can be used with a single CPUID entry. This will allow marking emulated features as fully supported in the guest cpu_caps without needing to hardcode the set of emulated features in multiple locations. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-43-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Initialize guest cpu_caps based on guest CPUID	Sean Christopherson
	Initialize a vCPU's capabilities based on the guest CPUID provided by userspace instead of simply zeroing the entire array. This is the first step toward using cpu_caps to query all CPUID-based guest capabilities, i.e. will allow converting all usage of guest_cpuid_has() to guest_cpu_cap_has(). Zeroing the array was the logical choice when using cpu_caps was opt-in, e.g. "unsupported" was generally a safer default, and the whole point of governed features is that KVM would need to check host and guest support, i.e. making everything unsupported by default didn't require more code. But requiring KVM to manually "enable" every CPUID-based feature in cpu_caps would require an absurd amount of boilerplate code. Follow existing CPUID/kvm_cpu_caps nomenclature where possible, e.g. for the change() and clear() APIs. Replace check_and_set() with constrain() to try and capture that KVM is constraining userspace's desired guest feature set based on KVM's capabilities. This is intended to be gigantic nop, i.e. should not have any impact on guest or KVM functionality. This is also an intermediate step; a future commit will also incorporate KVM support into the vCPU's cpu_caps before converting guest_cpuid_has() to guest_cpu_cap_has(). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-42-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps	Sean Christopherson
	Replace the internals of the governed features framework with a more comprehensive "guest CPU capabilities" implementation, i.e. with a guest version of kvm_cpu_caps. Keep the skeleton of governed features around for now as vmx_adjust_sec_exec_control() relies on detecting governed features to do the right thing for XSAVES, and switching all guest feature queries to guest_cpu_cap_has() requires subtle and non-trivial changes, i.e. is best done as a standalone change. Tracking all guest capabilities that KVM cares will allow excising the poorly named "governed features" framework, and effectively optimizes all KVM queries of guest capabilities, i.e. doesn't require making a subjective decision as to whether or not a feature is worth "governing", and doesn't require adding the code to do so. The cost of tracking all features is currently 92 bytes per vCPU on 64-bit kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features. That cost is well worth paying even if the only benefit was eliminating the "governed features" terminology. And practically speaking, the real cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm into a new order-N allocation, and if that happens there are better ways to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR state separate allocations. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-41-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"	Sean Christopherson
	As the first step toward replacing KVM's so-called "governed features" framework with a more comprehensive, less poorly named implementation, replace the "kvm_governed_feature" function prefix with "guest_cpu_cap" and rename guest_can_use() to guest_cpu_cap_has(). The "guest_cpu_cap" naming scheme mirrors that of "kvm_cpu_cap", and provides a more clear distinction between guest capabilities, which are KVM controlled (heh, or one might say "governed"), and guest CPUID, which with few exceptions is fully userspace controlled. Opportunistically rewrite the comment about XSS passthrough for SEV-ES guests to avoid referencing so many functions, as such comments are prone to becoming stale (case in point...). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-40-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID	Sean Christopherson
	Unconditionally advertise "support" for the HYPERVISOR feature in CPUID, as the flag simply communicates to the guest that's it's running under a hypervisor. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-39-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID	Sean Christopherson
	Unconditionally advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID, as KVM always emulates deadline mode, if the VM has an in-kernel local APIC. The odds of a VMM emulating the local APIC in userspace, not emulating the TSC deadline timer, _and_ reflecting KVM_GET_SUPPORTED_CPUID back into KVM_SET_CPUID2, i.e. the risk of over-advertising and breaking any setups, is extremely low. KVM has _unconditionally_ advertised X2APIC via CPUID since commit 0d1de2d901f4 ("KVM: Always report x2apic as supported feature"), and it is completely impossible for userspace to emulate X2APIC as KVM doesn't support forwarding the MSR accesses to userspace. I.e. KVM has relied on userspace VMMs to not misreport local APIC capabilities for nearly 13 years. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-38-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Remove all direct usage of cpuid_entry2_find()	Sean Christopherson
	Convert all use of cpuid_entry2_find() to kvm_find_cpuid_entry{,index}() now that cpuid_entry2_find() operates on the vCPU state, i.e. now that there is no need to use cpuid_entry2_find() directly in order to pass in non-vCPU state. To help prevent unwanted usage of cpuid_entry2_find(), #undef KVM_CPUID_INDEX_NOT_SIGNIFICANT, i.e. force KVM to use kvm_find_cpuid_entry(). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-37-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find()	Sean Christopherson
	Move kvm_find_cpuid_entry{,_index}() "up" in cpuid.c so that they are colocated with cpuid_entry2_find(), e.g. to make it easier to see the effective guts of the helpers without having to bounce around cpuid.c. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-36-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find()	Sean Christopherson
	Now that KVM sets vcpu->arch.cpuid_{entries,nent} before processing the incoming CPUID entries during KVM_SET_CPUID{,2}, drop the @entries and @nent params from cpuid_entry2_find() and unconditionally operate on the vCPU state. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-35-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Remove unnecessary caching of KVM's PV CPUID base	Sean Christopherson
	Now that KVM only searches for KVM's PV CPUID base when userspace sets guest CPUID, drop the cache and simply do the search every time. Practically speaking, this is a nop except for situations where userspace sets CPUID _after_ running the vCPU, which is anything but a hot path, e.g. QEMU does so only when hotplugging a vCPU. And on the flip side, caching guest CPUID information, especially information that is used to query/modify _other_ CPUID state, is inherently dangerous as it's all too easy to use stale information, i.e. KVM should only cache CPUID state when the performance and/or programming benefits justify it. Link: https://lore.kernel.org/r/20241128013424.4096668-34-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID	Sean Christopherson
	Now that KVM disallows disabling HLT-exiting after vCPUs have been created, i.e. now that it's impossible for kvm_hlt_in_guest() to change while vCPUs are running, apply KVM's PV_UNHALT quirk only when userspace is setting guest CPUID. Opportunistically rename the helper to make it clear that KVM's behavior is a quirk that should never have been added. KVM's documentation explicitly states that userspace should not advertise PV_UNHALT if HLT-exiting is disabled, but for unknown reasons, commit caa057a2cad6 ("KVM: X86: Provide a capability to disable HLT intercepts") didn't stop at documenting the requirement and also massaged the incoming guest CPUID. Unfortunately, it's quite likely that userspace has come to rely on KVM's behavior, i.e. the code can't simply be deleted. The only reason KVM doesn't have an "official" quirk is that there is no known use case where disabling the quirk would make sense, i.e. letting userspace disable the quirk would further increase KVM's burden without any benefit. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-33-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2	Sean Christopherson
	When handling KVM_SET_CPUID{,2}, swap the old and new CPUID arrays and lengths before processing the new CPUID, and simply undo the swap if setting the new CPUID fails for whatever reason. To keep the diff reasonable, continue passing the entry array and length to most helpers, and defer the more complete cleanup to future commits. For any sane VMM, setting "bad" CPUID state is not a hot path (or even something that is surviable), and setting guest CPUID before it's known good will allow removing all of KVM's infrastructure for processing CPUID entries directly (as opposed to operating on vcpu->arch.cpuid_entries). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Add a macro to init CPUID features that KVM emulates in software	Sean Christopherson
	Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to OR-in features that KVM emulates in software, i.e. that don't depend on the feature being available in hardware. The contained scope of kvm_cpu_cap_init() allows using a local variable to track the set of emulated leaves, which in addition to avoiding confusing and/or unnecessary variables, helps prevent misuse of EMUL_F(). Link: https://lore.kernel.org/r/20241128013424.4096668-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Add a macro to init CPUID features that ignore host kernel support	Sean Christopherson
	Add a macro for use in kvm_set_cpu_caps() to automagically initialize features that KVM wants to support based solely on the CPU's capabilities, e.g. KVM advertises LA57 support if it's available in hardware, even if the host kernel isn't utilizing 57-bit virtual addresses. Track a features that are passed through to userspace (from hardware) in a local variable, and simply OR them in after adjusting the capabilities that came from boot_cpu_data. Note, eliminating the open-coded call to cpuid_ecx() also fixes a largely benign bug where KVM could incorrectly report LA57 support on Intel CPUs whose max supported CPUID is less than 7, i.e. if the max supported leaf (<7) happened to have bit 16 set. In practice, barring a funky virtual machine setup, the bug is benign as all known CPUs that support VMX also support leaf 7. Link: https://lore.kernel.org/r/20241128013424.4096668-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Harden CPU capabilities processing against out-of-scope features	Sean Christopherson
	Add compile-time assertions to verify that usage of F() and friends in kvm_set_cpu_caps() is scoped to the correct CPUID word, e.g. to detect bugs where KVM passes a feature bit from word X into word y. Add a one-off assertion in the aliased feature macro to ensure that only word 0x8000_0001.EDX aliased the features defined for 0x1.EDX. To do so, convert kvm_cpu_cap_init() to a macro and have it define a local variable to track which CPUID word is being initialized that is then used to validate usage of F() (all of the inputs are compile-time constants and thus can be fed into BUILD_BUG_ON()). Redefine KVM_VALIDATE_CPU_CAP_USAGE after kvm_set_cpu_caps() to be a nop so that F() can be used in other flows that aren't as easily hardened, e.g. __do_cpuid_func_emulated() and __do_cpuid_func(). Invoke KVM_VALIDATE_CPU_CAP_USAGE() in SF() and X86_64_F() to ensure the validation occurs, e.g. if the usage of F() is completely compiled out (which shouldn't happen for boot_cpu_has(), but could happen in the future, e.g. if KVM were to use cpu_feature_enabled()). Link: https://lore.kernel.org/r/20241128013424.4096668-29-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions	Sean Christopherson
	Undefine SPEC_CTRL_SSBD, which is #defined by msr-index.h to represent the enable flag in MSR_IA32_SPEC_CTRL, to avoid issues with the macro being unpacked into its raw value when passed to KVM's F() macro. This will allow using multiple layers of macros in F() and friends, e.g. to harden against incorrect usage of F(). No functional change intended (cpuid.c doesn't consume SPEC_CTRL_SSBD). Link: https://lore.kernel.org/r/20241128013424.4096668-28-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper	Sean Christopherson
	Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single helper. The only advantage of separating the two was to make it somewhat obvious that KVM directly initializes the KVM-defined words, whereas using a common helper will allow for hardening both kernel- and KVM-defined CPUID words without needing copy+paste. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features	Sean Christopherson
	Add a macro to precisely handle CPUID features that AMD duplicated from CPUID.0x1.EDX into CPUID.0x8000_0001.EDX. This will allow adding an assert that all features passed to kvm_cpu_cap_init() match the word being processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1. Because the kernel simply reuses the X86_FEATURE_* definitions from CPUID.0x1.EDX, KVM's use of the aliased features would result in false positives from such an assert. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Add a macro to init CPUID features that are 64-bit only	Sean Christopherson
	Add a macro to mask-in feature flags that are supported only on 64-bit kernels/KVM. In addition to reducing overall #ifdeffery, using a macro will allow hardening the kvm_cpu_cap initialization sequences to assert that the features being advertised are indeed included in the word being initialized. And arguably using *F() macros through is more readable. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()	Sean Christopherson
	Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_ bits in the helper (a future commit will play macro games to set emulated feature flags via kvm_cpu_cap_init()). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Unpack F() CPUID feature flag macros to one flag per line of code	Sean Christopherson
	Refactor kvm_set_cpu_caps() to express each supported (or not) feature flag on a separate line, modulo a handful of cases where KVM does not, and likely will not, support a sequence of flags. This will allow adding fancier macros with longer, more descriptive names without resulting in absurd line lengths and/or weird code. Isolating each flag also makes it far easier to review changes, reduces code conflicts, and generally makes it easier to resolve conflicts. Lastly, it allows co-locating comments for notable flags, e.g. MONITOR, precisely with the relevant flag. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID	Sean Christopherson
	Explicitly zero out the feature word in kvm_cpu_caps if the word's associated CPUID function is greater than the max leaf supported by the CPU. For such unsupported functions, Intel CPUs return the output from the last supported leaf, not all zeros. Practically speaking, this is likely a benign bug, as KVM uses the raw host CPUID to mask the kernel's computed capabilities, and the kernel does perform max leaf checks when populating boot_cpu_data. The only way KVM's goof could be problematic is if the kernel force-set a feature in a leaf that is completely unsupported, _and_ the max supported leaf happened to return a value with '1' the same bit position. Which is theoretically possible, but extremely unlikely. And even if that did happen, it's entirely possible that KVM would still provide the correct functionality; the kernel did set the capability after all. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Do reverse CPUID sanity checks in __feature_leaf()	Sean Christopherson
	Do the compile-time sanity checks on reverse_cpuid in __feature_leaf() so that higher level APIs don't need to "manually" perform the sanity checks. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Don't update PV features caches when enabling enforcement capability	Sean Christopherson
	Revert the chunk of commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features is initialized when enabling cap") that forced a PV features cache refresh during KVM_CAP_ENFORCE_PV_FEATURE_CPUID, as whatever ioctl() ordering issue it alleged to have fixed never existed upstream, and likely never existed in any kernel. At the time of the commit, there was a tangentially related ioctl() ordering issue, as toggling KVM_X86_DISABLE_EXITS_HLT after KVM_SET_CPUID2 would have resulted in KVM potentially leaving KVM_FEATURE_PV_UNHALT set. But (a) that bug affected the entire guest CPUID, not just the cache, (b) commit 01b4f510b9f4 didn't address that bug, it only refreshed the cache (with the bad CPUID), and (c) setting KVM_X86_DISABLE_EXITS_HLT after vCPU creation is completely broken as KVM configures HLT-exiting only during vCPU creation, which is why KVM_CAP_X86_DISABLE_EXITS is now disallowed if vCPUs have been created. Another tangentially related bug was KVM's failure to clear the cache when handling KVM_SET_CPUID2, but again commit 01b4f510b9f4 did nothing to fix that bug. The most plausible explanation for the what commit 01b4f510b9f4 was trying to fix is a bug that existed in Google's internal kernel that was the source of commit 01b4f510b9f4. At the time, Google's internal kernel had not yet picked up commit 0d3b2ba16ba68 ("KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent"), i.e. KVM would not initialize the PV features cache if KVM_SET_CPUID2 was called without a CPUID.0x1 entry. Of course, no sane real world VMM would omit CPUID.0x1, including the KVM selftest added by commit ac4a4d6de22e ("selftests: kvm: test enforcement of paravirtual cpuid features"). And the test didn't actually try to verify multiple orderings, nor did the selftest enter the guest without doing KVM_SET_CPUID2, so who knows what motivated the change. Regardless of why commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features is initialized when enabling cap") was added, refreshing the cache during KVM_CAP_ENFORCE_PV_FEATURE_CPUID isn't necessary. Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Zero out PV features cache when the CPUID leaf is not present	Sean Christopherson
	Clear KVM's PV feature cache prior when processing a new guest CPUID so that KVM doesn't keep a stale cache entry if userspace does KVM_SET_CPUID2 multiple times, once with a PV features entry, and a second time without. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior	Sean Christopherson
	Rework x86's KVM PV features test to align with KVM's new, fixed behavior of not allowing userspace to disable HLT-exiting after vCPUs have been created. Rework the core testcase to disable HLT-exiting before creating a vCPU, and opportunistically modify keep the paired VM+vCPU creation to verify that KVM rejects KVM_CAP_X86_DISABLE_EXITS as expected. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test	Sean Christopherson
	Actually check for KVM support for disabling HLT-exiting instead of effectively checking that KVM_CAP_X86_DISABLE_EXITS is #defined to a non-zero value, and convert the TEST_REQUIRE() to a simple return so that only the sub-test is skipped if HLT-exiting is mandatory. The goof has likely gone unnoticed because all x86 CPUs support disabling HLT-exiting, only systems with the opt-in mitigate_smt_rsb KVM module param disallow HLT-exiting. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Drop the now unused KVM_X86_DISABLE_VALID_EXITS	Sean Christopherson
	Drop the KVM_X86_DISABLE_VALID_EXITS definition, as it is misleading, and unused in KVM because it is misleading. The set of exits that can be disabled is dynamic, i.e. userspace (and KVM) must check KVM's actual capabilities. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed	Sean Christopherson
	Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that disabling the exit(s) is not allowed. E.g. because MWAIT isn't supported or the CPU doesn't have an always-running APIC timer, or because KVM is configured to mitigate cross-thread vulnerabilities. Cc: Kechen Lu <kechenl@nvidia.com> Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts") Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation	Sean Christopherson
	Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless, e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after vCPU creation. vCPUs may also end up with an inconsistent configuration if exits are disabled between creation of multiple vCPUs. Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/all/9227068821b275ac547eb2ede09ec65d2281fe07.1680179693.git.houwenlong.hwl@antgroup.com Link: https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation	Sean Christopherson
	Drop the manual initialization of maxphyaddr and reserved_gpa_bits during vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching. None of the helpers between the existing code in kvm_arch_vcpu_create() and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create() isn't exactly easy). Link: https://lore.kernel.org/r/20241128013424.4096668-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86/pmu: Drop now-redundant refresh() during init()	Sean Christopherson
	Drop the manual kvm_pmu_refresh() from kvm_pmu_init() now that kvm_arch_vcpu_create() performs the refresh via kvm_vcpu_after_set_cpuid(). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h	Sean Christopherson
	Let vendor code inline __kvm_is_valid_cr4() now x86.c's cr4_reserved_bits no longer exists, as keeping cr4_reserved_bits local to x86.c was the only reason for "hiding" the definition of __kvm_is_valid_cr4(). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes	Sean Christopherson
	Extend x86's set sregs test to verify that KVM sets/clears OSXSAVE and OSKPKE according to CR4.XSAVE and CR4.PKE respectively. For performance reasons, KVM is responsible for emulating the architectural behavior of the OS CPUID bits tracking CR4. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry()	Sean Christopherson
	Refresh selftests' CPUID cache in the vCPU structure when querying a CPUID entry so that tests don't consume stale data when KVM modifies CPUID as a side effect to a completely unrelated change. E.g. KVM adjusts OSXSAVE in response to CR4.OSXSAVE changes. Unnecessarily invoking KVM_GET_CPUID is suboptimal, but vcpu->cpuid exists to simplify selftests development, not for performance reasons. And, unfortunately, trying to handle the side effects in tests or other flows is unpleasant, e.g. selftests could manually refresh if KVM_SET_SREGS is successful, but that would still leave a gap with respect to guest CR4 changes. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Assert that vcpu->cpuid is non-NULL when getting CPUID entries	Sean Christopherson
	Add a sanity check in __vcpu_get_cpuid_entry() to provide a friendlier error than a segfault when a test developer tries to use a vCPU CPUID helper on a barebones vCPU. Link: https://lore.kernel.org/r/20241128013424.4096668-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement	Sean Christopherson
	Rework x86's set sregs test to verify that KVM enforces CPUID vs. CR4 features even if userspace hasn't explicitly set guest CPUID. KVM used to allow userspace to set any KVM-supported CR4 value prior to KVM_SET_CPUID2, and the test verified that behavior. However, the testcase was written purely to verify KVM's existing behavior, i.e. was NOT written to match the needs of real world VMMs. Opportunistically verify that KVM continues to reject unsupported features after KVM_SET_CPUID2 (using KVM_GET_SUPPORTED_CPUID). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX	Sean Christopherson
	Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's reserved bits into the guest's reserved bits. This fixes a bug where VMX's set_cr4_guest_host_mask() fails to account for KVM-reserved bits when deciding which bits can be passed through to the guest. In most cases, letting the guest directly write reserved CR4 bits is ok, i.e. attempting to set the bit(s) will still #GP, but not if a feature is available in hardware but explicitly disabled by the host, e.g. if FSGSBASE support is disabled via "nofsgsbase". Note, the extra overhead of computing host reserved bits every time userspace sets guest CPUID is negligible. The feature bits that are queried are packed nicely into a handful of words, and so checking and setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the total cost will be in the noise even if the number of checked CR4 bits doubles over the next few years. In other words, x86 will run out of CR4 bits long before the overhead becomes problematic. Note #2, __cr4_reserved_bits() starts from CR4_RESERVED_BITS, which is why the existing __kvm_cpu_cap_has() processing doesn't explicitly OR in CR4_RESERVED_BITS (and why the new code doesn't do so either). Fixes: 2ed41aa631fc ("KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Explicitly do runtime CPUID updates "after" initial setup	Sean Christopherson
	Explicitly perform runtime CPUID adjustments as part of the "after set CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID state during kvm_update_cpuid_runtime(). E.g. see commit 4736d85f0d18 ("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT"). Whacking each mole individually is not sustainable or robust, e.g. while the aforemention commit fixed KVM's PV features, the same issue lurks for Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime features (though spoiler alert, neither should KVM). Updating runtime features in the "full" path will also simplify adding a snapshot of the guest's capabilities, i.e. of caching the intersection of guest CPUID and kvm_cpu_caps (modulo a few edge cases). Link: https://lore.kernel.org/r/20241128013424.4096668-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Do all post-set CPUID processing during vCPU creation	Sean Christopherson
	During vCPU creation, process KVM's default, empty CPUID as if userspace set an empty CPUID to ensure consistent and correct behavior with respect to guest CPUID. E.g. if userspace never sets guest CPUID, KVM will never configure cr4_guest_rsvd_bits, and thus create divergent, incorrect, guest- visible behavior due to letting the guest set any KVM-supported CR4 bits despite the features not being allowed per guest CPUID. Note! This changes KVM's ABI, as lack of full CPUID processing allowed userspace to stuff garbage vCPU state, e.g. userspace could set CR4 to a guest-unsupported value via KVM_SET_SREGS. But it's extremely unlikely that this is a breaking change, as KVM already has many flows that require userspace to set guest CPUID before loading vCPU state. E.g. multiple MSR flows consult guest CPUID on host writes, and KVM_SET_SREGS itself already relies on guest CPUID being up-to-date, as KVM's validity check on CR3 consumes CPUID.0x7.1 (for LAM) and CPUID.0x80000008 (for MAXPHYADDR). Furthermore, the plan is to commit to enforcing guest CPUID for userspace writes to MSRs, at which point bypassing sregs CPUID checks is even more nonsensical. Link: https://lore.kernel.org/r/20241128013424.4096668-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Limit use of F() and SF() to kvm_cpu_cap_{mask,init_kvm_defined}()	Sean Christopherson
	Define and undefine the F() and SF() macros precisely around kvm_set_cpu_caps() to make it all but impossible to use the macros outside of kvm_cpu_cap_{mask,init_kvm_defined}(). Currently, F() is a simple passthrough, but SF() is actively dangerous as it checks that the scattered feature is supported by the host kernel. And usage outside of the aforementioned helpers will run afoul of future changes to harden KVM's CPUID management. Opportunistically switch to feature_bit() when stuffing LA57 based on raw hardware support. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Use feature_bit() to clear CONSTANT_TSC when emulating CPUID	Sean Christopherson
	When clearing CONSTANT_TSC during CPUID emulation due to a Hyper-V quirk, use feature_bit() instead of SF() to ensure the bit is actually cleared. SF() evaluates to zero if the _host_ doesn't support the feature. I.e. KVM could keep the bit set if userspace advertised CONSTANT_TSC despite it not being supported in hardware. Note, translating from a scattered feature to a the hardware version is done by __feature_translate(), not SF(). The sole purpose of SF() is to check kernel support for the scattered feature, before translation. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR	Sean Christopherson
	Now that KVM selftests uses the kernel's canonical arch paths, directly override ARCH to 'x86' when targeting x86_64 instead of defining ARCH_DIR to redirect to appropriate paths. ARCH_DIR was originally added to deal with KVM selftests using the target triple ARCH for directories, e.g. s390x and aarch64; keeping it around just to deal with the one-off alias from x86_64=>x86 is unnecessary and confusing. Note, even when selftests are built from the top-level Makefile, ARCH is scoped to KVM's makefiles, i.e. overriding ARCH won't trip up some other selftests that (somehow) expects x86_64 and can't work with x86. Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20241128005547.4077116-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Use canonical $(ARCH) paths for KVM selftests directories	Sean Christopherson
	Use the kernel's canonical $(ARCH) paths instead of the raw target triple for KVM selftests directories. KVM selftests are quite nearly the only place in the entire kernel that using the target triple for directories, tools/testing/selftests/drivers/s390x being the lone holdout. Using the kernel's preferred nomenclature eliminates the minor, but annoying, friction of having to translate to KVM's selftests directories, e.g. for pattern matching, opening files, running selftests, etc. Opportunsitically delete file comments that reference the full path of the file, as they are obviously prone to becoming stale, and serve no known purpose. Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Provide empty 'all' and 'clean' targets for unsupported ARCHs	Sean Christopherson
	Provide empty targets for KVM selftests if the target architecture is unsupported to make it obvious which architectures are supported, and so that various side effects don't fail and/or do weird things, e.g. as is, "mkdir -p $(sort $(dir $(TEST_GEN_PROGS)))" fails due to a missing operand, and conversely, "$(shell mkdir -p $(sort $(OUTPUT)/$(ARCH_DIR) ..." will create an empty, useless directory for the unsupported architecture. Move the guts of the Makefile to Makefile.kvm so that it's easier to see that the if-statement effectively guards all of KVM selftests. Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)	Sean Christopherson
	Add two phases to mmu_stress_test to verify that KVM correctly handles guest memory that was writable, and then made read-only in the primary MMU, and then made writable again. Add bonus coverage for x86 and arm64 to verify that all of guest memory was marked read-only. Making forward progress (without making memory writable) requires arch specific code to skip over the faulting instruction, but the test can at least verify each vCPU's starting page was made read-only for other architectures. Link: https://lore.kernel.org/r/20241128005547.4077116-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Add a read-only mprotect() phase to mmu_stress_test	Sean Christopherson
	Add a third phase of mmu_stress_test to verify that mprotect()ing guest memory to make it read-only doesn't cause explosions, e.g. to verify KVM correctly handles the resulting mmu_notifier invalidations. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Precisely limit the number of guest loops in mmu_stress_test	Sean Christopherson
	Run the exact number of guest loops required in mmu_stress_test instead of looping indefinitely in anticipation of adding more stages that run different code (e.g. reads instead of writes). Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>