git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-12-18	KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper	Sean Christopherson
	Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single helper. The only advantage of separating the two was to make it somewhat obvious that KVM directly initializes the KVM-defined words, whereas using a common helper will allow for hardening both kernel- and KVM-defined CPUID words without needing copy+paste. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features	Sean Christopherson
	Add a macro to precisely handle CPUID features that AMD duplicated from CPUID.0x1.EDX into CPUID.0x8000_0001.EDX. This will allow adding an assert that all features passed to kvm_cpu_cap_init() match the word being processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1. Because the kernel simply reuses the X86_FEATURE_* definitions from CPUID.0x1.EDX, KVM's use of the aliased features would result in false positives from such an assert. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Add a macro to init CPUID features that are 64-bit only	Sean Christopherson
	Add a macro to mask-in feature flags that are supported only on 64-bit kernels/KVM. In addition to reducing overall #ifdeffery, using a macro will allow hardening the kvm_cpu_cap initialization sequences to assert that the features being advertised are indeed included in the word being initialized. And arguably using *F() macros through is more readable. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()	Sean Christopherson
	Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_ bits in the helper (a future commit will play macro games to set emulated feature flags via kvm_cpu_cap_init()). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Unpack F() CPUID feature flag macros to one flag per line of code	Sean Christopherson
	Refactor kvm_set_cpu_caps() to express each supported (or not) feature flag on a separate line, modulo a handful of cases where KVM does not, and likely will not, support a sequence of flags. This will allow adding fancier macros with longer, more descriptive names without resulting in absurd line lengths and/or weird code. Isolating each flag also makes it far easier to review changes, reduces code conflicts, and generally makes it easier to resolve conflicts. Lastly, it allows co-locating comments for notable flags, e.g. MONITOR, precisely with the relevant flag. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID	Sean Christopherson
	Explicitly zero out the feature word in kvm_cpu_caps if the word's associated CPUID function is greater than the max leaf supported by the CPU. For such unsupported functions, Intel CPUs return the output from the last supported leaf, not all zeros. Practically speaking, this is likely a benign bug, as KVM uses the raw host CPUID to mask the kernel's computed capabilities, and the kernel does perform max leaf checks when populating boot_cpu_data. The only way KVM's goof could be problematic is if the kernel force-set a feature in a leaf that is completely unsupported, _and_ the max supported leaf happened to return a value with '1' the same bit position. Which is theoretically possible, but extremely unlikely. And even if that did happen, it's entirely possible that KVM would still provide the correct functionality; the kernel did set the capability after all. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Do reverse CPUID sanity checks in __feature_leaf()	Sean Christopherson
	Do the compile-time sanity checks on reverse_cpuid in __feature_leaf() so that higher level APIs don't need to "manually" perform the sanity checks. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Don't update PV features caches when enabling enforcement capability	Sean Christopherson
	Revert the chunk of commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features is initialized when enabling cap") that forced a PV features cache refresh during KVM_CAP_ENFORCE_PV_FEATURE_CPUID, as whatever ioctl() ordering issue it alleged to have fixed never existed upstream, and likely never existed in any kernel. At the time of the commit, there was a tangentially related ioctl() ordering issue, as toggling KVM_X86_DISABLE_EXITS_HLT after KVM_SET_CPUID2 would have resulted in KVM potentially leaving KVM_FEATURE_PV_UNHALT set. But (a) that bug affected the entire guest CPUID, not just the cache, (b) commit 01b4f510b9f4 didn't address that bug, it only refreshed the cache (with the bad CPUID), and (c) setting KVM_X86_DISABLE_EXITS_HLT after vCPU creation is completely broken as KVM configures HLT-exiting only during vCPU creation, which is why KVM_CAP_X86_DISABLE_EXITS is now disallowed if vCPUs have been created. Another tangentially related bug was KVM's failure to clear the cache when handling KVM_SET_CPUID2, but again commit 01b4f510b9f4 did nothing to fix that bug. The most plausible explanation for the what commit 01b4f510b9f4 was trying to fix is a bug that existed in Google's internal kernel that was the source of commit 01b4f510b9f4. At the time, Google's internal kernel had not yet picked up commit 0d3b2ba16ba68 ("KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent"), i.e. KVM would not initialize the PV features cache if KVM_SET_CPUID2 was called without a CPUID.0x1 entry. Of course, no sane real world VMM would omit CPUID.0x1, including the KVM selftest added by commit ac4a4d6de22e ("selftests: kvm: test enforcement of paravirtual cpuid features"). And the test didn't actually try to verify multiple orderings, nor did the selftest enter the guest without doing KVM_SET_CPUID2, so who knows what motivated the change. Regardless of why commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features is initialized when enabling cap") was added, refreshing the cache during KVM_CAP_ENFORCE_PV_FEATURE_CPUID isn't necessary. Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Zero out PV features cache when the CPUID leaf is not present	Sean Christopherson
	Clear KVM's PV feature cache prior when processing a new guest CPUID so that KVM doesn't keep a stale cache entry if userspace does KVM_SET_CPUID2 multiple times, once with a PV features entry, and a second time without. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior	Sean Christopherson
	Rework x86's KVM PV features test to align with KVM's new, fixed behavior of not allowing userspace to disable HLT-exiting after vCPUs have been created. Rework the core testcase to disable HLT-exiting before creating a vCPU, and opportunistically modify keep the paired VM+vCPU creation to verify that KVM rejects KVM_CAP_X86_DISABLE_EXITS as expected. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test	Sean Christopherson
	Actually check for KVM support for disabling HLT-exiting instead of effectively checking that KVM_CAP_X86_DISABLE_EXITS is #defined to a non-zero value, and convert the TEST_REQUIRE() to a simple return so that only the sub-test is skipped if HLT-exiting is mandatory. The goof has likely gone unnoticed because all x86 CPUs support disabling HLT-exiting, only systems with the opt-in mitigate_smt_rsb KVM module param disallow HLT-exiting. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Drop the now unused KVM_X86_DISABLE_VALID_EXITS	Sean Christopherson
	Drop the KVM_X86_DISABLE_VALID_EXITS definition, as it is misleading, and unused in KVM because it is misleading. The set of exits that can be disabled is dynamic, i.e. userspace (and KVM) must check KVM's actual capabilities. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed	Sean Christopherson
	Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that disabling the exit(s) is not allowed. E.g. because MWAIT isn't supported or the CPU doesn't have an always-running APIC timer, or because KVM is configured to mitigate cross-thread vulnerabilities. Cc: Kechen Lu <kechenl@nvidia.com> Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts") Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation	Sean Christopherson
	Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless, e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after vCPU creation. vCPUs may also end up with an inconsistent configuration if exits are disabled between creation of multiple vCPUs. Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/all/9227068821b275ac547eb2ede09ec65d2281fe07.1680179693.git.houwenlong.hwl@antgroup.com Link: https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation	Sean Christopherson
	Drop the manual initialization of maxphyaddr and reserved_gpa_bits during vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching. None of the helpers between the existing code in kvm_arch_vcpu_create() and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create() isn't exactly easy). Link: https://lore.kernel.org/r/20241128013424.4096668-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86/pmu: Drop now-redundant refresh() during init()	Sean Christopherson
	Drop the manual kvm_pmu_refresh() from kvm_pmu_init() now that kvm_arch_vcpu_create() performs the refresh via kvm_vcpu_after_set_cpuid(). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h	Sean Christopherson
	Let vendor code inline __kvm_is_valid_cr4() now x86.c's cr4_reserved_bits no longer exists, as keeping cr4_reserved_bits local to x86.c was the only reason for "hiding" the definition of __kvm_is_valid_cr4(). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes	Sean Christopherson
	Extend x86's set sregs test to verify that KVM sets/clears OSXSAVE and OSKPKE according to CR4.XSAVE and CR4.PKE respectively. For performance reasons, KVM is responsible for emulating the architectural behavior of the OS CPUID bits tracking CR4. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry()	Sean Christopherson
	Refresh selftests' CPUID cache in the vCPU structure when querying a CPUID entry so that tests don't consume stale data when KVM modifies CPUID as a side effect to a completely unrelated change. E.g. KVM adjusts OSXSAVE in response to CR4.OSXSAVE changes. Unnecessarily invoking KVM_GET_CPUID is suboptimal, but vcpu->cpuid exists to simplify selftests development, not for performance reasons. And, unfortunately, trying to handle the side effects in tests or other flows is unpleasant, e.g. selftests could manually refresh if KVM_SET_SREGS is successful, but that would still leave a gap with respect to guest CR4 changes. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Assert that vcpu->cpuid is non-NULL when getting CPUID entries	Sean Christopherson
	Add a sanity check in __vcpu_get_cpuid_entry() to provide a friendlier error than a segfault when a test developer tries to use a vCPU CPUID helper on a barebones vCPU. Link: https://lore.kernel.org/r/20241128013424.4096668-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement	Sean Christopherson
	Rework x86's set sregs test to verify that KVM enforces CPUID vs. CR4 features even if userspace hasn't explicitly set guest CPUID. KVM used to allow userspace to set any KVM-supported CR4 value prior to KVM_SET_CPUID2, and the test verified that behavior. However, the testcase was written purely to verify KVM's existing behavior, i.e. was NOT written to match the needs of real world VMMs. Opportunistically verify that KVM continues to reject unsupported features after KVM_SET_CPUID2 (using KVM_GET_SUPPORTED_CPUID). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX	Sean Christopherson
	Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's reserved bits into the guest's reserved bits. This fixes a bug where VMX's set_cr4_guest_host_mask() fails to account for KVM-reserved bits when deciding which bits can be passed through to the guest. In most cases, letting the guest directly write reserved CR4 bits is ok, i.e. attempting to set the bit(s) will still #GP, but not if a feature is available in hardware but explicitly disabled by the host, e.g. if FSGSBASE support is disabled via "nofsgsbase". Note, the extra overhead of computing host reserved bits every time userspace sets guest CPUID is negligible. The feature bits that are queried are packed nicely into a handful of words, and so checking and setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the total cost will be in the noise even if the number of checked CR4 bits doubles over the next few years. In other words, x86 will run out of CR4 bits long before the overhead becomes problematic. Note #2, __cr4_reserved_bits() starts from CR4_RESERVED_BITS, which is why the existing __kvm_cpu_cap_has() processing doesn't explicitly OR in CR4_RESERVED_BITS (and why the new code doesn't do so either). Fixes: 2ed41aa631fc ("KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Explicitly do runtime CPUID updates "after" initial setup	Sean Christopherson
	Explicitly perform runtime CPUID adjustments as part of the "after set CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID state during kvm_update_cpuid_runtime(). E.g. see commit 4736d85f0d18 ("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT"). Whacking each mole individually is not sustainable or robust, e.g. while the aforemention commit fixed KVM's PV features, the same issue lurks for Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime features (though spoiler alert, neither should KVM). Updating runtime features in the "full" path will also simplify adding a snapshot of the guest's capabilities, i.e. of caching the intersection of guest CPUID and kvm_cpu_caps (modulo a few edge cases). Link: https://lore.kernel.org/r/20241128013424.4096668-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Do all post-set CPUID processing during vCPU creation	Sean Christopherson
	During vCPU creation, process KVM's default, empty CPUID as if userspace set an empty CPUID to ensure consistent and correct behavior with respect to guest CPUID. E.g. if userspace never sets guest CPUID, KVM will never configure cr4_guest_rsvd_bits, and thus create divergent, incorrect, guest- visible behavior due to letting the guest set any KVM-supported CR4 bits despite the features not being allowed per guest CPUID. Note! This changes KVM's ABI, as lack of full CPUID processing allowed userspace to stuff garbage vCPU state, e.g. userspace could set CR4 to a guest-unsupported value via KVM_SET_SREGS. But it's extremely unlikely that this is a breaking change, as KVM already has many flows that require userspace to set guest CPUID before loading vCPU state. E.g. multiple MSR flows consult guest CPUID on host writes, and KVM_SET_SREGS itself already relies on guest CPUID being up-to-date, as KVM's validity check on CR3 consumes CPUID.0x7.1 (for LAM) and CPUID.0x80000008 (for MAXPHYADDR). Furthermore, the plan is to commit to enforcing guest CPUID for userspace writes to MSRs, at which point bypassing sregs CPUID checks is even more nonsensical. Link: https://lore.kernel.org/r/20241128013424.4096668-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Limit use of F() and SF() to kvm_cpu_cap_{mask,init_kvm_defined}()	Sean Christopherson
	Define and undefine the F() and SF() macros precisely around kvm_set_cpu_caps() to make it all but impossible to use the macros outside of kvm_cpu_cap_{mask,init_kvm_defined}(). Currently, F() is a simple passthrough, but SF() is actively dangerous as it checks that the scattered feature is supported by the host kernel. And usage outside of the aforementioned helpers will run afoul of future changes to harden KVM's CPUID management. Opportunistically switch to feature_bit() when stuffing LA57 based on raw hardware support. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: x86: Use feature_bit() to clear CONSTANT_TSC when emulating CPUID	Sean Christopherson
	When clearing CONSTANT_TSC during CPUID emulation due to a Hyper-V quirk, use feature_bit() instead of SF() to ensure the bit is actually cleared. SF() evaluates to zero if the _host_ doesn't support the feature. I.e. KVM could keep the bit set if userspace advertised CONSTANT_TSC despite it not being supported in hardware. Note, translating from a scattered feature to a the hardware version is done by __feature_translate(), not SF(). The sole purpose of SF() is to check kernel support for the scattered feature, before translation. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	Merge tag 'for-6.13-rc3-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - tree-checker catches invalid number of inline extent references - zoned mode fixes: - enhance zone append IO command so it also detects emulated writes - handle bio splitting at sectorsize boundary - when deleting a snapshot, fix a condition for visiting nodes in reloc trees * tag 'for-6.13-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: tree-checker: reject inline extent items with 0 ref count btrfs: split bios to the fs sector size boundary btrfs: use bio_is_zone_append() in the completion handler btrfs: fix improper generation check in snapshot delete
2024-12-18	KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR	Sean Christopherson
	Now that KVM selftests uses the kernel's canonical arch paths, directly override ARCH to 'x86' when targeting x86_64 instead of defining ARCH_DIR to redirect to appropriate paths. ARCH_DIR was originally added to deal with KVM selftests using the target triple ARCH for directories, e.g. s390x and aarch64; keeping it around just to deal with the one-off alias from x86_64=>x86 is unnecessary and confusing. Note, even when selftests are built from the top-level Makefile, ARCH is scoped to KVM's makefiles, i.e. overriding ARCH won't trip up some other selftests that (somehow) expects x86_64 and can't work with x86. Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20241128005547.4077116-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Use canonical $(ARCH) paths for KVM selftests directories	Sean Christopherson
	Use the kernel's canonical $(ARCH) paths instead of the raw target triple for KVM selftests directories. KVM selftests are quite nearly the only place in the entire kernel that using the target triple for directories, tools/testing/selftests/drivers/s390x being the lone holdout. Using the kernel's preferred nomenclature eliminates the minor, but annoying, friction of having to translate to KVM's selftests directories, e.g. for pattern matching, opening files, running selftests, etc. Opportunsitically delete file comments that reference the full path of the file, as they are obviously prone to becoming stale, and serve no known purpose. Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Provide empty 'all' and 'clean' targets for unsupported ARCHs	Sean Christopherson
	Provide empty targets for KVM selftests if the target architecture is unsupported to make it obvious which architectures are supported, and so that various side effects don't fail and/or do weird things, e.g. as is, "mkdir -p $(sort $(dir $(TEST_GEN_PROGS)))" fails due to a missing operand, and conversely, "$(shell mkdir -p $(sort $(OUTPUT)/$(ARCH_DIR) ..." will create an empty, useless directory for the unsupported architecture. Move the guts of the Makefile to Makefile.kvm so that it's easier to see that the if-statement effectively guards all of KVM selftests. Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)	Sean Christopherson
	Add two phases to mmu_stress_test to verify that KVM correctly handles guest memory that was writable, and then made read-only in the primary MMU, and then made writable again. Add bonus coverage for x86 and arm64 to verify that all of guest memory was marked read-only. Making forward progress (without making memory writable) requires arch specific code to skip over the faulting instruction, but the test can at least verify each vCPU's starting page was made read-only for other architectures. Link: https://lore.kernel.org/r/20241128005547.4077116-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Add a read-only mprotect() phase to mmu_stress_test	Sean Christopherson
	Add a third phase of mmu_stress_test to verify that mprotect()ing guest memory to make it read-only doesn't cause explosions, e.g. to verify KVM correctly handles the resulting mmu_notifier invalidations. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Precisely limit the number of guest loops in mmu_stress_test	Sean Christopherson
	Run the exact number of guest loops required in mmu_stress_test instead of looping indefinitely in anticipation of adding more stages that run different code (e.g. reads instead of writes). Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Use vcpu_arch_put_guest() in mmu_stress_test	Sean Christopherson
	Use vcpu_arch_put_guest() to write memory from the guest in mmu_stress_test as an easy way to provide a bit of extra coverage. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Enable mmu_stress_test on arm64	Sean Christopherson
	Enable the mmu_stress_test on arm64. The intent was to enable the test across all architectures when it was first added, but a few goofs made it unrunnable on !x86. Now that those goofs are fixed, at least for arm64, enable the test. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: sefltests: Explicitly include ucall_common.h in mmu_stress_test.c	Sean Christopherson
	Explicitly include ucall_common.h in the MMU stress test, as unlike arm64 and x86-64, RISC-V doesn't include ucall_common.h in its processor.h, i.e. this will allow enabling the test on RISC-V. Reported-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Compute number of extra pages needed in mmu_stress_test	Sean Christopherson
	Create mmu_stress_tests's VM with the correct number of extra pages needed to map all of memory in the guest. The bug hasn't been noticed before as the test currently runs only on x86, which maps guest memory with 1GiB pages, i.e. doesn't need much memory in the guest for page tables. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Only muck with SREGS on x86 in mmu_stress_test	Sean Christopherson
	Try to get/set SREGS in mmu_stress_test only when running on x86, as the ioctls are supported only by x86 and PPC, and the latter doesn't yet support KVM selftests. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Rename max_guest_memory_test to mmu_stress_test	Sean Christopherson
	Rename max_guest_memory_test to mmu_stress_test so that the name isn't horribly misleading when future changes extend the test to verify things like mprotect() interactions, and because the test is useful even when its configured to populate far less than the maximum amount of guest memory. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Check for a potential unhandled exception iff KVM_RUN succeeded	Sean Christopherson
	Don't check for an unhandled exception if KVM_RUN failed, e.g. if it returned errno=EFAULT, as reporting unhandled exceptions is done via a ucall, i.e. requires KVM_RUN to exit cleanly. Theoretically, checking for a ucall on a failed KVM_RUN could get a false positive, e.g. if there were stale data in vcpu->run from a previous exit. Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Assert that vcpu_{g,s}et_reg() won't truncate	Sean Christopherson
	Assert that the register being read/written by vcpu_{g,s}et_reg() is no larger than a uint64_t, i.e. that a selftest isn't unintentionally truncating the value being read/written. Ideally, the assert would be done at compile-time, but that would limit the checks to hardcoded accesses and/or require fancier compile-time assertion infrastructure to filter out dynamic usage. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20241128005547.4077116-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: selftests: Return a value from vcpu_get_reg() instead of using an out-param	Sean Christopherson
	Return a uint64_t from vcpu_get_reg() instead of having the caller provide a pointer to storage, as none of the vcpu_get_reg() usage in KVM selftests accesses a register larger than 64 bits, and vcpu_set_reg() only accepts a 64-bit value. If a use case comes along that needs to get a register that is larger than 64 bits, then a utility can be added to assert success and take a void pointer, but until then, forcing an out param yields ugly code and prevents feeding the output of vcpu_get_reg() into vcpu_set_reg(). Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20241128005547.4077116-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18	KVM: arm64: Only apply PMCR_EL0.P to the guest range of counters	Oliver Upton
	An important distinction from other registers affected by HPMN is that PMCR_EL0 only affects the guest range of counters, regardless of the EL from which it is accessed. Ensure that PMCR_EL0.P is always applied to 'guest' counters by manually computing the mask rather than deriving it from the current context. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241217175611.3658290-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-12-18	KVM: arm64: nv: Reload PMU events upon MDCR_EL2.HPME change	Oliver Upton
	MDCR_EL2.HPME is the 'global' enable bit for event counters reserved for EL2. Give the PMU a kick when it's changed to ensure events are reprogrammed before returning to the guest. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241217175550.3658212-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-12-18	KVM: arm64: Use KVM_REQ_RELOAD_PMU to handle PMCR_EL0.E change	Oliver Upton
	Nested virt introduces yet another set of 'global' knobs for controlling event counters that are reserved for EL2 (i.e. >= HPMN). Get ready to share some plumbing with the NV controls by offloading counter reprogramming to KVM_REQ_RELOAD_PMU. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241217175532.3658134-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-12-18	KVM: arm64: Add unified helper for reprogramming counters by mask	Oliver Upton
	Having separate helpers for enabling/disabling counters provides the wrong abstraction, as the state of each counter needs to be evaluated independently and, in some cases, use a different global enable bit. Collapse the enable/disable accessors into a single, common helper that reconfigures every counter set in @mask, leaving the complexity of determining if an event is actually enabled in kvm_pmu_counter_is_enabled(). Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241217175513.3658056-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-12-18	KVM: arm64: Always check the state from hyp_ack_unshare()	Quentin Perret
	There are multiple pKVM memory transitions where the state of a page is not cross-checked from the completer's PoV for performance reasons. For example, if a page is PKVM_PAGE_OWNED from the initiator's PoV, we should be guaranteed by construction that it is PKVM_NOPAGE for everybody else, hence allowing us to save a page-table lookup. When it was introduced, hyp_ack_unshare() followed that logic and bailed out without checking the PKVM_PAGE_SHARED_BORROWED state in the hypervisor's stage-1. This was correct as we could safely assume that all host-initiated shares were directed at the hypervisor at the time. But with the introduction of other types of shares (e.g. for FF-A or non-protected guests), it is now very much required to cross check this state to prevent the host from running __pkvm_host_unshare_hyp() on a page shared with TZ or a non-protected guest. Thankfully, if an attacker were to try this, the hyp_unmap() call from hyp_complete_unshare() would fail, hence causing to WARN() from __do_unshare() with the host lock held, which is fatal. But this is fragile at best, and can hardly be considered a security measure. Let's just do the right thing and always check the state from hyp_ack_unshare(). Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241128154406.602875-1-qperret@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-12-18	Merge tag 'cxl-fixes-6.13-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Ira Weiny: - prevent probe failure when non-critical RAS unmasking fails - fix CXL 1.1 link status sysfs attribute - fix 4 way (and greater) switch interleave region creation * tag 'cxl-fixes-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Fix region creation for greater than x2 switches cxl/pci: Check dport->regs.rcd_pcie_cap availability before accessing cxl/pci: Fix potential bogus return value upon successful probing
2024-12-18	drm/amdgpu/nbio7.0: fix IP version check	Alex Deucher
	Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0ec43fbece784215d3c4469973e4556d70bce915) Cc: stable@vger.kernel.org
2024-12-18	drm/amd: Update strapping for NBIO 2.5.0	Mario Limonciello
	This helps to avoid a spurious PME event on hotplug to Azalia. Cc: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Reported-and-tested-by: ionut_n2001@yahoo.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215884 Tested-by: Gabriel Marcano <gabemarcano@yahoo.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211024414.7840-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3f6f237b9dd189e1fb85b8a3f7c97a8f27c1e49a) Cc: stable@vger.kernel.org