summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2023-05-05Merge tag 'kvm-x86-mmu-6.4-2' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
Fix a long-standing flaw in x86's TDP MMU where unloading roots on a vCPU can result in the root being freed even though the root is completely valid and can be reused as-is (with a TLB flush).
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-04-28Merge tag 'smp-core-2023-04-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-26KVM: x86: Preserve TDP MMU roots until they are explicitly invalidatedSean Christopherson
Preserve TDP MMU roots until they are explicitly invalidated by gifting the TDP MMU itself a reference to a root when it is allocated. Keeping a reference in the TDP MMU fixes a flaw where the TDP MMU exhibits terrible performance, and can potentially even soft-hang a vCPU, if a vCPU frequently unloads its roots, e.g. when KVM is emulating SMI+RSM. When KVM emulates something that invalidates _all_ TLB entries, e.g. SMI and RSM, KVM unloads all of the vCPUs roots (KVM keeps a small per-vCPU cache of previous roots). Unloading roots is a simple way to ensure KVM flushes and synchronizes all roots for the vCPU, as KVM flushes and syncs when allocating a "new" root (from the vCPU's perspective). In the shadow MMU, KVM keeps track of all shadow pages, roots included, in a per-VM hash table. Unloading a shadow MMU root just wipes it from the per-vCPU cache; the root is still tracked in the per-VM hash table. When KVM loads a "new" root for the vCPU, KVM will find the old, unloaded root in the per-VM hash table. Unlike the shadow MMU, the TDP MMU doesn't track "inactive" roots in a per-VM structure, where "active" in this case means a root is either in-use or cached as a previous root by at least one vCPU. When a TDP MMU root becomes inactive, i.e. the last vCPU reference to the root is put, KVM immediately frees the root (asterisk on "immediately" as the actual freeing may be done by a worker, but for all intents and purposes the root is gone). The TDP MMU behavior is especially problematic for 1-vCPU setups, as unloading all roots effectively frees all roots. The issue is mitigated to some degree in multi-vCPU setups as a different vCPU usually holds a reference to an unloaded root and thus keeps the root alive, allowing the vCPU to reuse its old root after unloading (with a flush+sync). The TDP MMU flaw has been known for some time, as until very recently, KVM's handling of CR0.WP also triggered unloading of all roots. The CR0.WP toggling scenario was eventually addressed by not unloading roots when _only_ CR0.WP is toggled, but such an approach doesn't Just Work for emulating SMM as KVM must emulate a full TLB flush on entry and exit to/from SMM. Given that the shadow MMU plays nice with unloading roots at will, teaching the TDP MMU to do the same is far less complex than modifying KVM to track which roots need to be flushed before reuse. Note, preserving all possible TDP MMU roots is not a concern with respect to memory consumption. Now that the role for direct MMUs doesn't include information about the guest, e.g. CR0.PG, CR0.WP, CR4.SMEP, etc., there are _at most_ six possible roots (where "guest_mode" here means L2): 1. 4-level !SMM !guest_mode 2. 4-level SMM !guest_mode 3. 5-level !SMM !guest_mode 4. 5-level SMM !guest_mode 5. 4-level !SMM guest_mode 6. 5-level !SMM guest_mode And because each vCPU can track 4 valid roots, a VM can already have all 6 root combinations live at any given time. Not to mention that, in practice, no sane VMM will advertise different guest.MAXPHYADDR values across vCPUs, i.e. KVM won't ever use both 4-level and 5-level roots for a single VM. Furthermore, the vast majority of modern hypervisors will utilize EPT/NPT when available, thus the guest_mode=%true cases are also unlikely to be utilized. Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Link: https://lore.kernel.org/all/959c5bce-beb5-b463-7158-33fc4a4f910c@linux.microsoft.com Link: https://lkml.kernel.org/r/20220209170020.1775368-1-pbonzini%40redhat.com Link: https://lore.kernel.org/all/20230322013731.102955-1-minipli@grsecurity.net Link: https://lore.kernel.org/all/000000000000a0bc2b05f9dd7fab@google.com Link: https://lore.kernel.org/all/000000000000eca0b905fa0f7756@google.com Cc: Ben Gardon <bgardon@google.com> Cc: David Matlack <dmatlack@google.com> Cc: stable@vger.kernel.org Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Link: https://lore.kernel.org/r/20230426220323.3079789-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-26Merge tag 'kvm-x86-vmx-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM VMX changes for 6.4: - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - Misc cleanups
2023-04-26Merge tag 'kvm-x86-svm-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.4: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts
2023-04-26Merge tag 'kvm-x86-selftests-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM selftests, and an AMX/XCR0 bugfix, for 6.4: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Overhaul the AMX selftests to improve coverage and cleanup the test - Misc cleanups
2023-04-26Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 PMU changes for 6.4: - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - Misc cleanups and fixes
2023-04-26Merge tag 'kvm-x86-mmu-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 MMU changes for 6.4: - Tweak FNAME(sync_spte) to avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to share the .sync_page() implementation, i.e. utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Misc cleanups
2023-04-26Merge tag 'kvm-x86-misc-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 changes for 6.4: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Misc cleanups
2023-04-26Merge tag 'kvmarm-6.4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.4 - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements.
2023-04-26Merge tag 'v6.4-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Total usage stats now include all that returned errors (instead of just some) - Remove maximum hash statesize limit - Add cloning support for hmac and unkeyed hashes - Demote BUG_ON in crypto_unregister_alg to a WARN_ON Algorithms: - Use RIP-relative addressing on x86 to prepare for PIE build - Add accelerated AES/GCM stitched implementation on powerpc P10 - Add some test vectors for cmac(camellia) - Remove failure case where jent is unavailable outside of FIPS mode in drbg - Add permanent and intermittent health error checks in jitter RNG Drivers: - Add support for 402xx devices in qat - Add support for HiSTB TRNG - Fix hash concurrency issues in stm32 - Add OP-TEE firmware support in caam" * tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (139 commits) i2c: designware: Add doorbell support for Mendocino i2c: designware: Use PCI PSP driver for communication powerpc: Move Power10 feature PPC_MODULE_FEATURE_P10 crypto: p10-aes-gcm - Remove POWER10_CPU dependency crypto: testmgr - Add some test vectors for cmac(camellia) crypto: cryptd - Add support for cloning hashes crypto: cryptd - Convert hash to use modern init_tfm/exit_tfm crypto: hmac - Add support for cloning crypto: hash - Add crypto_clone_ahash/shash crypto: api - Add crypto_clone_tfm crypto: api - Add crypto_tfm_get crypto: x86/sha - Use local .L symbols for code crypto: x86/crc32 - Use local .L symbols for code crypto: x86/aesni - Use local .L symbols for code crypto: x86/sha256 - Use RIP-relative addressing crypto: x86/ghash - Use RIP-relative addressing crypto: x86/des3 - Use RIP-relative addressing crypto: x86/crc32c - Use RIP-relative addressing crypto: x86/cast6 - Use RIP-relative addressing crypto: x86/cast5 - Use RIP-relative addressing ...
2023-04-24Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fget updates from Al Viro: "fget() to fdget() conversions" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse_dev_ioctl(): switch to fdget() cgroup_get_from_fd(): switch to fdget_raw() bpf: switch to fdget_raw() build_mount_idmapped(): switch to fdget() kill the last remaining user of proc_ns_fget() SVM-SEV: convert the rest of fget() uses to fdget() in there convert sgx_set_attribute() to fdget()/fdput() convert setns(2) to fdget()/fdput()
2023-04-24Merge tag 'rcu.6.4.april5.2023.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux Pull RCU updates from Joel Fernandes: - Updates and additions to MAINTAINERS files, with Boqun being added to the RCU entry and Zqiang being added as an RCU reviewer. I have also transitioned from reviewer to maintainer; however, Paul will be taking over sending RCU pull-requests for the next merge window. - Resolution of hotplug warning in nohz code, achieved by fixing cpu_is_hotpluggable() through interaction with the nohz subsystem. Tick dependency modifications by Zqiang, focusing on fixing usage of the TICK_DEP_BIT_RCU_EXP bitmask. - Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n kernels, fixed by Zqiang. - Improvements to rcu-tasks stall reporting by Neeraj. - Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for increased robustness, affecting several components like mac802154, drbd, vmw_vmci, tracing, and more. A report by Eric Dumazet showed that the API could be unknowingly used in an atomic context, so we'd rather make sure they know what they're asking for by being explicit: https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/ - Documentation updates, including corrections to spelling, clarifications in comments, and improvements to the srcu_size_state comments. - Better srcu_struct cache locality for readers, by adjusting the size of srcu_struct in support of SRCU usage by Christoph Hellwig. - Teach lockdep to detect deadlocks between srcu_read_lock() vs synchronize_srcu() contributed by Boqun. Previously lockdep could not detect such deadlocks, now it can. - Integration of rcutorture and rcu-related tools, targeted for v6.4 from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis module parameter, and more - Miscellaneous changes, various code cleanups and comment improvements * tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits) checkpatch: Error out if deprecated RCU API used mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep() ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep() net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep() net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep() tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep() misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep() drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep() rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan() rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early rcu: Remove never-set needwake assignment from rcu_report_qs_rdp() rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race rcu/trace: use strscpy() to instead of strncpy() tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem ...
2023-04-20SVM-SEV: convert the rest of fget() uses to fdget() in thereAl Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-04-11KVM: x86: Filter out XTILE_CFG if XTILE_DATA isn't permittedSean Christopherson
Filter out XTILE_CFG from the supported XCR0 reported to userspace if the current process doesn't have access to XTILE_DATA. Attempting to set XTILE_CFG in XCR0 will #GP if XTILE_DATA is also not set, and so keeping XTILE_CFG as supported results in explosions if userspace feeds KVM_GET_SUPPORTED_CPUID back into KVM and the guest doesn't sanity check CPUID. Fixes: 445ecdf79be0 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID") Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Aaron Lewis <aaronlewis@google.com> Tested-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230405004520.421768-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-11KVM: x86: Add a helper to handle filtering of unpermitted XCR0 featuresAaron Lewis
Add a helper, kvm_get_filtered_xcr0(), to dedup code that needs to account for XCR0 features that require explicit opt-in on a per-process basis. In addition to documenting when KVM should/shouldn't consult xstate_get_guest_group_perm(), the helper will also allow sanitizing the filtered XCR0 to avoid enumerating architecturally illegal XCR0 values, e.g. XTILE_CFG without XTILE_DATA. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Mingwei Zhang <mizhang@google.com> [sean: rename helper, move to x86.h, massage changelog] Reviewed-by: Aaron Lewis <aaronlewis@google.com> Tested-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20230405004520.421768-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-11KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not interceptedSean Christopherson
Extend VMX's nested intercept logic for emulated instructions to handle "pause" interception, in quotes because KVM's emulator doesn't filter out NOPs when checking for nested intercepts. Failure to allow emulation of NOPs results in KVM injecting a #UD into L2 on any NOP that collides with the emulator's definition of PAUSE, i.e. on all single-byte NOPs. For PAUSE itself, honor L1's PAUSE-exiting control, but ignore PLE to avoid unnecessarily injecting a #UD into L2. Per the SDM, the first execution of PAUSE after VM-Entry is treated as the beginning of a new loop, i.e. will never trigger a PLE VM-Exit, and so L1 can't expect any given execution of PAUSE to deterministically exit. ... the processor considers this execution to be the first execution of PAUSE in a loop. (It also does so for the first execution of PAUSE at CPL 0 after VM entry.) All that said, the PLE side of things is currently a moot point, as KVM doesn't expose PLE to L1. Note, vmx_check_intercept() is still wildly broken when L1 wants to intercept an instruction, as KVM injects a #UD instead of synthesizing a nested VM-Exit. That issue extends far beyond NOP/PAUSE and needs far more effort to fix, i.e. is a problem for the future. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Cc: Mathias Krause <minipli@grsecurity.net> Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230405002359.418138-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10KVM: x86/mmu: Refresh CR0.WP prior to checking for emulated permission faultsSean Christopherson
Refresh the MMU's snapshot of the vCPU's CR0.WP prior to checking for permission faults when emulating a guest memory access and CR0.WP may be guest owned. If the guest toggles only CR0.WP and triggers emulation of a supervisor write, e.g. when KVM is emulating UMIP, KVM may consume a stale CR0.WP, i.e. use stale protection bits metadata. Note, KVM passes through CR0.WP if and only if EPT is enabled as CR0.WP is part of the MMU role for legacy shadow paging, and SVM (NPT) doesn't support per-bit interception controls for CR0. Don't bother checking for EPT vs. NPT as the "old == new" check will always be true under NPT, i.e. the only cost is the read of vcpu->arch.cr4 (SVM unconditionally grabs CR0 from the VMCB on VM-Exit). Reported-by: Mathias Krause <minipli@grsecurity.net> Link: https://lkml.kernel.org/r/677169b4-051f-fcae-756b-9a3e1bb9f8fe%40grsecurity.net Fixes: fb509f76acc8 ("KVM: VMX: Make CR0.WP a guest owned bit") Tested-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20230405002608.418442-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10KVM: x86/mmu: Move filling of Hyper-V's TLB range struct into Hyper-V codeSean Christopherson
Refactor Hyper-V's range-based TLB flushing API to take a gfn+nr_pages pair instead of a struct, and bury said struct in Hyper-V specific code. Passing along two params generates much better code for the common case where KVM is _not_ running on Hyper-V, as forwarding the flush on to Hyper-V's hv_flush_remote_tlbs_range() from kvm_flush_remote_tlbs_range() becomes a tail call. Cc: David Matlack <dmatlack@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230405003133.419177-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10KVM: x86: Rename Hyper-V remote TLB hooks to match established schemeSean Christopherson
Rename the Hyper-V hooks for TLB flushing to match the naming scheme used by all the other TLB flushing hooks, e.g. in kvm_x86_ops, vendor code, arch hooks from common code, etc. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230405003133.419177-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: x86/pmu: Prevent the PMU from counting disallowed eventsAaron Lewis
When counting "Instructions Retired" (0xc0) in a guest, KVM will occasionally increment the PMU counter regardless of if that event is being filtered. This is because some PMU events are incremented via kvm_pmu_trigger_event(), which doesn't know about the event filter. Add the event filter to kvm_pmu_trigger_event(), so events that are disallowed do not increment their counters. Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230307141400.1486314-2-aaronlewis@google.com [sean: prepend "pmc" to the new function] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: x86/pmu: Fix a typo in kvm_pmu_request_counter_reprogam()Like Xu
Fix a "reprogam" => "reprogram" typo in kvm_pmu_request_counter_reprogam(). Fixes: 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230310113349.31799-1-likexu@tencent.com [sean: trim the changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: Rewrite reprogram_counters() to improve performanceLike Xu
A valid pmc is always tested before using pmu->reprogram_pmi. Eliminate this part of the redundancy by setting the counter's bitmask directly, and in addition, trigger KVM_REQ_PMU only once to save more cpu cycles. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214050757.9623-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpersSean Christopherson
Invert the flows in intel_pmu_{g,s}et_msr()'s case statements so that they follow the kernel's preferred style of: if (<not valid>) return <error> <commit change> return <success> which is also the style used by every other {g,s}et_msr() helper (except AMD's PMU variant, which doesn't use a switch statement). Modify the "set" paths with costly side effects, i.e. that reprogram counters, to skip only the side effects, i.e. to perform reserved bits checks even if the value is unchanged. None of the reserved bits checks are expensive, so there's no strong justification for skipping them, and guarding only the side effect makes it slightly more obvious what is being skipped and why. No functional change intended (assuming no reserved bit bugs). Link: https://lkml.kernel.org/r/Y%2B6cfen%2FCpO3%2FdLO%40google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()Like Xu
The name of function pmc_is_enabled() is a bit misleading. A PMC can be disabled either by PERF_CLOBAL_CTRL or by its corresponding EVTSEL. Append global semantics to its name. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214050757.9623-2-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: Zero out LBR capabilities during PMU refreshSean Christopherson
Zero out the LBR capabilities during PMU refresh to avoid exposing LBRs to the guest against userspace's wishes. If userspace modifies the guest's CPUID model or invokes KVM_CAP_PMU_CAPABILITY to disable vPMU after an initial KVM_SET_CPUID2, but before the first KVM_RUN, KVM will retain the previous LBR info due to bailing before refreshing the LBR descriptor. Note, this is a very theoretical bug, there is no known use case where a VMM would deliberately enable the vPMU via KVM_SET_CPUID2, and then later disable the vPMU. Link: https://lore.kernel.org/r/20230311004618.920745-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has runSean Christopherson
Now that KVM disallows changing feature MSRs, i.e. PERF_CAPABILITIES, after running a vCPU, WARN and bug the VM if the PMU is refreshed after the vCPU has run. Note, KVM has disallowed CPUID updates after running a vCPU since commit feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN"), i.e. PERF_CAPABILITIES was the only remaining way to trigger a PMU refresh after KVM_RUN. Cc: Like Xu <like.xu.linux@gmail.com> Link: https://lore.kernel.org/r/20230311004618.920745-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Disallow writes to immutable feature MSRs after KVM_RUNSean Christopherson
Disallow writes to feature MSRs after KVM_RUN to prevent userspace from changing the vCPU model after running the vCPU. Similar to guest CPUID, KVM uses feature MSRs to configure intercepts, determine what operations are/aren't allowed, etc. Changing the capabilities while the vCPU is active will at best yield unpredictable guest behavior, and at worst could be dangerous to KVM. Allow writing the current value, e.g. so that userspace can blindly set all MSRs when emulating RESET, and unconditionally allow writes to MSR_IA32_UCODE_REV so that userspace can emulate patch loads. Special case the VMX MSRs to keep the generic list small, i.e. so that KVM can do a linear walk of the generic list without incurring meaningful overhead. Cc: Like Xu <like.xu.linux@gmail.com> Cc: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Generate set of VMX feature MSRs using first/last definitionsSean Christopherson
Add VMX MSRs to the runtime list of feature MSRs by iterating over the range of emulated MSRs instead of manually defining each MSR in the "all" list. Using the range definition reduces the cost of emulating a new VMX MSR, e.g. prevents forgetting to add an MSR to the list. Extracting the VMX MSRs from the "all" list, which is a compile-time constant, also shrinks the list to the point where the compiler can heavily optimize code that iterates over the list. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Add macros to track first...last VMX feature MSRsSean Christopherson
Add macros to track the range of VMX feature MSRs that are emulated by KVM to reduce the maintenance cost of extending the set of emulated MSRs. Note, KVM doesn't necessarily emulate all known/consumed VMX MSRs, e.g. PROCBASED_CTLS3 is consumed by KVM to enable IPI virtualization, but is not emulated as KVM doesn't emulate/virtualize IPI virtualization for nested guests. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Add a helper to query whether or not a vCPU has ever runSean Christopherson
Add a helper to query if a vCPU has run so that KVM doesn't have to open code the check on last_vmentry_cpu being set to a magic value. No functional change intended. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: x86: Rename kvm_init_msr_list() to clarify it inits multiple listsSean Christopherson
Rename kvm_init_msr_list() to kvm_init_msr_lists() to clarify that it initializes multiple lists: MSRs to save, emulated MSRs, and feature MSRs. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20230311004618.920745-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06KVM: SVM: Return the local "r" variable from svm_set_msr()Sean Christopherson
Rename "r" to "ret" and actually return it from svm_set_msr() to reduce the probability of repeating the mistake of commit 723d5fb0ffe4 ("kvm: svm: Add IA32_FLUSH_CMD guest support"), which set "r" thinking that it would be propagated to the caller. Alternatively, the declaration of "r" could be moved into the handling of MSR_TSC_AUX, but that risks variable shadowing in the future. A wrapper for kvm_set_user_return_msr() would allow eliding a local variable, but that feels like delaying the inevitable. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Virtualize FLUSH_L1D and passthrough MSR_IA32_FLUSH_CMDSean Christopherson
Virtualize FLUSH_L1D so that the guest can use the performant L1D flush if one of the many mitigations might require a flush in the guest, e.g. Linux provides an option to flush the L1D when switching mms. Passthrough MSR_IA32_FLUSH_CMD for write when it's supported in hardware and exposed to the guest, i.e. always let the guest write it directly if FLUSH_L1D is fully supported. Forward writes to hardware in host context on the off chance that KVM ends up emulating a WRMSR, or in the really unlikely scenario where userspace wants to force a flush. Restrict these forwarded WRMSRs to the known command out of an abundance of caution. Passing through the MSR means the guest can throw any and all values at hardware, but doing so in host context is arguably a bit more dangerous. Link: https://lkml.kernel.org/r/CALMp9eTt3xzAEoQ038bJQ9LN0ZOXrSWsN7xnNUD%2B0SS%3DWwF7Pg%40mail.gmail.com Link: https://lore.kernel.org/all/20230201132905.549148-2-eesposit@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Move MSR_IA32_PRED_CMD WRMSR emulation to common codeSean Christopherson
Dedup the handling of MSR_IA32_PRED_CMD across VMX and SVM by moving the logic to kvm_set_msr_common(). Now that the MSR interception toggling is handled as part of setting guest CPUID, the VMX and SVM paths are identical. Opportunistically massage the code to make it a wee bit denser. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20230322011440.2195485-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: SVM: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230322011440.2195485-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: VMX: Passthrough MSR_IA32_PRED_CMD based purely on host+guest CPUIDSean Christopherson
Passthrough MSR_IA32_PRED_CMD based purely on whether or not the MSR is supported and enabled, i.e. don't wait until the first write. There's no benefit to deferred passthrough, and the extra logic only adds complexity. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20230322011440.2195485-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-06KVM: x86: Revert MSR_IA32_FLUSH_CMD.FLUSH_L1D enablingSean Christopherson
Revert the recently added virtualizing of MSR_IA32_FLUSH_CMD, as both the VMX and SVM are fatally buggy to guests that use MSR_IA32_FLUSH_CMD or MSR_IA32_PRED_CMD, and because the entire foundation of the logic is flawed. The most immediate problem is an inverted check on @cmd that results in rejecting legal values. SVM doubles down on bugs and drops the error, i.e. silently breaks all guest mitigations based on the command MSRs. The next issue is that neither VMX nor SVM was updated to mark MSR_IA32_FLUSH_CMD as being a possible passthrough MSR, which isn't hugely problematic, but does break MSR filtering and triggers a WARN on VMX designed to catch this exact bug. The foundational issues stem from the MSR_IA32_FLUSH_CMD code reusing logic from MSR_IA32_PRED_CMD, which in turn was likely copied from KVM's support for MSR_IA32_SPEC_CTRL. The copy+paste from MSR_IA32_SPEC_CTRL was misguided as MSR_IA32_PRED_CMD (and MSR_IA32_FLUSH_CMD) is a write-only MSR, i.e. doesn't need the same "deferred passthrough" shenanigans as MSR_IA32_SPEC_CTRL. Revert all MSR_IA32_FLUSH_CMD enabling in one fell swoop so that there is no point where KVM advertises, but does not support, L1D_FLUSH. This reverts commits 45cf86f26148e549c5ba4a8ab32a390e4bde216e, 723d5fb0ffe4c02bd4edf47ea02c02e454719f28, and a807b78ad04b2eaa348f52f5cc7702385b6de1ee. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/20230317190432.GA863767%40dev-arch.thelio-3990X Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Message-Id: <20230322011440.2195485-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-05KVM: x86/pmu: Disallow legacy LBRs if architectural LBRs are availableSean Christopherson
Disallow enabling LBR support if the CPU supports architectural LBRs. Traditional LBR support is absent on CPU models that have architectural LBRs, and KVM doesn't yet support arch LBRs, i.e. KVM will pass through non-existent MSRs if userspace enables LBRs for the guest. Cc: stable@vger.kernel.org Cc: Yang Weijiang <weijiang.yang@intel.com> Cc: Like Xu <like.xu.linux@gmail.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES") Tested-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230128001427.2548858-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05KVM: x86: set "mitigate_smt_rsb" storage-class-specifier to staticTom Rix
smatch reports arch/x86/kvm/x86.c:199:20: warning: symbol 'mitigate_smt_rsb' was not declared. Should it be static? This variable is only used in one file so it should be static. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230404010141.1913667-1-trix@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshedLike Xu
The kvm_pmu_refresh() may be called repeatedly (e.g. configure guest CPUID repeatedly or update MSR_IA32_PERF_CAPABILITIES) and each call will use the last pmu->all_valid_pmc_idx value, with the residual bits introducing additional overhead later in the vPMU emulation. Fixes: b35e5548b411 ("KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230404071759.75376-1-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05KVM: VMX: Use is_64_bit_mode() to check 64-bit mode in SGX handlerBinbin Wu
sgx_get_encls_gva() uses is_long_mode() to check 64-bit mode, however, SGX system leaf instructions are valid in compatibility mode, should use is_64_bit_mode() instead. Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions") Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230404032502.27798-1-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-05kvm: Remove "select SRCU"Paul E. McKenney
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements from the various KVM Kconfig files. Acked-by: Sean Christopherson <seanjc@google.com> (x86) Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <kvm@vger.kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> (arm64) Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Anup Patel <anup@brainfault.org> (riscv) Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390) Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALLOliver Upton
The 'longmode' field is a bit annoying as it blows an entire __u32 to represent a boolean value. Since other architectures are looking to add support for KVM_EXIT_HYPERCALL, now is probably a good time to clean it up. Redefine the field (and the remaining padding) as a set of flags. Preserve the existing ABI by using bit 0 to indicate if the guest was in long mode and requiring that the remaining 31 bits must be zero. Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-2-oliver.upton@linux.dev
2023-04-04KVM: x86/mmu: Merge all handle_changed_pte*() functionsVipin Sharma
Merge __handle_changed_pte() and handle_changed_spte_acc_track() into a single function, handle_changed_pte(), as the two are always used together. Remove the existing handle_changed_pte(), as it's just a wrapper that calls __handle_changed_pte() and handle_changed_spte_acc_track(). Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Remove handle_changed_spte_dirty_log()Vipin Sharma
Remove handle_changed_spte_dirty_log() as there is no code flow which sets 4KiB SPTE writable and hit this path. This function marks the page dirty in a memslot only if new SPTE is 4KiB in size and writable. Current users of handle_changed_spte_dirty_log() are: 1. set_spte_gfn() - Create only non writable SPTEs. 2. write_protect_gfn() - Change an SPTE to non writable. 3. zap leaf and roots APIs - Everything is 0. 4. handle_removed_pt() - Sets SPTEs to REMOVED_SPTE 5. tdp_mmu_link_sp() - Makes non leaf SPTEs. There is also no path which creates a writable 4KiB without going through make_spte() and this functions takes care of marking SPTE dirty in the memslot if it is PT_WRITABLE. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: add blurb to __handle_changed_spte()'s comment] Link: https://lore.kernel.org/r/20230321220021.2119033-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Remove "record_acc_track" in __tdp_mmu_set_spte()Vipin Sharma
Remove bool parameter "record_acc_track" from __tdp_mmu_set_spte() and refactor the code. This variable is always set to true by its caller. Remove single and double underscore prefix from tdp_mmu_set_spte() related APIs: 1. Change __tdp_mmu_set_spte() to tdp_mmu_set_spte() 2. Change _tdp_mmu_set_spte() to tdp_mmu_iter_set_spte() Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230321220021.2119033-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Bypass __handle_changed_spte() when aging TDP MMU SPTEsVipin Sharma
Drop everything except the "tdp_mmu_spte_changed" tracepoint part of __handle_changed_spte() when aging SPTEs in the TDP MMU, as clearing the accessed status doesn't affect the SPTE's shadow-present status, whether or not the SPTE is a leaf, or change the PFN. I.e. none of the functional updates handled by __handle_changed_spte() are relevant. Losing __handle_changed_spte()'s sanity checks does mean that a bug could theoretical go unnoticed, but that scenario is extremely unlikely, e.g. would effectively require a misconfigured MMU or a locking bug elsewhere. Link: https://lore.kernel.org/all/Y9HcHRBShQgjxsQb@google.com Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-04KVM: x86/mmu: Drop unnecessary dirty log checks when aging TDP MMU SPTEsVipin Sharma
Drop the unnecessary call to handle dirty log updates when aging TDP MMU SPTEs, as neither clearing the Accessed bit nor marking a SPTE for access tracking can _set_ the Writable bit, i.e. can't trigger marking a gfn dirty in its memslot. The access tracking path can _clear_ the Writable bit, e.g. if the XCHG races with fast_page_fault() and writes the stale value without the Writable bit set, but clearing the Writable bit outside of mmu_lock is not allowed, i.e. access tracking can't spuriously set the Writable bit. Signed-off-by: Vipin Sharma <vipinsh@google.com> [sean: split to separate patch, apply to dirty path, write changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>