summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx/nested.c
AgeCommit message (Collapse)Author
2025-02-26KVM: nVMX: Process events on nested VM-Exit if injectable IRQ or NMI is pendingSean Christopherson
Process pending events on nested VM-Exit if the vCPU has an injectable IRQ or NMI, as the event may have become pending while L2 was active, i.e. may not be tracked in the context of vmcs01. E.g. if L1 has passed its APIC through to L2 and an IRQ arrives while L2 is active, then KVM needs to request an IRQ window prior to running L1, otherwise delivery of the IRQ will be delayed until KVM happens to process events for some other reason. The missed failure is detected by vmx_apic_passthrough_tpr_threshold_test in KVM-Unit-Tests, but has effectively been masked due to a flaw in KVM's PIC emulation that causes KVM to make spurious KVM_REQ_EVENT requests (and apparently no one ever ran the test with split IRQ chips). Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250224235542.2562848-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-20Merge tag 'kvm-x86-misc-6.14' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.14: - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature. - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM. - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively. - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU.
2025-01-08KVM: VMX: refactor PML terminologyMaxim Levitsky
Rename PML_ENTITY_NUM to PML_LOG_NR_ENTRIES Add PML_HEAD_INDEX to specify the first entry that CPU writes. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241219221034.903927-2-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Honor event priority when emulating PI delivery during VM-EnterSean Christopherson
Move the handling of a nested posted interrupt notification that is unblocked by nested VM-Enter (unblocks L1 IRQs when ack-on-exit is enabled by L1) from VM-Enter emulation to vmx_check_nested_events(). To avoid a pointless forced immediate exit, i.e. to not regress IRQ delivery latency when a nested posted interrupt is pending at VM-Enter, block processing of the notification IRQ if and only if KVM must block _all_ events. Unlike injected events, KVM doesn't need to actually enter L2 before updating the vIRR and vmcs02.GUEST_INTR_STATUS, as the resulting L2 IRQ will be blocked by hardware itself, until VM-Enter to L2 completes. Note, very strictly speaking, moving the IRQ from L2's PIR to IRR before entering L2 is still technically wrong. But, practically speaking, only an L1 hypervisor or an L0 userspace that is deliberately checking event priority against PIR=>IRR processing can even notice; L2 will see architecturally correct behavior, as KVM ensures the VM-Enter is finished before doing anything that would effectively preempt the PIR=>IRR movement. Reported-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241101191447.1807602-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Use vmcs01's controls shadow to check for IRQ/NMI windows at VM-EnterSean Christopherson
Use vmcs01's execution controls shadow to check for IRQ/NMI windows after a successful nested VM-Enter, instead of snapshotting the information prior to emulating VM-Enter. It's quite difficult to see that the entire reason controls are snapshot prior nested VM-Enter is to read them from vmcs01 (vmcs02 is loaded if nested VM-Enter is successful). That could be solved with a comment, but explicitly using vmcs01's shadow makes the code self-documenting to a certain extent. No functional change intended (vmcs01's execution controls must not be modified during emulation of nested VM-Enter). Link: https://lore.kernel.org/r/20241101191447.1807602-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Drop manual vmcs01.GUEST_INTERRUPT_STATUS.RVI check at VM-EnterSean Christopherson
Drop the manual check for a pending IRQ in vmcs01's RVI field during nested VM-Enter, as the recently added call to kvm_apic_has_interrupt() when checking for pending events after successful VM-Enter is a superset of the RVI check (IRQs that are pending in RVI are also pending in L1's IRR). Link: https://lore.kernel.org/r/20241101191447.1807602-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Check for pending INIT/SIPI after entering non-root modeSean Christopherson
Explicitly check for a pending INIT or SIPI after entering non-root mode during nested VM-Enter emulation, as no VMCS information is quered as part of the check, i.e. there is no need to check for INIT/SIPI while vmcs01 is still loaded. Link: https://lore.kernel.org/r/20241101191447.1807602-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Explicitly update vPPR on successful nested VM-EnterSean Christopherson
Always request pending event evaluation after successful nested VM-Enter if L1 has a pending IRQ, as KVM will effectively do so anyways when APICv is enabled, by way of vmx_has_apicv_interrupt(). This will allow dropping the aforementioned APICv check, and will also allow handling nested Posted Interrupt processing entirely within vmx_check_nested_events(), which is necessary to honor priority between concurrent events. Note, checking for pending IRQs has a subtle side effect, as it results in a PPR update for L1's vAPIC (PPR virtualization does happen at VM-Enter, but for nested VM-Enter that affects L2's vAPIC, not L1's vAPIC). However, KVM updates PPR _constantly_, even when PPR technically shouldn't be refreshed, e.g. kvm_vcpu_has_events() re-evaluates PPR if IRQs are unblocked, by way of the same kvm_apic_has_interrupt() check. Ditto for nested VM-Enter itself, when nested posted interrupts are enabled. Thus, trying to avoid a PPR update on VM-Enter just to be pedantically accurate is ridiculous, given the behavior elsewhere in KVM. Link: https://lore.kernel.org/kvm/20230312180048.1778187-1-jason.cj.chen@intel.com Closes: https://lore.kernel.org/all/20240920080012.74405-1-mankku@gmail.com Signed-off-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241101191447.1807602-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VIDChao Gao
If KVM emulates an EOI for L1's virtual APIC while L2 is active, defer updating GUEST_INTERUPT_STATUS.SVI, i.e. the VMCS's cache of the highest in-service IRQ, until L1 is active, as vmcs01, not vmcs02, needs to track vISR. The missed SVI update for vmcs01 can result in L1 interrupts being incorrectly blocked, e.g. if there is a pending interrupt with lower priority than the interrupt that was EOI'd. This bug only affects use cases where L1's vAPIC is effectively passed through to L2, e.g. in a pKVM scenario where L2 is L1's depriveleged host, as KVM will only emulate an EOI for L1's vAPIC if Virtual Interrupt Delivery (VID) is disabled in vmc12, and L1 isn't intercepting L2 accesses to its (virtual) APIC page (or if x2APIC is enabled, the EOI MSR). WARN() if KVM updates L1's ISR while L2 is active with VID enabled, as an EOI from L2 is supposed to affect L2's vAPIC, but still defer the update, to try to keep L1 alive. Specifically, KVM forwards all APICv-related VM-Exits to L1 via nested_vmx_l1_wants_exit(): case EXIT_REASON_APIC_ACCESS: case EXIT_REASON_APIC_WRITE: case EXIT_REASON_EOI_INDUCED: /* * The controls for "virtualize APIC accesses," "APIC- * register virtualization," and "virtual-interrupt * delivery" only come from vmcs12. */ return true; Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvm/20230312180048.1778187-1-jason.cj.chen@intel.com Reported-by: Markku Ahvenjärvi <mankku@gmail.com> Closes: https://lore.kernel.org/all/20240920080012.74405-1-mankku@gmail.com Cc: Janne Karhunen <janne.karhunen@gmail.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: drop request, handle in VMX, write changelog] Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241128000010.4051275-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_capsSean Christopherson
Switch all queries (except XSAVES) of guest features from guest CPUID to guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls to guest_cpu_cap_has(). Keep guest_cpuid_has() around for XSAVES, but subsume its helper guest_cpuid_get_register() and add a compile-time assertion to prevent using guest_cpuid_has() for any other feature. Add yet another comment for XSAVE to explain why KVM is allowed to query its raw guest CPUID. Opportunistically drop the unused guest_cpuid_clear(), as there should be no circumstance in which KVM needs to _clear_ a guest CPUID feature now that everything is tracked via cpu_caps. E.g. KVM may need to _change_ a feature to emulate dynamic CPUID flags, but KVM should never need to clear a feature in guest CPUID to prevent it from being used by the guest. Delete the last remnants of the governed features framework, as the lone holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for governed vs. ungoverned features. Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when computing reserved CR4 bits is a nop when viewed as a whole, as KVM's capabilities are already incorporated into the calculation, i.e. if a feature is present in guest CPUID but unsupported by KVM, its CR4 bit was already being marked as reserved, checking guest_cpu_cap_has() simply double-stamps that it's a reserved bit. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-51-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"Sean Christopherson
As the first step toward replacing KVM's so-called "governed features" framework with a more comprehensive, less poorly named implementation, replace the "kvm_governed_feature" function prefix with "guest_cpu_cap" and rename guest_can_use() to guest_cpu_cap_has(). The "guest_cpu_cap" naming scheme mirrors that of "kvm_cpu_cap", and provides a more clear distinction between guest capabilities, which are KVM controlled (heh, or one might say "governed"), and guest CPUID, which with few exceptions is fully userspace controlled. Opportunistically rewrite the comment about XSS passthrough for SEV-ES guests to avoid referencing so many functions, as such comments are prone to becoming stale (case in point...). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-40-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-13Merge branch 'kvm-docs-6.13' into HEADPaolo Bonzini
- Drop obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect references to non-existing ioctls - List registers supported by KVM_GET/SET_ONE_REG on s390 - Use rST internal links - Reorganize the introduction to the API document
2024-11-13Merge tag 'kvm-x86-misc-6.13' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.13 - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe, and harden the cache accessors to try to prevent similar issues from occuring in the future. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups
2024-11-04KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabledSean Christopherson
When getting the current VPID, e.g. to emulate a guest TLB flush, return vpid01 if L2 is running but with VPID disabled, i.e. if VPID is disabled in vmcs12. Architecturally, if VPID is disabled, then the guest and host effectively share VPID=0. KVM emulates this behavior by using vpid01 when running an L2 with VPID disabled (see prepare_vmcs02_early_rare()), and so KVM must also treat vpid01 as the current VPID while L2 is active. Unconditionally treating vpid02 as the current VPID when L2 is active causes KVM to flush TLB entries for vpid02 instead of vpid01, which results in TLB entries from L1 being incorrectly preserved across nested VM-Enter to L2 (L2=>L1 isn't problematic, because the TLB flush after nested VM-Exit flushes vpid01). The bug manifests as failures in the vmx_apicv_test KVM-Unit-Test, as KVM incorrectly retains TLB entries for the APIC-access page across a nested VM-Enter. Opportunisticaly add comments at various touchpoints to explain the architectural requirements, and also why KVM uses vpid01 instead of vpid02. All credit goes to Chao, who root caused the issue and identified the fix. Link: https://lore.kernel.org/all/ZwzczkIlYGX+QXJz@intel.com Fixes: 2b4a5a5d5688 ("KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST") Cc: stable@vger.kernel.org Cc: Like Xu <like.xu.linux@gmail.com> Debugged-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241031202011.1580522-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: nVMX: fix canonical check of vmcs12 HOST_RIPMaxim Levitsky
HOST_RIP canonical check should check the L1 of CR4.LA57 stored in the vmcs12 rather than the current L1's because it is legal to change the CR4.LA57 value during VM exit from L2 to L1. This is a theoretical bug though, because it is highly unlikely that a VM exit will change the CR4.LA57 from the value it had on VM entry. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-5-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: model canonical checks more preciselyMaxim Levitsky
As a result of a recent investigation, it was determined that x86 CPUs which support 5-level paging, don't always respect CR4.LA57 when doing canonical checks. In particular: 1. MSRs which contain a linear address, allow full 57-bitcanonical address regardless of CR4.LA57 state. For example: MSR_KERNEL_GS_BASE. 2. All hidden segment bases and GDT/IDT bases also behave like MSRs. This means that full 57-bit canonical address can be loaded to them regardless of CR4.LA57, both using MSRS (e.g GS_BASE) and instructions (e.g LGDT). 3. TLB invalidation instructions also allow the user to use full 57-bit address regardless of the CR4.LA57. Finally, it must be noted that the CPU doesn't prevent the user from disabling 5-level paging, even when the full 57-bit canonical address is present in one of the registers mentioned above (e.g GDT base). In fact, this can happen without any userspace help, when the CPU enters SMM mode - some MSRs, for example MSR_KERNEL_GS_BASE are left to contain a non-canonical address in regard to the new mode. Since most of the affected MSRs and all segment bases can be read and written freely by the guest without any KVM intervention, this patch makes the emulator closely follow hardware behavior, which means that the emulator doesn't take in the account the guest CPUID support for 5-level paging, and only takes in the account the host CPU support. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-4-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-01KVM: x86: drop x86.h include from cpuid.hMaxim Levitsky
Drop x86.h include from cpuid.h to allow the x86.h to include the cpuid.h instead. Also fix various places where x86.h was implicitly included via cpuid.h Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240906221824.491834-2-mlevitsk@redhat.com [sean: fixup a missed include in mtrr.c] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-25KVM: Pass in write/dirty to kvm_vcpu_map(), not kvm_vcpu_unmap()Sean Christopherson
Now that all kvm_vcpu_{,un}map() users pass "true" for @dirty, have them pass "true" as a @writable param to kvm_vcpu_map(), and thus create a read-only mapping when possible. Note, creating read-only mappings can be theoretically slower, as they don't play nice with fast GUP due to the need to break CoW before mapping the underlying PFN. But practically speaking, creating a mapping isn't a super hot path, and getting a writable mapping for reading is weird and confusing. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-34-seanjc@google.com>
2024-10-25KVM: nVMX: Mark vmcs12's APIC access page dirty when unmappingSean Christopherson
Mark the APIC access page as dirty when unmapping it from KVM. The fact that the page _shouldn't_ be written doesn't guarantee the page _won't_ be written. And while the contents are likely irrelevant, the values _are_ visible to the guest, i.e. dropping writes would be visible to the guest (though obviously highly unlikely to be problematic in practice). Marking the map dirty will allow specifying the write vs. read-only when *mapping* the memory, which in turn will allow creating read-only maps. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-33-seanjc@google.com>
2024-10-25KVM: nVMX: Add helper to put (unmap) vmcs12 pagesSean Christopherson
Add a helper to dedup unmapping the vmcs12 pages. This will reduce the amount of churn when a future patch refactors the kvm_vcpu_unmap() API. No functional change intended. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-26-seanjc@google.com>
2024-10-25KVM: nVMX: Drop pointless msr_bitmap_map field from struct nested_vmxSean Christopherson
Remove vcpu_vmx.msr_bitmap_map and instead use an on-stack structure in the one function that uses the map, nested_vmx_prepare_msr_bitmap(). Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-25-seanjc@google.com>
2024-10-25KVM: nVMX: Rely on kvm_vcpu_unmap() to track validity of eVMCS mappingSean Christopherson
Remove the explicit evmptr12 validity check when deciding whether or not to unmap the eVMCS pointer, and instead rely on kvm_vcpu_unmap() to play nice with a NULL map->hva, i.e. to do nothing if the map is invalid. Note, vmx->nested.hv_evmcs_map is zero-allocated along with the rest of vcpu_vmx, i.e. the map starts out invalid/NULL. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-24-seanjc@google.com>
2024-09-17Merge tag 'kvm-x86-vmx-6.12' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM VMX changes for 6.12: - Set FINAL/PAGE in the page fault error code for EPT Violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical. - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2. - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures. - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible). - Minor SGX fix and a cleanup.
2024-09-09KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-EnterSean Christopherson
Explicitly invalidate posted_intr_nv when emulating nested VM-Enter and posted interrupts are disabled to make it clear that posted_intr_nv is valid if and only if nested posted interrupts are enabled, and as a cheap way to harden against KVM bugs. KVM initializes posted_intr_nv to -1 at vCPU creation and resets it to -1 when unloading vmcs12 and/or leaving nested mode, i.e. this is not a bug fix (or at least, it's not intended to be a bug fix). Note, tracking nested.posted_intr_nv as a u16 subtly adds a measure of safety, as it prevents unintentionally matching KVM's informal "no IRQ" vector of -1, stored as a signed int. Because a u16 can be always be represented as a signed int, the effective "invalid" value of posted_intr_nv, 65535, will be preserved as-is when comparing against an int, i.e. will be zero-extended, not sign-extended, and thus won't get a false positive if KVM is buggy and compares posted_intr_nv against -1. Opportunistically add a comment in vmx_deliver_nested_posted_interrupt() to call out that it must check vmx->nested.posted_intr_nv, not the vector in vmcs12, which is presumably the _entire_ reason nested.posted_intr_nv exists. E.g. vmcs12 is a KVM-controlled snapshot, so there are no TOCTOU races to worry about, the only potential badness is if the vCPU leaves nested and frees vmcs12 between the sender checking is_guest_mode() and dereferencing the vmcs12 pointer. Link: https://lore.kernel.org/r/20240906043413.1049633-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injectionSean Christopherson
When synthensizing a nested VM-Exit due to an external interrupt, pend a nested posted interrupt if the external interrupt vector matches L2's PI notification vector, i.e. if the interrupt is a PI notification for L2. This fixes a bug where KVM will incorrectly inject VM-Exit instead of processing nested posted interrupt when IPI virtualization is enabled. Per the SDM, detection of the notification vector doesn't occur until the interrupt is acknowledge and deliver to the CPU core. If the external-interrupt exiting VM-execution control is 1, any unmasked external interrupt causes a VM exit (see Section 26.2). If the "process posted interrupts" VM-execution control is also 1, this behavior is changed and the processor handles an external interrupt as follows: 1. The local APIC is acknowledged; this provides the processor core with an interrupt vector, called here the physical vector. 2. If the physical vector equals the posted-interrupt notification vector, the logical processor continues to the next step. Otherwise, a VM exit occurs as it would normally due to an external interrupt; the vector is saved in the VM-exit interruption-information field. For the most part, KVM has avoided problems because a PI NV for L2 that arrives will L2 is active will be processed by hardware, and KVM checks for a pending notification vector during nested VM-Enter. Thus, to hit the bug, the PI NV interrupt needs to sneak its way into L1's vIRR while L2 is active. Without IPI virtualization, the scenario is practically impossible to hit, modulo L1 doing weird things (see below), as the ordering between vmx_deliver_posted_interrupt() and nested VM-Enter effectively guarantees that either the sender will see the vCPU as being in_guest_mode(), or the receiver will see the interrupt in its vIRR. With IPI virtualization, introduced by commit d588bb9be1da ("KVM: VMX: enable IPI virtualization"), the sending CPU effectively implements a rough equivalent of vmx_deliver_posted_interrupt(), sans the nested PI NV check. If the target vCPU has a valid PID, the CPU will send a PI NV interrupt based on _L1's_ PID, as the sender's because IPIv table points at L1 PIDs. PIR := 32 bytes at PID_ADDR; // under lock PIR[V] := 1; store PIR at PID_ADDR; // release lock NotifyInfo := 8 bytes at PID_ADDR + 32; // under lock IF NotifyInfo.ON = 0 AND NotifyInfo.SN = 0; THEN NotifyInfo.ON := 1; SendNotify := 1; ELSE SendNotify := 0; FI; store NotifyInfo at PID_ADDR + 32; // release lock IF SendNotify = 1; THEN send an IPI specified by NotifyInfo.NDST and NotifyInfo.NV; FI; As a result, the target vCPU ends up receiving an interrupt on KVM's POSTED_INTR_VECTOR while L2 is running, with an interrupt in L1's PIR for L2's nested PI NV. The POSTED_INTR_VECTOR interrupt triggers a VM-Exit from L2 to L0, KVM moves the interrupt from L1's PIR to vIRR, triggers a KVM_REQ_EVENT prior to re-entry to L2, and calls vmx_check_nested_events(), effectively bypassing all of KVM's "early" checks on nested PI NV. Without IPI virtualization, the bug can likely be hit only if L1 programs an assigned device to _post_ an interrupt to L2's notification vector, by way of L1's PID.PIR. Doing so would allow the interrupt to get into L1's vIRR without KVM checking vmcs12's NV. Which is architecturally allowed, but unlikely behavior for a hypervisor. Cc: Zeng Guang <guang.zeng@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20240906043413.1049633-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQSean Christopherson
In the should-be-impossible scenario that kvm_cpu_get_interrupt() doesn't return a valid vector after checking kvm_cpu_has_interrupt(), skip VM-Exit injection to reduce the probability of crashing/confusing L1. Now that KVM gets the IRQ _before_ calling nested_vmx_vmexit(), squashing the VM-Exit injection is trivial since there are no actions that need to be undone. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20240906043413.1049633-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection siteSean Christopherson
Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from nested_vmx_vmexit() to vmx_check_nested_events(), which is subtly the one and only path where KVM invokes nested_vmx_vmexit() with EXIT_REASON_EXTERNAL_INTERRUPT. A future fix will perform a last-minute check on L2's nested posted interrupt notification vector, just before injecting a nested VM-Exit. To handle that scenario correctly, KVM needs to get the interrupt _before_ injecting VM-Exit, as simply querying the highest priority interrupt, via kvm_cpu_has_interrupt(), would result in TOCTOU bug, as a new, higher priority interrupt could arrive between kvm_cpu_has_interrupt() and kvm_cpu_get_interrupt(). Unfortunately, simply moving the call to kvm_cpu_get_interrupt() doesn't suffice, as a VMWRITE to GUEST_INTERRUPT_STATUS.SVI is hiding in kvm_get_apic_interrupt(), and acknowledging the interrupt before nested VM-Exit would cause the VMWRITE to hit vmcs02 instead of vmcs01. Open code a rough equivalent to kvm_cpu_get_interrupt() so that the IRQ is acknowledged after emulating VM-Exit, taking care to avoid the TOCTOU issue described above. Opportunistically convert the WARN_ON() to a WARN_ON_ONCE(). If KVM has a bug that results in a false positive from kvm_cpu_has_interrupt(), spamming dmesg won't help the situation. Note, nested_vmx_reflect_vmexit() can never reflect external interrupts as they are always "wanted" by L0. Link: https://lore.kernel.org/r/20240906043413.1049633-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22KVM: nVMX: Use vmx_segment_cache_clear() instead of open coded equivalentMaxim Levitsky
In prepare_vmcs02_rare(), call vmx_segment_cache_clear() instead of setting segment_cache.bitmask directly. Using the helper minimizes the chances of prepare_vmcs02_rare() doing the wrong thing in the future, e.g. if KVM ends up doing more than just zero the bitmask when purging the cache. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240725175232.337266-2-mlevitsk@redhat.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22KVM: nVMX: Honor userspace MSR filter lists for nested VM-Enter/VM-ExitSean Christopherson
Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if L1 attempts to load/store an MSR via the VMCS MSR lists that userspace has disallowed access to via an MSR filter. Intel already disallows including a handful of "special" MSRs in the VMCS lists, so denying access isn't completely without precedent. More importantly, the behavior is well-defined _and_ can be communicated the end user, e.g. to the customer that owns a VM running as L1 on top of KVM. On the other hand, ignoring userspace MSR filters is all but guaranteed to result in unexpected behavior as the access will hit KVM's internal state, which is likely not up-to-date. Unlike KVM-internal accesses, instruction emulation, and dedicated VMCS fields, the MSRs in the VMCS load/store lists are 100% guest controlled, thus making it all but impossible to reason about the correctness of ignoring the MSR filter. And if userspace *really* wants to deny access to MSRs via the aforementioned scenarios, userspace can hide the associated feature from the guest, e.g. by disabling the PMU to prevent accessing PERF_GLOBAL_CTRL via its VMCS field. But for the MSR lists, KVM is blindly processing MSRs; the MSR filters are the _only_ way for userspace to deny access. This partially reverts commit ac8d6cad3c7b ("KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr"). Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Cc: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20240722235922.3351122-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22KVM: nVMX: Use macros and #defines in vmx_restore_vmx_misc()Xin Li
Use macros in vmx_restore_vmx_misc() instead of open coding everything using BIT_ULL() and GENMASK_ULL(). Opportunistically split feature bits and reserved bits into separate variables, and add a comment explaining the subset logic (it's not immediately obvious that the set of feature bits is NOT the set of _supported_ feature bits). Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog, drop #defines] Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.hSean Christopherson
Move the handful of MSR_IA32_VMX_MISC bit defines that are currently in msr-indx.h to vmx.h so that all of the VMX_MISC defines and wrappers can be found in a single location. Opportunistically use BIT_ULL() instead of open coding hex values, add defines for feature bits that are architecturally defined, and move the defines down in the file so that they are colocated with the helpers for getting fields from VMX_MISC. No functional change intended. Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22KVM: nVMX: Add a helper to encode VMCS info in MSR_IA32_VMX_BASICSean Christopherson
Add a helper to encode the VMCS revision, size, and supported memory types in MSR_IA32_VMX_BASIC, i.e. when synthesizing KVM's supported BASIC MSR value, and delete the now unused VMCS size and memtype shift macros. For a variety of reasons, KVM has shifted (pun intended) to using helpers to *get* information from the VMX MSRs, as opposed to defined MASK and SHIFT macros for direct use. Provide a similar helper for the nested VMX code, which needs to *set* information, so that KVM isn't left with a mix of SHIFT macros and dedicated helpers. Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22KVM: nVMX: Use macros and #defines in vmx_restore_vmx_basic()Xin Li
Use macros in vmx_restore_vmx_basic() instead of open coding everything using BIT_ULL() and GENMASK_ULL(). Opportunistically split feature bits and reserved bits into separate variables, and add a comment explaining the subset logic (it's not immediately obvious that the set of feature bits is NOT the set of _supported_ feature bits). Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog, drop #defines] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-22x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, ↵Sean Christopherson
etc.) Add defines for the architectural memory types that can be shoved into various MSRs and registers, e.g. MTRRs, PAT, VMX capabilities MSRs, EPTPs, etc. While most MSRs/registers support only a subset of all memory types, the values themselves are architectural and identical across all users. Leave the goofy MTRR_TYPE_* definitions as-is since they are in a uapi header, but add compile-time assertions to connect the dots (and sanity check that the msr-index.h values didn't get fat-fingered). Keep the VMX_EPTP_MT_* defines so that it's slightly more obvious that the EPTP holds a single memory type in 3 of its 64 bits; those bits just happen to be 2:0, i.e. don't need to be shifted. Opportunistically use X86_MEMTYPE_WB instead of an open coded '6' in setup_vmcs_config(). No functional change intended. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-07-16Merge tag 'kvm-x86-vmx-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM VMX changes for 6.11 - Remove an unnecessary EPT TLB flush when enabling hardware. - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1). - Misc cleanups
2024-07-16Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.11 - Add a global struct to consolidate tracking of host values, e.g. EFER, and move "shadow_phys_bits" into the structure as "maxphyaddr". - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. - Print the name of the APICv/AVIC inhibits in the relevant tracepoint. - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. - Misc cleanups
2024-06-28KVM: nVMX: Fold requested virtual interrupt check into has_nested_events()Sean Christopherson
Check for a Requested Virtual Interrupt, i.e. a virtual interrupt that is pending delivery, in vmx_has_nested_events() and drop the one-off kvm_x86_ops.guest_apic_has_interrupt() hook. In addition to dropping a superfluous hook, this fixes a bug where KVM would incorrectly treat virtual interrupts _for L2_ as always enabled due to kvm_arch_interrupt_allowed(), by way of vmx_interrupt_blocked(), treating IRQs as enabled if L2 is active and vmcs12 is configured to exit on IRQs, i.e. KVM would treat a virtual interrupt for L2 as a valid wake event based on L1's IRQ blocking status. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-28KVM: nVMX: Check for pending posted interrupts when looking for nested eventsSean Christopherson
Check for pending (and notified!) posted interrupts when checking if L2 has a pending wake event, as fully posted/notified virtual interrupt is a valid wake event for HLT. Note that KVM must check vmx->nested.pi_pending to avoid prematurely waking L2, e.g. even if KVM sees a non-zero PID.PIR and PID.0N=1, the virtual interrupt won't actually be recognized until a notification IRQ is received by the vCPU or the vCPU does (nested) VM-Enter. Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Reported-by: Jim Mattson <jmattson@google.com> Closes: https://lore.kernel.org/all/20231207010302.2240506-1-jmattson@google.com Link: https://lore.kernel.org/r/20240607172609.3205077-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-28KVM: nVMX: Request immediate exit iff pending nested event needs injectionSean Christopherson
When requesting an immediate exit from L2 in order to inject a pending event, do so only if the pending event actually requires manual injection, i.e. if and only if KVM actually needs to regain control in order to deliver the event. Avoiding the "immediate exit" isn't simply an optimization, it's necessary to make forward progress, as the "already expired" VMX preemption timer trick that KVM uses to force a VM-Exit has higher priority than events that aren't directly injected. At present time, this is a glorified nop as all events processed by vmx_has_nested_events() require injection, but that will not hold true in the future, e.g. if there's a pending virtual interrupt in vmcs02.RVI. I.e. if KVM is trying to deliver a virtual interrupt to L2, the expired VMX preemption timer will trigger VM-Exit before the virtual interrupt is delivered, and KVM will effectively hang the vCPU in an endless loop of forced immediate VM-Exits (because the pending virtual interrupt never goes away). Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-28KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vectorSean Christopherson
Add a helper to retrieve the highest pending vector given a Posted Interrupt descriptor. While the actual operation is straightforward, it's surprisingly easy to mess up, e.g. if one tries to reuse lapic.c's find_highest_vector(), which doesn't work with PID.PIR due to the APIC's IRR and ISR component registers being physically discontiguous (they're 4-byte registers aligned at 16-byte intervals). To make PIR handling more consistent with respect to IRR and ISR handling, return -1 to indicate "no interrupt pending". Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-03KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...Sean Christopherson
Add "struct kvm_host_values kvm_host" to hold the various host values that KVM snapshots during initialization. Bundling the host values into a single struct simplifies adding new MSRs and other features with host state/values that KVM cares about, and provides a one-stop shop. E.g. adding a new value requires one line, whereas tracking each value individual often requires three: declaration, definition, and export. No functional change intended. Link: https://lore.kernel.org/r/20240423221521.2923759-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-23KVM: nVMX: Always handle #VEs in L0 (never forward #VEs from L2 to L1)Sean Christopherson
Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0, as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs are KVM's responsibility. Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-23KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE supportSean Christopherson
Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE enabled and a VE info address pointing at pfn 0. Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-09KVM: nVMX: Add a sanity check that nested PML Full stems from EPT ViolationsSean Christopherson
Add a WARN_ON_ONCE() sanity check to verify that a nested PML Full VM-Exit is only synthesized when the original VM-Exit from L2 was an EPT Violation. While KVM can fallthrough to kvm_mmu_do_page_fault() if an EPT Misconfig occurs on a stale MMIO SPTE, KVM should not treat the access as a write (there isn't enough information to know *what* the access was), i.e. KVM should never try to insert a PML entry in that case. Link: https://lore.kernel.org/r/20240209221700.393189-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: x86: Move nEPT exit_qualification field from kvm_vcpu_arch to x86_exceptionSean Christopherson
Move the exit_qualification field that is used to track information about in-flight nEPT violations from "struct kvm_vcpu_arch" to "x86_exception", i.e. associate the information with the actual nEPT violation instead of the vCPU. To handle bits that are pulled from vmcs.EXIT_QUALIFICATION, i.e. that are propagated from the "original" EPT violation VM-Exit, simply grab them from the VMCS on-demand when injecting a nEPT Violation or a PML Full VM-exit. Aside from being ugly, having an exit_qualification field in kvm_vcpu_arch is outright dangerous, e.g. see commit d7f0a00e438d ("KVM: VMX: Report up-to-date exit qualification to userspace"). Opportunstically add a comment to call out that PML Full and EPT Violation VM-Exits use the same bit to report NMI blocking information. Link: https://lore.kernel.org/r/20240209221700.393189-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09KVM: nVMX: Clear EXIT_QUALIFICATION when injecting an EPT MisconfigSean Christopherson
Explicitly clear the EXIT_QUALIFCATION field when injecting an EPT misconfig into L1, as required by the VMX architecture. Per the SDM: This field is saved for VM exits due to the following causes: debug exceptions; page-fault exceptions; start-up IPIs (SIPIs); system-management interrupts (SMIs) that arrive immediately after the execution of I/O instructions; task switches; INVEPT; INVLPG; INVPCID; INVVPID; LGDT; LIDT; LLDT; LTR; SGDT; SIDT; SLDT; STR; VMCLEAR; VMPTRLD; VMPTRST; VMREAD; VMWRITE; VMXON; WBINVD; WBNOINVD; XRSTORS; XSAVES; control-register accesses; MOV DR; I/O instructions; MWAIT; accesses to the APIC-access page; EPT violations; EOI virtualization; APIC-write emulation; page-modification log full; SPP-related events; and instruction timeout. For all other VM exits, this field is cleared. Generating EXIT_QUALIFICATION from vcpu->arch.exit_qualification is wrong for all (two) paths that lead to nested_ept_inject_page_fault(). For EPT violations (the common case), vcpu->arch.exit_qualification will have been set by handle_ept_violation() to vmcs02.EXIT_QUALIFICATION, i.e. contains the information of a EPT violation and thus is likely non-zero. For an EPT misconfig, which can reach FNAME(walk_addr_generic) and thus inject a nEPT misconfig if KVM created an MMIO SPTE that became stale, vcpu->arch.exit_qualification will hold the information from the last EPT violation VM-Exit, as vcpu->arch.exit_qualification is _only_ written by handle_ept_violation(). Fixes: 4704d0befb07 ("KVM: nVMX: Exiting from L2 to L1") Link: https://lore.kernel.org/r/20240209221700.393189-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-03-11Merge tag 'kvm-x86-pmu-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 PMU changes for 6.9: - Fix several bugs where KVM speciously prevents the guest from utilizing fixed counters and architectural event encodings based on whether or not guest CPUID reports support for the _architectural_ encoding. - Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads, priority of VMX interception vs #GP, PMC types in architectural PMUs, etc. - Add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID, i.e. are difficult to validate via KVM-Unit-Tests. - Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting cycles, e.g. when checking if a PMC event needs to be synthesized when skipping an instruction. - Optimize triggering of emulated events, e.g. for "count instructions" events when skipping an instruction, which yields a ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest. - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit.
2024-02-22KVM: x86: Open code all direct reads to guest DR6 and DR7Sean Christopherson
Bite the bullet, and open code all direct reads of DR6 and DR7. KVM currently has a mix of open coded accesses and calls to kvm_get_dr(), which is confusing and ugly because there's no rhyme or reason as to why any particular chunk of code uses kvm_get_dr(). The obvious alternative is to force all accesses through kvm_get_dr(), but it's not at all clear that doing so would be a net positive, e.g. even if KVM ends up wanting/needing to force all reads through a common helper, e.g. to play caching games, the cost of reverting this change is likely lower than the ongoing cost of maintaining weird, arbitrary code. No functional change intended. Cc: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240209220752.388160-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-22KVM: x86: Make kvm_get_dr() return a value, not use an out parameterSean Christopherson
Convert kvm_get_dr()'s output parameter to a return value, and clean up most of the mess that was created by forcing callers to provide a pointer. No functional change intended. Acked-by: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240209220752.388160-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01KVM: x86/pmu: Snapshot event selectors that KVM emulates in softwareSean Christopherson
Snapshot the event selectors for the events that KVM emulates in software, which is currently instructions retired and branch instructions retired. The event selectors a tied to the underlying CPU, i.e. are constant for a given platform even though perf doesn't manage the mappings as such. Getting the event selectors from perf isn't exactly cheap, especially if mitigations are enabled, as at least one indirect call is involved. Snapshot the values in KVM instead of optimizing perf as working with the raw event selectors will be required if KVM ever wants to emulate events that aren't part of perf's uABI, i.e. that don't have an "enum perf_hw_id" entry. Link: https://lore.kernel.org/r/20231110022857.1273836-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>