summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/lapic.h
AgeCommit message (Collapse)Author
6 daysMerge tag 'kvm-x86-apic-6.17' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM local APIC changes for 6.17 Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC.
2025-07-10x86/apic: KVM: Move lapic set/clear_vector() helpers to common codeNeeraj Upadhyay
Move apic_clear_vector() and apic_set_vector() helper functions to apic.h in order to reuse them in the Secure AVIC guest APIC driver in later patches to atomically set/clear vectors in the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-13-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Move lapic get/set helpers to common codeNeeraj Upadhyay
Move the apic_get_reg(), apic_set_reg(), apic_get_reg64() and apic_set_reg64() helper functions to apic.h in order to reuse them in the Secure AVIC guest APIC driver in later patches to read/write registers from/to the APIC backing page. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-12-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename lapic set/clear vector helpersNeeraj Upadhyay
In preparation for moving kvm-internal kvm_lapic_set_vector(), kvm_lapic_clear_vector() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-10-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename lapic get/set_reg() helpersNeeraj Upadhyay
In preparation for moving kvm-internal __kvm_lapic_set_reg(), __kvm_lapic_get_reg() to apic.h for use in Secure AVIC APIC driver, rename them as part of the APIC API. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-8-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Change lapic regs base address to void pointerNeeraj Upadhyay
Change APIC base address from "char *" to "void *" in KVM lapic's set/get helper functions. Pointer arithmetic for "void *" and "char *" operate identically. With "void *" there is less of a chance of doing the wrong thing, e.g. neglecting to cast and reading a byte instead of the desired APIC register size. No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-6-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Rename VEC_POS/REG_POS macro usagesNeeraj Upadhyay
In preparation for moving most of the KVM's lapic helpers which use VEC_POS/REG_POS macros to common APIC header for use in Secure AVIC APIC driver, rename all VEC_POS/REG_POS macro usages to APIC_VECTOR_TO_BIT_NUMBER/APIC_VECTOR_TO_REG_OFFSET and remove VEC_POS/REG_POS. While at it, clean up line wrap in find_highest_vector(). No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250709033242.267892-5-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10x86/apic: KVM: Deduplicate APIC vector => register+bit mathSean Christopherson
Consolidate KVM's {REG,VEC}_POS() macros and lapic_vector_set_in_irr()'s open coded equivalent logic in anticipation of the kernel gaining more usage of vector => reg+bit lookups. Use lapic_vector_set_in_irr()'s math as using divides for both the bit number and register offset makes it easier to connect the dots, and for at least one user, fixup_irqs(), "/ 32 * 0x10" generates ever so slightly better code with gcc-14 (shaves a whole 3 bytes from the code stream): ((v) >> 5) << 4: c1 ef 05 shr $0x5,%edi c1 e7 04 shl $0x4,%edi 81 c7 00 02 00 00 add $0x200,%edi (v) / 32 * 0x10: c1 ef 05 shr $0x5,%edi 83 c7 20 add $0x20,%edi c1 e7 04 shl $0x4,%edi Keep KVM's tersely named macros as "wrappers" to avoid unnecessary churn in KVM, and because the shorter names yield more readable code overall in KVM. The new macros type cast the vector parameter to "unsigned int". This is required from better code generation for cases where an "int" is passed to these macros in KVM code. int v; ((v) >> 5) << 4: c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((v) / 32 * 0x10): 85 ff test %edi,%edi 8d 47 1f lea 0x1f(%rdi),%eax 0f 49 c7 cmovns %edi,%eax c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax ((unsigned int)(v) / 32 * 0x10): c1 f8 05 sar $0x5,%eax c1 e0 04 shl $0x4,%eax (v) & (32 - 1): 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax (v) % 32 89 fa mov %edi,%edx c1 fa 1f sar $0x1f,%edx c1 ea 1b shr $0x1b,%edx 8d 04 17 lea (%rdi,%rdx,1),%eax 83 e0 1f and $0x1f,%eax 29 d0 sub %edx,%eax (unsigned int)(v) % 32: 89 f8 mov %edi,%eax 83 e0 1f and $0x1f,%eax Overall kvm.ko text size is impacted if "unsigned int" is not used. Bin Orig New (w/o unsigned int) New (w/ unsigned int) lapic.o 28580 28772 28580 kvm.o 670810 671002 670810 kvm.ko 708079 708271 708079 No functional change intended. [Neeraj: Type cast vec macro param to "unsigned int", provide data in commit log on "unsigned int" requirement] Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-4-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-10KVM: x86: Remove redundant parentheses around 'bitmap'Neeraj Upadhyay
When doing pointer arithmetic in apic_test_vector() and kvm_lapic_{set|clear}_vector(), remove the unnecessary parentheses surrounding the 'bitmap' parameter. No functional change intended. Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Link: https://lore.kernel.org/r/20250709033242.267892-3-Neeraj.Upadhyay@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Move definition of X2APIC_MSR() to lapic.hSean Christopherson
Dedup the definition of X2APIC_MSR and put it in the local APIC code where it belongs. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250610225737.156318-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24x86/irq: KVM: Track PIR bitmap as an "unsigned long" arraySean Christopherson
Track the PIR bitmap in posted interrupt descriptor structures as an array of unsigned longs instead of using unionized arrays for KVM (u32s) versus IRQ management (u64s). In practice, because the non-KVM usage is (sanely) restricted to 64-bit kernels, all existing usage of the u64 variant is already working with unsigned longs. Using "unsigned long" for the array will allow reworking KVM's processing of the bitmap to read/write in 64-bit chunks on 64-bit kernels, i.e. will allow optimizing KVM by reducing the number of atomic accesses to PIR. Opportunstically replace the open coded literals in the posted MSIs code with the appropriate macro. Deliberately don't use ARRAY_SIZE() in the for-loops, even though it would be cleaner from a certain perspective, in anticipation of decoupling the processing from the array declaration. No functional change intended. Link: https://lore.kernel.org/r/20250401163447.846608-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-03-14KVM: TDX: Add support for find pending IRQ in a protected local APICSean Christopherson
Add flag and hook to KVM's local APIC management to support determining whether or not a TDX guest has a pending IRQ. For TDX vCPUs, the virtual APIC page is owned by the TDX module and cannot be accessed by KVM. As a result, registers that are virtualized by the CPU, e.g. PPR, cannot be read or written by KVM. To deliver interrupts for TDX guests, KVM must send an IRQ to the CPU on the posted interrupt notification vector. And to determine if TDX vCPU has a pending interrupt, KVM must check if there is an outstanding notification. Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is protected to short-circuit the various other flows that try to pull an IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is pending, KVM can't do anything based on _which_ IRQ is pending. Intentionally omit sanity checks from other flows, e.g. PPR update, so as not to degrade non-TDX guests with unnecessary checks. A well-behaved KVM and userspace will never reach those flows for TDX guests, but reaching them is not fatal if something does go awry. For the TD exits not due to HLT TDCALL, skip checking RVI pending in tdx_protected_apic_has_interrupt(). Except for the guest being stupid (e.g., non-HLT TDCALL in an interrupt shadow), it's not even possible to have an interrupt in RVI that is fully unmasked. There is no any CPU flows that modify RVI in the middle of instruction execution. I.e. if RVI is non-zero, then either the interrupt has been pending since before the TD exit, or the instruction caused the TD exit is in an STI/SS shadow. KVM doesn't care about STI/SS shadows outside of the HALTED case. And if the interrupt was pending before TD exit, then it _must_ be blocked, otherwise the interrupt would have been serviced at the instruction boundary. For the HLT TDCALL case, it will be handled in a future patch when HLT TDCALL is supported. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Message-ID: <20250222014757.897978-2-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VIDChao Gao
If KVM emulates an EOI for L1's virtual APIC while L2 is active, defer updating GUEST_INTERUPT_STATUS.SVI, i.e. the VMCS's cache of the highest in-service IRQ, until L1 is active, as vmcs01, not vmcs02, needs to track vISR. The missed SVI update for vmcs01 can result in L1 interrupts being incorrectly blocked, e.g. if there is a pending interrupt with lower priority than the interrupt that was EOI'd. This bug only affects use cases where L1's vAPIC is effectively passed through to L2, e.g. in a pKVM scenario where L2 is L1's depriveleged host, as KVM will only emulate an EOI for L1's vAPIC if Virtual Interrupt Delivery (VID) is disabled in vmc12, and L1 isn't intercepting L2 accesses to its (virtual) APIC page (or if x2APIC is enabled, the EOI MSR). WARN() if KVM updates L1's ISR while L2 is active with VID enabled, as an EOI from L2 is supposed to affect L2's vAPIC, but still defer the update, to try to keep L1 alive. Specifically, KVM forwards all APICv-related VM-Exits to L1 via nested_vmx_l1_wants_exit(): case EXIT_REASON_APIC_ACCESS: case EXIT_REASON_APIC_WRITE: case EXIT_REASON_EOI_INDUCED: /* * The controls for "virtualize APIC accesses," "APIC- * register virtualization," and "virtual-interrupt * delivery" only come from vmcs12. */ return true; Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvm/20230312180048.1778187-1-jason.cj.chen@intel.com Reported-by: Markku Ahvenjärvi <mankku@gmail.com> Closes: https://lore.kernel.org/all/20240920080012.74405-1-mankku@gmail.com Cc: Janne Karhunen <janne.karhunen@gmail.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: drop request, handle in VMX, write changelog] Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241128000010.4051275-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Unpack msr_data structure prior to calling kvm_apic_set_base()Sean Christopherson
Pass in the new value and "host initiated" as separate parameters to kvm_apic_set_base(), as forcing the KVM_SET_SREGS path to declare and fill an msr_data structure is awkward and kludgy, e.g. __set_sregs_common() doesn't even bother to set the proper MSR index. No functional change intended. Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241101183555.1794700-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Make kvm_recalculate_apic_map() local to lapic.cSean Christopherson
Make kvm_recalculate_apic_map() local to lapic.c now that all external callers are gone. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-8-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Rename APIC base setters to better capture their relationshipSean Christopherson
Rename kvm_set_apic_base() and kvm_lapic_set_base() to kvm_apic_set_base() and __kvm_apic_set_base() respectively to capture that the underscores version is a "special" variant (it exists purely to avoid recalculating the optimized map multiple times when stuffing the RESET value). Opportunistically add a comment explaining why kvm_lapic_reset() uses the inner helper. Note, KVM deliberately invokes kvm_arch_vcpu_create() while kvm->lock is NOT held so that vCPU setup isn't serialized if userspace is creating multiple/all vCPUs in parallel. I.e. triggering an extra recalculation is not limited to theoretical/rare edge cases, and so is worth avoiding. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-7-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Inline kvm_get_apic_mode() in lapic.hSean Christopherson
Inline kvm_get_apic_mode() in lapic.h to avoid a CALL+RET as well as an export. The underlying kvm_apic_mode() helper is public information, i.e. there is no state/information that needs to be hidden from vendor modules. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-5-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Get vcpu->arch.apic_base directly and drop kvm_get_apic_base()Sean Christopherson
Access KVM's emulated APIC base MSR value directly instead of bouncing through a helper, as there is no reason to add a layer of indirection, and there are other MSRs with a "set" but no "get", e.g. EFER. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-4-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-17Merge tag 'kvm-x86-vmx-6.12' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM VMX changes for 6.12: - Set FINAL/PAGE in the page fault error code for EPT Violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical. - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2. - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures. - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible). - Minor SGX fix and a cleanup.
2024-09-09KVM: x86: Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt()Sean Christopherson
Fold kvm_get_apic_interrupt() into kvm_cpu_get_interrupt() now that nVMX essentially open codes kvm_get_apic_interrupt() in order to correctly emulate nested posted interrupts. Opportunistically stop exporting kvm_cpu_get_interrupt(), as the aforementioned nVMX flow was the only user in vendor code. Link: https://lore.kernel.org/r/20240906043413.1049633-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09KVM: x86: Move "ack" phase of local APIC IRQ delivery to separate APISean Christopherson
Split the "ack" phase, i.e. the movement of an interrupt from IRR=>ISR, out of kvm_get_apic_interrupt() and into a separate API so that nested VMX can acknowledge a specific interrupt _after_ emulating a VM-Exit from L2 to L1. To correctly emulate nested posted interrupts while APICv is active, KVM must: 1. find the highest pending interrupt. 2. check if that IRQ is L2's notification vector 3. emulate VM-Exit if the IRQ is NOT the notification vector 4. ACK the IRQ in L1 _after_ VM-Exit When APICv is active, the process of moving the IRQ from the IRR to the ISR also requires a VMWRITE to update vmcs01.GUEST_INTERRUPT_STATUS.SVI, and so acknowledging the interrupt before switching to vmcs01 would result in marking the IRQ as in-service in the wrong VMCS. KVM currently fudges around this issue by doing kvm_get_apic_interrupt() smack dab in the middle of emulating VM-Exit, but that hack doesn't play nice with nested posted interrupts, as notification vector IRQs don't trigger a VM-Exit in the first place. Cc: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240906043413.1049633-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-09-09KVM: x86: Remove some unused declarationsYue Haibing
Commit 238adc77051a ("KVM: Cleanup LAPIC interface") removed kvm_lapic_get_base() but leave declaration. And other two declarations were never implenmented since introduction. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240830022537.2403873-1-yuehaibing@huawei.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-07-16KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_opsWei Wang
Introduces kvm_x86_call(), to streamline the usage of static calls of kvm_x86_ops. The current implementation of these calls is verbose and could lead to alignment challenges. This makes the code susceptible to exceeding the "80 columns per single line of code" limit as defined in the coding-style document. Another issue with the existing implementation is that the addition of kvm_x86_ prefix to hooks at the static_call sites hinders code readability and navigation. kvm_x86_call() is added to improve code readability and maintainability, while adhering to the coding style guidelines. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Link: https://lore.kernel.org/r/20240507133103.15052-3-wei.w.wang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.11 - Add a global struct to consolidate tracking of host values, e.g. EFER, and move "shadow_phys_bits" into the structure as "maxphyaddr". - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. - Print the name of the APICv/AVIC inhibits in the relevant tracepoint. - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. - Misc cleanups
2024-06-05KVM: x86: Make nanoseconds per APIC bus cycle a VM variableIsaku Yamahata
Introduce the VM variable "nanoseconds per APIC bus cycle" in preparation to make the APIC bus frequency configurable. The TDX architecture hard-codes the core crystal clock frequency to 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture does not allow the VMM to override the value. In addition, per Intel SDM: "The APIC timer frequency will be the processor’s bus clock or core crystal clock frequency (when TSC/core crystal clock ratio is enumerated in CPUID leaf 0x15) divided by the value specified in the divide configuration register." The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded APIC bus frequency of 1GHz. Introduce the VM variable "nanoseconds per APIC bus cycle" to prepare for allowing userspace to tell KVM to use the frequency that TDX mandates instead of the default 1Ghz. Doing so ensures that the guest doesn't have a conflicting view of the APIC bus frequency. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> [reinette: rework changelog] Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/ae75ce37c6c38bb4efd10a0a41932984c40b24ac.1714081726.git.reinette.chatre@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-05KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-VIsaku Yamahata
Remove APIC_BUS_FREQUENCY and calculate it based on nanoseconds per APIC bus cycle. APIC_BUS_FREQUENCY is used only for HV_X64_MSR_APIC_FREQUENCY. The MSR is not frequently read, calculate it every time. There are two constants related to the APIC bus frequency: APIC_BUS_FREQUENCY and APIC_BUS_CYCLE_NS. Only one value is required because one can be calculated from the other: APIC_BUS_CYCLES_NS = 1000 * 1000 * 1000 / APIC_BUS_FREQUENCY. Remove APIC_BUS_FREQUENCY and instead calculate it when needed. This prepares for support of configurable APIC bus frequency by requiring to change only a single variable. Suggested-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> [reinette: rework changelog] Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/76a659d0898e87ebd73ee7c922f984a87a6ab370.1714081726.git.reinette.chatre@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-06-03KVM: x86: Drop support for hand tuning APIC timer advancement from userspaceSean Christopherson
Remove support for specifying a static local APIC timer advancement value, and instead present a read-only boolean parameter to let userspace enable or disable KVM's dynamic APIC timer advancement. Realistically, it's all but impossible for userspace to specify an advancement that is more precise than what KVM's adaptive tuning can provide. E.g. a static value needs to be tuned for the exact hardware and kernel, and if KVM is using hrtimers, likely requires additional tuning for the exact configuration of the entire system. Dropping support for a userspace provided value also fixes several flaws in the interface. E.g. KVM interprets a negative value other than -1 as a large advancement, toggling between a negative and positive value yields unpredictable behavior as vCPUs will switch from dynamic to static advancement, changing the advancement in the middle of VM creation can result in different values for vCPUs within a VM, etc. Those flaws are mostly fixable, but there's almost no justification for taking on yet more complexity (it's minimal complexity, but still non-zero). The only arguments against using KVM's adaptive tuning is if a setup needs a higher maximum, or if the adjustments are too reactive, but those are arguments for letting userspace control the absolute max advancement and the granularity of each adjustment, e.g. similar to how KVM provides knobs for halt polling. Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com Cc: Shuling Zhou <zhoushuling@huawei.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240522010304.1650603-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-24KVM: x86: Split out logic to generate "readable" APIC regs mask to helperSean Christopherson
Move the generation of the readable APIC regs bitmask to a standalone helper so that VMX can use the mask for its MSR interception bitmaps. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230107011025.565472-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-13KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabledSean Christopherson
Free the APIC access page memslot if any vCPU enables x2APIC and SVM's AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with x2APIC enabled. On AMD, if its "hybrid" mode is enabled (AVIC is enabled when x2APIC is enabled even without x2AVIC support), keeping the APIC access page memslot results in the guest being able to access the virtual APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware that the guest is operating in x2APIC mode. Exempt nested SVM's update of APICv state from the new logic as x2APIC can't be toggled on VM-Exit. In practice, invoking the x2APIC logic should be harmless precisely because it should be a glorified nop, but play it safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU lock. Intel doesn't suffer from the same issue as APICv has fully independent VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM should provide bus error semantics and not memory semantics for the APIC page when x2APIC is enabled, but KVM already provides memory semantics in other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware disabled (via APIC_BASE MSR). Note, checking apic_access_memslot_enabled without taking locks relies it being set during vCPU creation (before kvm_vcpu_reset()). vCPUs can race to set the inhibit and delete the memslot, i.e. can get false positives, but can't get false negatives as apic_access_memslot_enabled can't be toggled "on" once any vCPU reaches KVM_RUN. Opportunistically drop the "can" while updating avic_activate_vmcb()'s comment, i.e. to state that KVM _does_ support the hybrid mode. Move the "Note:" down a line to conform to preferred kernel/KVM multi-line comment style. Opportunistically update the apicv_update_lock comment, as it isn't actually used to protect apic_access_memslot_enabled (which is protected by slots_lock). Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13KVM: x86: Move APIC access page helper to common x86 codeSean Christopherson
Move the APIC access page allocation helper function to common x86 code, the allocation routine is virtually identical between APICv (VMX) and AVIC (SVM). Keep APICv's gfn_to_page() + put_page() sequence, which verifies that a backing page can be allocated, i.e. that the system isn't under heavy memory pressure. Forcing the backing page to be populated isn't strictly necessary, but skipping the effective prefetch only delays the inevitable. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-23KVM: x86: Simplify kvm_apic_hw_enabledPeng Hao
kvm_apic_hw_enabled() only needs to return bool, there is no place to use the return value of MSR_IA32_APICBASE_ENABLE. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <CAPm50aJ=BLXNWT11+j36Dd6d7nz2JmOBk4u7o_NPQ0N61ODu1g@mail.gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86: start moving SMM-related functions to new filesPaolo Bonzini
Create a new header and source with code related to system management mode emulation. Entry and exit will move there too; for now, opportunistically rename put_smstate to PUT_SMSTATE while moving it to smm.h, and adjust the SMM state saving code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specificSean Christopherson
Rename kvm_apic_has_events() to kvm_apic_has_pending_init_or_sipi() so that it's more obvious that "events" really just means "INIT or SIPI". Opportunistically clean up a weirdly worded comment that referenced kvm_apic_has_events() instead of kvm_apic_accept_events(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowedSean Christopherson
Rename and invert kvm_vcpu_latch_init() to kvm_apic_init_sipi_allowed() so as to match the behavior of {interrupt,nmi,smi}_allowed(), and expose the helper so that it can be used by kvm_vcpu_has_events() to determine whether or not an INIT or SIPI is pending _and_ can be taken immediately. Opportunistically replaced usage of the "latch" terminology with "blocked" and/or "allowed", again to align with KVM's terminology used for all other event types. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-08KVM: x86: Fix handling of APIC LVT updates when userspace changes MCG_CAPSean Christopherson
Add a helper to update KVM's in-kernel local APIC in response to MCG_CAP being changed by userspace to fix multiple bugs. First and foremost, KVM needs to check that there's an in-kernel APIC prior to dereferencing vcpu->arch.apic. Beyond that, any "new" LVT entries need to be masked, and the APIC version register needs to be updated as it reports out the number of LVT entries. Fixes: 4b903561ec49 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.") Reported-by: syzbot+8cdad6430c24f396f158@syzkaller.appspotmail.com Cc: Siddh Raman Pant <code@siddh.me> Cc: Jue Wang <juew@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-06-24KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.Jue Wang
This patch calculates the number of lvt entries as part of KVM_X86_MCE_SETUP conditioned on the presence of MCG_CMCI_P bit in MCG_CAP and stores result in kvm_lapic. It translats from APIC_LVTx register to index in lapic_lvt_entry enum. It extends the APIC_LVTx macro as well as other lapic write/reset handling etc to support Corrected Machine Check Interrupt. Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-5-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: Add APIC_LVTx() macro.Jue Wang
An APIC_LVTx macro is introduced to calcualte the APIC_LVTx register offset based on the index in the lapic_lvt_entry enum. Later patches will extend the APIC_LVTx macro to support the APIC_LVTCMCI register in order to implement Corrected Machine Check Interrupt signaling. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-4-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24KVM: x86: Fill apic_lvt_mask with enums / explicit entries.Jue Wang
This patch defines a lapic_lvt_entry enum used as explicit indices to the apic_lvt_mask array. In later patches a LVT_CMCI will be added to implement the Corrected Machine Check Interrupt signaling. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-3-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: x86: Use lapic_in_kernel() to query in-kernel APIC in APICv helperSean Christopherson
Use lapic_in_kernel() in kvm_vcpu_apicv_active() to take advantage of the kvm_has_noapic_vcpu static branch. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20KVM: x86: Move "apicv_active" into "struct kvm_lapic"Sean Christopherson
Move the per-vCPU apicv_active flag into KVM's local APIC instance. APICv is fully dependent on an in-kernel local APIC, but that's not at all clear when reading the current code due to the flag being stored in the generic kvm_vcpu_arch struct. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614230548.3852141-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25KVM: LAPIC: Trace LAPIC timer expiration on every vmentryWanpeng Li
In commit ec0671d5684a ("KVM: LAPIC: Delay trace_kvm_wait_lapic_expire tracepoint to after vmexit", 2019-06-04), trace_kvm_wait_lapic_expire was moved after guest_exit_irqoff() because invoking tracepoints within kvm_guest_enter/kvm_guest_exit caused a lockdep splat. These days this is not necessary, because commit 87fa7f3e98a1 ("x86/kvm: Move context tracking where it belongs", 2020-07-09) restricted the RCU extended quiescent state to be closer to vmentry/vmexit. Moving the tracepoint back to __kvm_wait_lapic_expire is more accurate, because it will be reported even if vcpu_enter_guest causes multiple vmentries via the IPI/Timer fast paths, and it allows the removal of advance_expire_delta. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1650961551-38390-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Make kvm_lapic_set_reg() a "private" xAPIC helperSean Christopherson
Hide the lapic's "raw" write helper inside lapic.c to force non-APIC code to go through proper helpers when modification the vAPIC state. Keep the read helper visible to outsiders for now, refactoring KVM to hide it too is possible, it will just take more work to do so. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-01KVM: x86: Make kvm_lapic_reg_{read,write}() staticSean Christopherson
Make the low level read/write lapic helpers static, any accesses to the local APIC from vendor code or non-APIC code should be routed through proper helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220204214205.3306634-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-11KVM: x86: Rename kvm_lapic_enable_pv_eoi()Vitaly Kuznetsov
kvm_lapic_enable_pv_eoi() is a misnomer as the function is also used to disable PV EOI. Rename it to kvm_lapic_set_pv_eoi(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211108152819.12485-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Add a return code to kvm_apic_accept_eventsJim Mattson
No functional change intended. At present, the only negative value returned by kvm_check_nested_events is -EBUSY. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-6-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Always use to_hv_vcpu() accessor to get to 'struct ↵Vitaly Kuznetsov
kvm_vcpu_hv' As a preparation to allocating Hyper-V context dynamically, make it clear who's the user of the said context. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Drop unused kvm_hv_vapic_assist_page_enabled()Vitaly Kuznetsov
kvm_hv_vapic_assist_page_enabled() seems to be unused since its introduction in commit 10388a07164c1 ("KVM: Add HYPER-V apic access MSRs"), drop it. Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: Stop using deprecated jump label APIsCun Li
The use of 'struct static_key' and 'static_key_false' is deprecated. Use the new API. Signed-off-by: Cun Li <cun.jia.li@gmail.com> Message-Id: <20210111152435.50275-1-cun.jia.li@gmail.com> [Make it compile. While at it, rename kvm_no_apic_vcpu to kvm_has_noapic_vcpu; the former reads too much like "true if no vCPU has an APIC". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: nVMX: Morph notification vector IRQ on nested VM-Enter to pending PISean Christopherson
On successful nested VM-Enter, check for pending interrupts and convert the highest priority interrupt to a pending posted interrupt if it matches L2's notification vector. If the vCPU receives a notification interrupt before nested VM-Enter (assuming L1 disables IRQs before doing VM-Enter), the pending interrupt (for L1) should be recognized and processed as a posted interrupt when interrupts become unblocked after VM-Enter to L2. This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is trying to inject an interrupt into L2 by setting the appropriate bit in L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's method of manually moving the vector from PIR->vIRR/RVI). KVM will observe the IPI while the vCPU is in L1 context and so won't immediately morph it to a posted interrupt for L2. The pending interrupt will be seen by vmx_check_nested_events(), cause KVM to force an immediate exit after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. After handling the VM-Exit, L1 will see that L2 has a pending interrupt in PIR, send another IPI, and repeat until L2 is killed. Note, posted interrupts require virtual interrupt deliveriy, and virtual interrupt delivery requires exit-on-interrupt, ergo interrupts will be unconditionally unmasked on VM-Enter if posted interrupts are enabled. Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200812175129.12172-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15KVM: x86: introduce kvm_can_use_hv_timerPaolo Bonzini
Replace the ad hoc test in vmx_set_hv_timer with a test in the caller, start_hv_timer. This test is not Intel-specific and would be duplicated when introducing the fast path for the TSC deadline MSR. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>