summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-23KVM: x86: Track irq_bypass_vcpu in common x86 codeSean Christopherson
Track the vCPU that is being targeted for IRQ bypass, a.k.a. for a posted IRQ, in common x86 code. This will allow for additional consolidation of the SVM and VMX code. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-36-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing()Sean Christopherson
Fold kvm_arch_irqfd_route_changed() into kvm_arch_update_irqfd_routing(). Calling arch code to know whether or not to call arch code is absurd. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250611224604.313496-35-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: Don't WARN if updating IRQ bypass route failsSean Christopherson
Don't bother WARNing if updating an IRTE route fails now that vendor code provides much more precise WARNs. The generic WARN doesn't provide enough information to actually debug the problem, and has obviously done nothing to surface the myriad bugs in KVM x86's implementation. Drop all of the associated return code plumbing that existed just so that common KVM could WARN. Link: https://lore.kernel.org/r/20250611224604.313496-34-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23iommu: KVM: Split "struct vcpu_data" into separate AMD vs. Intel structsSean Christopherson
Split the vcpu_data structure that serves as a handoff from KVM to IOMMU drivers into vendor specific structures. Overloading a single structure makes the code hard to read and maintain, is *very* misleading as it suggests that mixing vendors is actually supported, and bastardizing Intel's posted interrupt descriptor address when AMD's IOMMU already has its own structure is quite unnecessary. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-33-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: SVM: Clean up return handling in avic_pi_update_irte()Sean Christopherson
Clean up the return paths for avic_pi_update_irte() now that the refactoring dust has settled. Opportunistically drop the pr_err() on IRTE update failures. Logging that a failure occurred without _any_ context is quite useless. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: x86: Move posted interrupt tracepoint to common codeSean Christopherson
Move the pi_irte_update tracepoint to common x86, and call it whenever the IRTE is modified. Tracing only the modifications that result in an IRQ being posted to a vCPU makes the tracepoint useless for debugging. Drop the vendor specific address; plumbing that into common code isn't worth the trouble, as the address is meaningless without a whole pile of other information that isn't provided in any tracepoint. Link: https://lore.kernel.org/r/20250611224604.313496-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: x86: Dedup AVIC vs. PI code for identifying target vCPUSean Christopherson
Hoist the logic for identifying the target vCPU for a posted interrupt into common x86. The code is functionally identical between Intel and AMD. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: x86: Nullify irqfd->producer after updating IRTEsSean Christopherson
Nullify irqfd->producer (when it's going away) _after_ updating IRTEs so that the producer can be queried during the update. Link: https://lore.kernel.org/r/20250611224604.313496-29-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: x86: Move IRQ routing/delivery APIs from x86.c => irq.cSean Christopherson
Move a bunch of IRQ routing and delivery APIs from x86.c to irq.c. x86.c has grown quite fat, and irq.c is the perfect landing spot. Opportunistically rewrite kvm_arch_irq_bypass_del_producer()'s comment, as the existing comment has several typos and is rather confusing. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20250611224604.313496-28-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: SVM: Extract SVM specific code out of get_pi_vcpu_info()Sean Christopherson
Genericize SVM's get_pi_vcpu_info() so that it can be shared with VMX. The only SVM specific information it provides is the AVIC back page, and that can be trivially retrieved by its sole caller. No functional change intended. Cc: Francesco Lavra <francescolavra.fl@gmail.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: VMX: Stop walking list of routing table entries when updating IRTESean Christopherson
Now that KVM provides the to-be-updated routing entry, stop walking the routing table to find that entry. KVM, via setup_routing_entry() and sanity checked by kvm_get_msi_route(), disallows having a GSI configured to trigger multiple MSIs, i.e. the for-loop can only process one entry. Link: https://lore.kernel.org/r/20250611224604.313496-26-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: SVM: Stop walking list of routing table entries when updating IRTESean Christopherson
Now that KVM explicitly passes the new/current GSI routing to pi_update_irte(), simply use the provided routing entry and stop walking the routing table to find that entry. KVM, via setup_routing_entry() and sanity checked by kvm_get_msi_route(), disallows having a GSI configured to trigger multiple MSIs. I.e. this is subtly a glorified nop, as KVM allows at most one MSI per GSI, the for-loop can only ever process one entry, and that entry is the new/current entry (see the WARN_ON_ONCE() added by "KVM: x86: Pass new routing entries and irqfd when updating IRTEs" to ensure @new matches the entry found in the routing table). Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-25-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23iommu/amd: KVM: SVM: Pass NULL @vcpu_info to indicate "not guest mode"Sean Christopherson
Pass NULL to amd_ir_set_vcpu_affinity() to communicate "don't post to a vCPU" now that there's no need to communicate information back to KVM about the previous vCPU (KVM does its own tracking). Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23iommu/amd: KVM: SVM: Use pi_desc_addr to derive ga_root_ptrSean Christopherson
Use vcpu_data.pi_desc_addr instead of amd_iommu_pi_data.base to get the GA root pointer. KVM is the only source of amd_iommu_pi_data.base, and KVM's one and only path for writing amd_iommu_pi_data.base computes the exact same value for vcpu_data.pi_desc_addr and amd_iommu_pi_data.base, and fills amd_iommu_pi_data.base if and only if vcpu_data.pi_desc_addr is valid, i.e. amd_iommu_pi_data.base is fully redundant. Cc: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blockingSean Christopherson
Add a comment to explain why KVM clears IsRunning when putting a vCPU, even though leaving IsRunning=1 would be ok from a functional perspective. Per Maxim's experiments, a misbehaving VM could spam the AVIC doorbell so fast as to induce a 50%+ loss in performance. Link: https://lore.kernel.org/all/8d7e0d0391df4efc7cb28557297eb2ec9904f1e5.camel@redhat.com Cc: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-22-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: VMX: Suppress PI notifications whenever the vCPU is putSean Christopherson
Suppress posted interrupt notifications (set PID.SN=1) whenever the vCPU is put, i.e. unloaded, not just when the vCPU is preempted, as KVM doesn't do anything in response to a notification IRQ that arrives in the host, nor does KVM rely on the Outstanding Notification (PID.ON) flag when the vCPU is unloaded. And, the cost of scanning the PIR to manually set PID.ON when loading the vCPU is quite small, especially relative to the cost of loading (and unloading) a vCPU. On the flip side, leaving SN clear means a notification for the vCPU will result in a spurious IRQ for the pCPU, even if vCPU task is scheduled out, running in userspace, etc. Even worse, if the pCPU is running a different vCPU, the spurious IRQ could trigger posted interrupt processing for the wrong vCPU, which is technically a violation of the architecture, as setting bits in PIR aren't supposed to be propagated to the vIRR until a notification IRQ is received. The saving grace of the current behavior is that hardware sends notification interrupts if and only if PID.ON=0, i.e. only the first posted interrupt for a vCPU will trigger a spurious IRQ (for each window where the vCPU is unloaded). Ideally, KVM would suppress notifications before enabling IRQs in the VM-Exit, but KVM relies on PID.ON as an indicator that there is a posted interrupt pending in PIR, e.g. in vmx_sync_pir_to_irr(), and sadly there is no way to ask hardware to set PID.ON, but not generate an interrupt. That could be solved by using pi_has_pending_interrupt() instead of checking only PID.ON, but it's not at all clear that would be a performance win, as KVM would end up scanning the entire PIR whenever an interrupt isn't pending. And long term, the spurious IRQ window, i.e. where a vCPU is loaded with IRQs enabled, can effectively be made smaller for hot paths by moving performance critical VM-Exit handlers into the fastpath, i.e. by never enabling IRQs for hot path VM-Exits. Link: https://lore.kernel.org/r/20250611224604.313496-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235Maxim Levitsky
Disable IPI virtualization on AMD Family 17h CPUs (Zen2 and Zen1), as hardware doesn't reliably detect changes to the 'IsRunning' bit during ICR write emulation, and might fail to VM-Exit on the sending vCPU, if IsRunning was recently cleared. The absence of the VM-Exit leads to KVM not waking (or triggering nested VM-Exit of) the target vCPU(s) of the IPI, which can lead to hung vCPUs, unbounded delays in L2 execution, etc. To workaround the erratum, simply disable IPI virtualization, which prevents KVM from setting IsRunning and thus eliminates the race where hardware sees a stale IsRunning=1. As a result, all ICR writes (except when "Self" shorthand is used) will VM-Exit and therefore be correctly emulated by KVM. Disabling IPI virtualization does carry a performance penalty, but benchmarkng shows that enabling AVIC without IPI virtualization is still much better than not using AVIC at all, because AVIC still accelerates posted interrupts and the receiving end of the IPIs. Note, when virtualizing Self-IPIs, the CPU skips reading the physical ID table and updates the vIRR directly (because the vCPU is by definition actively running), i.e. Self-IPI isn't susceptible to the erratum *and* is still accelerated by hardware. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [sean: rebase, massage changelog, disallow user override] Acked-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Add enable_ipiv param, never set IsRunning if disabledMaxim Levitsky
Let userspace "disable" IPI virtualization for AVIC via the enable_ipiv module param, by never setting IsRunning. SVM doesn't provide a way to disable IPI virtualization in hardware, but by ensuring CPUs never see IsRunning=1, every IPI in the guest (except for self-IPIs) will generate a VM-Exit. To avoid setting the real IsRunning bit, while still allowing KVM to use each vCPU's entry to update GA log entries, simply maintain a shadow of the entry, without propagating IsRunning updates to the real table when IPI virtualization is disabled. Providing a way to effectively disable IPI virtualization will allow KVM to safely enable AVIC on hardware that is susceptible to erratum #1235, which causes hardware to sometimes fail to detect that the IsRunning bit has been cleared by software. Note, the table _must_ be fully populated, as broadcast IPIs skip invalid entries, i.e. won't generate VM-Exit if every entry is invalid, and so simply pointing the VMCB at a common dummy table won't work. Alternatively, KVM could allocate a shadow of the entire table, but that'd be a waste of 4KiB since the per-vCPU entry doesn't actually consume an additional 8 bytes of memory (vCPU structures are large enough that they are backed by order-N pages). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [sean: keep "entry" variables, reuse enable_ipiv, split from erratum] Link: https://lore.kernel.org/r/20250611224604.313496-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: VMX: Move enable_ipiv knob to common x86Sean Christopherson
Move enable_ipiv to common x86 so that it can be reused by SVM to control IPI virtualization when AVIC is enabled. SVM doesn't actually provide a way to truly disable IPI virtualization, but KVM can get close enough by skipping the necessary table programming. Link: https://lore.kernel.org/r/20250611224604.313496-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Drop superfluous "cache" of AVIC Physical ID entry pointerSean Christopherson
Drop the vCPU's pointer to its AVIC Physical ID entry, and simply index the table directly. Caching a pointer address is completely unnecessary for performance, and while the field technically caches the result of the pointer calculation, it's all too easy to misinterpret the name and think that the field somehow caches the _data_ in the table. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Track AVIC tables as natively sized pointers, not "struct pages"Sean Christopherson
Allocate and track AVIC's logical and physical tables as u32 and u64 pointers respectively, as managing the pages as "struct page" pointers adds an almost absurd amount of boilerplate and complexity. E.g. with page_address() out of the way, svm->avic_physical_id_cache becomes completely superfluous, and will be removed in a future cleanup. No functional change intended. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Acked-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Drop redundant check in AVIC code on ID during vCPU creationSean Christopherson
Drop avic_get_physical_id_entry()'s compatibility check on the incoming ID, as its sole caller, avic_init_backing_page(), performs the exact same check. Drop avic_get_physical_id_entry() entirely as the only remaining functionality is getting the address of the Physical ID table, and accessing the array without an immediate bounds check is kludgy. Opportunistically add a compile-time assertion to ensure the vcpu_id can't result in a bounds overflow, e.g. if KVM (really) messed up a maximum physical ID #define, as well as run-time assertions so that a NULL pointer dereference is morphed into a safer WARN(). No functional change intended. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Inhibit AVIC if ID is too big instead of rejecting vCPU creationSean Christopherson
Inhibit AVIC with a new "ID too big" flag if userspace creates a vCPU with an ID that is too big, but otherwise allow vCPU creation to succeed. Rejecting KVM_CREATE_VCPU with EINVAL violates KVM's ABI as KVM advertises that the max vCPU ID is 4095, but disallows creating vCPUs with IDs bigger than 254 (AVIC) or 511 (x2AVIC). Alternatively, KVM could advertise an accurate value depending on which AVIC mode is in use, but that wouldn't really solve the underlying problem, e.g. would be a breaking change if KVM were to ever try and enable AVIC or x2AVIC by default. Cc: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Drop vcpu_svm's pointless avic_backing_page fieldSean Christopherson
Drop vcpu_svm's avic_backing_page pointer and instead grab the physical address of KVM's vAPIC page directly from the source. Getting a physical address from a kernel virtual address is not an expensive operation, and getting the physical address from a struct page is *more* expensive for CONFIG_SPARSEMEM=y kernels. Regardless, none of the paths that consume the address are hot paths, i.e. shaving cycles is not a priority. Eliminating the "cache" means KVM doesn't have to worry about the cache being invalid, which will simplify a future fix when dealing with vCPU IDs that are too big. WARN if KVM attempts to allocate a vCPU's AVIC backing page without an in-kernel local APIC. avic_init_vcpu() bails early if the APIC is not in-kernel, and KVM disallows enabling an in-kernel APIC after vCPUs have been created, i.e. it should be impossible to reach avic_init_backing_page() without the vAPIC being allocated. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Add helper to deduplicate code for getting AVIC backing pageSean Christopherson
Add a helper to get the physical address of the AVIC backing page, both to deduplicate code and to prepare for getting the address directly from apic->regs, at which point it won't be all that obvious that the address in question is what SVM calls the AVIC backing page. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Drop pointless masking of kernel page pa's with AVIC HPA masksSean Christopherson
Drop AVIC_HPA_MASK and all its users, the mask is just the 4KiB-aligned maximum theoretical physical address for x86-64 CPUs, as x86-64 is currently defined (going beyond PA52 would require an entirely new paging mode, which would arguably create a new, different architecture). All usage in KVM masks the result of page_to_phys(), which on x86-64 is guaranteed to be 4KiB aligned and a legal physical address; if either of those requirements doesn't hold true, KVM has far bigger problems. Drop masking the avic_backing_page with AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK for all the same reasons, but keep the macro even though it's unused in functional code. It's a distinct architectural define, and having the definition in software helps visualize the layout of an entry. And to be hyper-paranoid about MAXPA going beyond 52, add a compile-time assert to ensure the kernel's maximum supported physical address stays in bounds. The unnecessary masking in avic_init_vmcb() also incorrectly assumes that SME's C-bit resides between bits 51:11; that holds true for current CPUs, but isn't required by AMD's architecture: In some implementations, the bit used may be a physical address bit Key word being "may". Opportunistically use the GENMASK_ULL() version for AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK, which is far more readable than a set of repeating Fs. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Drop pointless masking of default APIC base when setting V_APIC_BARSean Christopherson
Drop VMCB_AVIC_APIC_BAR_MASK, it's just a regurgitation of the maximum theoretical 4KiB-aligned physical address, i.e. is not novel in any way, and its only usage is to mask the default APIC base, which is 4KiB aligned and (obviously) a legal physical address. No functional change intended. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://lore.kernel.org/r/20250611224604.313496-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Delete IRTE link from previous vCPU irrespective of new routingSean Christopherson
Delete the IRTE link from the previous vCPU irrespective of the new routing state, i.e. even if the IRTE won't be configured to post IRQs to a vCPU. Whether or not the new route is postable as no bearing on the *old* route. Failure to delete the link can result in KVM incorrectly updating the IRTE, e.g. if the "old" vCPU is scheduled in/out. Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt") Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20iommu/amd: KVM: SVM: Delete now-unused cached/previous GA tag fieldsSean Christopherson
Delete the amd_ir_data.prev_ga_tag field now that all usage is superfluous. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Delete IRTE link from previous vCPU before setting new IRTESean Christopherson
Delete the previous per-vCPU IRTE link prior to modifying the IRTE. If forcing the IRTE back to remapped mode fails, the IRQ is already broken; keeping stale metadata won't change that, and the IOMMU should be sufficiently paranoid to sanitize the IRTE when the IRQ is freed and reallocated. This will allow hoisting the vCPU tracking to common x86, which in turn will allow most of the IRTE update code to be deduplicated. Tested-by: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: SVM: Track per-vCPU IRTEs using kvm_kernel_irqfd structureSean Christopherson
Track the IRTEs that are posting to an SVM vCPU via the associated irqfd structure and GSI routing instead of dynamically allocating a separate data structure. In addition to eliminating an atomic allocation, this will allow hoisting much of the IRTE update logic to common x86. Cc: Sairaj Kodilkar <sarunkod@amd.com> Link: https://lore.kernel.org/r/20250611224604.313496-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: Pass new routing entries and irqfd when updating IRTEsSean Christopherson
When updating IRTEs in response to a GSI routing or IRQ bypass change, pass the new/current routing information along with the associated irqfd. This will allow KVM x86 to harden, simplify, and deduplicate its code. Since adding/removing a bypass producer is now conveniently protected with irqfds.lock, i.e. can't run concurrently with kvm_irq_routing_update(), use the routing information cached in the irqfd instead of looking up the information in the current GSI routing tables. Opportunistically convert an existing printk() to pr_info() and put its string onto a single line (old code that strictly adhered to 80 chars). Link: https://lore.kernel.org/r/20250611224604.313496-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Fold irq_comm.c into irq.cSean Christopherson
Drop irq_comm.c, a.k.a. common IRQ APIs, as there has been no non-x86 user since commit 003f7de62589 ("KVM: ia64: remove") (at the time, irq_comm.c lived in virt/kvm, not arch/x86/kvm). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Move IRQ mask notifier infrastructure to I/O APIC emulationSean Christopherson
Move the IRQ mask logic to ioapic.c as KVM's only user is its in-kernel I/O APIC emulation. In addition to encapsulating more I/O APIC specific code, trimming down irq_comm.c helps pave the way for removing it entirely. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: selftests: Fall back to split IRQ chip if full in-kernel chip is ↵Sean Christopherson
unsupported Now that KVM x86 allows compiling out support for in-kernel I/O APIC (and PIC and PIT) emulation, i.e. allows disabling KVM_CREATE_IRQCHIP for all intents and purposes, fall back to a split IRQ chip for x86 if creating the full in-kernel version fails with ENOTTY. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: Squash two CONFIG_HAVE_KVM_IRQCHIP #ifdefs into oneSean Christopherson
Squash two #idef CONFIG_HAVE_KVM_IRQCHIP regions in KVM's trace events, as the only code outside of the #idefs depends on CONFIG_KVM_IOAPIC, and that Kconfig only exists for x86, which unconditionally selects HAVE_KVM_IRQCHIP. No functional change intended. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APICSean Christopherson
Add a Kconfig to allow building KVM without support for emulating a I/O APIC, PIC, and PIT, which is desirable for deployments that effectively don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to create an in-kernel I/O APIC. E.g. compiling out support eliminates a few thousand lines of guest-facing code and gives security folks warm fuzzies. As a bonus, wrapping relevant paths with CONFIG_KVM_IOAPIC #ifdefs makes it much easier for readers to understand which bits and pieces exist specifically for fully in-kernel IRQ chips. Opportunistically convert all two in-kernel uses of __KVM_HAVE_IOAPIC to CONFIG_KVM_IOAPIC, e.g. rather than add a second #ifdef to generate a stub for kvm_arch_post_irq_routing_update(). Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: Move x86-only tracepoints to x86's trace.hSean Christopherson
Move the I/O APIC tracepoints and trace_kvm_msi_set_irq() to x86, as __KVM_HAVE_IOAPIC is just code for "x86", and trace_kvm_msi_set_irq() isn't unique to I/O APIC emulation. Opportunistically clean up the absurdly messy #includes in ioapic.c. No functional change intended. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Explicitly check for in-kernel PIC when getting ExtINTSean Christopherson
Explicitly check for an in-kernel PIC when checking for a pending ExtINT in the PIC. Effectively swapping the split vs. full irqchip logic will allow guarding the in-kernel I/O APIC (and PIC) emulation with a Kconfig, and also makes it more obvious that kvm_pic_read_irq() won't result in a NULL pointer dereference. Opportunistically add WARNs in the fallthrough path, mostly to document that the userspace ExtINT logic is only relevant to split IRQ chips. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Don't clear PIT's IRQ line status when destroying PITSean Christopherson
Don't bother clearing the PIT's IRQ line status when destroying the PIT, as userspace can't possibly rely on KVM to lower the IRQ line in any sane use case, and it's not at all obvious that clearing the PIT's IRQ line is correct/desirable in kvm_create_pit()'s error path. When called from kvm_arch_pre_destroy_vm(), the entire VM is being torn down and thus {kvm_pic,kvm_ioapic}.irq_states are unreachable. As for the error path in kvm_create_pit(), the only way the PIT's bit in irq_states can be set is if userspace raises the associated IRQ before KVM_CREATE_PIT{2} completes. Forcefully clearing the bit would clobber userspace's input, nonsensical though that input may be. Not to mention that no known VMM will continue on if PIT creation fails. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Hardcode the PIT IRQ source ID to '2'Sean Christopherson
Hardcode the PIT's source IRQ ID to '2' instead of "finding" that bit 2 is always the first available bit in irq_sources_bitmap. Bits 0 and 1 are set/reserved by kvm_arch_init_vm(), i.e. long before kvm_create_pit() can be invoked, and KVM allows at most one in-kernel PIT instance, i.e. it's impossible for the PIT to find a different free bit (there are no other users of kvm_request_irq_source_id(). Delete the now-defunct irq_sources_bitmap and all its associated code. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Move kvm_{request,free}_irq_source_id() to i8254.c (PIT)Sean Christopherson
Move kvm_{request,free}_irq_source_id() to i8254.c, i.e. the dedicated PIT emulation file, in anticipation of removing them entirely in favor of hardcoding the PIT's "requested" source ID (the source ID can only ever be '2', and the request can never fail). No functional change intended. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Move kvm_setup_default_irq_routing() into irq.cSean Christopherson
Move the default IRQ routing table used for in-kernel I/O APIC and PIC routing to irq.c, and tweak the name to make it explicitly clear what routing is being initialized. In addition to making it more obvious that the so called "default" routing only applies to an in-kernel I/O APIC, getting it out of irq_comm.c will allow removing irq_comm.c entirely. And placing the function alongside other I/O APIC and PIC code will allow for guarding KVM's in-kernel I/O APIC and PIC emulation with a Kconfig with minimal #ifdefs. No functional change intended. Cc: Kai Huang <kai.huang@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Rename irqchip_kernel() to irqchip_full()Sean Christopherson
Rename irqchip_kernel() to irqchip_full(), as "kernel" is very ambiguous due to the existence of split IRQ chip support, where only some of the "irqchip" is in emulated by the kernel/KVM. E.g. irqchip_kernel() often gets confused with irqchip_in_kernel(). Opportunistically hoist the definition up in irq.h so that it's co-located with other "full" irqchip code in anticipation of wrapping it all with a Kconfig/#ifdef. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Move KVM_{GET,SET}_IRQCHIP ioctl helpers to irq.cSean Christopherson
Move the ioctl helpers for getting/setting fully in-kernel IRQ chip state to irq.c, partly to trim down x86.c, but mostly in preparation for adding a Kconfig to control support for in-kernel I/O APIC, PIC, and PIT emulation. No functional change intended. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Move PIT ioctl helpers to i8254.cSean Christopherson
Move the PIT ioctl helpers to i8254.c, i.e. to the file that implements PIT emulation. Eliminating PIT code in x86.c will allow adding a Kconfig to control support for in-kernel I/O APIC, PIC, and PIT emulation with minimal #ifdefs. Opportunistically make kvm_pit_set_reinject() and kvm_pit_load_count() local to i8254.c as they were only publicly visible to make them available to the ioctl helpers. No functional change intended. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapperSean Christopherson
Drop the superfluous kvm_hv_set_sint() and instead wire up ->set() directly to its final destination, kvm_hv_synic_set_irq(). Keep hv_synic_set_irq() instead of kvm_hv_set_sint() to provide some amount of consistency in the ->set() helpers, e.g. to match kvm_pic_set_irq() and kvm_ioapic_set_irq(). kvm_set_msi() is arguably the oddball, e.g. kvm_set_msi_irq() should be something like kvm_msi_to_lapic_irq() so that kvm_set_msi() can instead be kvm_set_msi_irq(), but that's a future problem to solve. No functional change intended. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Kai Huang <kai.huang@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Drop superfluous kvm_set_ioapic_irq() => kvm_ioapic_set_irq() wrapperSean Christopherson
Drop the superfluous and confusing kvm_set_ioapic_irq() and instead wire up ->set() directly to its final destination. No functional change intended. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Drop superfluous kvm_set_pic_irq() => kvm_pic_set_irq() wrapperSean Christopherson
Drop the superfluous and confusing kvm_set_pic_irq() => kvm_pic_set_irq() wrapper, and instead wire up ->set() directly to its final destination. Opportunistically move the declaration kvm_pic_set_irq() to irq.h to start gathering more of the in-kernel APIC/IO-APIC logic in irq.{c,h}. No functional change intended. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update()Sean Christopherson
Trigger the I/O APIC route rescan that's performed for a split IRQ chip after userspace updates IRQ routes in kvm_arch_irq_routing_update(), i.e. before dropping kvm->irq_lock. Calling kvm_make_all_cpus_request() under a mutex is perfectly safe, and the smp_wmb()+smp_mb__after_atomic() pair in __kvm_make_request()+kvm_check_request() ensures the new routing is visible to vCPUs prior to the request being visible to vCPUs. In all likelihood, commit b053b2aef25d ("KVM: x86: Add EOI exit bitmap inference") somewhat arbitrarily made the request outside of irq_lock to avoid holding irq_lock any longer than is strictly necessary. And then commit abdb080f7ac8 ("kvm/irqchip: kvm_arch_irq_routing_update renaming split") took the easy route of adding another arch hook instead of risking a functional change. Note, the call to synchronize_srcu_expedited() does NOT provide ordering guarantees with respect to vCPUs scanning the new routing; as above, the request infrastructure provides the necessary ordering. I.e. there's no need to wait for kvm_scan_ioapic_routes() to complete if it's actively running, because regardless of whether it grabs the old or new table, the vCPU will have another KVM_REQ_SCAN_IOAPIC pending, i.e. will rescan again and see the new mappings. Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250611213557.294358-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>