linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-04-24	drm/xe: Abort printing coredump in VM printer output if full	Matthew Brost
	Abort printing coredump in VM printer output if full. Helps speedup large coredumps which need to walked multiple times in xe_devcoredump_read. v2: - s/drm_printer_is_full/drm_coredump_printer_is_full (Jani) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-5-matthew.brost@intel.com
2025-04-24	drm/print: Add drm_coredump_printer_is_full	Matthew Brost
	Add drm_coredump_printer_is_full which indicates if a drm printer's output is full. Useful to short circuit coredump printing once printer's output is full. v2: - s/drm_printer_is_full/drm_coredump_printer_is_full (Jani) v3: - Bail if not a coredump printer (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-4-matthew.brost@intel.com
2025-04-24	drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access	Matthew Brost
	Add migrate layer functions to access VRAM and update xe_ttm_access_memory to use for non-visible access and large (more than 16k) BO access. 8G devcoreump on BMG observed 3 minute CPU copy time vs. 3s GPU copy time. v4: - Fix non-page aligned accesses - Add support for small / unaligned access - Update commit message indicating migrate used for large accesses (Auld) - Fix warning in xe_res_cursor for non-zero offset v5: - Fix 32 bit build (CI) v6: - Rebase and use SVM migration copy functions v7: - Fix build error (CI) v8: - Remove ifdef around VRAM copy functions (CI) - Use break statement in dma unmmaping (Jonathan) - Use if/else rather than goto (Jonathan) - Use single return point (Jonathan) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-3-matthew.brost@intel.com
2025-04-24	drm/xe: Add devcoredump chunking	Matthew Brost
	Chunk devcoredump into 1.5G pieces to avoid hitting the kvmalloc limit of 2G. Simple algorithm reads 1.5G at time in xe_devcoredump_read callback as needed. Some memory allocations are changed to GFP_ATOMIC as they done in xe_devcoredump_read which holds lock in the path of reclaim. The allocations are small, so in practice should never fail. v2: - Update commit message wrt gfp atomic (John H) v6: - Drop GFP_ATOMIC change for hwconfig (John H) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-2-matthew.brost@intel.com
2025-04-24	clk: socfpga: stratix10: Optimize local variables	Thorsten Blum
	Since readl() returns a u32, the local variable reg can also have the data type u32. Furthermore, mdiv and refdiv are derived from reg and can also be a u32. Since do_div() casts the divisor to u32 anyway, changing the data type of refdiv to u32 removes the following Coccinelle/coccicheck warning reported by do_div.cocci: WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead Compile-tested only. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-04-24	clk: socfpga: clk-pll: Optimize local variables	Thorsten Blum
	Since readl() returns a u32, the local variables reg and bypass can also have the data type u32. Furthermore, divf and divq are derived from reg and can also be a u32. Since do_div() casts the divisor to u32 anyway, changing the data type of divq to u32 removes the following Coccinelle/coccicheck warning reported by do_div.cocci: WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead Compile-tested only. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-04-24	nios2: Replace strcpy() with strscpy() and simplify setup_cpuinfo()	Thorsten Blum
	strcpy() is deprecated; use strscpy() instead. Since the destination buffer has a fixed length, strscpy() automatically determines its size using sizeof() when the size argument is omitted. This makes the explicit size argument unnecessary - remove it. Now, combine both if-else branches using strscpy() and the same buffer into a single statement to simplify the code. No functional changes intended. Link: https://github.com/KSPP/linux/issues/88 Cc: linux-hardening@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-04-24	nios2: do not introduce conflicting mappings when flushing tlb entries	Simon Schuster
	The NIOS2 hardware does not support conflicting mappings for the same virtual address (see Nios II Processor Reference Guide from 2023.08.28): "The operating system software is responsible for guaranteeing that multiple TLB entries do not map the same virtual address. The hardware behavior is undefined when multiple entries map the same virtual address." When flushing tlb-entries, the kernel may violate this invariant for virtual addresses related to PID 0 as flushing is currently implemented by assigning physical address, pid and flags to 0. A small example: Before flushing TLB mappings for pid:0x42: dump tlb-entries for line=0xd (addr 000d0000): -- way:09 vpn:0x0006d000 phys:0x01145000 pid:0x00 flags:rw--c ... -- way:0d vpn:0x0006d000 phys:0x02020000 pid:0x42 flags:rw--c After flushing TLB mappings for pid:0x42: dump tlb-entries for line=0xd (addr 000d0000): -- way:09 vpn:0x0006d000 phys:0x01145000 pid:0x00 flags:rw--c ... -- way:0d vpn:0x0006d000 phys:0x00000000 pid:0x00 flags:----- As functions such as replace_tlb_one_pid operate on the assumption of unique mappings, this can cause repeated pagefaults for a single address that are not covered by the existing spurious pagefault handling. This commit fixes this issue by keeping the pid field of the entries when flushing. That way, no conflicting mappings are introduced as the pair of <address,pid> is now kept unique: Fixed example after flushing TLB mappings for pid:0x42: dump tlb-entries for line=0xd (addr 000d0000): -- way:09 vpn:0x0006d000 phys:0x01145000 pid:0x00 flags:rw--c ... -- way:0d vpn:0x0006d000 phys:0x00000000 pid:0x42 flags:----- When flushing the complete tlb/initialising all entries, the way is used as a substitute mmu pid value for the "invalid" entries. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-04-24	nios2: force update_mmu_cache on spurious tlb-permission--related pagefaults	Simon Schuster
	NIOS2 uses a software-managed TLB for virtual address translation. To flush a cache line, the original mapping is replaced by one to physical address 0x0 with no permissions (rwx mapped to 0) set. This can lead to TLB-permission--related traps when such a nominally flushed entry is encountered as a mapping for an otherwise valid virtual address within a process (e.g. due to an MMU-PID-namespace rollover that previously flushed the complete TLB including entries of existing, running processes). The default ptep_set_access_flags implementation from mm/pgtable-generic.c only forces a TLB-update when the page-table entry has changed within the page table: /* * [...] We return whether the PTE actually changed, which in turn * instructs the caller to do things like update__mmu_cache. [...] / int ptep_set_access_flags(struct vm_area_struct vma, unsigned long address, pte_t ptep, pte_t entry, int dirty) { int changed = !pte_same(ptep, entry); if (changed) { set_pte_at(vma->vm_mm, address, ptep, entry); flush_tlb_fix_spurious_fault(vma, address); } return changed; } However, no cross-referencing with the TLB-state occurs, so the flushing-induced pseudo entries that are responsible for the pagefault in the first place are never pre-empted from TLB on this code path. This commit fixes this behaviour by always requesting a TLB-update in this part of the pagefault handling, fixing spurious page-faults on the way. The handling is a straightforward port of the logic from the MIPS architecture via an arch-specific ptep_set_access_flags function ported from arch/mips/include/asm/pgtable.h. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2025-04-24	gfs2: Fix usage of bio->bi_status in gfs2_end_log_write	Andrew Price
	bio->bi_status is an index into the blk_errors array, not an errno. Its __bitwise tag is cast away here, resulting in a sparse warning: fs/gfs2/lops.c:207:22: warning: cast from restricted blk_status_t We could either add __force to the cast and continue logging bi_status in the error message, or we could look up the errno in the array and log that. As sdp->sd_log_error is used as an errno in all other cases, look up the errno here for consistency. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-24	gfs2: deallocate inodes in gfs2_create_inode	Andreas Gruenbacher
	When creating and destroying inodes, we are relying on the inode hash table to make sure that for a given inode number, only a single inode will exist. We then link that inode to its inode and iopen glock and let those glocks point back at the inode. However, when iget_failed() is called, the inode is removed from the inode hash table before gfs_evict_inode() is called, and uniqueness is no longer guaranteed. Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work around that problem by detaching the inode glock from the inode before calling iget_failed(), but that broke the inode deallocation code in gfs_evict_inode(). To fix that, deallocate partially created inodes in gfs2_create_inode() instead of relying on gfs_evict_inode() for doing that. This means that gfs2_evict_inode() and its helper functions will no longer see partially created inodes, and so some simplifications are possible there. Fixes: 9ffa18884cce ("gfs2: gl_object races fix") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-24	riscv: uprobes: Add missing fence.i after building the XOL buffer	Björn Töpel
	The XOL (execute out-of-line) buffer is used to single-step the replaced instruction(s) for uprobes. The RISC-V port was missing a proper fence.i (i$ flushing) after constructing the XOL buffer, which can result in incorrect execution of stale/broken instructions. This was found running the BPF selftests "test_progs: uprobe_autoattach, attach_probe" on the Spacemit K1/X60, where the uprobes tests randomly blew up. Reviewed-by: Guo Ren <guoren@kernel.org> Fixes: 74784081aac8 ("riscv: Add uprobes supported") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250419111402.1660267-2-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-24	riscv: Replace function-like macro by static inline function	Björn Töpel
	The flush_icache_range() function is implemented as a "function-like macro with unused parameters", which can result in "unused variables" warnings. Replace the macro with a static inline function, as advised by Documentation/process/coding-style.rst. Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250419111402.1660267-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-24	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "The single core change is an obvious bug fix (and falls within the LF guidelines for patches from sanctioned entities). The other driver changes are a bit larger but likewise pretty obvious" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpi3mr: Add level check to control event logging scsi: ufs: core: Add NULL check in ufshcd_mcq_compl_pending_transfer() scsi: core: Clear flags for scsi_cmnd that did not complete scsi: ufs: Introduce quirk to extend PA_HIBERN8TIME for UFS devices scsi: ufs: qcom: Add quirks for Samsung UFS devices scsi: target: iscsi: Fix timeout on deleted connection scsi: mpi3mr: Reset the pending interrupt flag scsi: mpi3mr: Fix pending I/O counter scsi: ufs: mcq: Add NULL check in ufshcd_mcq_abort()
2025-04-24	ACPICA: Add support for printing AML arguments when trace point enabled	Mario Limonciello
	When debug level is set to `ACPI_LV_TRACE_POINT` method start and exit are emitted into the debug logs. This can be useful to understand call paths, however none of the arguments for the method calls are populated even when turning up other debug levels. This can be useful for BIOSes that contain debug strings to see those strings. When `ACPI_LV_TRACE_POINT` is set also output all of the arguments for a given method call. This enables this type of debugging: ``` extrace-0138 ex_trace_point : Method Begin [0x0000000096b240c4:\M460] execution. extrace-0173 ex_trace_args : " POST CODE: %X ACPI TIMER: %X TIME: %d.%d ms\n", b0003f53, 1a26a8b2, 0, 15e, 0, 0 extrace-0138 ex_trace_point : Method End [0x0000000096b240c4:\M460] execution. ``` Link: https://github.com/acpica/acpica/commit/08219d91b5678ae2fae6e4f208df790a4e108c1c Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20250417031040.514460-1-superm1@kernel.org Link: https://github.com/acpica/acpica/pull/1012 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-04-24	Merge tag 'landlock-6.15-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fixes from Mickaël Salaün: "Fix some Landlock audit issues, add related tests, and updates documentation" * tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Update log documentation landlock: Fix documentation for landlock_restrict_self(2) landlock: Fix documentation for landlock_create_ruleset(2) selftests/landlock: Add PID tests for audit records selftests/landlock: Factor out audit fixture in audit_test landlock: Log the TGID of the domain creator landlock: Remove incorrect warning
2025-04-24	cgroup: fix goto ordering in cgroup_init()	JP Kobryn
	Go to the appropriate section labels when css_rstat_init() or psi_cgroup_alloc() fails. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Fixes: a97915559f5c ("cgroup: change rstat function signatures from cgroup-based to css-based") Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-24	KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff()	Uros Bizjak
	Micro-optimize vmx_do_interrupt_irqoff() by substituting MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent LEAVE instruction. GCC compiler does this by default for a generic tuning and for all modern processors: DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave", m_386 \| m_CORE_ALL \| m_K6_GEODE \| m_AMD_MULTIPLE \| m_ZHAOXIN \| m_TREMONT \| m_CORE_HYBRID \| m_CORE_ATOM \| m_GENERIC) The new code also saves a couple of bytes, from: 27: 48 89 ec mov %rbp,%rsp 2a: 5d pop %rbp to: 27: c9 leave No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250414081131.97374-2-ubizjak@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: nVMX: Check MSR load/store list counts during VM-Enter consistency checks	Sean Christopherson
	Explicitly verify the MSR load/store list counts are below the advertised limit as part of the initial consistency checks on the lists, so that code that consumes the count doesn't need to worry about extreme edge cases. Enforcing the limit during the initial checks fixes a flaw on 32-bit KVM where a sufficiently high @count could lead to overflow: arch/x86/kvm/vmx/nested.c:834 nested_vmx_check_msr_switch() warn: potential user controlled sizeof overflow 'addr + count * 16' '0-u64max + 16-68719476720' arch/x86/kvm/vmx/nested.c 827 static int nested_vmx_check_msr_switch(struct kvm_vcpu vcpu, 828 u32 count, u64 addr) 829 { 830 if (count == 0) 831 return 0; 832 833 if (!kvm_vcpu_is_legal_aligned_gpa(vcpu, addr, 16) \|\| --> 834 !kvm_vcpu_is_legal_gpa(vcpu, (addr + count sizeof(struct vmx_msr_entry) - 1))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While the SDM doesn't explicitly state an illegal count results in VM-Fail, the SDM states that exceeding the limit may result in undefined behavior. I.e. the SDM gives hardware, and thus KVM, carte blanche to do literally anything in response to a count that exceeds the "recommended" limit. If the limit is exceeded, undefined processor behavior may result (including a machine check during the VMX transition). KVM already enforces the limit when processing the MSRs, i.e. already signals a late VM-Exit Consistency Check for VM-Enter, and generates a VMX Abort for VM-Exit. I.e. explicitly checking the limits simply means KVM will signal VM-Fail instead of VM-Exit or VMX Abort. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/44961459-2759-4164-b604-f6bd43da8ce9@stanley.mountain Link: https://lore.kernel.org/r/20250315024402.2363098-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.15-rc4). This pull includes wireless and a fix to vxlan which isn't in Linus's tree just yet. The latter creates with a silent conflict / build breakage, so merging it now to avoid causing problems. drivers/net/vxlan/vxlan_vnifilter.c 094adad91310 ("vxlan: Use a single lock to protect the FDB table") 087a9eb9e597 ("vxlan: vnifilter: Fix unlocked deletion of default FDB entry") https://lore.kernel.org/20250423145131.513029-1-idosch@nvidia.com No "normal" conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-24	KVM: SVM: Fix SNP AP destroy race with VMRUN	Tom Lendacky
	An AP destroy request for a target vCPU is typically followed by an RMPADJUST to remove the VMSA attribute from the page currently being used as the VMSA for the target vCPU. This can result in a vCPU that is about to VMRUN to exit with #VMEXIT_INVALID. This usually does not happen as APs are typically sitting in HLT when being destroyed and therefore the vCPU thread is not running at the time. However, if HLT is allowed inside the VM, then the vCPU could be about to VMRUN when the VMSA attribute is removed from the VMSA page, resulting in a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the guest to crash. An RMPADJUST against an in-use (already running) VMSA results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA attribute cannot be changed until the VMRUN for target vCPU exits. The Qemu command line option '-overcommit cpu-pm=on' is an example of allowing HLT inside the guest. Update the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event to include the KVM_REQUEST_WAIT flag. The kvm_vcpu_kick() function will not wait for requests to be honored, so create kvm_make_request_and_kick() that will add a new event request and honor the KVM_REQUEST_WAIT flag. This will ensure that the target vCPU sees the AP destroy request before returning to the initiating vCPU should the target vCPU be in guest mode. Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/fe2c885bf35643dd224e91294edb6777d5df23a4.1743097196.git.thomas.lendacky@amd.com [sean: add a comment explaining the use of smp_send_reschedule()] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	x86/irq: KVM: Add helper for harvesting PIR to deduplicate KVM and posted MSIs	Sean Christopherson
	Now that posted MSI and KVM harvesting of PIR is identical, extract the code (and posted MSI's wonderful comment) to a common helper. No functional change intended. Link: https://lore.kernel.org/r/20250401163447.846608-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: VMX: Use arch_xchg() when processing PIR to avoid instrumentation	Sean Christopherson
	Use arch_xchg() when moving IRQs from the PIR to the vIRR, purely to avoid instrumentation so that KVM is compatible with the needs of posted MSI. This will allow extracting the core PIR logic to common code and sharing it between KVM and posted MSI handling. Link: https://lore.kernel.org/r/20250401163447.846608-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: VMX: Isolate pure loads from atomic XCHG when processing PIR	Sean Christopherson
	Rework KVM's processing of the PIR to use the same algorithm as posted MSIs, i.e. to do READ(x4) => XCHG(x4) instead of (READ+XCHG)(x4). Given KVM's long-standing, sub-optimal use of 32-bit accesses to the PIR, it's safe to say far more thought and investigation was put into handling the PIR for posted MSIs, i.e. there's no reason to assume KVM's existing logic is meaningful, let alone superior. Matching the processing done by posted MSIs will also allow deduplicating the code between KVM and posted MSIs. See the comment for handle_pending_pir() added by commit 1b03d82ba15e ("x86/irq: Install posted MSI notification handler") for details on why isolating loads from XCHG is desirable. Suggested-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250401163447.846608-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: VMX: Process PIR using 64-bit accesses on 64-bit kernels	Sean Christopherson
	Process the PIR at the natural kernel width, i.e. in 64-bit chunks on 64-bit kernels, so that the worst case of having a posted IRQ in each chunk of the vIRR only requires 4 loads and xchgs from/to the PIR, not 8. Deliberately use a "continue" to skip empty entries so that the code is a carbon copy of handle_pending_pir(), in anticipation of deduplicating KVM and posted MSI logic. Suggested-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250401163447.846608-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	x86/irq: KVM: Track PIR bitmap as an "unsigned long" array	Sean Christopherson
	Track the PIR bitmap in posted interrupt descriptor structures as an array of unsigned longs instead of using unionized arrays for KVM (u32s) versus IRQ management (u64s). In practice, because the non-KVM usage is (sanely) restricted to 64-bit kernels, all existing usage of the u64 variant is already working with unsigned longs. Using "unsigned long" for the array will allow reworking KVM's processing of the bitmap to read/write in 64-bit chunks on 64-bit kernels, i.e. will allow optimizing KVM by reducing the number of atomic accesses to PIR. Opportunstically replace the open coded literals in the posted MSIs code with the appropriate macro. Deliberately don't use ARRAY_SIZE() in the for-loops, even though it would be cleaner from a certain perspective, in anticipation of decoupling the processing from the array declaration. No functional change intended. Link: https://lore.kernel.org/r/20250401163447.846608-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: VMX: Ensure vIRR isn't reloaded at odd times when sync'ing PIR	Sean Christopherson
	Read each vIRR exactly once when shuffling IRQs from the PIR to the vAPIC to ensure getting the highest priority IRQ from the chunk doesn't reload from the vIRR. In practice, a reload is functionally benign as vcpu->mutex is held and so IRQs can be consumed, i.e. new IRQs can appear, but existing IRQs can't disappear. Link: https://lore.kernel.org/r/20250401163447.846608-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	x86/irq: Track if IRQ was found in PIR during initial loop (to load PIR vals)	Sean Christopherson
	Track whether or not at least one IRQ was found in PIR during the initial loop to load PIR chunks from memory. Doing so generates slightly better code (arguably) for processing the for-loop of XCHGs, especially for the case where there are no pending IRQs. Note, while PIR can be modified between the initial load and the XCHG, it can only _gain_ new IRQs, i.e. there is no danger of a false positive due to the final version of pir_copy[] being empty. Opportunistically convert the boolean to an "unsigned long" and compute the effective boolean result via bitwise-OR. Some compilers, e.g. clang-14, need the extra "hint" to elide conditional branches. Opportunistically rename the variable in anticipation of moving the PIR accesses to a common helper that can be shared by posted MSIs and KVM. Old: <+74>: test %rdx,%rdx <+77>: je 0xffffffff812bbeb0 <handle_pending_pir+144> <pir[0]> <+88>: mov $0x1,%dl> <+90>: test %rsi,%rsi <+93>: je 0xffffffff812bbe8c <handle_pending_pir+108> <pir[1]> <+106>: mov $0x1,%dl <+108>: test %rcx,%rcx <+111>: je 0xffffffff812bbe9e <handle_pending_pir+126> <pir[2]> <+124>: mov $0x1,%dl <+126>: test %rax,%rax <+129>: je 0xffffffff812bbeb9 <handle_pending_pir+153> <pir[3]> <+142>: jmp 0xffffffff812bbec1 <handle_pending_pir+161> <+144>: xor %edx,%edx <+146>: test %rsi,%rsi <+149>: jne 0xffffffff812bbe7f <handle_pending_pir+95> <+151>: jmp 0xffffffff812bbe8c <handle_pending_pir+108> <+153>: test %dl,%dl <+155>: je 0xffffffff812bbf8e <handle_pending_pir+366> New: <+74>: mov %rax,%r8 <+77>: or %rcx,%r8 <+80>: or %rdx,%r8 <+83>: or %rsi,%r8 <+86>: setne %bl <+89>: je 0xffffffff812bbf88 <handle_pending_pir+360> <+95>: test %rsi,%rsi <+98>: je 0xffffffff812bbe8d <handle_pending_pir+109> <pir[0]> <+109>: test %rdx,%rdx <+112>: je 0xffffffff812bbe9d <handle_pending_pir+125> <pir[1]> <+125>: test %rcx,%rcx <+128>: je 0xffffffff812bbead <handle_pending_pir+141> <pir[2]> <+141>: test %rax,%rax <+144>: je 0xffffffff812bbebd <handle_pending_pir+157> <pir[3]> Link: https://lore.kernel.org/r/20250401163447.846608-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	x86/irq: Ensure initial PIR loads are performed exactly once	Sean Christopherson
	Ensure the PIR is read exactly once at the start of handle_pending_pir(), to guarantee that checking for an outstanding posted interrupt in a given chuck doesn't reload the chunk from the "real" PIR. Functionally, a reload is benign, but it would defeat the purpose of pre-loading into a copy. Fixes: 1b03d82ba15e ("x86/irq: Install posted MSI notification handler") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250401163447.846608-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	x86/insn: Fix CTEST instruction decoding	Kirill A. Shutemov
	insn_decoder_test found a problem with decoding APX CTEST instructions: Found an x86 instruction decoder bug, please report this. ffffffff810021df 62 54 94 05 85 ff ctestneq objdump says 6 bytes, but insn_get_length() says 5 It happens because x86-opcode-map.txt doesn't specify arguments for the instruction and the decoder doesn't expect to see ModRM byte. Fixes: 690ca3a3067f ("x86/insn: Add support for APX EVEX instructions to the opcode map") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org # v6.10+ Link: https://lore.kernel.org/r/20250423065815.2003231-1-kirill.shutemov@linux.intel.com
2025-04-24	KVM: x86: Add module param to control and enumerate device posted IRQs	Sean Christopherson
	Add a module param to each KVM vendor module to allow disabling device posted interrupts without having to sacrifice all of APICv/AVIC, and to also effectively enumerate to userspace whether or not KVM may be utilizing device posted IRQs. Disabling device posted interrupts is very desirable for testing, and can even be desirable for production environments, e.g. if the host kernel wants to interpose on device interrupts. Put the module param in kvm-{amd,intel}.ko instead of kvm.ko to match the overall APICv/AVIC controls, and to avoid complications with said controls. E.g. if the param is in kvm.ko, KVM needs to be snapshot the original user-defined value to play nice with a vendor module being reloaded with different enable_apicv settings. Link: https://lore.kernel.org/r/20250401161804.842968-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: VMX: Don't send UNBLOCK when starting device assignment without APICv	Sean Christopherson
	When starting device assignment, i.e. potential IRQ bypass, don't blast KVM_REQ_UNBLOCK if APICv is disabled/unsupported. There is no need to wake vCPUs if they can never use VT-d posted IRQs (sending UNBLOCK guards against races being vCPUs blocking and devices starting IRQ bypass). Opportunistically use kvm_arch_has_irq_bypass() for all relevant checks in the VMX Posted Interrupt code so that all checks in KVM x86 incorporate the same information (once AMD/AVIC is given similar treatment). Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://lore.kernel.org/r/20250401161804.842968-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Rescan I/O APIC routes after EOI interception for old routing	weizijie
	Rescan I/O APIC routes for a vCPU after handling an intercepted I/O APIC EOI for an IRQ that is not targeting said vCPU, i.e. after handling what's effectively a stale EOI VM-Exit. If a level-triggered IRQ is in-flight when IRQ routing changes, e.g. because the guest changes routing from its IRQ handler, then KVM intercepts EOIs on both the new and old target vCPUs, so that the in-flight IRQ can be de-asserted when it's EOI'd. However, only the EOI for the in-flight IRQ needs to be intercepted, as IRQs on the same vector with the new routing are coincidental, i.e. occur only if the guest is reusing the vector for multiple interrupt sources. If the I/O APIC routes aren't rescanned, KVM will unnecessarily intercept EOIs for the vector and negative impact the vCPU's interrupt performance. Note, both commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race") and commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race") mentioned this issue, but it was considered a "rare" occurrence thus was not addressed. However in real environments, this issue can happen even in a well-behaved guest. Cc: Kai Huang <kai.huang@intel.com> Co-developed-by: xuyun <xuyun_xy.xy@linux.alibaba.com> Signed-off-by: xuyun <xuyun_xy.xy@linux.alibaba.com> Signed-off-by: weizijie <zijie.wei@linux.alibaba.com> [sean: massage changelog and comments, use int/-1, reset at scan] Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250304013335.4155703-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Add a helper to deduplicate I/O APIC EOI interception logic	Sean Christopherson
	Extract the vCPU specific EOI interception logic for I/O APIC emulation into a common helper for userspace and in-kernel emulation in anticipation of optimizing the "pending EOI" case. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250304013335.4155703-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Isolate edge vs. level check in userspace I/O APIC route scanning	Sean Christopherson
	Extract and isolate the trigger mode check in kvm_scan_ioapic_routes() in anticipation of moving destination matching logic to a common helper (for userspace vs. in-kernel I/O APIC emulation). No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250304013335.4155703-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Advertise support for AMD's PREFETCHI	Babu Moger
	The latest AMD platform has introduced a new instruction called PREFETCHI. This instruction loads a cache line from a specified memory address into the indicated data or instruction cache level, based on locality reference hints. Feature bit definition: CPUID_Fn80000021_EAX [bit 20] - Indicates support for IC prefetch. This feature is analogous to Intel's PREFETCHITI (CPUID.(EAX=7,ECX=1):EDX), though the CPUID bit definitions differ between AMD and Intel. Advertise support to userspace, as no additional enabling is necessary (PREFETCHI can't be intercepted as there's no instruction specific behavior that needs to be virtualize). The feature is documented in Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h, Revision C1 (Link below). Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/ee1c08fc400bb574a2b8f2c6a0bd9def10a29d35.1744130533.git.babu.moger@amd.com [sean: rewrite shortlog to highlight the KVM functionality] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Sort CPUID_8000_0021_EAX leaf bits properly	Borislav Petkov
	WRMSR_XX_BASE_NS is bit 1 so put it there, add some new bits as comments only. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250324160617.15379-1-bp@kernel.org [sean: skip the FSRS/FSRC placeholders to avoid confusion] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: clean up a return	Dan Carpenter
	Returning a literal X86EMUL_CONTINUE is slightly clearer than returning rc. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/7604cbbf-15e6-45a8-afec-cf5be46c2924@stanley.mountain Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Advertise support for WRMSRNS	Sean Christopherson
	Advertise support for WRMSRNS (WRMSR non-serializing) to userspace if the instruction is supported by the underlying CPU. From a virtualization perspective, the only difference between WRMSRNS and WRMSR is that VM-Exits due to WRMSRNS set EXIT_QUALIFICATION to '1'. WRMSRNS doesn't require a new enabling control, shares the same basic exit reason, and behaves the same as WRMSR with respect to MSR interception. WRMSR and WRMSRNS use the same basic exit reason (see Appendix C). For WRMSR, the exit qualification is 0, while for WRMSRNS it is 1. Don't do anything different when emulating WRMSRNS vs. WRMSR, as KVM can't do anything less, i.e. can't make emulation non-serializing. The motivation for the guest to use WRMSRNS instead of WRMSR is to avoid immediately serializing the CPU when the necessary serialization is guaranteed by some other mechanism, i.e. WRMSRNS being fully serializing isn't guest-visible, just less performant. Suggested-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250227010111.3222742-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	x86/msr: Rename the WRMSRNS opcode macro to ASM_WRMSRNS (for KVM)	Sean Christopherson
	Rename the WRMSRNS instruction opcode macro so that it doesn't collide with X86_FEATURE_WRMSRNS when using token pasting to generate references to X86_FEATURE_WRMSRNS. KVM heavily uses token pasting to generate KVM's set of support feature bits, and adding WRMSRNS support in KVM will run will run afoul of the opcode macro. arch/x86/kvm/cpuid.c:719:37: error: pasting "X86_FEATURE_" and "" "" does not give a valid preprocessing token 719 \| u32 __leaf = __feature_leaf(X86_FEATURE_##name); \ \| ^~~~~~~~~~~~ KVM has worked around one such collision in the past by #undef'ing the problematic macro in order to avoid blocking a KVM rework, but such games are generally undesirable, e.g. requires bleeding macro details into KVM, risks weird behavior if what KVM is #undef'ing changes, etc. Reviewed-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250227010111.3222742-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Generalize IBRS virtualization on emulated VM-exit	Yosry Ahmed
	Commit 2e7eab81425a ("KVM: VMX: Execute IBPB on emulated VM-exit when guest has IBRS") added an IBPB in the emulated VM-exit path on Intel to properly virtualize IBRS by providing separate predictor modes for L1 and L2. AMD requires similar handling, except when IbrsSameMode is enumerated by the host CPU (which is the case on most/all AMD CPUs). With IbrsSameMode, hardware IBRS is sufficient and no extra handling is needed from KVM. Generalize the handling in nested_vmx_vmexit() by moving it into a generic function, add the AMD handling, and use it in nested_svm_vmexit() too. The main reason for using a generic function is to have a single place to park the huge comment about virtualizing IBRS. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250221163352.3818347-4-yosry.ahmed@linux.dev [sean: use kvm_nested_vmexit_handle_spec_ctrl() for the helper] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Propagate AMD's IbrsSameMode to the guest	Yosry Ahmed
	If IBRS provides same mode (kernel/user or host/guest) protection on the host, then by definition it also provides same mode protection in the guest. In fact, all different modes from the guest's perspective are the same mode from the host's perspective anyway. Propagate IbrsSameMode to the guests. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250221163352.3818347-3-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	x86/cpufeatures: Define X86_FEATURE_AMD_IBRS_SAME_MODE	Yosry Ahmed
	Per the APM [1]: Some processors, identified by CPUID Fn8000_0008_EBX[IbrsSameMode] (bit 19) = 1, provide additional speculation limits. For these processors, when IBRS is set, indirect branch predictions are not influenced by any prior indirect branches, regardless of mode (CPL and guest/host) and regardless of whether the prior indirect branches occurred before or after the setting of IBRS. This is referred to as Same Mode IBRS. Define this feature bit, which will be used by KVM to determine if an IBPB is required on nested VM-exits in SVM. [1] AMD64 Architecture Programmer's Manual Pub. 40332, Rev 4.08 - April 2024, Volume 2, 3.2.9 Speculation Control MSRs Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250221163352.3818347-2-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run()	Dan Carpenter
	The "kvm_run->kvm_valid_regs" and "kvm_run->kvm_dirty_regs" variables are u64 type. We are only using the lowest 3 bits but we want to ensure that the users are not passing invalid bits so that we can use the remaining bits in the future. However "sync_valid_fields" and kvm_sync_valid_fields() are u32 type so the check only ensures that the lower 32 bits are clear. Fix this by changing the types to u64. Fixes: 74c1807f6c4f ("KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/ec25aad1-113e-4c6e-8941-43d432251398@stanley.mountain Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception	Mikhail Lobanov
	Previously, commit ed129ec9057f ("KVM: x86: forcibly leave nested mode on vCPU reset") addressed an issue where a triple fault occurring in nested mode could lead to use-after-free scenarios. However, the commit did not handle the analogous situation for System Management Mode (SMM). This omission results in triggering a WARN when KVM forces a vCPU INIT after SHUTDOWN interception while the vCPU is in SMM. This situation was reprodused using Syzkaller by: 1) Creating a KVM VM and vCPU 2) Sending a KVM_SMI ioctl to explicitly enter SMM 3) Executing invalid instructions causing consecutive exceptions and eventually a triple fault The issue manifests as follows: WARNING: CPU: 0 PID: 25506 at arch/x86/kvm/x86.c:12112 kvm_vcpu_reset+0x1d2/0x1530 arch/x86/kvm/x86.c:12112 Modules linked in: CPU: 0 PID: 25506 Comm: syz-executor.0 Not tainted 6.1.130-syzkaller-00157-g164fe5dde9b6 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:kvm_vcpu_reset+0x1d2/0x1530 arch/x86/kvm/x86.c:12112 Call Trace: <TASK> shutdown_interception+0x66/0xb0 arch/x86/kvm/svm/svm.c:2136 svm_invoke_exit_handler+0x110/0x530 arch/x86/kvm/svm/svm.c:3395 svm_handle_exit+0x424/0x920 arch/x86/kvm/svm/svm.c:3457 vcpu_enter_guest arch/x86/kvm/x86.c:10959 [inline] vcpu_run+0x2c43/0x5a90 arch/x86/kvm/x86.c:11062 kvm_arch_vcpu_ioctl_run+0x50f/0x1cf0 arch/x86/kvm/x86.c:11283 kvm_vcpu_ioctl+0x570/0xf00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4122 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x19a/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Architecturally, INIT is blocked when the CPU is in SMM, hence KVM's WARN() in kvm_vcpu_reset() to guard against KVM bugs, e.g. to detect improper emulation of INIT. SHUTDOWN on SVM is a weird edge case where KVM needs to do _something_ sane with the VMCB, since it's technically undefined, and INIT is the least awful choice given KVM's ABI. So, double down on stuffing INIT on SHUTDOWN, and force the vCPU out of SMM to avoid any weirdness (and the WARN). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: ed129ec9057f ("KVM: x86: forcibly leave nested mode on vCPU reset") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Link: https://lore.kernel.org/r/20250414171207.155121-1-m.lobanov@rosa.ru [sean: massage changelog, make it clear this isn't architectural behavior] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-24	perf/x86: Fix non-sampling (counting) events on certain x86 platforms	Luo Gengkun
	Perf doesn't work at perf stat for hardware events on certain x86 platforms: $perf stat -- sleep 1 Performance counter stats for 'sleep 1': 16.44 msec task-clock # 0.016 CPUs utilized 2 context-switches # 121.691 /sec 0 cpu-migrations # 0.000 /sec 54 page-faults # 3.286 K/sec <not supported> cycles <not supported> instructions <not supported> branches <not supported> branch-misses The reason is that the check in x86_pmu_hw_config() for sampling events is unexpectedly applied to counting events as well. It should only impact x86 platforms with limit_period used for non-PEBS events. For Intel platforms, it should only impact some older platforms, e.g., HSW, BDW and NHM. Fixes: 88ec7eedbbd2 ("perf/x86: Fix low freqency setting issue") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250423064724.3716211-1-luogengkun@huaweicloud.com
2025-04-24	spi: meson-spicc: add DMA support	Xianwei Zhao
	Add DMA support for spicc driver. DMA works if the transfer meets the following conditions: 1. 64 bits per word; 2. The transfer length must be multiples of the dma_burst_len, and the dma_burst_len should be one of 8,7...2, otherwise, it will be split into several SPI bursts. Signed-off-by: Sunny Luo <sunny.luo@amlogic.com> Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Link: https://patch.msgid.link/20250414-spi-dma-v2-1-84bbd92fa469@amlogic.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-24	vxlan: vnifilter: Fix unlocked deletion of default FDB entry	Ido Schimmel
	When a VNI is deleted from a VXLAN device in 'vnifilter' mode, the FDB entry associated with the default remote (assuming one was configured) is deleted without holding the hash lock. This is wrong and will result in a warning [1] being generated by the lockdep annotation that was added by commit ebe642067455 ("vxlan: Create wrappers for FDB lookup"). Reproducer: # ip link add vx0 up type vxlan dstport 4789 external vnifilter local 192.0.2.1 # bridge vni add vni 10010 remote 198.51.100.1 dev vx0 # bridge vni del vni 10010 dev vx0 Fix by acquiring the hash lock before the deletion and releasing it afterwards. Blame the original commit that introduced the issue rather than the one that exposed it. [1] WARNING: CPU: 3 PID: 392 at drivers/net/vxlan/vxlan_core.c:417 vxlan_find_mac+0x17f/0x1a0 [...] RIP: 0010:vxlan_find_mac+0x17f/0x1a0 [...] Call Trace: <TASK> __vxlan_fdb_delete+0xbe/0x560 vxlan_vni_delete_group+0x2ba/0x940 vxlan_vni_del.isra.0+0x15f/0x580 vxlan_process_vni_filter+0x38b/0x7b0 vxlan_vnifilter_process+0x3bb/0x510 rtnetlink_rcv_msg+0x2f7/0xb70 netlink_rcv_skb+0x131/0x360 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x121/0x1b0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250423145131.513029-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-24	Merge tag 'wireless-2025-04-24' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Some more fixes, notably: * iwlwifi: various regression and iwlmld fixes * mac80211: fix TX frames in monitor mode * brcmfmac: error handling for firmware load * tag 'wireless-2025-04-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: restore missing initialization of async_handlers_list wifi: brcm80211: fmac: Add error handling for brcmf_usb_dl_writeimage() wifi: plfxlc: Remove erroneous assert in plfxlc_mac_release wifi: iwlwifi: fix the check for the SCRATCH register upon resume wifi: iwlwifi: don't warn if the NIC is gone in resume wifi: iwlwifi: mld: fix BAID validity check wifi: iwlwifi: back off on continuous errors wifi: iwlwifi: mld: only create debugfs symlink if it does not exist wifi: iwlwifi: mld: inform trans on init failure wifi: iwlwifi: mld: properly handle async notification in op mode start Revert "wifi: iwlwifi: make no_160 more generic" Revert "wifi: iwlwifi: add support for BE213" wifi: mac80211: restore monitor for outgoing frames ==================== Link: https://patch.msgid.link/20250424120535.56499-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-24	Merge branch 'kvm-fixes-6.15-rc4' into HEAD	Paolo Bonzini
	* Single fix for broken usage of 'multi-MIDR' infrastructure in PI code, adding an open-coded erratum check for Cavium ThunderX * Bugfixes from a planned posted interrupt rework * Do not use kvm_rip_read() unconditionally to cater for guests with inaccessible register state.