summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-27bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variableYonghong Song
Marc Suñé (Isovalent, part of Cisco) reported an issue where an uninitialized variable caused generating bpf prog binary code not working as expected. The reproducer is in [1] where the flags “-Wall -Werror” are enabled, but there is no warning as the compiler takes advantage of uninitialized variable to do aggressive optimization. The optimized code looks like below: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 <END> The verifier will report the following failure: 9: (85) call bpf_trace_printk#6 last insn is not an exit or jmp The above verifier log does not give a clear hint about how to fix the problem and user may take quite some time to figure out that the issue is due to compiler taking advantage of uninitialized variable. In llvm internals, uninitialized variable usage may generate 'unreachable' IR insn and these 'unreachable' IR insns may indicate uninitialized variable impact on code optimization. So far, llvm BPF backend ignores 'unreachable' IR hence the above code is generated. With clang21 patch [2], those 'unreachable' IR insn are converted to func __bpf_trap(). In order to maintain proper control flow graph for bpf progs, [2] also adds an 'exit' insn after bpf_trap() if __bpf_trap() is the last insn in the function. The new code looks like: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 10: 85 10 00 00 ff ff ff ff call -0x1 0000000000000050: R_BPF_64_32 __bpf_trap 11: 95 00 00 00 00 00 00 00 exit <END> In kernel, a new kfunc __bpf_trap() is added. During insn verification, any hit with __bpf_trap() will result in verification failure. The kernel is able to provide better log message for debugging. With llvm patch [2] and without this patch (no __bpf_trap() kfunc for existing kernel), e.g., for old kernels, the verifier outputs 10: <invalid kfunc call> kfunc '__bpf_trap' is referenced but wasn't resolved Basically, kernel does not support __bpf_trap() kfunc. This still didn't give clear signals about possible reason. With llvm patch [2] and with this patch, the verifier outputs 10: (85) call __bpf_trap#74479 unexpected __bpf_trap() due to uninitialized variable? It gives much better hints for verification failure. [1] https://github.com/msune/clang_bpf/blob/main/Makefile#L3 [2] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205326.1291640-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'x86_cache_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
2025-05-27bpf: Remove special_kfunc_set from verifierYonghong Song
Currently, the verifier has both special_kfunc_set and special_kfunc_list. When adding a new kfunc usage to the verifier, it is often confusing about whether special_kfunc_set or special_kfunc_list or both should add that kfunc. For example, some kfuncs, e.g., bpf_dynptr_from_skb, bpf_dynptr_clone, bpf_wq_set_callback_impl, does not need to be in special_kfunc_set. To avoid potential future confusion, special_kfunc_set is deleted and btf_id_set_contains(&special_kfunc_set, ...) is removed. The code is refactored with a new func check_special_kfunc(), which contains all codes covered by original branch meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id) There is no functionality change. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205321.1291431-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge branch 'replace-config_dmabuf_sysfs_stats-with-bpf'Alexei Starovoitov
T.J. Mercier says: ==================== Replace CONFIG_DMABUF_SYSFS_STATS with BPF Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to perform per-buffer accounting with debugfs which is not suitable for production environments. Eventually we discovered the overhead with per-buffer sysfs file creation/removal was significantly impacting allocation and free times, and exacerbated kernfs lock contention. [2] dma_buf_stats_setup() is responsible for 39% of single-page buffer creation duration, or 74% of single-page dma_buf_export() duration when stressing dmabuf allocations and frees. I prototyped a change from per-buffer to per-exporter statistics with a RCU protected list of exporter allocations that accommodates most (but not all) of our use-cases and avoids almost all of the sysfs overhead. While that adds less overhead than per-buffer sysfs, and less even than the maintenance of the dmabuf debugfs_list, it's still *additional* overhead on top of the debugfs_list and doesn't give us per-buffer info. This series uses the existing dmabuf debugfs_list to implement a BPF dmabuf iterator, which adds no overhead to buffer allocation/free and provides per-buffer info. The list has been moved outside of CONFIG_DEBUG_FS scope so that it is always populated. The BPF program loaded by userspace that extracts per-buffer information gets to define its own interface which avoids the lack of ABI stability with debugfs. This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and the plan is to remove it from the kernel after the next longterm stable release. [1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.com [2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com v1 -> v2: Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov Add open-coded iterator and selftest per Alexei Starovoitov Add a second test buffer from the system dmabuf heap to selftests Use the BPF program we'll use in production for selftest per Alexei Starovoitov https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.c https://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdmabufinfo/dmabuf_bpf_stats.cpp v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com v2 -> v3: Rebase onto bpf-next/master Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the new get_first_dmabuf(). This avoids having to expose the dmabuf list and mutex to the rest of the kernel, and keeps the dmabuf mutex operations near each other in the same file. (Christian König) Add Christian's RB to dma-buf: Rename debugfs symbols Drop RFC: dma-buf: Remove DMA-BUF statistics v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com v3 -> v4: Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov Fix dma-buf.c kdoc comment style per Alexei Starovoitov Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin / dma_buf_iter_next per Christian König Add Christian's RB to bpf: Add dmabuf iterator v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com v4 -> v5: Add Christian's Acks to all patches Add Song Liu's Acks Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per Song Liu Fix open-coded iterator comment style per Song Liu Move iterator termination check to its own subtest per Song Liu Rework selftest buffer creation per Song Liu Fix spacing in sanitize_string per BPF CI v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com v5 -> v6: Song Liu: Init test buffer FDs to -1 Zero-init udmabuf_create for future proofing Bail early for iterator fd/FILE creation failure Dereference char ptr to check for NUL in sanitize_string() Move map insertion from create_test_buffers() to test_dmabuf_iter() Add ACK to selftests/bpf: Add test for open coded dmabuf_iter v6: https://lore.kernel.org/all/20250513163601.812317-1-tjmercier@google.com v6 -> v7: Zero uninitialized name bytes following the end of name strings per s390x BPF CI Reorder sanitize_string bounds checks per Song Liu Add Song's Ack to: selftests/bpf: Add test for dmabuf_iter Rebase onto bpf-next/master per BPF CI ==================== Link: https://patch.msgid.link/20250522230429.941193-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for open coded dmabuf_iterT.J. Mercier
Use the same test buffers as the traditional iterator and a new BPF map to verify the test buffers can be found with the open coded dmabuf iterator. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-6-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for dmabuf_iterT.J. Mercier
This test creates a udmabuf, and a dmabuf from the system dmabuf heap, and uses a BPF program that prints dmabuf metadata with the new dmabuf_iter to verify they can be found. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-5-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf: Add open coded dmabuf iteratorT.J. Mercier
This open coded iterator allows for more flexibility when creating BPF programs. It can support output in formats other than text. With an open coded iterator, a single BPF program can traverse multiple kernel data structures (now including dmabufs), allowing for more efficient analysis of kernel data compared to multiple reads from procfs, sysfs, or multiple traditional BPF iterator invocations. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-4-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf: Add dmabuf iteratorT.J. Mercier
The dmabuf iterator traverses the list of all DMA buffers. DMA buffers are refcounted through their associated struct file. A reference is taken on each buffer as the list is iterated to ensure each buffer persists for the duration of the bpf program execution without holding the list mutex. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-3-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27dma-buf: Rename debugfs symbolsT.J. Mercier
Rename the debugfs list and mutex so it's clear they are now usable without the need for CONFIG_DEBUG_FS. The list will always be populated to support the creation of a BPF iterator for dmabufs. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-2-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'timers-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "Updates for the time/timer core code: - Rework the initialization of the posix-timer kmem_cache and move the cache pointer into the timer_data structure to prevent false sharing - Switch the alarmtimer code to lock guards - Improve the CPU selection criteria in the per CPU validation of the clocksource watchdog to avoid arbitrary selections (or omissions) on systems with a small number of CPUs - The usual cleanups and improvements" * tag 'timers-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Remove unused tick_nohz_full_add_cpus_to() clocksource: Fix the CPUs' choice in the watchdog per CPU verification alarmtimer: Switch spin_{lock,unlock}_irqsave() to guards alarmtimer: Remove dead return value in clock2alarm() time/jiffies: Change register_refined_jiffies() to void __init timers: Remove unused __round_jiffies(_up) posix-timers: Initialize cache early and move pointer into __timer_data
2025-05-27Merge tag 'timers-clocksource-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource updates from Thomas Gleixner: "Updates for clocksource/clockevent drivers: - The final conversion of text formatted device tree binding to schemas - A new driver fot the System Timer Module on S32G NXP SoCs - A new driver fot the Econet HPT timer - The usual improvements and device tree binding updates" * tag 'timers-clocksource-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) clocksource/drivers/renesas-ostm: Unconditionally enable reprobe support dt-bindings: timer: renesas,ostm: Document RZ/V2N (R9A09G056) support dt-bindings: timer: Convert marvell,armada-370-timer to DT schema dt-bindings: timer: Convert ti,keystone-timer to DT schema dt-bindings: timer: Convert st,spear-timer to DT schema dt-bindings: timer: Convert socionext,milbeaut-timer to DT schema dt-bindings: timer: Convert snps,arc-timer to DT schema dt-bindings: timer: Convert snps,archs-rtc to DT schema dt-bindings: timer: Convert snps,archs-gfrc to DT schema dt-bindings: timer: Convert lsi,zevio-timer to DT schema dt-bindings: timer: Convert jcore,pit to DT schema dt-bindings: timer: Convert img,pistachio-gptimer to DT schema dt-bindings: timer: Convert ezchip,nps400-timer to DT schema dt-bindings: timer: Convert cirrus,clps711x-timer to DT schema dt-bindings: timer: Convert altr,timer-1.0 to DT schema dt-bindings: timer: Add ESWIN EIC7700 CLINT clocksource/drivers: Add EcoNet Timer HPT driver dt-bindings: timer: Add EcoNet EN751221 "HPT" CPU Timer dt-bindings: timer: Convert arm,mps2-timer to DT schema dt-bindings: timer: Add Sophgo SG2044 ACLINT timer ...
2025-05-27Merge tag 'timers-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer*(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_*() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()
2025-05-27Merge tag 'irq-msi-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI updates from Thomas Gleixner: "Updates for the MSI subsystem (core code and PCI): - Switch the MSI descriptor locking to lock guards - Replace a broken and naive implementation of PCI/MSI-X control word updates in the PCI/TPH driver with a properly serialized variant in the PCI/MSI core code. - Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by replacing the direct access to the MSI descriptors with the proper API function calls. People will never understand that APIs exist for a reason... - Provide core infrastructre for the upcoming PCI endpoint library extensions. Currently limited to ARM GICv3+, but in theory extensible to other architectures. - Provide a MSI domain::teardown() callback, which allows drivers to undo the effects of the prepare() callback. - Move the MSI domain::prepare() callback invocation to domain creation time to avoid redundant (and in case of ARM/GIC-V3-ITS confusing) invocations on every allocation. In combination with the new teardown callback this removes some ugly hacks in the GIC-V3-ITS driver, which pretended to work around the short comings of the core code so far. With this update the code is correct by design and implementation. - Make the irqchip MSI library globally available, provide a MSI parent domain creation helper and convert a bunch of (PCI/)MSI drivers over to the modern MSI parent mechanism. This is the first step to get rid of at least one incarnation of the three PCI/MSI management schemes. - The usual small cleanups and improvements" * tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) PCI/MSI: Use bool for MSI enable state tracking PCI: tegra: Convert to MSI parent infrastructure PCI: xgene: Convert to MSI parent infrastructure PCI: apple: Convert to MSI parent infrastructure irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper irqchip/gic: Convert to msi_create_parent_irq_domain() helper genirq/msi: Add helper for creating MSI-parent irq domains irqchip: Make irq-msi-lib.h globally available irqchip/gic-v3-its: Use allocation size from the prepare call genirq/msi: Engage the .msi_teardown() callback on domain removal genirq/msi: Move prepare() call to per-device allocation irqchip/gic-v3-its: Implement .msi_teardown() callback genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable() platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() genirq/msi: Rename msi_[un]lock_descs() ...
2025-05-27Merge tag 'irq-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_*() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_*() ...
2025-05-27Merge tag 'irq-drivers-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq controller updates from Thomas Gleixner: "Update for interrupt chip drivers: - Convert the generic interrupt chip to lock guards to remove copy & pasta boilerplate code and gotos. - A new driver fot the interrupt controller in the EcoNet EN751221 MIPS SoC. - Extend the SG2042-MSI driver to support the new SG2044 SoC - Updates and cleanups for the (ancient) VT8500 driver - Improve the scalability of the ARM GICV4.1 ITS driver by utilizing node local copies a VM's interrupt translation table when possible. This results in a 12% reduction of VM IPI latency in certain workloads. - The usual cleanups and improvements all over the place" * tag 'irq-drivers-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) irqchip/irq-pruss-intc: Simplify chained interrupt handler setup irqchip/gic-v4.1: Use local 4_1 ITS to generate VSGI irqchip/econet-en751221: Switch to of_fwnode_handle() irqchip/irq-vt8500: Switch to irq_domain_create_*() irqchip/econet-en751221: Switch to irq_domain_create_linear() irqchip/irq-vt8500: Use fewer global variables and add error handling irqchip/irq-vt8500: Use a dedicated chained handler function irqchip/irq-vt8500: Don't require 8 interrupts from a chained controller irqchip/irq-vt8500: Drop redundant copy of the device node pointer irqchip/irq-vt8500: Split up ack/mask functions irqchip/sg2042-msi: Fix wrong type cast in sg2044_msi_irq_ack() irqchip/sg2042-msi: Add the Sophgo SG2044 MSI interrupt controller irqchip/sg2042-msi: Introduce configurable chipinfo for SG2042 irqchip/sg2042-msi: Rename functions and data structures to be SG2042 agnostic dt-bindings: interrupt-controller: Add Sophgo SG2044 MSI controller genirq/generic-chip: Fix incorrect lock guard conversions genirq/generic-chip: Remove unused lock wrappers irqchip: Convert generic irqchip locking to guards gpio: mvebu: Convert generic irqchip locking to guard() ARM: orion/gpio:: Convert generic irqchip locking to guard() ...
2025-05-27Merge tag 'irq-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "Updates for the generic interrupt subsystem core code: - Address a long standing subtle problem in the CPU hotplug code for affinity-managed interrupts. Affinity-managed interrupts are shut down by the core code when the last CPU in the affinity set goes offline and started up again when the first CPU in the affinity set becomes online again. This unfortunately does not take into account whether an interrupt has been disabled before the last CPU goes offline and starts up the interrupt unconditionally when the first CPU becomes online again. That's obviously not what drivers expect. Address this by preserving the disabled state for affinity-managed interrupts accross these CPU hotplug operations. All non-managed interrupts are not affected by this because startup/shutdown is coupled to request/free_irq() which obviously has to reset state. - Support three-cell scheme interrupts to allow GPIO drivers to specify interrupts from an already existing scheme - Switch the interrupt subsystem core to lock guards. This gets rid of quite some copy & pasta boilerplate code all over the place. - The usual small cleanups and improvements all over the place" * tag 'irq-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) genirq/irqdesc: Remove double locking in hwirq_show() genirq: Retain disable depth for managed interrupts across CPU hotplug genirq: Bump the size of the local variable for sprintf() genirq/manage: Use the correct lock guard in irq_set_irq_wake() genirq: Consistently use '%u' format specifier for unsigned int variables genirq: Ensure flags in lock guard is consistently initialized genirq: Fix inverted condition in handle_nested_irq() genirq/cpuhotplug: Fix up lock guards conversion brainf..t genirq: Use scoped_guard() to shut clang up genirq: Remove unused remove_percpu_irq() genirq: Remove irq_[get|put]_desc*() genirq/manage: Rework irq_set_irqchip_state() genirq/manage: Rework irq_get_irqchip_state() genirq/manage: Rework teardown_percpu_nmi() genirq/manage: Rework prepare_percpu_nmi() genirq/manage: Rework disable_percpu_irq() genirq/manage: Rework irq_percpu_is_enabled() genirq/manage: Rework enable_percpu_irq() genirq/manage: Rework irq_set_parent() genirq/manage: Rework can_request_irq() ...
2025-05-27Merge tag 'core-entry-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core entry code updates from Thomas Gleixner: "Updates for the generic and architecture entry code: - Move LoongArch and RISC-V ret_from_fork() implementations to C code so that syscall_exit_user_mode() can be inlined - Split the RISC-V ret_from_fork() implementation into return to user and return to kernel, which gives a measurable performance improvement - Inline syscall_exit_user_mode() which benefits all architectures by avoiding a function call and letting the compiler do better optimizations" * tag 'core-entry-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: LoongArch: entry: Fix include order entry: Inline syscall_exit_to_user_mode() LoongArch: entry: Migrate ret_from_fork() to C riscv: entry: Split ret_from_fork() into user and kernel riscv: entry: Convert ret_from_fork() to C
2025-05-27btrfs: don't drop a reference if btrfs_check_write_meta_pointer() failsJosef Bacik
In the zoned mode there's a bug in the extent buffer tree conversion to xarray. The reference for eb is dropped and code continues but the references get dropped by releasing the batch. Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/linux-btrfs/202505191521.435b97ac-lkp@intel.com/ Fixes: 19d7f65f032f ("btrfs: convert the buffer_radix to an xarray") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-27Merge branch 'for-next/vdso' into for-next/coreWill Deacon
* for-next/vdso: arm64: vdso: Use __arch_counter_get_cntvct()
2025-05-27Merge branch 'for-next/sme-fixes' into for-next/coreWill Deacon
* for-next/sme-fixes: (35 commits) arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected arm64/fpsimd: ptrace: Gracefully handle errors arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state arm64/fpsimd: ptrace: Do not present register data for inactive mode arm64/fpsimd: ptrace: Save task state before generating SVE header arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data arm64/fpsimd: Make clone() compatible with ZA lazy saving arm64/fpsimd: Clear PSTATE.SM during clone() arm64/fpsimd: Consistently preserve FPSIMD state during clone() arm64/fpsimd: Remove redundant task->mm check arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return() arm64/fpsimd: Add task_smstop_sm() arm64/fpsimd: Factor out {sve,sme}_state_size() helpers arm64/fpsimd: Clarify sve_sync_*() functions arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE arm64/fpsimd: signal: Consistently read FPSIMD context arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only arm64/fpsimd: Do not discard modified SVE state ...
2025-05-27Merge branch 'for-next/selftests' into for-next/coreWill Deacon
* for-next/selftests: kselftest/arm64: Set default OUTPUT path when undefined kselftest/arm64: fp-ptrace: Adjust to new inactive mode behaviour kselftest/arm64: fp-ptrace: Adjust to new VL change behaviour kselftest/arm64: tpidr2: Adjust to new clone() behaviour kselftest/arm64: fp-ptrace: Fix expected FPMR value when PSTATE.SM is changed
2025-05-27Merge branch 'for-next/psci' into for-next/coreWill Deacon
* for-next/psci: firmware: psci: Fix refcount leak in psci_dt_init
2025-05-27Merge branch 'for-next/perf' into for-next/coreWill Deacon
* for-next/perf: perf/arm-cmn: Add CMN S3 ACPI binding perf/arm-cmn: Initialise cmn->cpu earlier perf/amlogic: Replace smp_processor_id() with raw_smp_processor_id() in meson_ddr_pmu_create() perf/arm-cmn: Fix REQ2/SNP2 mixup perf: Do not enable by default during compile testing perf: arm-ni: Fix missing platform_set_drvdata() perf: arm-ni: Unregister PMUs on probe failure perf/arm-cmn: Remove CMN-600 DTC domain special case
2025-05-27Merge branch 'for-next/mm' into for-next/coreWill Deacon
* for-next/mm: arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz() arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() mm/page_table_check: Batch-check pmds/puds just like ptes arm64: hugetlb: Refine tlb maintenance scope arm64: hugetlb: Cleanup huge_pte size discovery mechanisms arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings arm64: Support ARM64_VA_BITS=52 when setting ARCH_MMAP_RND_BITS_MAX arm64/mm: Remove randomization of the linear map
2025-05-27Merge branch 'for-next/misc' into for-next/coreWill Deacon
* for-next/misc: arm64/cpuinfo: only show one cpu's info in c_show() arm64: Extend pr_crit message on invalid FDT arm64: Kconfig: remove unnecessary selection of CRC32 arm64: Add missing includes for mem_encrypt
2025-05-27Merge branch 'for-next/fixes' into for-next/coreWill Deacon
Merge in for-next/fixes, as subsequent improvements to our early PI code that disallow BSS exports depend on the 'arm64_use_ng_mappings' fix here. * for-next/fixes: arm64: cpufeature: Move arm64_use_ng_mappings to the .data section to prevent wrong idmap generation arm64: errata: Add missing sentinels to Spectre-BHB MIDR arrays
2025-05-27Merge branch 'for-next/entry' into for-next/coreWill Deacon
* for-next/entry: arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again arm64: Update comment regarding values in __boot_cpu_mode arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64: enable PREEMPT_LAZY
2025-05-27Merge branch 'for-next/efi' into for-next/coreWill Deacon
* for-next/efi: arm64/fpsimd: Avoid unnecessary per-CPU buffers for EFI runtime calls
2025-05-27Merge branch 'for-next/cpufeature' into for-next/coreWill Deacon
* for-next/cpufeature: arm64: cputype: Add cputype definition for HIP12 arm64: Expose AIDR_EL1 via sysfs arm64/cpufeature: Add missing id_aa64mmfr4 feature reg update
2025-05-27Merge branch 'for-next/acpi' into for-next/coreWill Deacon
* for-next/acpi: firmware: SDEI: Allow sdei initialization without ACPI_APEI_GHES
2025-05-27Doc: networking: Fix various typos in rds.rstAlok Tiwari
Corrected "sages" to "messages" in the bitmap allocation description. Fixed "competed" to "completed" in the recv path datagram handling section. Corrected "privatee" to "private" in the multipath RDS section. Fixed "mutlipath" to "multipath" in the transport capabilities description. These changes improve documentation clarity and maintain consistency. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20250522074413.3634446-1-alok.a.tiwari@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27net/mlx5e: Allow setting MAC address of representorsMark Bloch
A representor netdev does not correspond to real hardware that needs to be updated when setting the MAC address. The default eth_mac_addr() is sufficient for simply updating the netdev's MAC address with validation. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747898036-1121904-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27Merge branch 'octeontx2-pf-do-not-detect-macsec-block-based-on-silicon'Paolo Abeni
Subbaraya Sundeep says: ==================== octeontx2-pf: Do not detect MACSEC block based on silicon Out of various silicon variants of CN10K series some have hardware MACSEC block for offloading MACSEC operations and some do not. AF driver already has the information of whether MACSEC is present or not on running silicon. Hence fetch that information from AF via mailbox message. ==================== Link: https://patch.msgid.link/1747894516-4565-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27octeontx2-pf: macsec: Get MACSEC capability flag from AFSubbaraya Sundeep
The presence of MACSEC block is currently figured out based on the running silicon variant. This may not be correct all the times since the MACSEC block can be fused out. Hence get the macsec info from AF via mailbox. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747894548-4657-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27octeontx2-af: Add MACSEC capability flagSubbaraya Sundeep
MACSEC block may be fused out on some silicons hence modify get_hw_cap mailbox message to set a capability flag in its response message based on MACSEC block availability. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747894528-4611-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27xsk: add missing virtual address conversion for pageBui Quang Minh
In commit 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()"), when converting from netmem to page, I missed a call to page_address() around skb_frag_page(frag) to get the virtual address of the page. This commit uses skb_frag_address() helper to fix the issue. Fixes: 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()") Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250522040115.5057-1-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27af_packet: move notifier's packet_dev_mc out of rcu critical sectionStanislav Fomichev
Syzkaller reports the following issue: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578 __mutex_lock+0x106/0xe80 kernel/locking/mutex.c:746 team_change_rx_flags+0x38/0x220 drivers/net/team/team_core.c:1781 dev_change_rx_flags net/core/dev.c:9145 [inline] __dev_set_promiscuity+0x3f8/0x590 net/core/dev.c:9189 netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201 dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286 packet_dev_mc net/packet/af_packet.c:3698 [inline] packet_dev_mclist_delete net/packet/af_packet.c:3722 [inline] packet_notifier+0x292/0xa60 net/packet/af_packet.c:4247 notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85 call_netdevice_notifiers_extack net/core/dev.c:2214 [inline] call_netdevice_notifiers net/core/dev.c:2228 [inline] unregister_netdevice_many_notify+0x15d8/0x2330 net/core/dev.c:11972 rtnl_delete_link net/core/rtnetlink.c:3522 [inline] rtnl_dellink+0x488/0x710 net/core/rtnetlink.c:3564 rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6955 netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534 Calling `PACKET_ADD_MEMBERSHIP` on an ops-locked device can trigger the `NETDEV_UNREGISTER` notifier, which may require disabling promiscuous and/or allmulti mode. Both of these operations require acquiring the netdev instance lock. Move the call to `packet_dev_mc` outside of the RCU critical section. The `mclist` modifications (add, del, flush, unregister) are protected by the RTNL, not the RCU. The RCU only protects the `sklist` and its associated `sks`. The delayed operation on the `mclist` entry remains within the RTNL. Reported-by: syzbot+b191b5ccad8d7a986286@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b191b5ccad8d7a986286 Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250522031129.3247266-1-stfomichev@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27Merge branch 'vsock-sock_linger-rework'Paolo Abeni
Michal Luczaj says: ==================== vsock: SOCK_LINGER rework Change vsock's lingerning to wait on close() until all data is sent, i.e. until workers picked all the packets for processing. v5: https://lore.kernel.org/r/20250521-vsock-linger-v5-0-94827860d1d6@rbox.co v4: https://lore.kernel.org/r/20250501-vsock-linger-v4-0-beabbd8a0847@rbox.co v3: https://lore.kernel.org/r/20250430-vsock-linger-v3-0-ddbe73b53457@rbox.co v2: https://lore.kernel.org/r/20250421-vsock-linger-v2-0-fe9febd64668@rbox.co v1: https://lore.kernel.org/r/20250407-vsock-linger-v1-0-1458038e3492@rbox.co Signed-off-by: Michal Luczaj <mhal@rbox.co> ==================== Link: https://patch.msgid.link/20250522-vsock-linger-v6-0-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27vsock/test: Add test for an unexpectedly lingering close()Michal Luczaj
There was an issue with SO_LINGER: instead of blocking until all queued messages for the socket have been successfully sent (or the linger timeout has been reached), close() would block until packets were handled by the peer. Add a test to alert on close() lingering when it should not. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250522-vsock-linger-v6-5-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27vsock/test: Introduce enable_so_linger() helperMichal Luczaj
Add a helper function that sets SO_LINGER. Adapt the caller. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250522-vsock-linger-v6-4-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27vsock/test: Introduce vsock_wait_sent() helperMichal Luczaj
Distill the virtio_vsock_sock::bytes_unsent checking loop (ioctl SIOCOUTQ) and move it to utils. Tweak the comment. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250522-vsock-linger-v6-3-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27vsock: Move lingering logic to af_vsock coreMichal Luczaj
Lingering should be transport-independent in the long run. In preparation for supporting other transports, as well as the linger on shutdown(), move code to core. Generalize by querying vsock_transport::unsent_bytes(), guard against the callback being unimplemented. Do not pass sk_lingertime explicitly. Pull SOCK_LINGER check into vsock_linger(). Flatten the function. Remove the nested block by inverting the condition: return early on !timeout. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250522-vsock-linger-v6-2-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27vsock/virtio: Linger on unsent dataMichal Luczaj
Currently vsock's lingering effectively boils down to waiting (or timing out) until packets are consumed or dropped by the peer; be it by receiving the data, closing or shutting down the connection. To align with the semantics described in the SO_LINGER section of man socket(7) and to mimic AF_INET's behaviour more closely, change the logic of a lingering close(): instead of waiting for all data to be handled, block until data is considered sent from the vsock's transport point of view. That is until worker picks the packets for processing and decrements virtio_vsock_sock::bytes_unsent down to 0. Note that (some interpretation of) lingering was always limited to transports that called virtio_transport_wait_close() on transport release. This does not change, i.e. under Hyper-V and VMCI no lingering would be observed. The implementation does not adhere strictly to man page's interpretation of SO_LINGER: shutdown() will not trigger the lingering. This follows AF_INET. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250522-vsock-linger-v6-1-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27net: phy: add driver for MaxLinear MxL86110 PHYStefano Radaelli
Add support for the MaxLinear MxL86110 Gigabit Ethernet PHY, a low-power, cost-optimized transceiver supporting 10/100/1000 Mbps over twisted-pair copper, compliant with IEEE 802.3. The driver implements basic features such as: - Device initialization - RGMII interface timing configuration - Wake-on-LAN support - LED initialization and control via /sys/class/leds This driver has been tested on multiple Variscite boards, including: - VAR-SOM-MX93 (i.MX93) - VAR-SOM-MX8M-PLUS (i.MX8MP) Example boot log showing driver probe: [ 7.692101] imx-dwmac 428a0000.ethernet eth0: PHY [stmmac-0:00] driver [MXL86110 Gigabit Ethernet] (irq=POLL) Signed-off-by: Stefano Radaelli <stefano.radaelli21@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250521212821.593057-1-stefano.radaelli21@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27Merge branch 'wireguard-updates-for-6-16'Paolo Abeni
Jason A. Donenfeld says: ==================== wireguard updates for 6.16 This small series contains mostly cleanups and one new feature: 1) Kees' __nonstring annotation comes to wireguard. 2) Two selftest fixes, one to help with compilation on gcc 15, and one removing stale config options. 3) Adoption of NLA_POLICY_MASK. 4) Jordan has added the ability to run: # wg set ... peer ... allowed-ips -192.168.1.0/24 Which will remove the allowed IP for that peer. Previously you had to replace all the IPs non-atomically, or move it to a dummy peer atomically, which wasn't very clean. ==================== Link: https://patch.msgid.link/20250521212707.1767879-1-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: selftests: specify -std=gnu17 for bashJason A. Donenfeld
GCC 15 defaults to C23, which bash can't compile under, so specify gnu17 explicitly. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-6-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: allowedips: add WGALLOWEDIP_F_REMOVE_ME flagJordan Rife
The current netlink API for WireGuard does not directly support removal of allowed ips from a peer. A user can remove an allowed ip from a peer in one of two ways: 1. By using the WGPEER_F_REPLACE_ALLOWEDIPS flag and providing a new list of allowed ips which omits the allowed ip that is to be removed. 2. By reassigning an allowed ip to a "dummy" peer then removing that peer with WGPEER_F_REMOVE_ME. With the first approach, the driver completely rebuilds the allowed ip list for a peer. If my current configuration is such that a peer has allowed ips 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2 the actual transition looks like this. [192.168.0.2, 192.168.0.3] <-- Initial state [] <-- Step 1: Allowed ips removed for peer [192.168.0.3] <-- Step 2: Allowed ips added back for peer This is true even if the allowed ip list is small and the update does not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the removal and subsequent addition of ips is non-atomic within a single request. Consequently, wg_allowedips_lookup_dst and wg_allowedips_lookup_src may return NULL while reconfiguring a peer even for packets bound for ips a user did not intend to remove leading to unintended interruptions in connectivity. This presents in userspace as failed calls to sendto and sendmsg for UDP sockets. In my case, I ran netperf while repeatedly reconfiguring the allowed ips for a peer with wg. /usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024 send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host While this may not be of particular concern for environments where peers and allowed ips are mostly static, systems like Cilium manage peers and allowed ips in a dynamic environment where peers (i.e. Kubernetes nodes) and allowed ips (i.e. pods running on those nodes) can frequently change making WGPEER_F_REPLACE_ALLOWEDIPS problematic. The second approach avoids any possible connectivity interruptions but is hacky and less direct, requiring the creation of a temporary peer just to dispose of an allowed ip. Introduce a new flag called WGALLOWEDIP_F_REMOVE_ME which in the same way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from a WireGuard device's configuration allows a user to remove an ip from a peer's set of allowed ips. This enables incremental updates to a device's configuration without any connectivity blips or messy workarounds. A corresponding patch for wg extends the existing `wg set` interface to leverage this feature. $ wg set wg0 peer <PUBKEY> allowed-ips +192.168.88.0/24,-192.168.0.1/32 When '+' or '-' is prepended to any ip in the list, wg clears WGPEER_F_REPLACE_ALLOWEDIPS and sets the WGALLOWEDIP_F_REMOVE_ME flag on any ip prefixed with '-'. Signed-off-by: Jordan Rife <jordan@jrife.io> [Jason: minor style nits, fixes to selftest, bump of wireguard-tools version] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-5-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: netlink: use NLA_POLICY_MASK where possibleJason A. Donenfeld
Rather than manually validating flags against the various __ALL_* constants, put this in the netlink policy description and have the upper layer machinery check it for us. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-4-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: global: add __nonstring annotations for unterminated stringsKees Cook
When a character array without a terminating NUL character has a static initializer, GCC 15's -Wunterminated-string-initialization will only warn if the array lacks the "nonstring" attribute[1]. Mark the arrays with __nonstring to correctly identify the char array as "not a C string" and thereby eliminate the warning: ../drivers/net/wireguard/cookie.c:29:56: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization] 29 | static const u8 mac1_key_label[COOKIE_KEY_LABEL_LEN] = "mac1----"; | ^~~~~~~~~~ ../drivers/net/wireguard/cookie.c:30:58: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (9 chars into 8 available) [-Wunterminated-string-initialization] 30 | static const u8 cookie_key_label[COOKIE_KEY_LABEL_LEN] = "cookie--"; | ^~~~~~~~~~ ../drivers/net/wireguard/noise.c:28:38: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (38 chars into 37 available) [-Wunterminated-string-initialization] 28 | static const u8 handshake_name[37] = "Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s"; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/net/wireguard/noise.c:29:39: warning: initializer-string for array of 'unsigned char' truncates NUL terminator but destination lacks 'nonstring' attribute (35 chars into 34 available) [-Wunterminated-string-initialization] 29 | static const u8 identifier_name[34] = "WireGuard v1 zx2c4 Jason@zx2c4.com"; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The arrays are always used with their fixed size, so use __nonstring. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1] Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-3-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: selftests: cleanup CONFIG_UBSAN_SANITIZE_ALLWangYuli
Commit 918327e9b7ff ("ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL") removed the CONFIG_UBSAN_SANITIZE_ALL configuration option. Eliminate invalid configurations to improve code readability. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-2-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>