summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-27selftests/bpf: Add unit tests with __bpf_trap() kfuncYonghong Song
Add some inline-asm tests and C tests where __bpf_trap() or __builtin_trap() is used in the code. The __builtin_trap() test is guarded with llvm21 ([1]) since otherwise the compilation failure will happen. [1] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205331.1291734-1-yonghong.song@linux.dev Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'x86_sev_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD SEV update from Borislav Petkov: "Add a virtual TPM driver glue which allows a guest kernel to talk to a TPM device emulated by a Secure VM Service Module (SVSM) - a helper module of sorts which runs at a different privilege level in the SEV-SNP VM stack. The intent being that a TPM device is emulated by a trusted entity and not by the untrusted host which is the default assumption in the confidential computing scenarios" * tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Register tpm-svsm platform device tpm: Add SNP SVSM vTPM driver svsm: Add header with SVSM_VTPM_CMD helpers x86/sev: Add SVSM vTPM probe/send_command functions
2025-05-27Merge tag 'x86_mtrr_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mtrr update from Borislav Petkov: "A single change to verify the presence of fixed MTRR ranges before accessing the respective MSRs" * tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()
2025-05-27Merge tag 'edac_updates_for_v6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - ie31200: Add support for Raptor Lake-S and Alder Lake-S compute dies - Rework how RRL registers per channel tracking is done in order to support newer hardware with different RRL configurations and refactor that code. Add support for Granite Rapids server - i10nm: explicitly set RRL modes to fix any wrong BIOS programming - Properly save and restore Retry Read error Log channel configuration info on Intel drivers - igen6: Handle correctly the case of fused off memory controllers on Arizona Beach and Amston Lake SoCs before adding support for them - the usual set of fixes and cleanups * tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/bluefield: Don't use bluefield_edac_readl() result on error EDAC/i10nm: Fix the bitwise operation between variables of different sizes EDAC/ie31200: Add two Intel SoCs for EDAC support EDAC/{skx_common,i10nm}: Add RRL support for Intel Granite Rapids server EDAC/{skx_common,i10nm}: Refactor show_retry_rd_err_log() EDAC/{skx_common,i10nm}: Refactor enable_retry_rd_err_log() EDAC/{skx_common,i10nm}: Structure the per-channel RRL registers EDAC/i10nm: Explicitly set the modes of the RRL register sets EDAC/{skx_common,i10nm}: Fix the loss of saved RRL for HBM pseudo channel 0 EDAC/skx_common: Fix general protection fault EDAC/igen6: Add Intel Amston Lake SoCs support EDAC/igen6: Add Intel Arizona Beach SoCs support EDAC/igen6: Skip absent memory controllers
2025-05-27bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variableYonghong Song
Marc Suñé (Isovalent, part of Cisco) reported an issue where an uninitialized variable caused generating bpf prog binary code not working as expected. The reproducer is in [1] where the flags “-Wall -Werror” are enabled, but there is no warning as the compiler takes advantage of uninitialized variable to do aggressive optimization. The optimized code looks like below: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 <END> The verifier will report the following failure: 9: (85) call bpf_trace_printk#6 last insn is not an exit or jmp The above verifier log does not give a clear hint about how to fix the problem and user may take quite some time to figure out that the issue is due to compiler taking advantage of uninitialized variable. In llvm internals, uninitialized variable usage may generate 'unreachable' IR insn and these 'unreachable' IR insns may indicate uninitialized variable impact on code optimization. So far, llvm BPF backend ignores 'unreachable' IR hence the above code is generated. With clang21 patch [2], those 'unreachable' IR insn are converted to func __bpf_trap(). In order to maintain proper control flow graph for bpf progs, [2] also adds an 'exit' insn after bpf_trap() if __bpf_trap() is the last insn in the function. The new code looks like: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 10: 85 10 00 00 ff ff ff ff call -0x1 0000000000000050: R_BPF_64_32 __bpf_trap 11: 95 00 00 00 00 00 00 00 exit <END> In kernel, a new kfunc __bpf_trap() is added. During insn verification, any hit with __bpf_trap() will result in verification failure. The kernel is able to provide better log message for debugging. With llvm patch [2] and without this patch (no __bpf_trap() kfunc for existing kernel), e.g., for old kernels, the verifier outputs 10: <invalid kfunc call> kfunc '__bpf_trap' is referenced but wasn't resolved Basically, kernel does not support __bpf_trap() kfunc. This still didn't give clear signals about possible reason. With llvm patch [2] and with this patch, the verifier outputs 10: (85) call __bpf_trap#74479 unexpected __bpf_trap() due to uninitialized variable? It gives much better hints for verification failure. [1] https://github.com/msune/clang_bpf/blob/main/Makefile#L3 [2] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205326.1291640-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'x86_cache_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
2025-05-27bpf: Remove special_kfunc_set from verifierYonghong Song
Currently, the verifier has both special_kfunc_set and special_kfunc_list. When adding a new kfunc usage to the verifier, it is often confusing about whether special_kfunc_set or special_kfunc_list or both should add that kfunc. For example, some kfuncs, e.g., bpf_dynptr_from_skb, bpf_dynptr_clone, bpf_wq_set_callback_impl, does not need to be in special_kfunc_set. To avoid potential future confusion, special_kfunc_set is deleted and btf_id_set_contains(&special_kfunc_set, ...) is removed. The code is refactored with a new func check_special_kfunc(), which contains all codes covered by original branch meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id) There is no functionality change. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205321.1291431-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge branch 'replace-config_dmabuf_sysfs_stats-with-bpf'Alexei Starovoitov
T.J. Mercier says: ==================== Replace CONFIG_DMABUF_SYSFS_STATS with BPF Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to perform per-buffer accounting with debugfs which is not suitable for production environments. Eventually we discovered the overhead with per-buffer sysfs file creation/removal was significantly impacting allocation and free times, and exacerbated kernfs lock contention. [2] dma_buf_stats_setup() is responsible for 39% of single-page buffer creation duration, or 74% of single-page dma_buf_export() duration when stressing dmabuf allocations and frees. I prototyped a change from per-buffer to per-exporter statistics with a RCU protected list of exporter allocations that accommodates most (but not all) of our use-cases and avoids almost all of the sysfs overhead. While that adds less overhead than per-buffer sysfs, and less even than the maintenance of the dmabuf debugfs_list, it's still *additional* overhead on top of the debugfs_list and doesn't give us per-buffer info. This series uses the existing dmabuf debugfs_list to implement a BPF dmabuf iterator, which adds no overhead to buffer allocation/free and provides per-buffer info. The list has been moved outside of CONFIG_DEBUG_FS scope so that it is always populated. The BPF program loaded by userspace that extracts per-buffer information gets to define its own interface which avoids the lack of ABI stability with debugfs. This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and the plan is to remove it from the kernel after the next longterm stable release. [1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.com [2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com v1 -> v2: Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov Add open-coded iterator and selftest per Alexei Starovoitov Add a second test buffer from the system dmabuf heap to selftests Use the BPF program we'll use in production for selftest per Alexei Starovoitov https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.c https://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdmabufinfo/dmabuf_bpf_stats.cpp v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com v2 -> v3: Rebase onto bpf-next/master Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the new get_first_dmabuf(). This avoids having to expose the dmabuf list and mutex to the rest of the kernel, and keeps the dmabuf mutex operations near each other in the same file. (Christian König) Add Christian's RB to dma-buf: Rename debugfs symbols Drop RFC: dma-buf: Remove DMA-BUF statistics v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com v3 -> v4: Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov Fix dma-buf.c kdoc comment style per Alexei Starovoitov Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin / dma_buf_iter_next per Christian König Add Christian's RB to bpf: Add dmabuf iterator v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com v4 -> v5: Add Christian's Acks to all patches Add Song Liu's Acks Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per Song Liu Fix open-coded iterator comment style per Song Liu Move iterator termination check to its own subtest per Song Liu Rework selftest buffer creation per Song Liu Fix spacing in sanitize_string per BPF CI v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com v5 -> v6: Song Liu: Init test buffer FDs to -1 Zero-init udmabuf_create for future proofing Bail early for iterator fd/FILE creation failure Dereference char ptr to check for NUL in sanitize_string() Move map insertion from create_test_buffers() to test_dmabuf_iter() Add ACK to selftests/bpf: Add test for open coded dmabuf_iter v6: https://lore.kernel.org/all/20250513163601.812317-1-tjmercier@google.com v6 -> v7: Zero uninitialized name bytes following the end of name strings per s390x BPF CI Reorder sanitize_string bounds checks per Song Liu Add Song's Ack to: selftests/bpf: Add test for dmabuf_iter Rebase onto bpf-next/master per BPF CI ==================== Link: https://patch.msgid.link/20250522230429.941193-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for open coded dmabuf_iterT.J. Mercier
Use the same test buffers as the traditional iterator and a new BPF map to verify the test buffers can be found with the open coded dmabuf iterator. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-6-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for dmabuf_iterT.J. Mercier
This test creates a udmabuf, and a dmabuf from the system dmabuf heap, and uses a BPF program that prints dmabuf metadata with the new dmabuf_iter to verify they can be found. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-5-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf: Add open coded dmabuf iteratorT.J. Mercier
This open coded iterator allows for more flexibility when creating BPF programs. It can support output in formats other than text. With an open coded iterator, a single BPF program can traverse multiple kernel data structures (now including dmabufs), allowing for more efficient analysis of kernel data compared to multiple reads from procfs, sysfs, or multiple traditional BPF iterator invocations. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-4-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf: Add dmabuf iteratorT.J. Mercier
The dmabuf iterator traverses the list of all DMA buffers. DMA buffers are refcounted through their associated struct file. A reference is taken on each buffer as the list is iterated to ensure each buffer persists for the duration of the bpf program execution without holding the list mutex. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-3-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27dma-buf: Rename debugfs symbolsT.J. Mercier
Rename the debugfs list and mutex so it's clear they are now usable without the need for CONFIG_DEBUG_FS. The list will always be populated to support the creation of a BPF iterator for dmabufs. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-2-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'timers-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "Updates for the time/timer core code: - Rework the initialization of the posix-timer kmem_cache and move the cache pointer into the timer_data structure to prevent false sharing - Switch the alarmtimer code to lock guards - Improve the CPU selection criteria in the per CPU validation of the clocksource watchdog to avoid arbitrary selections (or omissions) on systems with a small number of CPUs - The usual cleanups and improvements" * tag 'timers-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Remove unused tick_nohz_full_add_cpus_to() clocksource: Fix the CPUs' choice in the watchdog per CPU verification alarmtimer: Switch spin_{lock,unlock}_irqsave() to guards alarmtimer: Remove dead return value in clock2alarm() time/jiffies: Change register_refined_jiffies() to void __init timers: Remove unused __round_jiffies(_up) posix-timers: Initialize cache early and move pointer into __timer_data
2025-05-27Merge tag 'timers-clocksource-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource updates from Thomas Gleixner: "Updates for clocksource/clockevent drivers: - The final conversion of text formatted device tree binding to schemas - A new driver fot the System Timer Module on S32G NXP SoCs - A new driver fot the Econet HPT timer - The usual improvements and device tree binding updates" * tag 'timers-clocksource-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) clocksource/drivers/renesas-ostm: Unconditionally enable reprobe support dt-bindings: timer: renesas,ostm: Document RZ/V2N (R9A09G056) support dt-bindings: timer: Convert marvell,armada-370-timer to DT schema dt-bindings: timer: Convert ti,keystone-timer to DT schema dt-bindings: timer: Convert st,spear-timer to DT schema dt-bindings: timer: Convert socionext,milbeaut-timer to DT schema dt-bindings: timer: Convert snps,arc-timer to DT schema dt-bindings: timer: Convert snps,archs-rtc to DT schema dt-bindings: timer: Convert snps,archs-gfrc to DT schema dt-bindings: timer: Convert lsi,zevio-timer to DT schema dt-bindings: timer: Convert jcore,pit to DT schema dt-bindings: timer: Convert img,pistachio-gptimer to DT schema dt-bindings: timer: Convert ezchip,nps400-timer to DT schema dt-bindings: timer: Convert cirrus,clps711x-timer to DT schema dt-bindings: timer: Convert altr,timer-1.0 to DT schema dt-bindings: timer: Add ESWIN EIC7700 CLINT clocksource/drivers: Add EcoNet Timer HPT driver dt-bindings: timer: Add EcoNet EN751221 "HPT" CPU Timer dt-bindings: timer: Convert arm,mps2-timer to DT schema dt-bindings: timer: Add Sophgo SG2044 ACLINT timer ...
2025-05-27Merge tag 'timers-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer*(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_*() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()
2025-05-27Merge tag 'irq-msi-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI updates from Thomas Gleixner: "Updates for the MSI subsystem (core code and PCI): - Switch the MSI descriptor locking to lock guards - Replace a broken and naive implementation of PCI/MSI-X control word updates in the PCI/TPH driver with a properly serialized variant in the PCI/MSI core code. - Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by replacing the direct access to the MSI descriptors with the proper API function calls. People will never understand that APIs exist for a reason... - Provide core infrastructre for the upcoming PCI endpoint library extensions. Currently limited to ARM GICv3+, but in theory extensible to other architectures. - Provide a MSI domain::teardown() callback, which allows drivers to undo the effects of the prepare() callback. - Move the MSI domain::prepare() callback invocation to domain creation time to avoid redundant (and in case of ARM/GIC-V3-ITS confusing) invocations on every allocation. In combination with the new teardown callback this removes some ugly hacks in the GIC-V3-ITS driver, which pretended to work around the short comings of the core code so far. With this update the code is correct by design and implementation. - Make the irqchip MSI library globally available, provide a MSI parent domain creation helper and convert a bunch of (PCI/)MSI drivers over to the modern MSI parent mechanism. This is the first step to get rid of at least one incarnation of the three PCI/MSI management schemes. - The usual small cleanups and improvements" * tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) PCI/MSI: Use bool for MSI enable state tracking PCI: tegra: Convert to MSI parent infrastructure PCI: xgene: Convert to MSI parent infrastructure PCI: apple: Convert to MSI parent infrastructure irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper irqchip/gic: Convert to msi_create_parent_irq_domain() helper genirq/msi: Add helper for creating MSI-parent irq domains irqchip: Make irq-msi-lib.h globally available irqchip/gic-v3-its: Use allocation size from the prepare call genirq/msi: Engage the .msi_teardown() callback on domain removal genirq/msi: Move prepare() call to per-device allocation irqchip/gic-v3-its: Implement .msi_teardown() callback genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable() platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() genirq/msi: Rename msi_[un]lock_descs() ...
2025-05-27Merge tag 'irq-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_*() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_*() ...
2025-05-27Merge tag 'irq-drivers-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq controller updates from Thomas Gleixner: "Update for interrupt chip drivers: - Convert the generic interrupt chip to lock guards to remove copy & pasta boilerplate code and gotos. - A new driver fot the interrupt controller in the EcoNet EN751221 MIPS SoC. - Extend the SG2042-MSI driver to support the new SG2044 SoC - Updates and cleanups for the (ancient) VT8500 driver - Improve the scalability of the ARM GICV4.1 ITS driver by utilizing node local copies a VM's interrupt translation table when possible. This results in a 12% reduction of VM IPI latency in certain workloads. - The usual cleanups and improvements all over the place" * tag 'irq-drivers-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) irqchip/irq-pruss-intc: Simplify chained interrupt handler setup irqchip/gic-v4.1: Use local 4_1 ITS to generate VSGI irqchip/econet-en751221: Switch to of_fwnode_handle() irqchip/irq-vt8500: Switch to irq_domain_create_*() irqchip/econet-en751221: Switch to irq_domain_create_linear() irqchip/irq-vt8500: Use fewer global variables and add error handling irqchip/irq-vt8500: Use a dedicated chained handler function irqchip/irq-vt8500: Don't require 8 interrupts from a chained controller irqchip/irq-vt8500: Drop redundant copy of the device node pointer irqchip/irq-vt8500: Split up ack/mask functions irqchip/sg2042-msi: Fix wrong type cast in sg2044_msi_irq_ack() irqchip/sg2042-msi: Add the Sophgo SG2044 MSI interrupt controller irqchip/sg2042-msi: Introduce configurable chipinfo for SG2042 irqchip/sg2042-msi: Rename functions and data structures to be SG2042 agnostic dt-bindings: interrupt-controller: Add Sophgo SG2044 MSI controller genirq/generic-chip: Fix incorrect lock guard conversions genirq/generic-chip: Remove unused lock wrappers irqchip: Convert generic irqchip locking to guards gpio: mvebu: Convert generic irqchip locking to guard() ARM: orion/gpio:: Convert generic irqchip locking to guard() ...
2025-05-27Merge tag 'irq-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "Updates for the generic interrupt subsystem core code: - Address a long standing subtle problem in the CPU hotplug code for affinity-managed interrupts. Affinity-managed interrupts are shut down by the core code when the last CPU in the affinity set goes offline and started up again when the first CPU in the affinity set becomes online again. This unfortunately does not take into account whether an interrupt has been disabled before the last CPU goes offline and starts up the interrupt unconditionally when the first CPU becomes online again. That's obviously not what drivers expect. Address this by preserving the disabled state for affinity-managed interrupts accross these CPU hotplug operations. All non-managed interrupts are not affected by this because startup/shutdown is coupled to request/free_irq() which obviously has to reset state. - Support three-cell scheme interrupts to allow GPIO drivers to specify interrupts from an already existing scheme - Switch the interrupt subsystem core to lock guards. This gets rid of quite some copy & pasta boilerplate code all over the place. - The usual small cleanups and improvements all over the place" * tag 'irq-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) genirq/irqdesc: Remove double locking in hwirq_show() genirq: Retain disable depth for managed interrupts across CPU hotplug genirq: Bump the size of the local variable for sprintf() genirq/manage: Use the correct lock guard in irq_set_irq_wake() genirq: Consistently use '%u' format specifier for unsigned int variables genirq: Ensure flags in lock guard is consistently initialized genirq: Fix inverted condition in handle_nested_irq() genirq/cpuhotplug: Fix up lock guards conversion brainf..t genirq: Use scoped_guard() to shut clang up genirq: Remove unused remove_percpu_irq() genirq: Remove irq_[get|put]_desc*() genirq/manage: Rework irq_set_irqchip_state() genirq/manage: Rework irq_get_irqchip_state() genirq/manage: Rework teardown_percpu_nmi() genirq/manage: Rework prepare_percpu_nmi() genirq/manage: Rework disable_percpu_irq() genirq/manage: Rework irq_percpu_is_enabled() genirq/manage: Rework enable_percpu_irq() genirq/manage: Rework irq_set_parent() genirq/manage: Rework can_request_irq() ...
2025-05-27Merge tag 'core-entry-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core entry code updates from Thomas Gleixner: "Updates for the generic and architecture entry code: - Move LoongArch and RISC-V ret_from_fork() implementations to C code so that syscall_exit_user_mode() can be inlined - Split the RISC-V ret_from_fork() implementation into return to user and return to kernel, which gives a measurable performance improvement - Inline syscall_exit_user_mode() which benefits all architectures by avoiding a function call and letting the compiler do better optimizations" * tag 'core-entry-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: LoongArch: entry: Fix include order entry: Inline syscall_exit_to_user_mode() LoongArch: entry: Migrate ret_from_fork() to C riscv: entry: Split ret_from_fork() into user and kernel riscv: entry: Convert ret_from_fork() to C
2025-05-27virtio_rtc: Add RTC class driverPeter Hilber
Expose the virtio-rtc UTC-like clock as an RTC clock to userspace - if it is present, and if it does not step on leap seconds. The RTC class enables the virtio-rtc device to resume the system from sleep states on RTC alarm. Support RTC alarm if the virtio-rtc alarm feature is present. The virtio-rtc device signals an alarm by marking an alarmq buffer as used. Peculiarities ------------- A virtio-rtc clock is a bit special for an RTC clock in that - the clock may step (also backwards) autonomously at any time and - the device, and its notification mechanism, will be reset during boot or resume from sleep. The virtio-rtc device avoids that the driver might miss an alarm. The device signals an alarm whenever the clock has reached or passed the alarm time, and also when the device is reset (on boot or resume from sleep), if the alarm time is in the past. Open Issue ---------- The CLOCK_BOOTTIME_ALARM will use the RTC clock to wake up from sleep, and implicitly assumes that no RTC clock steps will occur during sleep. The RTC class driver does not know whether the current alarm is a real-time alarm or a boot-time alarm. Perhaps this might be handled by the driver also setting a virtio-rtc monotonic alarm (which uses a clock similar to CLOCK_BOOTTIME_ALARM). The virtio-rtc monotonic alarm would just be used to wake up in case it was a CLOCK_BOOTTIME_ALARM alarm. Otherwise, the behavior should not differ from other RTC class drivers. Signed-off-by: Peter Hilber <quic_philber@quicinc.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Message-Id: <20250509160734.1772-5-quic_philber@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-27virtio_rtc: Add Arm Generic Timer cross-timestampingPeter Hilber
For platforms using the Arm Generic Timer, add precise cross-timestamping support to virtio_rtc. Always report the CP15 virtual counter as the HW counter in use by arm_arch_timer, since the Linux kernel's usage of the Arm Generic Timer should always be compatible with this. Signed-off-by: Peter Hilber <quic_philber@quicinc.com> Message-Id: <20250509160734.1772-4-quic_philber@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-27virtio_rtc: Add PTP clocksPeter Hilber
Expose the virtio_rtc clocks as PTP clocks to userspace, similar to ptp_kvm. virtio_rtc can expose multiple clocks, e.g. a UTC clock and a monotonic clock. Userspace should distinguish different clocks through the name assigned by the driver. In particular, UTC-like clocks can also be distinguished by if and how leap seconds are smeared. udev rules such as the following can be used to get different symlinks for different clock types: SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP type 0/variant 0", SYMLINK += "ptp_virtio" SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP type 1/variant 0", SYMLINK += "ptp_virtio_tai" SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP type 2/variant 0", SYMLINK += "ptp_virtio_monotonic" SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP type 3/variant 0", SYMLINK += "ptp_virtio_smear_unspecified" SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP type 3/variant 1", SYMLINK += "ptp_virtio_smear_noon_linear" SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP type 3/variant 2", SYMLINK += "ptp_virtio_smear_sls" SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP type 4/variant 0", SYMLINK += "ptp_virtio_maybe_smeared" The preferred PTP clock reading method is ioctl PTP_SYS_OFFSET_PRECISE2, through the ptp_clock_info.getcrosststamp() op. For now, PTP_SYS_OFFSET_PRECISE2 will return -EOPNOTSUPP through a weak function. PTP_SYS_OFFSET_PRECISE2 requires cross-timestamping support for specific clocksources, which will be added in the following. If the clocksource specific code is enabled, check that the Virtio RTC device supports the respective HW counter before obtaining an actual cross-timestamp from the Virtio device. The Virtio RTC device response time may be higher than the timekeeper seqcount increment interval. Therefore, obtain the cross-timestamp before calling get_device_system_crosststamp(). As a fallback, support the ioctl PTP_SYS_OFFSET_EXTENDED2 for all platforms. Assume that concurrency issues during PTP clock removal are avoided by the posix_clock framework. Kconfig recursive dependencies prevent virtio_rtc from implicitly enabling PTP_1588_CLOCK, therefore just warn the user if PTP_1588_CLOCK is not available. Since virtio_rtc should in the future also expose clocks as RTC class devices, do not depend VIRTIO_RTC on PTP_1588_CLOCK. Signed-off-by: Peter Hilber <quic_philber@quicinc.com> Message-Id: <20250509160734.1772-3-quic_philber@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-27virtio_rtc: Add module and driver corePeter Hilber
Add the virtio_rtc module and driver core. The virtio_rtc module implements a driver compatible with the proposed Virtio RTC device specification. The Virtio RTC (Real Time Clock) device provides information about current time. The device can provide different clocks, e.g. for the UTC or TAI time standards, or for physical time elapsed since some past epoch. The driver can read the clocks with simple or more accurate methods. Implement the core, which interacts with the Virtio RTC device. Apart from this, the core does not expose functionality outside of the virtio_rtc module. Follow-up patches will expose PTP clocks and an RTC Class device. Provide synchronous messaging, which is enough for the expected time synchronization use cases through PTP clocks (similar to ptp_kvm) or RTC Class device. Signed-off-by: Peter Hilber <quic_philber@quicinc.com> Message-Id: <20250509160734.1772-2-quic_philber@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-27vringh: use bvec_kmap_localChristoph Hellwig
Use the bvec_kmap_local helper rather than digging into the bvec internals. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20250501142244.2888227-1-hch@lst.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2025-05-27vhost: vringh: Use matching allocation type in resize_iovec()Kees Cook
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "struct kvec *", but the returned type will be "struct iovec *". These have the same allocation size, so there is no bug: struct kvec { void *iov_base; /* and that should *never* hold a userland pointer */ size_t iov_len; }; struct iovec { void __user *iov_base; /* BSD uses caddr_t (1003.1g requires void *) */ __kernel_size_t iov_len; /* Must be size_t (1003.1g) */ }; Adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees@kernel.org> Message-Id: <20250426062214.work.334-kees@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-27virtio-pci: Fix result size returned for the admin command completionIsrael Rukshin
The result size returned by virtio_pci_admin_dev_parts_get() is 8 bytes larger than the actual result data size. This occurs because the result_sg_size field of the command is filled with the result length from virtqueue_get_buf(), which includes both the data size and an additional 8 bytes of status. This oversized result size causes two issues: 1. The state transferred to the destination includes 8 bytes of extra data at the end. 2. The allocated buffer in the kernel may be smaller than the returned size, leading to failures when reading beyond the allocated size. The commit fixes this by subtracting the status size from the result of virtqueue_get_buf(). This fix has been tested through live migrations with virtio-net, virtio-net-transitional, and virtio-blk devices. Fixes: 704806ca400e ("virtio: Extend the admin command to include the result size") Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <1745318025-23103-1-git-send-email-israelr@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-05-27vdpa/octeon_ep: Control PCI dev enabling manuallyPhilipp Stanner
PCI region request functions such as pci_request_region() currently have the problem of becoming sometimes managed functions, if pcim_enable_device() instead of pci_enable_device() was called. The PCI subsystem wants to remove this deprecated behavior from its interfaces. octeopn_ep enables its device with pcim_enable_device() (for VF. PF uses manual management), but does so only to get automatic disablement. The driver wants to manage its PCI resources for VF manually, without devres. The easiest way not to use automatic resource management at all is by also handling device enable- and disablement manually. Replace pcim_enable_device() with pci_enable_device(). Add the necessary calls to pci_disable_device(). Signed-off-by: Philipp Stanner <phasta@kernel.org> Acked-by: Vamsi Attunuru <vattunuru@marvell.com> Message-Id: <20250508085134.24084-2-phasta@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Philipp Stanner &lt;<a href="mailto:phasta@kernel.org" target="_blank">phasta@kernel.org</a>&gt;<br> Acked-by: Vamsi Attunuru &lt;<a href="mailto:vattunuru@marvell.com" target="_blank">vattunuru@marvell.com</a>&gt;<br>
2025-05-27btrfs: don't drop a reference if btrfs_check_write_meta_pointer() failsJosef Bacik
In the zoned mode there's a bug in the extent buffer tree conversion to xarray. The reference for eb is dropped and code continues but the references get dropped by releasing the batch. Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/linux-btrfs/202505191521.435b97ac-lkp@intel.com/ Fixes: 19d7f65f032f ("btrfs: convert the buffer_radix to an xarray") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-27Merge branch 'for-next/vdso' into for-next/coreWill Deacon
* for-next/vdso: arm64: vdso: Use __arch_counter_get_cntvct()
2025-05-27Merge branch 'for-next/sme-fixes' into for-next/coreWill Deacon
* for-next/sme-fixes: (35 commits) arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected arm64/fpsimd: ptrace: Gracefully handle errors arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state arm64/fpsimd: ptrace: Do not present register data for inactive mode arm64/fpsimd: ptrace: Save task state before generating SVE header arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data arm64/fpsimd: Make clone() compatible with ZA lazy saving arm64/fpsimd: Clear PSTATE.SM during clone() arm64/fpsimd: Consistently preserve FPSIMD state during clone() arm64/fpsimd: Remove redundant task->mm check arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return() arm64/fpsimd: Add task_smstop_sm() arm64/fpsimd: Factor out {sve,sme}_state_size() helpers arm64/fpsimd: Clarify sve_sync_*() functions arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE arm64/fpsimd: signal: Consistently read FPSIMD context arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only arm64/fpsimd: Do not discard modified SVE state ...
2025-05-27Merge branch 'for-next/selftests' into for-next/coreWill Deacon
* for-next/selftests: kselftest/arm64: Set default OUTPUT path when undefined kselftest/arm64: fp-ptrace: Adjust to new inactive mode behaviour kselftest/arm64: fp-ptrace: Adjust to new VL change behaviour kselftest/arm64: tpidr2: Adjust to new clone() behaviour kselftest/arm64: fp-ptrace: Fix expected FPMR value when PSTATE.SM is changed
2025-05-27Merge branch 'for-next/psci' into for-next/coreWill Deacon
* for-next/psci: firmware: psci: Fix refcount leak in psci_dt_init
2025-05-27Merge branch 'for-next/perf' into for-next/coreWill Deacon
* for-next/perf: perf/arm-cmn: Add CMN S3 ACPI binding perf/arm-cmn: Initialise cmn->cpu earlier perf/amlogic: Replace smp_processor_id() with raw_smp_processor_id() in meson_ddr_pmu_create() perf/arm-cmn: Fix REQ2/SNP2 mixup perf: Do not enable by default during compile testing perf: arm-ni: Fix missing platform_set_drvdata() perf: arm-ni: Unregister PMUs on probe failure perf/arm-cmn: Remove CMN-600 DTC domain special case
2025-05-27Merge branch 'for-next/mm' into for-next/coreWill Deacon
* for-next/mm: arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz() arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() mm/page_table_check: Batch-check pmds/puds just like ptes arm64: hugetlb: Refine tlb maintenance scope arm64: hugetlb: Cleanup huge_pte size discovery mechanisms arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings arm64: Support ARM64_VA_BITS=52 when setting ARCH_MMAP_RND_BITS_MAX arm64/mm: Remove randomization of the linear map
2025-05-27Merge branch 'for-next/misc' into for-next/coreWill Deacon
* for-next/misc: arm64/cpuinfo: only show one cpu's info in c_show() arm64: Extend pr_crit message on invalid FDT arm64: Kconfig: remove unnecessary selection of CRC32 arm64: Add missing includes for mem_encrypt
2025-05-27Merge branch 'for-next/fixes' into for-next/coreWill Deacon
Merge in for-next/fixes, as subsequent improvements to our early PI code that disallow BSS exports depend on the 'arm64_use_ng_mappings' fix here. * for-next/fixes: arm64: cpufeature: Move arm64_use_ng_mappings to the .data section to prevent wrong idmap generation arm64: errata: Add missing sentinels to Spectre-BHB MIDR arrays
2025-05-27Merge branch 'for-next/entry' into for-next/coreWill Deacon
* for-next/entry: arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again arm64: Update comment regarding values in __boot_cpu_mode arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64: enable PREEMPT_LAZY
2025-05-27Merge branch 'for-next/efi' into for-next/coreWill Deacon
* for-next/efi: arm64/fpsimd: Avoid unnecessary per-CPU buffers for EFI runtime calls
2025-05-27Merge branch 'for-next/cpufeature' into for-next/coreWill Deacon
* for-next/cpufeature: arm64: cputype: Add cputype definition for HIP12 arm64: Expose AIDR_EL1 via sysfs arm64/cpufeature: Add missing id_aa64mmfr4 feature reg update
2025-05-27Merge branch 'for-next/acpi' into for-next/coreWill Deacon
* for-next/acpi: firmware: SDEI: Allow sdei initialization without ACPI_APEI_GHES
2025-05-27Doc: networking: Fix various typos in rds.rstAlok Tiwari
Corrected "sages" to "messages" in the bitmap allocation description. Fixed "competed" to "completed" in the recv path datagram handling section. Corrected "privatee" to "private" in the multipath RDS section. Fixed "mutlipath" to "multipath" in the transport capabilities description. These changes improve documentation clarity and maintain consistency. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20250522074413.3634446-1-alok.a.tiwari@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27net/mlx5e: Allow setting MAC address of representorsMark Bloch
A representor netdev does not correspond to real hardware that needs to be updated when setting the MAC address. The default eth_mac_addr() is sufficient for simply updating the netdev's MAC address with validation. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747898036-1121904-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27Merge branch 'octeontx2-pf-do-not-detect-macsec-block-based-on-silicon'Paolo Abeni
Subbaraya Sundeep says: ==================== octeontx2-pf: Do not detect MACSEC block based on silicon Out of various silicon variants of CN10K series some have hardware MACSEC block for offloading MACSEC operations and some do not. AF driver already has the information of whether MACSEC is present or not on running silicon. Hence fetch that information from AF via mailbox message. ==================== Link: https://patch.msgid.link/1747894516-4565-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27octeontx2-pf: macsec: Get MACSEC capability flag from AFSubbaraya Sundeep
The presence of MACSEC block is currently figured out based on the running silicon variant. This may not be correct all the times since the MACSEC block can be fused out. Hence get the macsec info from AF via mailbox. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747894548-4657-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27octeontx2-af: Add MACSEC capability flagSubbaraya Sundeep
MACSEC block may be fused out on some silicons hence modify get_hw_cap mailbox message to set a capability flag in its response message based on MACSEC block availability. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747894528-4611-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27xsk: add missing virtual address conversion for pageBui Quang Minh
In commit 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()"), when converting from netmem to page, I missed a call to page_address() around skb_frag_page(frag) to get the virtual address of the page. This commit uses skb_frag_address() helper to fix the issue. Fixes: 7ead4405e06f ("xsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()") Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250522040115.5057-1-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27af_packet: move notifier's packet_dev_mc out of rcu critical sectionStanislav Fomichev
Syzkaller reports the following issue: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578 __mutex_lock+0x106/0xe80 kernel/locking/mutex.c:746 team_change_rx_flags+0x38/0x220 drivers/net/team/team_core.c:1781 dev_change_rx_flags net/core/dev.c:9145 [inline] __dev_set_promiscuity+0x3f8/0x590 net/core/dev.c:9189 netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201 dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286 packet_dev_mc net/packet/af_packet.c:3698 [inline] packet_dev_mclist_delete net/packet/af_packet.c:3722 [inline] packet_notifier+0x292/0xa60 net/packet/af_packet.c:4247 notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85 call_netdevice_notifiers_extack net/core/dev.c:2214 [inline] call_netdevice_notifiers net/core/dev.c:2228 [inline] unregister_netdevice_many_notify+0x15d8/0x2330 net/core/dev.c:11972 rtnl_delete_link net/core/rtnetlink.c:3522 [inline] rtnl_dellink+0x488/0x710 net/core/rtnetlink.c:3564 rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6955 netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534 Calling `PACKET_ADD_MEMBERSHIP` on an ops-locked device can trigger the `NETDEV_UNREGISTER` notifier, which may require disabling promiscuous and/or allmulti mode. Both of these operations require acquiring the netdev instance lock. Move the call to `packet_dev_mc` outside of the RCU critical section. The `mclist` modifications (add, del, flush, unregister) are protected by the RTNL, not the RCU. The RCU only protects the `sklist` and its associated `sks`. The delayed operation on the `mclist` entry remains within the RTNL. Reported-by: syzbot+b191b5ccad8d7a986286@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b191b5ccad8d7a986286 Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250522031129.3247266-1-stfomichev@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27Merge branch 'vsock-sock_linger-rework'Paolo Abeni
Michal Luczaj says: ==================== vsock: SOCK_LINGER rework Change vsock's lingerning to wait on close() until all data is sent, i.e. until workers picked all the packets for processing. v5: https://lore.kernel.org/r/20250521-vsock-linger-v5-0-94827860d1d6@rbox.co v4: https://lore.kernel.org/r/20250501-vsock-linger-v4-0-beabbd8a0847@rbox.co v3: https://lore.kernel.org/r/20250430-vsock-linger-v3-0-ddbe73b53457@rbox.co v2: https://lore.kernel.org/r/20250421-vsock-linger-v2-0-fe9febd64668@rbox.co v1: https://lore.kernel.org/r/20250407-vsock-linger-v1-0-1458038e3492@rbox.co Signed-off-by: Michal Luczaj <mhal@rbox.co> ==================== Link: https://patch.msgid.link/20250522-vsock-linger-v6-0-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>