summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-09-10net: ethtool: fix wrong type used in struct kernel_ethtool_ts_infoRussell King (Oracle)
In C, enumerated types do not have a defined size, apart from being compatible with one of the standard types. This allows an ABI / compiler to choose the type of an enum depending on the values it needs to store, and storing larger values in it can lead to undefined behaviour. The tx_type and rx_filters members of struct kernel_ethtool_ts_info are defined as enumerated types, but are bit arrays, where each bit is defined by the enumerated type. This means they typically store values in excess of the maximum value of the enumerated type, in fact (1 << max_value) and thus must not be declared using the enumated type. Fix both of these to use u32, as per the corresponding __u32 UAPI type. Fixes: 2111375b85ad ("net: Add struct kernel_ethtool_ts_info") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/E1uvMEK-00000003Amd-2pWR@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-10netfilter: nf_tables: make nft_set_do_lookup available unconditionallyFlorian Westphal
This function was added for retpoline mitigation and is replaced by a static inline helper if mitigations are not enabled. Enable this helper function unconditionally so next patch can add a lookup restart mechanism to fix possible false negatives while transactions are in progress. Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would then need the same copypaste loop. This patch is separate to ease review of the actual bug fix. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-10netfilter: nf_tables: place base_seq in struct netFlorian Westphal
This will soon be read from packet path around same time as the gencursor. Both gencursor and base_seq get incremented almost at the same time, so it makes sense to place them in the same structure. This doesn't increase struct net size on 64bit due to padding. Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-10cgroup: replace global percpu_rwsem with per threadgroup resem when writing ↵Yi Tao
to cgroup.procs The static usage pattern of creating a cgroup, enabling controllers, and then seeding it with CLONE_INTO_CGROUP doesn't require write locking cgroup_threadgroup_rwsem and thus doesn't benefit from this patch. To avoid affecting other users, the per threadgroup rwsem is only used when the favordynmods is enabled. As computer hardware advances, modern systems are typically equipped with many CPU cores and large amounts of memory, enabling the deployment of numerous applications. On such systems, container creation and deletion become frequent operations, making cgroup process migration no longer a cold path. This leads to noticeable contention with common process operations such as fork, exec, and exit. To alleviate the contention between cgroup process migration and operations like process fork, this patch modifies lock to take the write lock on signal_struct->group_rwsem when writing pid to cgroup.procs/threads instead of holding a global write lock. Cgroup process migration has historically relied on signal_struct->group_rwsem to protect thread group integrity. In commit <1ed1328792ff> ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem"), this was changed to a global cgroup_threadgroup_rwsem. The advantage of using a global lock was simplified handling of process group migrations. This patch retains the use of the global lock for protecting process group migration, while reducing contention by using per thread group lock during cgroup.procs/threads writes. The locking behavior is as follows: write cgroup.procs/threads | process fork,exec,exit | process group migration ------------------------------------------------------------------------------ cgroup_lock() | down_read(&g_rwsem) | cgroup_lock() down_write(&p_rwsem) | down_read(&p_rwsem) | down_write(&g_rwsem) critical section | critical section | critical section up_write(&p_rwsem) | up_read(&p_rwsem) | up_write(&g_rwsem) cgroup_unlock() | up_read(&g_rwsem) | cgroup_unlock() g_rwsem denotes cgroup_threadgroup_rwsem, p_rwsem denotes signal_struct->group_rwsem. This patch eliminates contention between cgroup migration and fork operations for threads that belong to different thread groups, thereby reducing the long-tail latency of cgroup migrations and lowering system load. With this patch, under heavy fork and exec interference, the long-tail latency of cgroup migration has been reduced from milliseconds to microseconds. Under heavy cgroup migration interference, the multi-CPU score of the spawn test case in UnixBench increased by 9%. tj: Update comment in cgroup_favor_dynmods() and switch WARN_ONCE() to pr_warn_once(). Signed-off-by: Yi Tao <escape@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-10cgroup: refactor the cgroup_attach_lock code to make it clearerYi Tao
Dynamic cgroup migration involving threadgroup locks can be in one of two states: no lock held, or holding the global lock. Explicitly declaring the different lock modes to make the code easier to understand and facilitates future extensions of the lock modes. Signed-off-by: Yi Tao <escape@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-10PM: EM: Add function for registering a PD without capacity updateRafael J. Wysocki
The intel_pstate driver manages CPU capacity changes itself and it does not need an update of the capacity of all CPUs in the system to be carried out after registering a PD. Moreover, in some configurations (for instance, an SMT-capable hybrid x86 system booted with nosmt in the kernel command line) the em_check_capacity_update() call at the end of em_dev_register_perf_domain() always fails and reschedules itself to run once again in 1 s, so effectively it runs in vain every 1 s forever. To address this, introduce a new variant of em_dev_register_perf_domain(), called em_dev_register_pd_no_update(), that does not invoke em_check_capacity_update(), and make intel_pstate use it instead of the original. Fixes: 7b010f9b9061 ("cpufreq: intel_pstate: EAS support for hybrid platforms") Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/ Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-10KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocksOliver Upton
syzkaller has caught us red-handed once more, this time nesting regular spinlocks behind raw spinlocks: ============================= [ BUG: Invalid wait context ] 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 Not tainted ----------------------------- syz.0.29/3743 is trying to lock: a3ff80008e2e9e18 (&xa->xa_lock#20){....}-{3:3}, at: vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 other info that might help us debug this: context-{5:5} 3 locks held by syz.0.29/3743: #0: a3ff80008e2e90a8 (&kvm->slots_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x50/0x624 arch/arm64/kvm/vgic/vgic-init.c:499 #1: a3ff80008e2e9fa0 (&kvm->arch.config_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x5c/0x624 arch/arm64/kvm/vgic/vgic-init.c:500 #2: 58f0000021be1428 (&vgic_cpu->ap_list_lock){....}-{2:2}, at: vgic_flush_pending_lpis+0x3c/0x31c arch/arm64/kvm/vgic/vgic.c:150 stack backtrace: CPU: 0 UID: 0 PID: 3743 Comm: syz.0.29 Not tainted 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 PREEMPT Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C) __dump_stack+0x30/0x40 lib/dump_stack.c:94 dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120 dump_stack+0x1c/0x28 lib/dump_stack.c:129 print_lock_invalid_wait_context kernel/locking/lockdep.c:4833 [inline] check_wait_context kernel/locking/lockdep.c:4905 [inline] __lock_acquire+0x978/0x299c kernel/locking/lockdep.c:5190 lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5871 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162 vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137 vgic_flush_pending_lpis+0x24c/0x31c arch/arm64/kvm/vgic/vgic.c:158 __kvm_vgic_vcpu_destroy+0x44/0x500 arch/arm64/kvm/vgic/vgic-init.c:455 kvm_vgic_destroy+0x100/0x624 arch/arm64/kvm/vgic/vgic-init.c:505 kvm_arch_destroy_vm+0x80/0x138 arch/arm64/kvm/arm.c:244 kvm_destroy_vm virt/kvm/kvm_main.c:1308 [inline] kvm_put_kvm+0x800/0xff8 virt/kvm/kvm_main.c:1344 kvm_vm_release+0x58/0x78 virt/kvm/kvm_main.c:1367 __fput+0x4ac/0x980 fs/file_table.c:465 ____fput+0x20/0x58 fs/file_table.c:493 task_work_run+0x1bc/0x254 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] do_notify_resume+0x1b4/0x270 arch/arm64/kernel/entry-common.c:151 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xb4/0x160 arch/arm64/kernel/entry-common.c:768 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 This is of course no good, but is at odds with how LPI refcounts are managed. Solve the locking mess by deferring the release of unreferenced LPIs after the ap_list_lock is released. Mark these to-be-released LPIs specially to avoid racing with vgic_put_irq() and causing a double-free. Since references can only be taken on LPIs with a nonzero refcount, extending the lifetime of freed LPIs is still safe. Reviewed-by: Marc Zyngier <maz@kernel.org> Reported-by: syzbot+cef594105ac7e60c6d93@syzkaller.appspotmail.com Closes: https://lore.kernel.org/kvmarm/68acd0d9.a00a0220.33401d.048b.GAE@google.com/ Link: https://lore.kernel.org/r/20250905100531.282980-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-09-10KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIsOliver Upton
KVM's use of krefs to manage LPIs isn't adding much, especially considering vgic_irq_release() is a noop due to the lack of sufficient context. Switch to using a regular refcount in anticipation of adding a meaningful release concept for LPIs. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250905100531.282980-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-09-10KVM: arm64: vgic: Drop stale comment on IRQ active stateOliver Upton
While LPIs lack an active state, KVM unconditionally folds the active state from the LR into the vgic_irq struct meaning this field cannot be 'creatively' reused for something else. Drop the misleading comment to reflect this. Link: https://lore.kernel.org/r/20250905100531.282980-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-09-08compiler-clang.h: define __SANITIZE_*__ macros only when undefinedNathan Chancellor
Clang 22 recently added support for defining __SANITIZE__ macros similar to GCC [1], which causes warnings (or errors with CONFIG_WERROR=y or W=e) with the existing defines that the kernel creates to emulate this behavior with existing clang versions. In file included from <built-in>:3: In file included from include/linux/compiler_types.h:171: include/linux/compiler-clang.h:37:9: error: '__SANITIZE_THREAD__' macro redefined [-Werror,-Wmacro-redefined] 37 | #define __SANITIZE_THREAD__ | ^ <built-in>:352:9: note: previous definition is here 352 | #define __SANITIZE_THREAD__ 1 | ^ Refactor compiler-clang.h to only define the sanitizer macros when they are undefined and adjust the rest of the code to use these macros for checking if the sanitizers are enabled, clearing up the warnings and allowing the kernel to easily drop these defines when the minimum supported version of LLVM for building the kernel becomes 22.0.0 or newer. Link: https://lkml.kernel.org/r/20250902-clang-update-sanitize-defines-v1-1-cf3702ca3d92@kernel.org Link: https://github.com/llvm/llvm-project/commit/568c23bbd3303518c5056d7f03444dae4fdc8a9c [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Bill Wendling <morbo@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-08mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()Uladzislau Rezki (Sony)
kasan_populate_vmalloc() and its helpers ignore the caller's gfp_mask and always allocate memory using the hardcoded GFP_KERNEL flag. This makes them inconsistent with vmalloc(), which was recently extended to support GFP_NOFS and GFP_NOIO allocations. Page table allocations performed during shadow population also ignore the external gfp_mask. To preserve the intended semantics of GFP_NOFS and GFP_NOIO, wrap the apply_to_page_range() calls into the appropriate memalloc scope. xfs calls vmalloc with GFP_NOFS, so this bug could lead to deadlock. There was a report here https://lkml.kernel.org/r/686ea951.050a0220.385921.0016.GAE@google.com This patch: - Extends kasan_populate_vmalloc() and helpers to take gfp_mask; - Passes gfp_mask down to alloc_pages_bulk() and __get_free_page(); - Enforces GFP_NOFS/NOIO semantics with memalloc_*_save()/restore() around apply_to_page_range(); - Updates vmalloc.c and percpu allocator call sites accordingly. Link: https://lkml.kernel.org/r/20250831121058.92971-1-urezki@gmail.com Fixes: 451769ebb7e7 ("mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reported-by: syzbot+3470c9ffee63e4abafeb@syzkaller.appspotmail.com Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-08bitops: Add __attribute_const__ to generic ffs()-family implementationsKees Cook
While tracking down a problem where constant expressions used by BUILD_BUG_ON() suddenly stopped working[1], we found that an added static initializer was convincing the compiler that it couldn't track the state of the prior statically initialized value. Tracing this down found that ffs() was used in the initializer macro, but since it wasn't marked with __attribute__const__, the compiler had to assume the function might change variable states as a side-effect (which is not true for ffs(), which provides deterministic math results). Add missing __attribute_const__ annotations to generic implementations of ffs(), __ffs(), fls(), and __fls() functions. These are pure mathematical functions that always return the same result for the same input with no side effects, making them eligible for compiler optimization. Build tested with x86_64 defconfig using GCC 14.2.0, which should validate the implementations when used by ARM, ARM64, LoongArch, Microblaze, NIOS2, and SPARC32 architectures. Link: https://github.com/KSPP/linux/issues/364 [1] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250804164417.1612371-2-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-08xen: replace XENFEAT_auto_translated_physmap with xen_pv_domain()Juergen Gross
Instead of testing the XENFEAT_auto_translated_physmap feature, just use !xen_pv_domain() which is equivalent. This has the advantage that a kernel not built with CONFIG_XEN_PV will be smaller due to dead code elimination. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250826145608.10352-3-jgross@suse.com>
2025-09-08xen: rework xen_pv_domain()Juergen Gross
Rework xen_pv_domain() to no longer use the xen_domain_type variable, but the artificial X86_FEATURE_XENPV cpu feature. On non-x86 architectures xen_pv_domain() can be defined as "0". This has the advantage that a kernel not built with CONFIG_XEN_PV will be smaller due to dead code elimination. Set the X86_FEATURE_XENPV feature very early, as xen_pv_domain() is used rather early, too. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20250826145608.10352-2-jgross@suse.com>
2025-09-08Merge tag 'vfs-6.17-rc6.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "fuse: - Prevent opening of non-regular backing files. Fuse doesn't support non-regular files anyway. - Check whether copy_file_range() returns a larger size than requested. - Prevent overflow in copy_file_range() as fuse currently only supports 32-bit sized copies. - Cache the blocksize value if the server returned a new value as inode->i_blkbits isn't modified directly anymore. - Fix i_blkbits handling for iomap partial writes. By default i_blkbits is set to PAGE_SIZE which causes iomap to mark the whole folio as uptodate even on a partial write. But fuseblk filesystems support choosing a blocksize smaller than PAGE_SIZE risking data corruption. Simply enforce PAGE_SIZE as blocksize for fuseblk's internal inode for now. - Prevent out-of-bounds acces in fuse_dev_write() when the number of bytes to be retrieved is truncated to the fc->max_pages limit. virtiofs: - Fix page faults for DAX page addresses. Misc: - Tighten file handle decoding from userns. Check that the decoded dentry itself has a valid idmapping in the user namespace. - Fix mount-notify selftests. - Fix some indentation errors. - Add an FMODE_ flag to indicate IOCB_HAS_METADATA availability. This will be moved to an FOP_* flag with a bit more rework needed for that to happen not suitable for a fix. - Don't silently ignore metadata for sync read/write. - Don't pointlessly log warning when reading coredump sysctls" * tag 'vfs-6.17-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: virtio_fs: fix page fault for DAX page address selftests/fs/mount-notify: Fix compilation failure. fhandle: use more consistent rules for decoding file handle from userns fuse: Block access to folio overlimit fuse: fix fuseblk i_blkbits for iomap partial writes fuse: reflect cached blocksize if blocksize was changed fuse: prevent overflow in copy_file_range return value fuse: check if copy_file_range() returns larger than requested size fuse: do not allow mapping a non-regular backing file coredump: don't pointlessly check and spew warnings fs: fix indentation style block: don't silently ignore metadata for sync read/write fs: add a FMODE_ flag to indicate IOCB_HAS_METADATA availability Please enter a commit message to explain why this merge is necessary, especially if it merges an updated upstream into a topic branch.
2025-09-07Merge tag 'timers-urgent-2025-09-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a severe slowdown regression in the timer vDSO code related to the while() loop in __iter_div_u64_rem(), when the AUX-clock is enabled" * tag 'timers-urgent-2025-09-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vdso/vsyscall: Avoid slow division loop in auxiliary clock update
2025-09-06lib/crypto: curve25519: Consolidate into single moduleEric Biggers
Reorganize the Curve25519 library code: - Build a single libcurve25519 module, instead of up to three modules: libcurve25519, libcurve25519-generic, and an arch-specific module. - Move the arch-specific Curve25519 code from arch/$(SRCARCH)/crypto/ to lib/crypto/$(SRCARCH)/. Centralize the build rules into lib/crypto/Makefile and lib/crypto/Kconfig. - Include the arch-specific code directly in lib/crypto/curve25519.c via a header, rather than using a separate .c file. - Eliminate the entanglement with CRYPTO. CRYPTO_LIB_CURVE25519 no longer selects CRYPTO, and the arch-specific Curve25519 code no longer depends on CRYPTO. This brings Curve25519 in line with the latest conventions for lib/crypto/, used by other algorithms. The exception is that I kept the generic code in separate translation units for now. (Some of the function names collide between the x86 and generic Curve25519 code. And the Curve25519 functions are very long anyway, so inlining doesn't matter as much for Curve25519 as it does for some other algorithms.) Link: https://lore.kernel.org/r/20250906213523.84915-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06lib/crypto: curve25519: Move a couple functions out-of-lineEric Biggers
Move curve25519() and curve25519_generate_public() from curve25519.h to curve25519.c. There's no good reason for them to be inline. Link: https://lore.kernel.org/r/20250906213523.84915-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06lib/crypto: tests: Migrate Curve25519 self-test to KUnitEric Biggers
Move the Curve25519 test from an ad-hoc self-test to a KUnit test. Generally keep the same test logic for now, just translated to KUnit. There's one exception, which is that I dropped the incomplete test of curve25519_generic(). The approach I'm taking to cover the different implementations with the KUnit tests is to just rely on booting kernels in QEMU with different '-cpu' options, rather than try to make the tests (incompletely) test multiple implementations on one CPU. This way, both the test and the library API are simpler. This commit makes the file lib/crypto/curve25519.c no longer needed, as its only purpose was to call the self-test. However, keep it for now, since a later commit will add code to it again. Temporarily omit the default value of CRYPTO_SELFTESTS that the other lib/crypto/ KUnit tests have. It would cause a recursive kconfig dependency, since the Curve25519 code is still entangled with CRYPTO. A later commit will fix that. Link: https://lore.kernel.org/r/20250906213523.84915-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-05workqueue: replace use of system_wq with system_percpu_wqMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_wq is a per-CPU worqueue, yet nothing in its name tells about that CPU affinity constraint, which is very often not required by users. Make it clear by adding a system_percpu_wq. queue_work() / queue_delayed_work() mod_delayed_work() will now use the new per-cpu wq: whether the user still stick on the old name a warn will be printed along a wq redirect to the new one. This patch add the new system_percpu_wq except for mm, fs and net subsystem, whom are handled in separated patches. The old wq will be kept for a few release cylces. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-05workqueue: replace use of system_unbound_wq with system_dfl_wqMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. queue_work() / queue_delayed_work() / mod_delayed_work() will now use the new unbound wq: whether the user still use the old wq a warn will be printed along with a wq redirect to the new one. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-05Merge tag 'pcmcia-6.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull PCMCIA fixes and cleanups from Dominik Brodowski: "A number of minor PCMCIA bugfixes and cleanups, including the removal of unused code paths" [ Dominik suggested this might be 6.18 material, but having looked through this, it looks appropriate early: minor trivial fixes and then one slightly bigger patch that removes dead code - Linus ] * tag 'pcmcia-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: Add error handling for add_interval() in do_validate_mem() pcmcia: cs: Remove unused pcmcia_get_socket_by_nr pcmcia: omap: Add missing check for platform_get_resource pcmcia: Use str_off_on() and str_yes_no() helpers pcmcia: remove PCCARD_IODYN pcmcia: ds: Emphasize "really" epizeuxis pcmcia: Fix a NULL pointer dereference in __iodyn_find_io_region() pcmcia: omap_cf: Mark driver struct with __refdata to prevent section mismatch
2025-09-05filelock: add FL_RECLAIM to show_fl_flags() macroJeff Layton
Show the FL_RECLAIM flag symbolically in tracepoints. Fixes: bb0a55bb7148 ("nfs: don't allow reexport reclaims") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/20250903-filelock-v1-1-f2926902962d@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-04More minor SDCA bug fixesMark Brown
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>: Just some minor SDCA bug fixes and some minor structure reordering to improve the padding.
2025-09-04cgroup: Remove unused cgroup_subsys::post_attachChuyi Zhou
cgroup_subsys::post_attach callback was introduced in commit 5cf1cacb49ae ("cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback") and only cpuset would use this callback to wait for the mm migration to complete at the end of __cgroup_procs_write(). Since the previous patch defer the flush operation until returning to userspace, no one use this callback now. Remove this callback from cgroup_subsys. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-04Merge tag 'net-6.17-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wireless and Bluetooth. We're reverting the removal of a Sundance driver, a user has appeared. This makes the PR rather large in terms of LoC. There's a conspicuous absence of real, user-reported 6.17 issues. Slightly worried that the summer distracted people from testing. Previous releases - regressions: - ax25: properly unshare skbs in ax25_kiss_rcv() Previous releases - always broken: - phylink: disable autoneg for interfaces that have no inband, fix regression on pcs-lynx (NXP LS1088) - vxlan: fix null-deref when using nexthop objects - batman-adv: fix OOB read/write in network-coding decode - icmp: icmp_ndo_send: fix reversing address translation for replies - tcp: fix socket ref leak in TCP-AO failure handling for IPv6 - mctp: - mctp_fraq_queue should take ownership of passed skb - usb: initialise mac header in RX path, avoid WARN - wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz, respect device limitations - wifi: wilc1000: avoid buffer overflow in WID string configuration - wifi: mt76: - fix regressions from mt7996 MLO support rework - fix offchannel handling issues on mt7996 - fix multiple wcid linked list corruption issues - mt7921: don't disconnect when AP requests switch to a channel which requires radar detection - mt7925u: use connac3 tx aggr check in tx complete - wifi: intel: - improve validation of ACPI DSM data - cfg: restore some 1000 series configs - wifi: ath: - ath11k: a fix for GTK rekeying - ath12k: a missed WiFi7 capability (multi-link EMLSR) - eth: intel: - ice: fix races in "low latency" firmware interface for Tx timestamps - idpf: set mac type when adding and removing MAC filters - i40e: remove racy read access to some debugfs files Misc: - Revert "eth: remove the DLink/Sundance (ST201) driver" - netfilter: conntrack: helper: Replace -EEXIST by -EBUSY, avoid confusing modprobe" * tag 'net-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) phy: mscc: Stop taking ts_lock for tx_queue and use its own lock selftest: net: Fix weird setsockopt() in bind_bhash.c. MAINTAINERS: add Sabrina to TLS maintainers gve: update MAINTAINERS ppp: fix memory leak in pad_compress_skb net: xilinx: axienet: Add error handling for RX metadata pointer retrieval net: atm: fix memory leak in atm_register_sysfs when device_register fail netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX selftests: netfilter: fix udpclash tool hang ax25: properly unshare skbs in ax25_kiss_rcv() mctp: return -ENOPROTOOPT for unknown getsockopt options net/smc: Remove validation of reserved bits in CLC Decline message ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init() net: thunder_bgx: decrement cleanup index before use net: thunder_bgx: add a missing of_node_put net: phylink: move PHY interrupt request to non-fail path net: lockless sock_i_ino() tools: ynl-gen: fix nested array counting wifi: wilc1000: avoid buffer overflow in WID string configuration wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result() ...
2025-09-04ASoC: SDCA: Reorder members of hide struct to remove holesCharles Keepax
Remove some padding holes in the sdca_entity_hide struct by reordering the members. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Message-ID: <20250820163717.1095846-4-ckeepax@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2025-09-04netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIXPhil Sutter
This new attribute is supposed to be used instead of NFTA_DEVICE_NAME for simple wildcard interface specs. It holds a NUL-terminated string representing an interface name prefix to match on. While kernel code to distinguish full names from prefixes in NFTA_DEVICE_NAME is simpler than this solution, reusing the existing attribute with different semantics leads to confusion between different versions of kernel and user space though: * With old kernels, wildcards submitted by user space are accepted yet silently treated as regular names. * With old user space, wildcards submitted by kernel may cause crashes since libnftnl expects NUL-termination when there is none. Using a distinct attribute type sanitizes these situations as the receiving part detects and rejects the unexpected attribute nested in *_HOOK_DEVS attributes. Fixes: 6d07a289504a ("netfilter: nf_tables: Support wildcard netdev hook specs") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-03binfmt_elf: preserve original ELF e_flags for core dumpsSvetlana Parfenova
Some architectures, such as RISC-V, use the ELF e_flags field to encode ABI-specific information (e.g., ISA extensions, fpu support). Debuggers like GDB rely on these flags in core dumps to correctly interpret optional register sets. If the flags are missing or incorrect, GDB may warn and ignore valid data, for example: warning: Unexpected size of section '.reg2/213' in core file. This can prevent access to fpu or other architecture-specific registers even when they were dumped. Save the e_flags field during ELF binary loading (in load_elf_binary()) into the mm_struct, and later retrieve it during core dump generation (in fill_note_info()). Kconfig option CONFIG_ARCH_HAS_ELF_CORE_EFLAGS is introduced for architectures that require this behaviour. Signed-off-by: Svetlana Parfenova <svetlana.parfenova@syntacore.com> Link: https://lore.kernel.org/r/20250901135350.619485-1-svetlana.parfenova@syntacore.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-03net: lockless sock_i_ino()Eric Dumazet
Followup of commit c51da3f7a161 ("net: remove sock_i_uid()") A recent syzbot report was the trigger for this change. Over the years, we had many problems caused by the read_lock[_bh](&sk->sk_callback_lock) in sock_i_uid(). We could fix smc_diag_dump_proto() or make a more radical move: Instead of waiting for new syzbot reports, cache the socket inode number in sk->sk_ino, so that we no longer need to acquire sk->sk_callback_lock in sock_i_ino(). This makes socket dumps faster (one less cache line miss, and two atomic ops avoided). Prior art: commit 25a9c8a4431c ("netlink: Add __sock_i_ino() for __netlink_diag_dump().") commit 4f9bf2a2f5aa ("tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.") commit efc3dbc37412 ("rds: Make rds_sock_lock BH rather than IRQ safe.") Fixes: d2d6422f8bd1 ("x86: Allow to enable PREEMPT_RT.") Reported-by: syzbot+50603c05bbdf4dfdaffa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68b73804.050a0220.3db4df.01d8.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250902183603.740428-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-03vdso/vsyscall: Avoid slow division loop in auxiliary clock updateThomas Weißschuh
The call to __iter_div_u64_rem() in vdso_time_update_aux() is a wrapper around subtraction. It cannot be used to divide large numbers, as that introduces long, computationally expensive delays. A regular u64 division is also not possible in the timekeeper update path as it can be too slow. Instead of splitting the ktime_t offset into into second and subsecond components during the timekeeper update fast-path, do it together with the adjustment of tk->offs_aux in the slow-path. Equivalent to the handling of offs_boot and monotonic_to_boot. Reuse the storage of monotonic_to_boot for the new field, as it is not used by auxiliary timekeepers. Fixes: 380b84e168e5 ("vdso/vsyscall: Update auxiliary clock data in the datapage") Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250825-vdso-auxclock-division-v1-1-a1d32a16a313@linutronix.de Closes: https://lore.kernel.org/lkml/aKwsNNWsHJg8IKzj@localhost/
2025-09-03sched/fair: Add related data structure for task based throttleValentin Schneider
Add related data structures for this new throttle functionality. Tesed-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Aaron Lu <ziqianlu@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Tested-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Matteo Martelli <matteo.martelli@codethink.co.uk> Link: https://lore.kernel.org/r/20250829081120.806-2-ziqianlu@bytedance.com
2025-09-03sched: Move STDL_INIT() functions out-of-linePeter Zijlstra
Since all these functions are address-taken in SDTL_INIT() and called indirectly, it doesn't really make sense for them to be inline. Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-09-03sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()Peter Zijlstra
Leon [1] and Vinicius [2] noted a topology_span_sane() warning during their testing starting from v6.16-rc1. Debug that followed pointed to the tl->mask() for the NODE domain being incorrectly resolved to that of the highest NUMA domain. tl->mask() for NODE is set to the sd_numa_mask() which depends on the global "sched_domains_curr_level" hack. "sched_domains_curr_level" is set to the "tl->numa_level" during tl traversal in build_sched_domains() calling sd_init() but was not reset before topology_span_sane(). Since "tl->numa_level" still reflected the old value from build_sched_domains(), topology_span_sane() for the NODE domain trips when the span of the last NUMA domain overlaps. Instead of replicating the "sched_domains_curr_level" hack, get rid of it entirely and instead, pass the entire "sched_domain_topology_level" object to tl->cpumask() function to prevent such mishap in the future. sd_numa_mask() now directly references "tl->numa_level" instead of relying on the global "sched_domains_curr_level" hack to index into sched_domains_numa_masks[]. The original warning was reproducible on the following NUMA topology reported by Leon: $ sudo numactl -H available: 5 nodes (0-4) node 0 cpus: 0 1 node 0 size: 2927 MB node 0 free: 1603 MB node 1 cpus: 2 3 node 1 size: 3023 MB node 1 free: 3008 MB node 2 cpus: 4 5 node 2 size: 3023 MB node 2 free: 3007 MB node 3 cpus: 6 7 node 3 size: 3023 MB node 3 free: 3002 MB node 4 cpus: 8 9 node 4 size: 3022 MB node 4 free: 2718 MB node distances: node 0 1 2 3 4 0: 10 39 38 37 36 1: 39 10 38 37 36 2: 38 38 10 37 36 3: 37 37 37 10 36 4: 36 36 36 36 10 The above topology can be mimicked using the following QEMU cmd that was used to reproduce the warning and test the fix: sudo qemu-system-x86_64 -enable-kvm -cpu host \ -m 20G -smp cpus=10,sockets=10 -machine q35 \ -object memory-backend-ram,size=4G,id=m0 \ -object memory-backend-ram,size=4G,id=m1 \ -object memory-backend-ram,size=4G,id=m2 \ -object memory-backend-ram,size=4G,id=m3 \ -object memory-backend-ram,size=4G,id=m4 \ -numa node,cpus=0-1,memdev=m0,nodeid=0 \ -numa node,cpus=2-3,memdev=m1,nodeid=1 \ -numa node,cpus=4-5,memdev=m2,nodeid=2 \ -numa node,cpus=6-7,memdev=m3,nodeid=3 \ -numa node,cpus=8-9,memdev=m4,nodeid=4 \ -numa dist,src=0,dst=1,val=39 \ -numa dist,src=0,dst=2,val=38 \ -numa dist,src=0,dst=3,val=37 \ -numa dist,src=0,dst=4,val=36 \ -numa dist,src=1,dst=0,val=39 \ -numa dist,src=1,dst=2,val=38 \ -numa dist,src=1,dst=3,val=37 \ -numa dist,src=1,dst=4,val=36 \ -numa dist,src=2,dst=0,val=38 \ -numa dist,src=2,dst=1,val=38 \ -numa dist,src=2,dst=3,val=37 \ -numa dist,src=2,dst=4,val=36 \ -numa dist,src=3,dst=0,val=37 \ -numa dist,src=3,dst=1,val=37 \ -numa dist,src=3,dst=2,val=37 \ -numa dist,src=3,dst=4,val=36 \ -numa dist,src=4,dst=0,val=36 \ -numa dist,src=4,dst=1,val=36 \ -numa dist,src=4,dst=2,val=36 \ -numa dist,src=4,dst=3,val=36 \ ... [ prateek: Moved common functions to include/linux/sched/topology.h, reuse the common bits for s390 and ppc, commit message ] Closes: https://lore.kernel.org/lkml/20250610110701.GA256154@unreal/ [1] Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap") # ce29a7da84cd, f55dac1dafb3 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Tested-by: Valentin Schneider <vschneid@redhat.com> # x86 Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> # powerpc Link: https://lore.kernel.org/lkml/a3de98387abad28592e6ab591f3ff6107fe01dc1.1755893468.git.tim.c.chen@linux.intel.com/ [2]
2025-09-02net: phy: add phy_interface_weight()Russell King (Oracle)
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uslwn-00000001SOx-0a7H@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02Merge tag 'mm-hotfixes-stable-2025-09-01-17-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. 13 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 11 of these fixes are for MM. This includes a three-patch series from Harry Yoo which fixes an intermittent boot failure which can occur on x86 systems. And a two-patch series from Alexander Gordeev which fixes a KASAN crash on S390 systems" * tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: fix possible deadlock in kmemleak x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() mm: introduce and use {pgd,p4d}_populate_kernel() mm: move page table sync declarations to linux/pgtable.h proc: fix missing pde_set_flags() for net proc files mm: fix accounting of memmap pages mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota() kexec: add KEXEC_FILE_NO_CMA as a legal flag kasan: fix GCC mem-intrinsic prefix with sw tags mm/kasan: avoid lazy MMU mode hazards mm/kasan: fix vmalloc shadow memory (de-)population races kunit: kasan_test: disable fortify string checker on kasan_strings() test selftests/mm: fix FORCE_READ to read input value correctly mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE ocfs2: prevent release journal inode after journal shutdown rust: mm: mark VmaNew as transparent of_numa: fix uninitialized memory nodes causing kernel panic
2025-09-02Minor bug fixes for some older Wolfson devicesMark Brown
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>: Minor bug fixes for a couple of older devices reported by some users. Mostly this centers around the automatic PLL configuration getting the wrong values due to rounding.
2025-09-02ASoC: SDCA: Add quirk for incorrect function types for 3 systemsMaciej Strozek
Certain systems have CS42L43 DisCo that claims to conform to version 0.6.28 but uses the function types from the 1.0 spec. Add a quirk as a workaround. Closes: https://github.com/thesofproject/linux/issues/5515 Cc: stable@vger.kernel.org Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev> Link: https://patch.msgid.link/20250901151518.3197941-1-mstrozek@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-09-02pidns: move is-ancestor logic to helperAleksa Sarai
This check will be needed in later patches, and there's no point open-coding it each time. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/20250805-procfs-pidns-api-v4-1-705f984940e7@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01copy_process: pass clone_flags as u64 across calltreeSimon Schuster
With the introduction of clone3 in commit 7f192e3cd316 ("fork: add clone3") the effective bit width of clone_flags on all architectures was increased from 32-bit to 64-bit, with a new type of u64 for the flags. However, for most consumers of clone_flags the interface was not changed from the previous type of unsigned long. While this works fine as long as none of the new 64-bit flag bits (CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still undesirable in terms of the principle of least surprise. Thus, this commit fixes all relevant interfaces of callees to sys_clone3/copy_process (excluding the architecture-specific copy_thread) to consistently pass clone_flags as u64, so that no truncation to 32-bit integers occurs on 32-bit architectures. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01fs: remove vfs_ioctl exportGreg Kroah-Hartman
vfs_ioctl() is no longer called by anything outside of fs/ioctl.c, so remove the global symbol and export as it is not needed. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/2025083038-carving-amuck-a4ae@gregkh Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01Merge tag 'fuse-fixes-6.17-rc5' of ↵Christian Brauner
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into vfs.fixes fuse fixes for 6.17-rc5 * tag 'fuse-fixes-6.17-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (6 commits) fuse: Block access to folio overlimit fuse: fix fuseblk i_blkbits for iomap partial writes fuse: reflect cached blocksize if blocksize was changed fuse: prevent overflow in copy_file_range return value fuse: check if copy_file_range() returns larger than requested size fuse: do not allow mapping a non-regular backing file Link: https://lore.kernel.org/CAJfpeguEVMMyw_zCb+hbOuSxdE2Z3Raw=SJsq=Y56Ae6dn2W3g@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-01fs: add an icount_read helperJosef Bacik
Instead of doing direct access to ->i_count, add a helper to handle this. This will make it easier to convert i_count to a refcount later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/9bc62a84c6b9d6337781203f60837bd98fbc4a96.1756222464.git.josef@toxicpanda.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-30audit: add record for multiple object contextsCasey Schaufler
Create a new audit record AUDIT_MAC_OBJ_CONTEXTS. An example of the MAC_OBJ_CONTEXTS record is: type=MAC_OBJ_CONTEXTS msg=audit(1601152467.009:1050): obj_selinux=unconfined_u:object_r:user_home_t:s0 When an audit event includes a AUDIT_MAC_OBJ_CONTEXTS record the "obj=" field in other records in the event will be "obj=?". An AUDIT_MAC_OBJ_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on an object security context. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-30audit: add record for multiple task security contextsCasey Schaufler
Replace the single skb pointer in an audit_buffer with a list of skb pointers. Add the audit_stamp information to the audit_buffer as there's no guarantee that there will be an audit_context containing the stamp associated with the event. At audit_log_end() time create auxiliary records as have been added to the list. Functions are created to manage the skb list in the audit_buffer. Create a new audit record AUDIT_MAC_TASK_CONTEXTS. An example of the MAC_TASK_CONTEXTS record is: type=MAC_TASK_CONTEXTS msg=audit(1600880931.832:113) subj_apparmor=unconfined subj_smack=_ When an audit event includes a AUDIT_MAC_TASK_CONTEXTS record the "subj=" field in other records in the event will be "subj=?". An AUDIT_MAC_TASK_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on a subject security context. Refactor audit_log_task_context(), creating a new audit_log_subj_ctx(). This is used in netlabel auditing to provide multiple subject security contexts as necessary. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-30lsm: security_lsmblob_to_secctx module selectionCasey Schaufler
Add a parameter lsmid to security_lsmblob_to_secctx() to identify which of the security modules that may be active should provide the security context. If the value of lsmid is LSM_ID_UNDEF the first LSM providing a hook is used. security_secid_to_secctx() is unchanged, and will always report the first LSM providing a hook. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-29lib/crypto: blake2s: Consolidate into single C translation unitEric Biggers
As was done with the other algorithms, reorganize the BLAKE2s code so that the generic implementation and the arch-specific "glue" code is consolidated into a single translation unit, so that the compiler will inline the functions and automatically decide whether to include the generic code in the resulting binary or not. Similarly, also consolidate the build rules into lib/crypto/{Makefile,Kconfig}. This removes the last uses of lib/crypto/{arm,x86}/{Makefile,Kconfig}, so remove those too. Don't keep the !KMSAN dependency. It was needed only for other algorithms such as ChaCha that initialize memory from assembly code. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: blake2s: Remove obsolete self-testEric Biggers
Remove the original BLAKE2s self-test, since it will be superseded by blake2s_kunit. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: chacha: Consolidate into single moduleEric Biggers
Consolidate the ChaCha code into a single module (excluding chacha-block-generic.c which remains always built-in for random.c), similar to various other algorithms: - Each arch now provides a header file lib/crypto/$(SRCARCH)/chacha.h, replacing lib/crypto/$(SRCARCH)/chacha*.c. The header defines chacha_crypt_arch() and hchacha_block_arch(). It is included by lib/crypto/chacha.c, and thus the code gets built into the single libchacha module, with improved inlining in some cases. - Whether arch-optimized ChaCha is buildable is now controlled centrally by lib/crypto/Kconfig instead of by lib/crypto/$(SRCARCH)/Kconfig. The conditions for enabling it remain the same as before, and it remains enabled by default. - Any additional arch-specific translation units for the optimized ChaCha code, such as assembly files, are now compiled by lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile. This removes the last use for the Makefile and Kconfig files in the arm64, mips, powerpc, riscv, and s390 subdirectories of lib/crypto/. So also remove those files and the references to them. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: chacha: Remove unused function chacha_is_arch_optimized()Eric Biggers
chacha_is_arch_optimized() is no longer used, so remove it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>