summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2025-08-24Merge tag 'perf_urgent_for_v6.17_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Fix a case where the events throttling logic operates on inactive events * tag 'perf_urgent_for_v6.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Avoid undefined behavior from stopping/starting inactive events
2025-08-24Merge tag 'modules-6.17-rc3.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules fix from Daniel Gomez: "This includes a fix part of the KSPP (Kernel Self Protection Project) to replace the deprecated and unsafe strcpy() calls in the kernel parameter string handler and sysfs parameters for built-in modules. Single commit, no functional changes" * tag 'modules-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: params: Replace deprecated strcpy() with strscpy() and memcpy()
2025-08-23Merge tag 'trace-v6.17-rc2-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix rtla and latency tooling pkg-config errors If libtraceevent and libtracefs is installed, but their corresponding '.pc' files are not installed, it reports that the libraries are missing and confuses the developer. Instead, report that the pkg-config files are missing and should be installed. - Fix overflow bug of the parser in trace_get_user() trace_get_user() uses the parsing functions to parse the user space strings. If the parser fails due to incorrect processing, it doesn't terminate the buffer with a nul byte. Add a "failed" flag to the parser that gets set when parsing fails and is used to know if the buffer is fine to use or not. - Remove a semicolon that was at an end of a comment line - Fix register_ftrace_graph() to unregister the pm notifier on error The register_ftrace_graph() registers a pm notifier but there's an error path that can exit the function without unregistering it. Since the function returns an error, it will never be unregistered. - Allocate and copy ftrace hash for reader of ftrace filter files When the set_ftrace_filter or set_ftrace_notrace files are open for read, an iterator is created and sets its hash pointer to the associated hash that represents filtering or notrace filtering to it. The issue is that the hash it points to can change while the iteration is happening. All the locking used to access the tracer's hashes are released which means those hashes can change or even be freed. Using the hash pointed to by the iterator can cause UAF bugs or similar. Have the read of these files allocate and copy the corresponding hashes and use that as that will keep them the same while the iterator is open. This also simplifies the code as opening it for write already does an allocate and copy, and now that the read is doing the same, there's no need to check which way it was opened on the release of the file, and the iterator hash can always be freed. - Fix function graph to copy args into temp storage The output of the function graph tracer shows both the entry and the exit of a function. When the exit is right after the entry, it combines the two events into one with the output of "function();", instead of showing: function() { } In order to do this, the iterator descriptor that reads the events includes storage that saves the entry event while it peaks at the next event in the ring buffer. The peek can free the entry event so the iterator must store the information to use it after the peek. With the addition of function graph tracer recording the args, where the args are a dynamic array in the entry event, the temp storage does not save them. This causes the args to be corrupted or even cause a read of unsafe memory. Add space to save the args in the temp storage of the iterator. - Fix race between ftrace_dump and reading trace_pipe ftrace_dump() is used when a crash occurs where the ftrace buffer will be printed to the console. But it can also be triggered by sysrq-z. If a sysrq-z is triggered while a task is reading trace_pipe it can cause a race in the ftrace_dump() where it checks if the buffer has content, then it checks if the next event is available, and then prints the output (regardless if the next event was available or not). Reading trace_pipe at the same time can cause it to not be available, and this triggers a WARN_ON in the print. Move the printing into the check if the next event exists or not * tag 'trace-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Also allocate and copy hash for reading of filter files ftrace: Fix potential warning in trace_printk_seq during ftrace_dump fgraph: Copy args in intermediate storage with entry trace/fgraph: Fix the warning caused by missing unregister notifier ring-buffer: Remove redundant semicolons tracing: Limit access to parser->buffer when trace_get_user failed rtla: Check pkg-config install tools/latency-collector: Check pkg-config install
2025-08-22ftrace: Also allocate and copy hash for reading of filter filesSteven Rostedt
Currently the reader of set_ftrace_filter and set_ftrace_notrace just adds the pointer to the global tracer hash to its iterator. Unlike the writer that allocates a copy of the hash, the reader keeps the pointer to the filter hashes. This is problematic because this pointer is static across function calls that release the locks that can update the global tracer hashes. This can cause UAF and similar bugs. Allocate and copy the hash for reading the filter files like it is done for the writers. This not only fixes UAF bugs, but also makes the code a bit simpler as it doesn't have to differentiate when to free the iterator's hash between writers and readers. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250822183606.12962cc3@batman.local.home Fixes: c20489dad156 ("ftrace: Assign iter->hash to filter or notrace hashes on seq read") Closes: https://lore.kernel.org/all/20250813023044.2121943-1-wutengda@huaweicloud.com/ Closes: https://lore.kernel.org/all/20250822192437.GA458494@ax162/ Reported-by: Tengda Wu <wutengda@huaweicloud.com> Tested-by: Tengda Wu <wutengda@huaweicloud.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-22ftrace: Fix potential warning in trace_printk_seq during ftrace_dumpTengda Wu
When calling ftrace_dump_one() concurrently with reading trace_pipe, a WARN_ON_ONCE() in trace_printk_seq() can be triggered due to a race condition. The issue occurs because: CPU0 (ftrace_dump) CPU1 (reader) echo z > /proc/sysrq-trigger !trace_empty(&iter) trace_iterator_reset(&iter) <- len = size = 0 cat /sys/kernel/tracing/trace_pipe trace_find_next_entry_inc(&iter) __find_next_entry ring_buffer_empty_cpu <- all empty return NULL trace_printk_seq(&iter.seq) WARN_ON_ONCE(s->seq.len >= s->seq.size) In the context between trace_empty() and trace_find_next_entry_inc() during ftrace_dump, the ring buffer data was consumed by other readers. This caused trace_find_next_entry_inc to return NULL, failing to populate `iter.seq`. At this point, due to the prior trace_iterator_reset, both `iter.seq.len` and `iter.seq.size` were set to 0. Since they are equal, the WARN_ON_ONCE condition is triggered. Move the trace_printk_seq() into the if block that checks to make sure the return value of trace_find_next_entry_inc() is non-NULL in ftrace_dump_one(), ensuring the 'iter.seq' is properly populated before subsequent operations. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ingo Molnar <mingo@elte.hu> Link: https://lore.kernel.org/20250822033343.3000289-1-wutengda@huaweicloud.com Fixes: d769041f8653 ("ring_buffer: implement new locking") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-22fgraph: Copy args in intermediate storage with entrySteven Rostedt
The output of the function graph tracer has two ways to display its entries. One way for leaf functions with no events recorded within them, and the other is for functions with events recorded inside it. As function graph has an entry and exit event, to simplify the output of leaf functions it combines the two, where as non leaf functions are separate: 2) | invoke_rcu_core() { 2) | raise_softirq() { 2) 0.391 us | __raise_softirq_irqoff(); 2) 1.191 us | } 2) 2.086 us | } The __raise_softirq_irqoff() function above is really two events that were merged into one. Otherwise it would have looked like: 2) | invoke_rcu_core() { 2) | raise_softirq() { 2) | __raise_softirq_irqoff() { 2) 0.391 us | } 2) 1.191 us | } 2) 2.086 us | } In order to do this merge, the reading of the trace output file needs to look at the next event before printing. But since the pointer to the event is on the ring buffer, it needs to save the entry event before it looks at the next event as the next event goes out of focus as soon as a new event is read from the ring buffer. After it reads the next event, it will print the entry event with either the '{' (non leaf) or ';' and timestamps (leaf). The iterator used to read the trace file has storage for this event. The problem happens when the function graph tracer has arguments attached to the entry event as the entry now has a variable length "args" field. This field only gets set when funcargs option is used. But the args are not recorded in this temp data and garbage could be printed. The entry field is copied via: data->ent = *curr; Where "curr" is the entry field. But this method only saves the non variable length fields from the structure. Add a helper structure to the iterator data that adds the max args size to the data storage in the iterator. Then simply copy the entire entry into this storage (with size protection). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/20250820195522.51d4a268@gandalf.local.home Reported-by: Sasha Levin <sashal@kernel.org> Tested-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/all/aJaxRVKverIjF4a6@lappy/ Fixes: ff5c9c576e75 ("ftrace: Add support for function argument to graph tracer") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-22Merge tag 'mm-hotfixes-stable-2025-08-21-18-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 10 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 17 of these fixes are for MM. As usual, singletons all over the place, apart from a three-patch series of KHO followup work from Pasha which is actually also a bunch of singletons" * tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/mremap: fix WARN with uffd that has remap events disabled mm/damon/sysfs-schemes: put damos dests dir after removing its files mm/migrate: fix NULL movable_ops if CONFIG_ZSMALLOC=m mm/damon/core: fix damos_commit_filter not changing allow mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn MAINTAINERS: mark MGLRU as maintained mm: rust: add page.rs to MEMORY MANAGEMENT - RUST iov_iter: iterate_folioq: fix handling of offset >= folio size selftests/damon: fix selftests by installing drgn related script .mailmap: add entry for Easwar Hariharan selftests/mm: add test for invalid multi VMA operations mm/mremap: catch invalid multi VMA moves earlier mm/mremap: allow multi-VMA move when filesystem uses thp_get_unmapped_area mm/damon/core: fix commit_ops_filters by using correct nth function tools/testing: add linux/args.h header and fix radix, VMA tests mm/debug_vm_pgtable: clear page table entries at destroy_args() squashfs: fix memory leak in squashfs_fill_super kho: warn if KHO is disabled due to an error kho: mm: don't allow deferred struct page with KHO kho: init new_physxa->phys_bits to fix lockdep
2025-08-21Merge tag 'cgroup-for-6.17-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix NULL de-ref in css_rstat_exit() which could happen after allocation failure - Fix a cpuset partition handling bug and a couple other misc issues - Doc spelling fix * tag 'cgroup-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: docs: cgroup: fixed spelling mistakes in documentation cgroup: avoid null de-ref in css_rstat_exit() cgroup/cpuset: Remove the unnecessary css_get/put() in cpuset_partition_write() cgroup/cpuset: Fix a partition error with CPU hotplug cgroup/cpuset: Use static_branch_enable_cpuslocked() on cpusets_insane_config_key
2025-08-21Merge tag 'sched_ext-for-6.17-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix a subtle bug during SCX enabling where a dead task skips init but doesn't skip sched class switch leading to invalid task state transition warning - Cosmetic fix in selftests * tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: selftests/sched_ext: Remove duplicate sched.h header sched/ext: Fix invalid task state transitions on class switch
2025-08-20Merge tag 'probes-fixes-v6.17-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: "Sanitize wildcard for fprobe event name Fprobe event accepts wildcards for the target functions, but unless the user specifies its event name, it makes an event with the wildcards. Replace the wildcard '*' with the underscore '_'" * tag 'probes-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fprobe-event: Sanitize wildcard for fprobe event name
2025-08-20tracing: fprobe-event: Sanitize wildcard for fprobe event nameMasami Hiramatsu (Google)
Fprobe event accepts wildcards for the target functions, but unless user specifies its event name, it makes an event with the wildcards. /sys/kernel/tracing # echo 'f mutex*' >> dynamic_events /sys/kernel/tracing # cat dynamic_events f:fprobes/mutex*__entry mutex* /sys/kernel/tracing # ls events/fprobes/ enable filter mutex*__entry To fix this, replace the wildcard ('*') with an underscore. Link: https://lore.kernel.org/all/175535345114.282990.12294108192847938710.stgit@devnote2/ Fixes: 334e5519c375 ("tracing/probes: Add fprobe events for tracing function entry and exit.") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org
2025-08-20trace/fgraph: Fix the warning caused by missing unregister notifierYe Weihua
This warning was triggered during testing on v6.16: notifier callback ftrace_suspend_notifier_call already registered WARNING: CPU: 2 PID: 86 at kernel/notifier.c:23 notifier_chain_register+0x44/0xb0 ... Call Trace: <TASK> blocking_notifier_chain_register+0x34/0x60 register_ftrace_graph+0x330/0x410 ftrace_profile_write+0x1e9/0x340 vfs_write+0xf8/0x420 ? filp_flush+0x8a/0xa0 ? filp_close+0x1f/0x30 ? do_dup2+0xaf/0x160 ksys_write+0x65/0xe0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f When writing to the function_profile_enabled interface, the notifier was not unregistered after start_graph_tracing failed, causing a warning the next time function_profile_enabled was written. Fixed by adding unregister_pm_notifier in the exception path. Link: https://lore.kernel.org/20250818073332.3890629-1-yeweihua4@huawei.com Fixes: 4a2b8dda3f870 ("tracing/function-graph-tracer: fix a regression while suspend to disk") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ye Weihua <yeweihua4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-20ring-buffer: Remove redundant semicolonsLiao Yuanhong
Remove unnecessary semicolons. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250813095114.559530-1-liaoyuanhong@vivo.com Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-20tracing: Limit access to parser->buffer when trace_get_user failedPu Lehui
When the length of the string written to set_ftrace_filter exceeds FTRACE_BUFF_MAX, the following KASAN alarm will be triggered: BUG: KASAN: slab-out-of-bounds in strsep+0x18c/0x1b0 Read of size 1 at addr ffff0000d00bd5ba by task ash/165 CPU: 1 UID: 0 PID: 165 Comm: ash Not tainted 6.16.0-g6bcdbd62bd56-dirty Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x34/0x50 (C) dump_stack_lvl+0xa0/0x158 print_address_description.constprop.0+0x88/0x398 print_report+0xb0/0x280 kasan_report+0xa4/0xf0 __asan_report_load1_noabort+0x20/0x30 strsep+0x18c/0x1b0 ftrace_process_regex.isra.0+0x100/0x2d8 ftrace_regex_release+0x484/0x618 __fput+0x364/0xa58 ____fput+0x28/0x40 task_work_run+0x154/0x278 do_notify_resume+0x1f0/0x220 el0_svc+0xec/0xf0 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x1ac/0x1b0 The reason is that trace_get_user will fail when processing a string longer than FTRACE_BUFF_MAX, but not set the end of parser->buffer to 0. Then an OOB access will be triggered in ftrace_regex_release-> ftrace_process_regex->strsep->strpbrk. We can solve this problem by limiting access to parser->buffer when trace_get_user failed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250813040232.1344527-1-pulehui@huaweicloud.com Fixes: 8c9af478c06b ("ftrace: Handle commands when closing set_ftrace_filter file") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-19kho: warn if KHO is disabled due to an errorPasha Tatashin
During boot scratch area is allocated based on command line parameters or auto calculated. However, scratch area may fail to allocate, and in that case KHO is disabled. Currently, no warning is printed that KHO is disabled, which makes it confusing for the end user to figure out why KHO is not available. Add the missing warning message. Link: https://lkml.kernel.org/r/20250808201804.772010-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19kho: mm: don't allow deferred struct page with KHOPasha Tatashin
KHO uses struct pages for the preserved memory early in boot, however, with deferred struct page initialization, only a small portion of memory has properly initialized struct pages. This problem was detected where vmemmap is poisoned, and illegal flag combinations are detected. Don't allow them to be enabled together, and later we will have to teach KHO to work properly with deferred struct page init kernel feature. Link: https://lkml.kernel.org/r/20250808201804.772010-3-pasha.tatashin@soleen.com Fixes: 4e1d010e3bda ("kexec: add config option for KHO") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19kho: init new_physxa->phys_bits to fix lockdepPasha Tatashin
Patch series "Several KHO Hotfixes". Three unrelated fixes for Kexec Handover. This patch (of 3): Lockdep shows the following warning: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. [<ffffffff810133a6>] dump_stack_lvl+0x66/0xa0 [<ffffffff8136012c>] assign_lock_key+0x10c/0x120 [<ffffffff81358bb4>] register_lock_class+0xf4/0x2f0 [<ffffffff813597ff>] __lock_acquire+0x7f/0x2c40 [<ffffffff81360cb0>] ? __pfx_hlock_conflict+0x10/0x10 [<ffffffff811707be>] ? native_flush_tlb_global+0x8e/0xa0 [<ffffffff8117096e>] ? __flush_tlb_all+0x4e/0xa0 [<ffffffff81172fc2>] ? __kernel_map_pages+0x112/0x140 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff81359556>] lock_acquire+0xe6/0x280 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff8100b9e0>] _raw_spin_lock+0x30/0x40 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff813ec327>] xa_load_or_alloc+0x67/0xe0 [<ffffffff813eb4c0>] kho_preserve_folio+0x90/0x100 [<ffffffff813ebb7f>] __kho_finalize+0xcf/0x400 [<ffffffff813ebef4>] kho_finalize+0x34/0x70 This is becase xa has its own lock, that is not initialized in xa_load_or_alloc. Modifiy __kho_preserve_order(), to properly call xa_init(&new_physxa->phys_bits); Link: https://lkml.kernel.org/r/20250808201804.772010-2-pasha.tatashin@soleen.com Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-19Merge tag 'vfs-6.17-rc3.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix two memory leaks in pidfs - Prevent changing the idmapping of an already idmapped mount without OPEN_TREE_CLONE through open_tree_attr() - Don't fail listing extended attributes in kernfs when no extended attributes are set - Fix the return value in coredump_parse() - Fix the error handling for unbuffered writes in netfs - Fix broken data integrity guarantees for O_SYNC writes via iomap - Fix UAF in __mark_inode_dirty() - Keep inode->i_blkbits constant in fuse - Fix coredump selftests - Fix get_unused_fd_flags() usage in do_handle_open() - Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES - Fix use-after-free in bh_read() - Fix incorrect lflags value in the move_mount() syscall * tag 'vfs-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signal: Fix memory leak for PIDFD_SELF* sentinels kernfs: don't fail listing extended attributes coredump: Fix return value in coredump_parse() fs/buffer: fix use-after-free when call bh_read() helper pidfs: Fix memory leak in pidfd_info() netfs: Fix unbuffered write error handling fhandle: do_handle_open() should get FD with user flags module: Rename EXPORT_SYMBOL_GPL_FOR_MODULES to EXPORT_SYMBOL_FOR_MODULES fs: fix incorrect lflags value in the move_mount syscall selftests/coredump: Remove the read() that fails the test fuse: keep inode->i_blkbits constant iomap: Fix broken data integrity guarantees for O_SYNC writes selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE fs: writeback: fix use-after-free in __mark_inode_dirty()
2025-08-19signal: Fix memory leak for PIDFD_SELF* sentinelsAdrian Huang (Lenovo)
Commit f08d0c3a7111 ("pidfd: add PIDFD_SELF* sentinels to refer to own thread/process") introduced a leak by acquiring a pid reference through get_task_pid(), which increments pid->count but never drops it with put_pid(). As a result, kmemleak reports unreferenced pid objects after running tools/testing/selftests/pidfd/pidfd_test, for example: unreferenced object 0xff1100206757a940 (size 160): comm "pidfd_test", pid 16965, jiffies 4294853028 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 fd 57 50 04 .............WP. 5e 44 00 00 00 00 00 00 18 de 34 17 01 00 11 ff ^D........4..... backtrace (crc cd8844d4): kmem_cache_alloc_noprof+0x2f4/0x3f0 alloc_pid+0x54/0x3d0 copy_process+0xd58/0x1740 kernel_clone+0x99/0x3b0 __do_sys_clone3+0xbe/0x100 do_syscall_64+0x7b/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fix this by calling put_pid() after do_pidfd_send_signal() returns. Fixes: f08d0c3a7111 ("pidfd: add PIDFD_SELF* sentinels to refer to own thread/process") Signed-off-by: Adrian Huang (Lenovo) <adrianhuang0701@gmail.com> Link: https://lore.kernel.org/20250818134310.12273-1-adrianhuang0701@gmail.com Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-17Merge tag 'locking_urgent_for_v6.17_rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Make sure sanity checks down in the mutex lock path happen on the correct type of task so that they don't trigger falsely - Use the write unsafe user access pairs when writing a futex value to prevent an error on PowerPC which does user read and write accesses differently * tag 'locking_urgent_for_v6.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: Fix __clear_task_blocked_on() warning from __ww_mutex_wound() path futex: Use user_write_access_begin/_end() in futex_put_value()
2025-08-16params: Replace deprecated strcpy() with strscpy() and memcpy()Thorsten Blum
strcpy() is deprecated; use strscpy() and memcpy() instead. In param_set_copystring(), we can safely use memcpy() because we already know the length of the source string 'val' and that it is guaranteed to be NUL-terminated within the first 'kps->maxlen' bytes. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Link: https://lore.kernel.org/r/20250813132200.184064-2-thorsten.blum@linux.dev Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-08-15perf: Avoid undefined behavior from stopping/starting inactive eventsYunseong Kim
Calling pmu->start()/stop() on perf events in PERF_EVENT_STATE_OFF can leave event->hw.idx at -1. When PMU drivers later attempt to use this negative index as a shift exponent in bitwise operations, it leads to UBSAN shift-out-of-bounds reports. The issue is a logical flaw in how event groups handle throttling when some members are intentionally disabled. Based on the analysis and the reproducer provided by Mark Rutland (this issue on both arm64 and x86-64). The scenario unfolds as follows: 1. A group leader event is configured with a very aggressive sampling period (e.g., sample_period = 1). This causes frequent interrupts and triggers the throttling mechanism. 2. A child event in the same group is created in a disabled state (.disabled = 1). This event remains in PERF_EVENT_STATE_OFF. Since it hasn't been scheduled onto the PMU, its event->hw.idx remains initialized at -1. 3. When throttling occurs, perf_event_throttle_group() and later perf_event_unthrottle_group() iterate through all siblings, including the disabled child event. 4. perf_event_throttle()/unthrottle() are called on this inactive child event, which then call event->pmu->start()/stop(). 5. The PMU driver receives the event with hw.idx == -1 and attempts to use it as a shift exponent. e.g., in macros like PMCNTENSET(idx), leading to the UBSAN report. The throttling mechanism attempts to start/stop events that are not actively scheduled on the hardware. Move the state check into perf_event_throttle()/perf_event_unthrottle() so that inactive events are skipped entirely. This ensures only active events with a valid hw.idx are processed, preventing undefined behavior and silencing UBSAN warnings. The corrected check ensures true before proceeding with PMU operations. The problem can be reproduced with the syzkaller reproducer: Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group") Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250812181046.292382-2-ysk@kzalloc.com
2025-08-14Merge tag 'net-6.17-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Netfilter and IPsec. Current release - regressions: - netfilter: nft_set_pipapo: - don't return bogus extension pointer - fix null deref for empty set Current release - new code bugs: - core: prevent deadlocks when enabling NAPIs with mixed kthread config - eth: netdevsim: Fix wild pointer access in nsim_queue_free(). Previous releases - regressions: - page_pool: allow enabling recycling late, fix false positive warning - sched: ets: use old 'nbands' while purging unused classes - xfrm: - restore GSO for SW crypto - bring back device check in validate_xmit_xfrm - tls: handle data disappearing from under the TLS ULP - ptp: prevent possible ABBA deadlock in ptp_clock_freerun() - eth: - bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE - hv_netvsc: fix panic during namespace deletion with VF Previous releases - always broken: - netfilter: fix refcount leak on table dump - vsock: do not allow binding to VMADDR_PORT_ANY - sctp: linearize cloned gso packets in sctp_rcv - eth: - hibmcge: fix the division by zero issue - microchip: fix KSZ8863 reset problem" * tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) net: usb: asix_devices: add phy_mask for ax88772 mdio bus net: kcm: Fix race condition in kcm_unattach() selftests: net/forwarding: test purge of active DWRR classes net/sched: ets: use old 'nbands' while purging unused classes bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE netdevsim: Fix wild pointer access in nsim_queue_free(). net: mctp: Fix bad kfree_skb in bind lookup test netfilter: nf_tables: reject duplicate device on updates ipvs: Fix estimator kthreads preferred affinity netfilter: nft_set_pipapo: fix null deref for empty set selftests: tls: test TCP stealing data from under the TLS socket tls: handle data disappearing from under the TLS ULP ptp: prevent possible ABBA deadlock in ptp_clock_freerun() ixgbe: prevent from unwanted interface name changes devlink: let driver opt out of automatic phys_port_name generation net: prevent deadlocks when enabling NAPIs with mixed kthread config net: update NAPI threaded config even for disabled NAPIs selftests: drv-net: don't assume device has only 2 queues docs: Fix name for net.ipv4.udp_child_hash_entries riscv: dts: thead: Add APB clocks for TH1520 GMACs ...
2025-08-13locking: Fix __clear_task_blocked_on() warning from __ww_mutex_wound() pathJohn Stultz
The __clear_task_blocked_on() helper added a number of sanity checks ensuring we hold the mutex wait lock and that the task we are clearing blocked_on pointer (if set) matches the mutex. However, there is an edge case in the _ww_mutex_wound() logic where we need to clear the blocked_on pointer for the task that owns the mutex, not the task that is waiting on the mutex. For this case the sanity checks aren't valid, so handle this by allowing a NULL lock to skip the additional checks. K Prateek Nayak and Maarten Lankhorst also pointed out that in this case where we don't hold the owner's mutex wait_lock, we need to be a bit more careful using READ_ONCE/WRITE_ONCE in both the __clear_task_blocked_on() and __set_task_blocked_on() implementations to avoid accidentally tripping WARN_ONs if two instances race. So do that here as well. This issue was easier to miss, I realized, as the test-ww_mutex driver only exercises the wait-die class of ww_mutexes. I've sent a patch[1] to address this so the logic will be easier to test. [1]: https://lore.kernel.org/lkml/20250801023358.562525-2-jstultz@google.com/ Fixes: a4f0b6fef4b0 ("locking/mutex: Add p->blocked_on wrappers for correctness checks") Closes: https://lore.kernel.org/lkml/68894443.a00a0220.26d0e1.0015.GAE@google.com/ Reported-by: syzbot+602c4720aed62576cd79@syzkaller.appspotmail.com Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250805001026.2247040-1-jstultz@google.com
2025-08-13ipvs: Fix estimator kthreads preferred affinityFrederic Weisbecker
The estimator kthreads' affinity are defined by sysctl overwritten preferences and applied through a plain call to the scheduler's affinity API. However since the introduction of managed kthreads preferred affinity, such a practice shortcuts the kthreads core code which eventually overwrites the target to the default unbound affinity. Fix this with using the appropriate kthread's API. Fixes: d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-08-11sched/ext: Fix invalid task state transitions on class switchAndrea Righi
When enabling a sched_ext scheduler, we may trigger invalid task state transitions, resulting in warnings like the following (which can be easily reproduced by running the hotplug selftest in a loop): sched_ext: Invalid task state transition 0 -> 3 for fish[770] WARNING: CPU: 18 PID: 787 at kernel/sched/ext.c:3862 scx_set_task_state+0x7c/0xc0 ... RIP: 0010:scx_set_task_state+0x7c/0xc0 ... Call Trace: <TASK> scx_enable_task+0x11f/0x2e0 switching_to_scx+0x24/0x110 scx_enable.isra.0+0xd14/0x13d0 bpf_struct_ops_link_create+0x136/0x1a0 __sys_bpf+0x1edd/0x2c30 __x64_sys_bpf+0x21/0x30 do_syscall_64+0xbb/0x370 entry_SYSCALL_64_after_hwframe+0x77/0x7f This happens because we skip initialization for tasks that are already dead (with their usage counter set to zero), but we don't exclude them during the scheduling class transition phase. Fix this by also skipping dead tasks during class swiching, preventing invalid task state transitions. Fixes: a8532fac7b5d2 ("sched_ext: TASK_DEAD tasks must be switched into SCX on ops_enable") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-11futex: Use user_write_access_begin/_end() in futex_put_value()Waiman Long
Commit cec199c5e39b ("futex: Implement FUTEX2_NUMA") introduced the futex_put_value() helper to write a value to the given user address. However, it uses user_read_access_begin() before the write. For architectures that differentiate between read and write accesses, like PowerPC, futex_put_value() fails with -EFAULT. Fix that by using the user_write_access_begin/user_write_access_end() pair instead. Fixes: cec199c5e39b ("futex: Implement FUTEX2_NUMA") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250811141147.322261-1-longman@redhat.com
2025-08-11rcu: Fix racy re-initialization of irq_work causing hangsFrederic Weisbecker
RCU re-initializes the deferred QS irq work everytime before attempting to queue it. However there are situations where the irq work is attempted to be queued even though it is already queued. In that case re-initializing messes-up with the irq work queue that is about to be handled. The chances for that to happen are higher when the architecture doesn't support self-IPIs and irq work are then all lazy, such as with the following sequence: 1) rcu_read_unlock() is called when IRQs are disabled and there is a grace period involving blocked tasks on the node. The irq work is then initialized and queued. 2) The related tasks are unblocked and the CPU quiescent state is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE, allowing the irq work to be requeued in the future (note the previous one hasn't fired yet). 3) A new grace period starts and the node has blocked tasks. 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work is re-initialized (but it's queued! and its node is cleared) and requeued. Which means it's requeued to itself. 5) The irq work finally fires with the tick. But since it was requeued to itself, it loops and hangs. Fix this with initializing the irq work only once before the CPU boots. Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
2025-08-10Merge tag 'smp_urgent_for_v6.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp fixes from Borislav Petkov: - Remove an obsolete comment and fix spelling * tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Remove obsolete comment from takedown_cpu() smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment
2025-08-10Merge tag 'irq_urgent_for_v6.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a wrong ioremap size in mvebu-gicp - Remove yet another compile-test case for a driver which needs an additional dependency - Fix a lock inversion scenario in the IRQ unit test suite - Remove an impossible flag situation in gic-v5 - Do not iounmap resources in gic-v5 which are managed by devm - Make sure stale, left-over interrupts in mvebu-gicp are cleared on driver init - Fix a reference counting mishap in msi-lib - Fix a dereference-before-null-ptr-check case in the riscv-imsic irqchip driver * tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mvebu-gicp: Use resource_size() for ioremap() irqchip: Build IMX_MU_MSI only on ARM genirq/test: Resolve irq lock inversion warnings irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs irqchip/gic-v5: iwb: Fix iounmap probe failure path irqchip/mvebu-gicp: Clear pending interrupts on init irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select() irqchip/riscv-imsic: Don't dereference before NULL pointer check
2025-08-10Merge tag 'locking_urgent_for_v6.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Prevent a futex hash leak due to different mm lifetimes * tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Move futex cleanup to __mmdrop()
2025-08-09cgroup: avoid null de-ref in css_rstat_exit()JP Kobryn
css_rstat_exit() may be called asynchronously in scenarios where preceding calls to css_rstat_init() have not completed. One such example is this sequence below: css_create(...) { ... init_and_link_css(css, ...); err = percpu_ref_init(...); if (err) goto err_free_css; err = cgroup_idr_alloc(...); if (err) goto err_free_css; err = css_rstat_init(css, ...); if (err) goto err_free_css; ... err_free_css: INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); } If any of the three goto jumps are taken, async cleanup will begin and css_rstat_exit() will be invoked on an uninitialized css->rstat_cpu. Avoid accessing the unitialized field by returning early in css_rstat_exit() if this is the case. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Suggested-by: Michal Koutný <mkoutny@suse.com> Fixes: 5da3bfa029d68 ("cgroup: use separate rstat trees for each subsystem") Cc: stable@vger.kernel.org # v6.16 Reported-by: syzbot+8d052e8b99e40bc625ed@syzkaller.appspotmail.com Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09cgroup/cpuset: Remove the unnecessary css_get/put() in cpuset_partition_write()Waiman Long
The css_get/put() calls in cpuset_partition_write() are unnecessary as an active reference of the kernfs node will be taken which will prevent its removal and guarantee the existence of the css. Only the online check is needed. Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09cgroup/cpuset: Fix a partition error with CPU hotplugWaiman Long
It was found during testing that an invalid leaf partition with an empty effective exclusive CPU list can become a valid empty partition with no CPU afer an offline/online operation of an unrelated CPU. An empty partition root is allowed in the special case that it has no task in its cgroup and has distributed out all its CPUs to its child partitions. That is certainly not the case here. The problem is in the cpumask_subsets() test in the hotplug case (update with no new mask) of update_parent_effective_cpumask() as it also returns true if the effective exclusive CPU list is empty. Fix that by addding the cpumask_empty() test to root out this exception case. Also add the cpumask_empty() test in cpuset_hotplug_update_tasks() to avoid calling update_parent_effective_cpumask() for this special case. Fixes: 0c7f293efc87 ("cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09cgroup/cpuset: Use static_branch_enable_cpuslocked() on ↵Waiman Long
cpusets_insane_config_key The following lockdep splat was observed. [ 812.359086] ============================================ [ 812.359089] WARNING: possible recursive locking detected [ 812.359097] -------------------------------------------- [ 812.359100] runtest.sh/30042 is trying to acquire lock: [ 812.359105] ffffffffa7f27420 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0xe/0x20 [ 812.359131] [ 812.359131] but task is already holding lock: [ 812.359134] ffffffffa7f27420 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_write_resmask+0x98/0xa70 : [ 812.359267] Call Trace: [ 812.359272] <TASK> [ 812.359367] cpus_read_lock+0x3c/0xe0 [ 812.359382] static_key_enable+0xe/0x20 [ 812.359389] check_insane_mems_config.part.0+0x11/0x30 [ 812.359398] cpuset_write_resmask+0x9f2/0xa70 [ 812.359411] cgroup_file_write+0x1c7/0x660 [ 812.359467] kernfs_fop_write_iter+0x358/0x530 [ 812.359479] vfs_write+0xabe/0x1250 [ 812.359529] ksys_write+0xf9/0x1d0 [ 812.359558] do_syscall_64+0x5f/0xe0 Since commit d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order"), the ordering of cpu hotplug lock and cpuset_mutex had been reversed. That patch correctly used the cpuslocked version of the static branch API to enable cpusets_pre_enable_key and cpusets_enabled_key, but it didn't do the same for cpusets_insane_config_key. The cpusets_insane_config_key can be enabled in the check_insane_mems_config() which is called from update_nodemask() or cpuset_hotplug_update_tasks() with both cpu hotplug lock and cpuset_mutex held. Deadlock can happen with a pending hotplug event that tries to acquire the cpu hotplug write lock which will block further cpus_read_lock() attempt from check_insane_mems_config(). Fix that by switching to use static_branch_enable_cpuslocked(). Fixes: d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Alexei Starovoitov: - Fix memory leak of bpf_scc_info objects (Eduard Zingerman) - Fix a regression in the 'perf' tool caused by moving UID filtering to BPF (Ilya Leoshkevich) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: perf bpf-filter: Enable events manually libbpf: Add the ability to suppress perf event enablement bpf: Fix memory leak of bpf_scc_info objects
2025-08-06cpu: Remove obsolete comment from takedown_cpu()Waiman Long
takedown_cpu() has a comment about "all preempt/rcu users must observe !cpu_active()" which is kind of meaningless in this function. This comment was originally introduced by commit 6acce3ef8452 ("sched: Remove get_online_cpus() usage") when _cpu_down() was setting cpu_active_mask and synchronize_rcu()/synchronize_sched() were added after that. Later commit 40190a78f85f ("sched/hotplug: Convert cpu_[in]active notifiers to state machine") added a new CPUHP_AP_ACTIVE hotplug state to set/clear cpu_active_mask. The following commit b2454caa8977 ("sched/hotplug: Move sync_rcu to be with set_cpu_active(false)") move the synchronize_*() calls to sched_cpu_deactivate() associated with the new hotplug state, but left the comment behind. Remove this comment as it is no longer relevant in takedown_cpu(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250729191232.664931-1-longman@redhat.com
2025-08-06genirq/test: Resolve irq lock inversion warningsBrian Norris
irq_shutdown_and_deactivate() is normally called with the descriptor lock held, and interrupts disabled. Nested a few levels down, it grabs the global irq_resend_lock. Lockdep rightfully complains when interrupts are not disabled: CPU0 CPU1 ---- ---- lock(irq_resend_lock); local_irq_disable(); lock(&irq_desc_lock_class); lock(irq_resend_lock); <Interrupt> lock(&irq_desc_lock_class); ... _raw_spin_lock+0x2b/0x40 clear_irq_resend+0x14/0x70 irq_shutdown_and_deactivate+0x29/0x80 irq_shutdown_depth_test+0x1ce/0x600 kunit_try_run_case+0x90/0x120 Grab the descriptor lock and disable interrupts, to resolve the problem. Fixes: 66067c3c8a1e ("genirq: Add kunit tests for depth counts") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/all/aJJONEIoIiTSDMqc@google.com Closes: https://lore.kernel.org/lkml/31a761e4-8f81-40cf-aaf5-d220ba11911c@roeck-us.net/
2025-08-06Merge tag 'kbuild-v6.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: "This is the last pull request from me. I'm grateful to have been able to continue as a maintainer for eight years. From the next cycle, Nathan and Nicolas will maintain Kbuild. - Fix a shortcut key issue in menuconfig - Fix missing rebuild of kheaders - Sort the symbol dump generated by gendwarfsyms - Support zboot extraction in scripts/extract-vmlinux - Migrate gconfig to GTK 3 - Add TAR variable to allow overriding the default tar command - Hand over Kbuild maintainership" * tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (92 commits) MAINTAINERS: hand over Kbuild maintenance kheaders: make it possible to override TAR kbuild: userprogs: use correct linker when mixing clang and GNU ld kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c kconfig: lxdialog: replace strcpy with snprintf in print_autowrap kconfig: gconf: refactor text_insert_help() kconfig: gconf: remove unneeded variable in text_insert_msg kconfig: gconf: use hyphens in signals kconfig: gconf: replace GtkImageMenuItem with GtkMenuItem kconfig: gconf: Fix Back button behavior kconfig: gconf: fix single view to display dependent symbols correctly scripts: add zboot support to extract-vmlinux gendwarfksyms: order -T symtypes output by name gendwarfksyms: use preferred form of sizeof for allocation kconfig: qconf: confine {begin,end}Group to constructor and destructor kconfig: qconf: fix ConfigList::updateListAllforAll() kconfig: add a function to dump all menu entries in a tree-like format kconfig: gconf: show GTK version in About dialog kconfig: gconf: replace GtkHPaned and GtkVPaned with GtkPaned kconfig: gconf: replace GdkColor with GdkRGBA ...
2025-08-06Merge tag 'perf-fixes-27504' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git Pull perf fixes from Thomas Gleixner: "Perf fixes for perf_mmap() reference counting to prevent potential reference count leaks which are caused by: - VMA splits, which change the offset or size of a mapping, which causes perf_mmap_close() to ignore the unmap or unmap the wrong buffer. - Several internal issues of perf_mmap(), which can cause reference count leaks in the perf mmap, corrupt accounting or cause leaks in perf drivers. The main fix is to prevent VMA splits by implementing the [may_]split() callback for vm operations. The other issues are addressed by rearranging code, early returns on failure and invocation of cleanups. Also provide a selftest to validate the fixes. The reference counting should be converted to refcount_t, but that requires larger refactoring of the code and will be done once these fixes are upstream" * tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git: selftests/perf_events: Add a mmap() correctness test perf/core: Prevent VMA split of buffer mappings perf/core: Handle buffer mapping fail correctly in perf_mmap() perf/core: Exit early on perf_mmap() fail perf/core: Don't leak AUX buffer refcount on allocation failure perf/core: Preserve AUX buffer allocation failure result
2025-08-06kheaders: make it possible to override TARMichał Górny
Commit 86cdd2fdc4e3 ("kheaders: make headers archive reproducible") introduced a number of options specific to GNU tar to the `tar` invocation in `gen_kheaders.sh` script. This causes the script to fail to work on systems where `tar` is not GNU tar. This can occur e.g. on recent Gentoo Linux installations that support using bsdtar from libarchive instead. Add a `TAR` make variable to make it possible to override the tar executable used, e.g. by specifying: make TAR=gtar Link: https://bugs.gentoo.org/884061 Reported-by: Sam James <sam@gentoo.org> Tested-by: Sam James <sam@gentoo.org> Co-developed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-08-05perf/core: Prevent VMA split of buffer mappingsThomas Gleixner
The perf mmap code is careful about mmap()'ing the user page with the ringbuffer and additionally the auxiliary buffer, when the event supports it. Once the first mapping is established, subsequent mapping have to use the same offset and the same size in both cases. The reference counting for the ringbuffer and the auxiliary buffer depends on this being correct. Though perf does not prevent that a related mapping is split via mmap(2), munmap(2) or mremap(2). A split of a VMA results in perf_mmap_open() calls, which take reference counts, but then the subsequent perf_mmap_close() calls are not longer fulfilling the offset and size checks. This leads to reference count leaks. As perf already has the requirement for subsequent mappings to match the initial mapping, the obvious consequence is that VMA splits, caused by resizing of a mapping or partial unmapping, have to be prevented. Implement the vm_operations_struct::may_split() callback and return unconditionally -EINVAL. That ensures that the mapping offsets and sizes cannot be changed after the fact. Remapping to a different fixed address with the same size is still possible as it takes the references for the new mapping and drops those of the old mapping. Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27504 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: stable@vger.kernel.org
2025-08-05perf/core: Handle buffer mapping fail correctly in perf_mmap()Thomas Gleixner
After successful allocation of a buffer or a successful attachment to an existing buffer perf_mmap() tries to map the buffer read only into the page table. If that fails, the already set up page table entries are zapped, but the other perf specific side effects of that failure are not handled. The calling code just cleans up the VMA and does not invoke perf_mmap_close(). This leaks reference counts, corrupts user->vm accounting and also results in an unbalanced invocation of event::event_mapped(). Cure this by moving the event::event_mapped() invocation before the map_range() call so that on map_range() failure perf_mmap_close() can be invoked without causing an unbalanced event::event_unmapped() call. perf_mmap_close() undoes the reference counts and eventually frees buffers. Fixes: b709eb872e19 ("perf: map pages in advance") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
2025-08-05perf/core: Exit early on perf_mmap() failThomas Gleixner
When perf_mmap() fails to allocate a buffer, it still invokes the event_mapped() callback of the related event. On X86 this might increase the perf_rdpmc_allowed reference counter. But nothing undoes this as perf_mmap_close() is never called in this case, which causes another reference count leak. Return early on failure to prevent that. Fixes: 1e0fb9ec679c ("perf: Add pmu callbacks to track event mapping and unmapping") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
2025-08-05perf/core: Don't leak AUX buffer refcount on allocation failureThomas Gleixner
Failure of the AUX buffer allocation leaks the reference count. Set the reference count to 1 only when the allocation succeeds. Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
2025-08-05perf/core: Preserve AUX buffer allocation failure resultThomas Gleixner
A recent overhaul sets the return value to 0 unconditionally after the allocations, which causes reference count leaks and corrupts the user->vm accounting. Preserve the AUX buffer allocation failure return value, so that the subsequent code works correctly. Fixes: 0983593f32c4 ("perf/core: Lift event->mmap_mutex in perf_mmap()") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: stable@vger.kernel.org
2025-08-05Merge tag 'mm-stable-2025-08-03-12-35' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: "Significant patch series in this pull request: - "mseal cleanups" (Lorenzo Stoakes) Some mseal cleaning with no intended functional change. - "Optimizations for khugepaged" (David Hildenbrand) Improve khugepaged throughput by batching PTE operations for large folios. This gain is mainly for arm64. - "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport) A bugfix, additional debug code and cleanups to the execmem code. - "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song) Bugfixes, cleanups and performance improvememnts to the mTHP swapin code" * tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits) mm: mempool: fix crash in mempool_free() for zero-minimum pools mm: correct type for vmalloc vm_flags fields mm/shmem, swap: fix major fault counting mm/shmem, swap: rework swap entry and index calculation for large swapin mm/shmem, swap: simplify swapin path and result handling mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO mm/shmem, swap: tidy up swap entry splitting mm/shmem, swap: tidy up THP swapin checks mm/shmem, swap: avoid redundant Xarray lookup during swapin x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations execmem: drop writable parameter from execmem_fill_trapping_insns() execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP) execmem: move execmem_force_rw() and execmem_restore_rox() before use execmem: rework execmem_cache_free() execmem: introduce execmem_alloc_rw() execmem: drop unused execmem_update_copy() mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped mm/rmap: add anon_vma lifetime debug check mm: remove mm/io-mapping.c ...
2025-08-04Merge tag 'printk-for-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add new "hash_pointers=[auto|always|never]" boot parameter to force the hashing even with "slab_debug" enabled - Allow to stop CPU, after losing nbcon console ownership during panic(), even without proper NMI - Allow to use the printk kthread immediately even for the 1st registered nbcon - Compiler warning removal * tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: nbcon: Allow reacquire during panic printk: Allow to use the printk kthread immediately even for 1st nbcon slab: Decouple slab_debug and no_hash_pointers vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format' compiler-gcc.h: Introduce __diag_GCC_all
2025-08-04sched/psi: Fix psi_seq initializationPeter Zijlstra
With the seqcount moved out of the group into a global psi_seq, re-initializing the seqcount on group creation is causing seqcount corruption. Fixes: 570c8efd5eb7 ("sched/psi: Optimize psi_group_change() cpu_clock() usage") Reported-by: Chris Mason <clm@meta.com> Suggested-by: Beata Michalska <beata.michalska@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-08-04Merge branch 'rework/fixes' into for-linusPetr Mladek