summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2025-07-24tracing: Call trace_ftrace_test_filter() for the eventSteven Rostedt
The trace event filter bootup self test tests a bunch of filter logic against the ftrace_test_filter event, but does not actually call the event. Work is being done to cause a warning if an event is defined but not used. To quiet the warning call the trace event under an if statement where it is disabled so it doesn't get optimized out. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas.schier@linux.dev> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250723194212.274458858@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-07-24 We've added 3 non-merge commits during the last 3 day(s) which contain a total of 4 files changed, 40 insertions(+), 15 deletions(-). The main changes are: 1) Improved verifier error message for incorrect narrower load from pointer field in ctx, from Paul Chaignon. 2) Disabled migration in nf_hook_run_bpf to address a syzbot report, from Kuniyuki Iwashima. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Test invalid narrower ctx load bpf: Reject narrower access to pointer ctx fields bpf: Disable migration in nf_hook_run_bpf(). ==================== Link: https://patch.msgid.link/20250724173306.3578483-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24resource: fix false warning in __request_region()Akinobu Mita
A warning is raised when __request_region() detects a conflict with a resource whose resource.desc is IORES_DESC_DEVICE_PRIVATE_MEMORY. But this warning is only valid for iomem_resources. The hmem device resource uses resource.desc as the numa node id, which can cause spurious warnings. This warning appeared on a machine with multiple cxl memory expanders. One of the NUMA node id is 6, which is the same as the value of IORES_DESC_DEVICE_PRIVATE_MEMORY. In this environment it was just a spurious warning, but when I saw the warning I suspected a real problem so it's better to fix it. This change fixes this by restricting the warning to only iomem_resource. This also adds a missing new line to the warning message. Link: https://lkml.kernel.org/r/20250719112604.25500-1-akinobu.mita@gmail.com Fixes: 7dab174e2e27 ("dax/hmem: Move hmem device registration to dax_hmem.ko") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-24x86: Handle KCOV __init vs inline mismatchesKees Cook
GCC appears to have kind of fragile inlining heuristics, in the sense that it can change whether or not it inlines something based on optimizations. It looks like the kcov instrumentation being added (or in this case, removed) from a function changes the optimization results, and some functions marked "inline" are _not_ inlined. In that case, we end up with __init code calling a function not marked __init, and we get the build warnings I'm trying to eliminate in the coming patch that adds __no_sanitize_coverage to __init functions: WARNING: modpost: vmlinux: section mismatch in reference: xbc_exit+0x8 (section: .text.unlikely) -> _xbc_exit (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: real_mode_size_needed+0x15 (section: .text.unlikely) -> real_mode_blob_end (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: __set_percpu_decrypted+0x16 (section: .text.unlikely) -> early_set_memory_decrypted (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: memblock_alloc_from+0x26 (section: .text.unlikely) -> memblock_alloc_try_nid (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_set_root_pointer+0xc (section: .text.unlikely) -> x86_init (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_get_root_pointer+0x8 (section: .text.unlikely) -> x86_init (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: efi_config_table_is_usable+0x16 (section: .text.unlikely) -> xen_efi_config_table_is_usable (section: .init.text) This problem is somewhat fragile (though using either __always_inline or __init will deterministically solve it), but we've tripped over this before with GCC and the solution has usually been to just use __always_inline and move on. For x86 this means forcing several functions to be inline with __always_inline. Link: https://lore.kernel.org/r/20250724055029.3623499-2-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc8). Conflicts: drivers/net/ethernet/microsoft/mana/gdma_main.c 9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion") 755391121038 ("net: mana: Allocate MSI-X vectors dynamically") https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/icssg/icssg_prueth.h 6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG") ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24rv: Return init error when registering monitorsGabriele Monaco
Monitors generated with dot2k have their registration function (the one called during monitor initialisation) return always 0, even if the registration failed on RV side. This can hide potential errors. Return the value returned by the RV register function. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250723161240.194860-6-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24verification/rvgen: Organise Kconfig entries for nested monitorsGabriele Monaco
The current behaviour of rvgen when running with the -a option is to append the necessary lines at the end of the configuration for Kconfig, Makefile and tracepoints. This is not always the desired behaviour in case of nested monitors: while tracepoints are not affected by nesting and the Makefile's only requirement is that the parent monitor is built before its children, in the Kconfig it is better to have children defined right after their parent, otherwise the result has wrong indentation: [*] foo_parent monitor [*] foo_child1 monitor [*] foo_child2 monitor [*] bar_parent monitor [*] bar_child1 monitor [*] bar_child2 monitor [*] foo_child3 monitor [*] foo_child4 monitor Adapt rvgen to look for a different marker for nested monitors in the Kconfig file and append the line right after the last sibling, instead of the last monitor. Also add the marker when creating a new parent monitor. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250723161240.194860-5-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tools/dot2c: Fix generated files going over 100 column limitGabriele Monaco
The dot2c.py script generates all states in a single line. This breaks the 100 column limit when the state machines are non-trivial. Change dot2c.py to generate the states in separate lines in case the generated line is going to be too long. Also adapt existing monitors with line length over the limit. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250723161240.194860-4-gmonaco@redhat.com Suggested-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tracing: Have eprobes handle arraysSteven Rostedt
eprobes are dynamic events that can read other events using their fields to create new events. Currently it doesn't work with arrays. When the new event field is attached to the old event field, it looks at the size of the field to determine what type of field the new field should be. For 1 byte fields it's a char, for 2 bytes, it's a short and for 4 bytes it's an integer. For all other sizes it just defaults to "long". This also reads the contents of the field for such cases. For arrays that are bigger than the size of long, return the value of the address of the content itself. This will allow eprobes to read other values in the array of the old event. This is useful when raw_syscalls is enabled but the syscall events are not. The syscall events are created from the raw_syscalls as they have an array of "args" that holds the 6 long words passed to the syscall entry point. To read the value of "filename" from sys_openat, the eprobe could attach to the raw_syscall and read the second value. It can then even be passed to a synthetic event and converted back to another eprobe to get the value of "filename" after it has been read by the kernel during the system call: [ Create an eprobe called "sys" and attach it to sys_enter. Read the id of the system call and the second argument ] # echo 'e:sys raw_syscalls.sys_enter nr=$id:u32 arg2=+8($args):u64' >> /sys/kernel/tracing/dynamic_events [ Create a synthetic event "path" that will hold the address of the sys_openat filename. This is on a 64bit machine, so make it 64 bits ] # echo 's:path u64 file;' >> /sys/kernel/tracing/dynamic_events [ Add a histogram to the eprobe/sys which tiggers if the "nr" field is 257 (sys_openat), and save the filename in the "file" variable. ] # echo 'hist:keys=common_pid:file=arg2 if nr == 257' > /sys/kernel/tracing/events/eprobes/sys/trigger [ Attach a histogram to sys_exit event that triggers the "path" synthetic event and records the "filename" that was passed from the sys eprobe. ] # echo 'hist:keys=common_pid:f=$file:onmatch(eprobes.sys).trace(path,$f)' >> /sys/kernel/tracing/events/raw_syscalls/sys_exit/trigger [ Create another eprobe that dereferences the "file" field as a user space string and displays it. ] # echo 'e:open synthetic.path file=+0($file):ustring' >> /sys/kernel/tracing/dynamic_events # echo 1 > /sys/kernel/tracing/events/eprobes/open/enable # cat trace_pipe cat-1142 [003] ...5. 799.521912: open: (synthetic.path) file="/etc/ld.so.cache" cat-1142 [003] ...5. 799.521934: open: (synthetic.path) file="/etc/ld.so.cache" cat-1142 [003] ...5. 799.522065: open: (synthetic.path) file="/etc/ld.so.cache" cat-1142 [003] ...5. 799.522080: open: (synthetic.path) file="/etc/ld.so.cache" cat-1142 [003] ...5. 799.522296: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6" cat-1142 [003] ...5. 799.522319: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6" less-1143 [005] ...5. 799.522327: open: (synthetic.path) file="/etc/ld.so.cache" cat-1142 [003] ...5. 799.522333: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6" cat-1142 [003] ...5. 799.522348: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6" less-1143 [005] ...5. 799.522349: open: (synthetic.path) file="/etc/ld.so.cache" cat-1142 [003] ...5. 799.522363: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6" less-1143 [005] ...5. 799.522477: open: (synthetic.path) file="/etc/ld.so.cache" cat-1142 [003] ...5. 799.522489: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6" less-1143 [005] ...5. 799.522492: open: (synthetic.path) file="/etc/ld.so.cache" less-1143 [005] ...5. 799.522720: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6" less-1143 [005] ...5. 799.522744: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6" less-1143 [005] ...5. 799.522759: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6" cat-1142 [003] ...5. 799.522850: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6" Link: https://lore.kernel.org/all/20250723124202.4f7475be@batman.local.home/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-23bpf: Reject narrower access to pointer ctx fieldsPaul Chaignon
The following BPF program, simplified from a syzkaller repro, causes a kernel warning: r0 = *(u8 *)(r1 + 169); exit; With pointer field sk being at offset 168 in __sk_buff. This access is detected as a narrower read in bpf_skb_is_valid_access because it doesn't match offsetof(struct __sk_buff, sk). It is therefore allowed and later proceeds to bpf_convert_ctx_access. Note that for the "is_narrower_load" case in the convert_ctx_accesses(), the insn->off is aligned, so the cnt may not be 0 because it matches the offsetof(struct __sk_buff, sk) in the bpf_convert_ctx_access. However, the target_size stays 0 and the verifier errors with a kernel warning: verifier bug: error during ctx access conversion(1) This patch fixes that to return a proper "invalid bpf_context access off=X size=Y" error on the load instruction. The same issue affects multiple other fields in context structures that allow narrow access. Some other non-affected fields (for sk_msg, sk_lookup, and sockopt) were also changed to use bpf_ctx_range_ptr for consistency. Note this syzkaller crash was reported in the "Closes" link below, which used to be about a different bug, fixed in commit fce7bd8e385a ("bpf/verifier: Handle BPF_LOAD_ACQ instructions in insn_def_regno()"). Because syzbot somehow confused the two bugs, the new crash and repro didn't get reported to the mailing list. Fixes: f96da09473b52 ("bpf: simplify narrower ctx access") Fixes: 0df1a55afa832 ("bpf: Warn on internal verifier errors") Reported-by: syzbot+0ef84a7bdf5301d4cbec@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0ef84a7bdf5301d4cbec Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/3b8dcee67ff4296903351a974ddd9c4dca768b64.1753194596.git.paul.chaignon@gmail.com
2025-07-23tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exitSteven Rostedt
The ipi tracepoints are mostly generic, but the tracepoints ipi_raise, ipi_entry and ipi_exit are only used by arm and arm64. This means these trace events are wasting memory in all the other architectures that do not use them. Add CONFIG_HAVE_EXTRA_IPI_TRACEPOINTS and have arm and arm64 select it to enable these trace events. The config makes it easy if other architectures decide to trace these as well. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Will Deacon <will@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250722103714.64eba013@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-23Merge branches 'rcu-exp.23.07.2025', 'rcu.22.07.2025', ↵Neeraj Upadhyay (AMD)
'torture-scripts.16.07.2025', 'srcu.19.07.2025', 'rcu.nocb.18.07.2025' and 'refscale.07.07.2025' into rcu.merge.23.07.2025
2025-07-24tracing: probes: Add a kerneldoc for traceprobe_parse_event_name()Masami Hiramatsu (Google)
Since traceprobe_parse_event_name() is a bit complicated, add a kerneldoc for explaining the behavior. Link: https://lore.kernel.org/all/175323430565.57270.2602609519355112748.stgit@devnote2/ Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tracing: uprobe-event: Allocate string buffers from heapMasami Hiramatsu (Google)
Allocate temporary string buffers for parsing uprobe-events from heap instead of stack. Link: https://lore.kernel.org/all/175323429593.57270.12369235525923902341.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tracing: eprobe-event: Allocate string buffers from heapMasami Hiramatsu (Google)
Allocate temporary string buffers for parsing eprobe-events from heap instead of stack. Link: https://lore.kernel.org/all/175323428599.57270.988038042425748956.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tracing: kprobe-event: Allocate string buffers from heapMasami Hiramatsu (Google)
Allocate temporary string buffers for parsing kprobe-events from heap instead of stack. Link: https://lore.kernel.org/all/175323427627.57270.5105357260879695051.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tracing: fprobe-event: Allocate string buffers from heapMasami Hiramatsu (Google)
Allocate temporary string buffers for fprobe-event from heap instead of stack. This fixes the stack frame exceed limit error. Link: https://lore.kernel.org/all/175323426643.57270.6657152008331160704.stgit@devnote2/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506240416.nZIhDXoO-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tracing: probe: Allocate traceprobe_parse_context from heapMasami Hiramatsu (Google)
Instead of allocating traceprobe_parse_context on stack, allocate it dynamically from heap (slab). This change is likely intended to prevent potential stack overflow issues, which can be a concern in the kernel environment where stack space is limited. Link: https://lore.kernel.org/all/175323425650.57270.280750740753792504.stgit@devnote2/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506240416.nZIhDXoO-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-24tracing: probes: Sort #include alphabeticallyMasami Hiramatsu (Google)
Sort the #include directives in trace_probe* files alphabetically for easier maintenance and avoid double includes. This also groups headers as linux-generic, asm-generic, and local headers. Link: https://lore.kernel.org/all/175323424678.57270.11975372127870059007.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-23tracing: Deprecate auto-mounting tracefs in debugfsSteven Rostedt
In January 2015, tracefs was created to allow access to the tracing infrastructure without needing to compile in debugfs. When tracefs is configured, the directory /sys/kernel/tracing will exist and tooling is expected to use that path to access the tracing infrastructure. To allow backward compatibility, when debugfs is mounted, it would automount tracefs in its "tracing" directory so that tooling that had hard coded /sys/kernel/debug/tracing would still work. It has been over 10 years since the new interface was introduced, and all tooling should now be using it. Start the process of deprecating the old path so that it doesn't need to be maintained anymore. A new config is added to allow distributions to disable automounting of tracefs on debugfs. If /sys/kernel/debug/tracing is accessed, a pr_warn() will trigger stating: "NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030" Expect to remove this feature in 5 years (2030). Cc: <linux-trace-users@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/20250722170806.40c068c6@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-23sysctl: rename kern_table -> sysctl_subsys_tableJoel Granados
Renamed sysctl table from kern_table to sysctl_subsys_table and grouped the two arch specific ctls to the end of the array. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.cJoel Granados
Moved ctl_tables elements for overflowuid and overflowgid into in kernel/sys.c. Create a register function that keeps them under "kernel" and run it after core with postcore_initcall. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23uevent: mv uevent_helper into kobject_uevent.cJoel Granados
Move both uevent_helper table into lib/kobject_uevent.c. Place the registration early in the initcall order with postcore_initcall. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23sysctl: Remove superfluous includes from kernel/sysctl.cJoel Granados
Remove the following headers from the include list in sysctl.c. * These are removed as the related variables are no longer there. =================== ==================== Include Related Var =================== ==================== linux/kmod.h usermodehelper asm/nmi.h nmi_watchdoc_enabled asm/io.h io_delay_type linux/pid.h pid_max_{,min,max} linux/sched/sysctl.h sysctl_{sched_*,numa_*,timer_*} linux/mount.h sysctl_mount_max linux/reboot.h poweroff_cmd linux/ratelimit.h {,printk_}ratelimit_state linux/printk.h kptr_restrict linux/security.h CONFIG_SECURITY_CAPABILITIES linux/net.h net_table linux/key.h key_sysctls linux/nvs_fs.h acpi_video_flags linux/acpi.h acpi_video_flags linux/fs.h proc_nr_files * These are no longer needed as intermediate includes ============== Include ============== linux/filter.h linux/binfmts.h Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23sysctl: Remove (very) old file changelogJoel Granados
These comments are older than 2003 and therefore do not bare any relevance on the current state of the sysctl.c file. Remove them as they confuse more than clarify. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.cJoel Granados
This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23sysctl: move cad_pid into kernel/pid.cJoel Granados
Move cad_pid as well as supporting function proc_do_cad_pid into kernel/pic.c. Replaced call to __do_proc_dointvec with proc_dointvec inside proc_do_cad_pid which requires the copy of the ctl_table to handle the temp value. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23sysctl: Move tainted ctl_table into kernel/panic.cJoel Granados
Move the ctl_table with the "tainted" proc_name into kernel/panic.c. With it moves the proc_tainted helper function. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23Input: sysrq: mv sysrq into drivers/tty/sysrq.cJoel Granados
Move both sysrq ctl_table and supported sysrq_sysctl_handler helper function into drivers/tty/sysrq.c. Replaced the __do_proc_dointvec in helper function with do_proc_dointvec_minmax as the former is local to kernel/sysctl.c. Here we use the minmax version of do_proc_dointvec because do_proc_dointvec is static and calling do_proc_dointvec_minmax with a NULL min and max is the same as calling do_proc_dointvec. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23fork: mv threads-max into kernel/fork.cJoel Granados
make sysctl_max_threads static as it no longer needs to be exported into sysctl.c. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23parisc/power: Move soft-power into power.cJoel Granados
Move the soft-power ctl table into parisc/power.c. As a consequence the pwrsw_enabled var is made static. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23mm: move randomize_va_space into memory.cJoel Granados
Move the randomize_va_space variable together with all its sysctl table elements into memory.c. Register it to the "kernel" directory by adding it to the subsys initialization calls This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23rcu: Move rcu_stall related sysctls into rcu/tree_stall.hJoel Granados
Move sysctl_panic_on_rcu_stall and sysctl_max_rcu_stall_to_panic into the kernel/rcu subdirectory. Make these static in tree_stall.h and removed them as extern from panic.h as their scope is now confined into one file. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23locking/rtmutex: Move max_lock_depth into rtmutex.cJoel Granados
Move the max_lock_depth sysctl table element into rtmutex_api.c. Removed the rtmutex.h include from sysctl.c. Chose to move into rtmutex_api.c to avoid multiple registrations every time rtmutex.c is included in other files. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23module: Move modprobe_path and modules_disabled ctl_tables into the module ↵Joel Granados
subsys Move module sysctl (modprobe_path and modules_disabled) out of sysctl.c and into the modules subsystem. Make modules_disabled static as it no longer needs to be exported. Remove module.h from the includes in sysctl as it no longer uses any module exported variables. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-07-23kcsan: test: Initialize dummy variableMarco Elver
Newer compiler versions rightfully point out: kernel/kcsan/kcsan_test.c:591:41: error: variable 'dummy' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer] 591 | KCSAN_EXPECT_READ_BARRIER(atomic_read(&dummy), false); | ^~~~~ 1 error generated. Although this particular test does not care about the value stored in the dummy atomic variable, let's silence the warning. Link: https://lkml.kernel.org/r/CA+G9fYu8JY=k-r0hnBRSkQQrFJ1Bz+ShdXNwC1TNeMt0eXaxeA@mail.gmail.com Fixes: 8bc32b348178 ("kcsan: test: Add test cases for memory barrier instrumentation") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reviewed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com>
2025-07-22tracing: Fix comment in trace_module_remove_events()Steven Rostedt
Fix typo "allocade" -> "allocated". Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250710095628.42ed6b06@batman.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORDSteven Rostedt
Ftrace is tightly coupled with architecture specific code because it requires the use of trampolines written in assembly. This means that when a new feature or optimization is made, it must be done for all architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are added to denote which architecture has the new enhancement so that other architectures can still function until they too have been updated. The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where applicable. Link: https://lore.kernel.org/all/20250703154916.48e3ada7@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250704104838.27a18690@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22tracing: Remove EVENT_FILE_FL_SOFT_MODE flagSteven Rostedt
When soft disabling of trace events was first created, it needed to have a way to know if a file had a user that was using it with soft disabled (for triggers that need to enable or disable events from a context that can not really enable or disable the event, it would set SOFT_DISABLED to state it is disabled). The flag SOFT_MODE was used to denote that an event had a user that would enable or disable it via the SOFT_DISABLED flag. Commit 1cf4c0732db3c ("tracing: Modify soft-mode only if there's no other referrer") fixed a bug where if two users were using the SOFT_DISABLED flag the accounting would get messed up as the SOFT_MODE flag could only handle one user. That commit added the sm_ref counter which kept track of how many users were using the event in "soft mode". This made the SOFT_MODE flag redundant as it should only be set if the sm_ref counter is non zero. Remove the SOFT_MODE flag and just use the sm_ref counter to know the event is in soft mode or not. This makes the code a bit simpler. Link: https://lore.kernel.org/all/20250702111908.03759998@batman.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Link: https://lore.kernel.org/20250702143657.18dd1882@batman.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22tracing: Remove pointless memory barriersNam Cao
Memory barriers are useful to ensure memory accesses from one CPU appear in the original order as seen by other CPUs. Some smp_rmb() and smp_wmb() are used, but they are not ordering multiple memory accesses. Remove them. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626151940.1756398-1-namcao@linutronix.de Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support itSteven Rostedt
ftrace has two flavors: 1) static: Where every function always calls the ftrace trampoline 2) dynamic: Where each function has nops that can be changed on demand to jump to the ftrace trampoline when needed. The static flavor has very high performance overhead and was only created to make it easier for architectures to implement the dynamic flavor. An architecture developer can first implement the static ftrace to make sure the trampolines work before working on the more complicated dynamic aspect of ftrace. Once the architecture can support dynamic ftrace, there's no reason to continue to support the static flavor. In fact, the static flavor tends to bitrot and bugs start to appear in them. Remove the prompt to pick DYNAMIC_FTRACE and simply enable it if the architecture supports it. Link: https://lore.kernel.org/all/f7e12c6d-892e-4ca3-9ef0-fbb524d04a48@ghiti.fr/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: ChenMiao <chenmiao.ku@gmail.com> Link: https://lore.kernel.org/20250703115222.2d7c8cd5@batman.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22fgraph: Keep track of when fgraph_ops are registered or notSteven Rostedt
Add a warning if unregister_ftrace_graph() is called without ever registering it, or if register_ftrace_graph() is called twice. This can detect errors when they happen and not later when there's a side effect: Link: https://lore.kernel.org/all/20250617120830.24fbdd62@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/20250701194451.22e34724@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22ring-buffer: Remove ring_buffer_read_prepare_sync()Steven Rostedt
When the ring buffer was first introduced, reading the non-consuming "trace" file required disabling the writing of the ring buffer. To make sure the writing was fully disabled before iterating the buffer with a non-consuming read, it would set the disable flag of the buffer and then call an RCU synchronization to make sure all the buffers were synchronized. The function ring_buffer_read_start() originally would initialize the iterator and call an RCU synchronization, but this was for each individual per CPU buffer where this would get called many times on a machine with many CPUs before the trace file could be read. The commit 72c9ddfd4c5bf ("ring-buffer: Make non-consuming read less expensive with lots of cpus.") separated ring_buffer_read_start into ring_buffer_read_prepare(), ring_buffer_read_sync() and then ring_buffer_read_start() to allow each of the per CPU buffers to be prepared, call the read_buffer_read_sync() once, and then the ring_buffer_read_start() for each of the CPUs which made things much faster. The commit 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") removed the requirement of disabling the recording of the ring buffer in order to iterate it, but it did not remove the synchronization that was happening that was required to wait for all the buffers to have no more writers. It's now OK for the buffers to have writers and no synchronization is needed. Remove the synchronization and put back the interface for the ring buffer iterator back before commit 72c9ddfd4c5bf was applied. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250630180440.3eabb514@batman.local.home Reported-by: David Howells <dhowells@redhat.com> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-22genirq: Teach handle_simple_irq() to resend an in-progress interruptMarc Zyngier
It appears that the defect outlined in 9c15eeb5362c4 ("genirq: Allow fasteoi handler to resend interrupts on concurrent handling") also affects some other less stellar MSI controllers, this time using the handle_simple_irq() flow. Teach this flow about irqd_needs_resend_when_in_progress(). Given the invasive nature of this workaround, only this flow is updated. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250708173404.1278635-2-maz@kernel.org
2025-07-22genirq: Prevent migration live lock in handle_edge_irq()Thomas Gleixner
Yicon reported and Liangyan debugged a live lock in handle_edge_irq() related to interrupt migration. If the interrupt affinity is moved to a new target CPU and the interrupt is currently handled on the previous target CPU for edge type interrupts the handler might get stuck on the previous target: CPU 0 (previous target) CPU 1 (new target) handle_edge_irq() repeat: handle_event() handle_edge_irq() if (INPROGESS) { set(PENDING); mask(); return; } if (PENDING) { clear(PENDING); unmask(); goto repeat; } The migration in software never completes and CPU0 continues to handle the pending events forever. This happens when the device raises interrupts with a high rate and always before handle_event() completes and before the CPU0 handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and over. This has been observed in virtual machines. Prevent this by checking whether the CPU which observes the INPROGRESS flag is the new affinity target. If that's the case, do not set the PENDING flag and wait for the INPROGRESS flag to be cleared instead, so that the new interrupt is handled on the new target CPU and the previous CPU is released from the action. This is restricted to the edge type handler and only utilized on systems, which use single CPU targets for interrupt affinity. Reported-by: Yicong Shen <shenyicong.1023@bytedance.com> Reported-by: Liangyan <liangyan.peng@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com Link: https://lore.kernel.org/all/20250718185312.076515034@linutronix.de
2025-07-22genirq: Split up irq_pm_check_wakeup()Thomas Gleixner
Let the calling code check for the IRQD_WAKEUP_ARMED flag to prepare for a live lock mitigation in the edge type handler. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Link: https://lore.kernel.org/all/20250718185312.012392426@linutronix.de
2025-07-22genirq: Move irq_wait_for_poll() to call siteThomas Gleixner
Move it to the call site so that the waiting for the INPROGRESS flag can be reused by an upcoming mitigation for a potential live lock in the edge type handler. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/all/20250718185311.948555026@linutronix.de
2025-07-22genirq: Remove pointless local variableThomas Gleixner
The variable is only used at one place, which can simply take the constant as function argument. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Link: https://lore.kernel.org/all/20250718185311.884314473@linutronix.de
2025-07-22timekeeping: Zero initialize system_counterval when querying time from phc ↵Markus Blöchl
drivers Most drivers only populate the fields cycles and cs_id of system_counterval in their get_time_fn() callback for get_device_system_crosststamp(), unless they explicitly provide nanosecond values. When the use_nsecs field was added to struct system_counterval, most drivers did not care. Clock sources other than CSID_GENERIC could then get converted in convert_base_to_cs() based on an uninitialized use_nsecs field, which usually results in -EINVAL during the following range check. Pass in a fully zero initialized system_counterval_t to cure that. Fixes: 6b2e29977518 ("timekeeping: Provide infrastructure for converting to/from a base clock") Signed-off-by: Markus Blöchl <markus@blochl.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250720-timekeeping_uninit_crossts-v2-1-f513c885b7c2@blochl.de
2025-07-22rcu: Document concurrent quiescent state reporting for offline CPUsJoel Fernandes
The synchronization of CPU offlining with GP initialization is confusing to put it mildly (rightfully so as the issue it deals with is complex). Recent discussions brought up a question -- what prevents the rcu_implicit_dyntick_qs() from warning about QS reports for offline CPUs (missing QS reports for offline CPUs causing indefinite hangs). QS reporting for now-offline CPUs should only happen from: - gp_init() - rcutree_cpu_report_dead() Add some documentation on this and refer to it from comments in the code explaining how QS reporting is not missed when these functions are concurrently running. I referred heavily to this post [1] about the need for the ofl_lock. [1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/ [ Applied paulmck feedback on moving documentation to Requirements.rst ] Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/ Co-developed-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>