Age | Commit message (Collapse) | Author |
|
Print lazy preemption model in ftrace header when latency-format=1.
# cat /sys/kernel/debug/sched/preempt
none voluntary full (lazy)
Without patch:
latency: 0 us, #232946/232946, CPU#40 | (M:unknown VP:0, KP:0, SP:0 HP:0 #P:80)
^^^^^^^
With Patch:
latency: 0 us, #1897938/25566788, CPU#16 | (M:lazy VP:0, KP:0, SP:0 HP:0 #P:80)
^^^^
Now that lazy preemption is part of the kernel, make sure the tracing
infrastructure reflects that.
Link: https://lore.kernel.org/20250103093647.575919-1-sshegde@linux.ibm.com
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The function graph tracer has become generic so that kretprobes and BPF
can use it along with function graph tracing itself. Some of the
infrastructure was specific for function graph tracing such as recording
the calltime and return time of the functions. Calling the clock code on a
high volume function does add overhead. The calculation of the calltime
was removed from the generic code and placed into the function graph
tracer itself so that the other users did not incur this overhead as they
did not need that timestamp.
The calltime field was still kept in the generic return entry structure
and the function graph return entry callback filled it as that structure
was passed to other code.
But this broke both irqsoff and wakeup latency tracer as they still
depended on the trace structure containing the calltime when the option
display-graph is set as it used some of those same functions that the
function graph tracer used. But now the calltime was not set and was just
zero. This caused the calculation of the function time to be the absolute
value of the return timestamp and not the length of the function.
# cd /sys/kernel/tracing
# echo 1 > options/display-graph
# echo irqsoff > current_tracer
The tracers went from:
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
0 us | 4) <idle>-0 | d..1. | 0.000 us | irqentry_enter();
3 us | 4) <idle>-0 | d..2. | | irq_enter_rcu() {
4 us | 4) <idle>-0 | d..2. | 0.431 us | preempt_count_add();
5 us | 4) <idle>-0 | d.h2. | | tick_irq_enter() {
5 us | 4) <idle>-0 | d.h2. | 0.433 us | tick_check_oneshot_broadcast_this_cpu();
6 us | 4) <idle>-0 | d.h2. | 2.426 us | ktime_get();
9 us | 4) <idle>-0 | d.h2. | | tick_nohz_stop_idle() {
10 us | 4) <idle>-0 | d.h2. | 0.398 us | nr_iowait_cpu();
11 us | 4) <idle>-0 | d.h1. | 1.903 us | }
11 us | 4) <idle>-0 | d.h2. | | tick_do_update_jiffies64() {
12 us | 4) <idle>-0 | d.h2. | | _raw_spin_lock() {
12 us | 4) <idle>-0 | d.h2. | 0.360 us | preempt_count_add();
13 us | 4) <idle>-0 | d.h3. | 0.354 us | do_raw_spin_lock();
14 us | 4) <idle>-0 | d.h2. | 2.207 us | }
15 us | 4) <idle>-0 | d.h3. | 0.428 us | calc_global_load();
16 us | 4) <idle>-0 | d.h3. | | _raw_spin_unlock() {
16 us | 4) <idle>-0 | d.h3. | 0.380 us | do_raw_spin_unlock();
17 us | 4) <idle>-0 | d.h3. | 0.334 us | preempt_count_sub();
18 us | 4) <idle>-0 | d.h1. | 1.768 us | }
18 us | 4) <idle>-0 | d.h2. | | update_wall_time() {
[..]
To:
# REL TIME CPU TASK/PID |||| DURATION FUNCTION CALLS
# | | | | |||| | | | | | |
0 us | 5) <idle>-0 | d.s2. | 0.000 us | _raw_spin_lock_irqsave();
0 us | 5) <idle>-0 | d.s3. | 312159583 us | preempt_count_add();
2 us | 5) <idle>-0 | d.s4. | 312159585 us | do_raw_spin_lock();
3 us | 5) <idle>-0 | d.s4. | | _raw_spin_unlock() {
3 us | 5) <idle>-0 | d.s4. | 312159586 us | do_raw_spin_unlock();
4 us | 5) <idle>-0 | d.s4. | 312159587 us | preempt_count_sub();
4 us | 5) <idle>-0 | d.s2. | 312159587 us | }
5 us | 5) <idle>-0 | d.s3. | | _raw_spin_lock() {
5 us | 5) <idle>-0 | d.s3. | 312159588 us | preempt_count_add();
6 us | 5) <idle>-0 | d.s4. | 312159589 us | do_raw_spin_lock();
7 us | 5) <idle>-0 | d.s3. | 312159590 us | }
8 us | 5) <idle>-0 | d.s4. | 312159591 us | calc_wheel_index();
9 us | 5) <idle>-0 | d.s4. | | enqueue_timer() {
9 us | 5) <idle>-0 | d.s4. | | wake_up_nohz_cpu() {
11 us | 5) <idle>-0 | d.s4. | | native_smp_send_reschedule() {
11 us | 5) <idle>-0 | d.s4. | 312171987 us | default_send_IPI_single_phys();
12408 us | 5) <idle>-0 | d.s3. | 312171990 us | }
12408 us | 5) <idle>-0 | d.s3. | 312171991 us | }
12409 us | 5) <idle>-0 | d.s3. | 312171991 us | }
Where the calculation of the time for each function was the return time
minus zero and not the time of when the function returned.
Have these tracers also save the calltime in the fgraph data section and
retrieve it again on the return to get the correct timings again.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250113183124.61767419@gandalf.local.home
Fixes: f1f36e22bee9 ("ftrace: Have calltime be saved in the fgraph storage")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In __trace_kprobe_create(), if something fails it must goto error block
to free objects. But when strdup() a symbol, it returns without that.
Fix it to goto the error block to free objects correctly.
Link: https://lore.kernel.org/all/173643297743.1514810.2408159540454241947.stgit@devnote2/
Fixes: 6212dd29683e ("tracing/kprobes: Use dyn_event framework for kprobe events")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
- Add needed READ_ONCE() around access to the fgraph array element
The updates to the fgraph array can happen when callbacks are
registered and unregistered. The __ftrace_return_to_handler() can
handle reading either the old value or the new value. But once it
reads that value it must stay consistent otherwise the check that
looks to see if the value is a stub may show false, but if the
compiler decides to re-read after that check, it can be true which
can cause the code to crash later on.
- Make function profiler use the top level ops for filtering again
When function graph became available for instances, its filter ops
became independent from the top level set_ftrace_filter. In the
process the function profiler received its own filter ops as well.
But the function profiler uses the top level set_ftrace_filter file
and does not have one of its own. In giving it its own filter ops, it
lost any user interface it once had. Make it use the top level
set_ftrace_filter file again. This fixes a regression.
* tag 'ftrace-v6.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Fix function profiler's filtering functionality
fgraph: Add READ_ONCE() when accessing fgraph_array[]
|
|
Commit c132be2c4fcc ("function_graph: Have the instances use their own
ftrace_ops for filtering"), function profiler (enabled via
function_profile_enabled) has been showing statistics for all functions,
ignoring set_ftrace_filter settings.
While tracers are instantiated, the function profiler is not. Therefore, it
should use the global set_ftrace_filter for consistency. This patch
modifies the function profiler to use the global filter, fixing the
filtering functionality.
Before (filtering not working):
```
root@localhost:~# echo 'vfs*' > /sys/kernel/tracing/set_ftrace_filter
root@localhost:~# echo 1 > /sys/kernel/tracing/function_profile_enabled
root@localhost:~# sleep 1
root@localhost:~# echo 0 > /sys/kernel/tracing/function_profile_enabled
root@localhost:~# head /sys/kernel/tracing/trace_stat/*
Function Hit Time Avg
s^2
-------- --- ---- ---
---
schedule 314 22290594 us 70989.15 us
40372231 us
x64_sys_call 1527 8762510 us 5738.382 us
3414354 us
schedule_hrtimeout_range 176 8665356 us 49234.98 us
405618876 us
__x64_sys_ppoll 324 5656635 us 17458.75 us
19203976 us
do_sys_poll 324 5653747 us 17449.83 us
19214945 us
schedule_timeout 67 5531396 us 82558.15 us
2136740827 us
__x64_sys_pselect6 12 3029540 us 252461.7 us
63296940171 us
do_pselect.constprop.0 12 3029532 us 252461.0 us
63296952931 us
```
After (filtering working):
```
root@localhost:~# echo 'vfs*' > /sys/kernel/tracing/set_ftrace_filter
root@localhost:~# echo 1 > /sys/kernel/tracing/function_profile_enabled
root@localhost:~# sleep 1
root@localhost:~# echo 0 > /sys/kernel/tracing/function_profile_enabled
root@localhost:~# head /sys/kernel/tracing/trace_stat/*
Function Hit Time Avg
s^2
-------- --- ---- ---
---
vfs_write 462 68476.43 us 148.217 us
25874.48 us
vfs_read 641 9611.356 us 14.994 us
28868.07 us
vfs_fstat 890 878.094 us 0.986 us
1.667 us
vfs_fstatat 227 757.176 us 3.335 us
18.928 us
vfs_statx 226 610.610 us 2.701 us
17.749 us
vfs_getattr_nosec 1187 460.919 us 0.388 us
0.326 us
vfs_statx_path 297 343.287 us 1.155 us
11.116 us
vfs_rename 6 291.575 us 48.595 us
9889.236 us
```
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250101190820.72534-1-enjuk@amazon.com
Fixes: c132be2c4fcc ("function_graph: Have the instances use their own ftrace_ops for filtering")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In __ftrace_return_to_handler(), a loop iterates over the fgraph_array[]
elements, which are fgraph_ops. The loop checks if an element is a
fgraph_stub to prevent using a fgraph_stub afterward.
However, if the compiler reloads fgraph_array[] after this check, it might
race with an update to fgraph_array[] that introduces a fgraph_stub. This
could result in the stub being processed, but the stub contains a null
"func_hash" field, leading to a NULL pointer dereference.
To ensure that the gops compared against the fgraph_stub matches the gops
processed later, add a READ_ONCE(). A similar patch appears in commit
63a8dfb ("function_graph: Add READ_ONCE() when accessing fgraph_array[]").
Cc: stable@vger.kernel.org
Fixes: 37238abe3cb47 ("ftrace/function_graph: Pass fgraph_ops to function graph callbacks")
Link: https://lore.kernel.org/20241231113731.277668-1-zilin@seu.edu.cn
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In order to catch a common bug where a TRACE_EVENT() TP_fast_assign()
assigns an address of an allocated string to the ring buffer and then
references it in TP_printk(), which can be executed hours later when the
string is free, the function test_event_printk() runs on all events as
they are registered to make sure there's no unwanted dereferencing.
It calls process_string() to handle cases in TP_printk() format that has
"%s". It returns whether or not the string is safe. But it can have some
false positives.
For instance, xe_bo_move() has:
TP_printk("move_lacks_source:%s, migrate object %p [size %zu] from %s to %s device_id:%s",
__entry->move_lacks_source ? "yes" : "no", __entry->bo, __entry->size,
xe_mem_type_to_name[__entry->old_placement],
xe_mem_type_to_name[__entry->new_placement], __get_str(device_id))
Where the "%s" references into xe_mem_type_to_name[]. This is an array of
pointers that should be safe for the event to access. Instead of flagging
this as a bad reference, if a reference points to an array, where the
record field is the index, consider it safe.
Link: https://lore.kernel.org/all/9dee19b6185d325d0e6fa5f7cbba81d007d99166.camel@sapience.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241231000646.324fb5f7@gandalf.local.home
Fixes: 65a25d9f7ac02 ("tracing: Add "%s" check in test_event_printk()")
Reported-by: Genes Lists <lists@sapience.com>
Tested-by: Gene C <arch@sapience.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
"Change the priority of the module callback of kprobe events so that it
is called after the jump label list on the module is updated.
This ensures the kprobe can check whether it is not on the jump label
address correctly"
* tag 'probes-fixes-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/kprobe: Make trace_kprobe's module callback called after jump_label update
|
|
If a large count is provided, it will trigger a warning in bitmap_parse_user.
Also check zero for it.
Cc: stable@vger.kernel.org
Fixes: 9e01c1b74c953 ("cpumask: convert kernel trace functions")
Link: https://lore.kernel.org/20241216073238.2573704-1-lizhi.xu@windriver.com
Reported-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0aecfd34fb878546f3fd
Tested-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
update
Make sure the trace_kprobe's module notifer callback function is called
after jump_label's callback is called. Since the trace_kprobe's callback
eventually checks jump_label address during registering new kprobe on
the loading module, jump_label must be updated before this registration
happens.
Link: https://lore.kernel.org/all/173387585556.995044.3157941002975446119.stgit@devnote2/
Fixes: 614243181050 ("tracing/kprobes: Support module init function probing")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer fixes from Steven Rostedt:
- Fix possible overflow of mmapped ring buffer with bad offset
If the mmap() to the ring buffer passes in a start address that is
passed the end of the mmapped file, it is not caught and a
slab-out-of-bounds is triggered.
Add a check to make sure the start address is within the bounds
- Do not use TP_printk() to boot mapped ring buffers
As a boot mapped ring buffer's data may have pointers that map to the
previous boot's memory map, it is unsafe to allow the TP_printk() to
be used to read the boot mapped buffer's events. If a TP_printk()
points to a static string from within the kernel it will not match
the current kernel mapping if KASLR is active, and it can fault.
Have it simply print out the raw fields.
* tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
trace/ring-buffer: Do not use TP_printk() formatting for boot mapped buffers
ring-buffer: Fix overflow in __rb_map_vma
|
|
The TP_printk() of a TRACE_EVENT() is a generic printf format that any
developer can create for their event. It may include pointers to strings
and such. A boot mapped buffer may contain data from a previous kernel
where the strings addresses are different.
One solution is to copy the event content and update the pointers by the
recorded delta, but a simpler solution (for now) is to just use the
print_fields() function to print these events. The print_fields() function
just iterates the fields and prints them according to what type they are,
and ignores the TP_printk() format from the event itself.
To understand the difference, when printing via TP_printk() the output
looks like this:
4582.696626: kmem_cache_alloc: call_site=getname_flags+0x47/0x1f0 ptr=00000000e70e10e0 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL node=-1 accounted=false
4582.696629: kmem_cache_alloc: call_site=alloc_empty_file+0x6b/0x110 ptr=0000000095808002 bytes_req=360 bytes_alloc=384 gfp_flags=GFP_KERNEL node=-1 accounted=false
4582.696630: kmem_cache_alloc: call_site=security_file_alloc+0x24/0x100 ptr=00000000576339c3 bytes_req=16 bytes_alloc=16 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false
4582.696653: kmem_cache_free: call_site=do_sys_openat2+0xa7/0xd0 ptr=00000000e70e10e0 name=names_cache
But when printing via print_fields() (echo 1 > /sys/kernel/tracing/options/fields)
the same event output looks like this:
4582.696626: kmem_cache_alloc: call_site=0xffffffff92d10d97 (-1831793257) ptr=0xffff9e0e8571e000 (-107689771147264) bytes_req=0x1000 (4096) bytes_alloc=0x1000 (4096) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0)
4582.696629: kmem_cache_alloc: call_site=0xffffffff92d0250b (-1831852789) ptr=0xffff9e0e8577f800 (-107689770747904) bytes_req=0x168 (360) bytes_alloc=0x180 (384) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0)
4582.696630: kmem_cache_alloc: call_site=0xffffffff92efca74 (-1829778828) ptr=0xffff9e0e8d35d3b0 (-107689640864848) bytes_req=0x10 (16) bytes_alloc=0x10 (16) gfp_flags=0xdc0 (3520) node=0xffffffff (-1) accounted=(0)
4582.696653: kmem_cache_free: call_site=0xffffffff92cfbea7 (-1831879001) ptr=0xffff9e0e8571e000 (-107689771147264) name=names_cache
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241218141507.28389a1d@gandalf.local.home
Fixes: 07714b4bb3f98 ("tracing: Handle old buffer mappings for event strings and functions")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
An overflow occurred when performing the following calculation:
nr_pages = ((nr_subbufs + 1) << subbuf_order) - pgoff;
Add a check before the calculation to avoid this problem.
syzbot reported this as a slab-out-of-bounds in __rb_map_vma:
BUG: KASAN: slab-out-of-bounds in __rb_map_vma+0x9ab/0xae0 kernel/trace/ring_buffer.c:7058
Read of size 8 at addr ffff8880767dd2b8 by task syz-executor187/5836
CPU: 0 UID: 0 PID: 5836 Comm: syz-executor187 Not tainted 6.13.0-rc2-syzkaller-00159-gf932fb9b4074 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/25/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:489
kasan_report+0xd9/0x110 mm/kasan/report.c:602
__rb_map_vma+0x9ab/0xae0 kernel/trace/ring_buffer.c:7058
ring_buffer_map+0x56e/0x9b0 kernel/trace/ring_buffer.c:7138
tracing_buffers_mmap+0xa6/0x120 kernel/trace/trace.c:8482
call_mmap include/linux/fs.h:2183 [inline]
mmap_file mm/internal.h:124 [inline]
__mmap_new_file_vma mm/vma.c:2291 [inline]
__mmap_new_vma mm/vma.c:2355 [inline]
__mmap_region+0x1786/0x2670 mm/vma.c:2456
mmap_region+0x127/0x320 mm/mmap.c:1348
do_mmap+0xc00/0xfc0 mm/mmap.c:496
vm_mmap_pgoff+0x1ba/0x360 mm/util.c:580
ksys_mmap_pgoff+0x32c/0x5c0 mm/mmap.c:542
__do_sys_mmap arch/x86/kernel/sys_x86_64.c:89 [inline]
__se_sys_mmap arch/x86/kernel/sys_x86_64.c:82 [inline]
__x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:82
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The reproducer for this bug is:
------------------------8<-------------------------
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <asm/types.h>
#include <sys/mman.h>
int main(int argc, char **argv)
{
int page_size = getpagesize();
int fd;
void *meta;
system("echo 1 > /sys/kernel/tracing/buffer_size_kb");
fd = open("/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw", O_RDONLY);
meta = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, page_size * 5);
}
------------------------>8-------------------------
Cc: stable@vger.kernel.org
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions")
Link: https://lore.kernel.org/tencent_06924B6674ED771167C23CC336C097223609@qq.com
Reported-by: syzbot+345e4443a21200874b18@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=345e4443a21200874b18
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Replace trace_check_vprintf() with test_event_printk() and
ignore_event()
The function test_event_printk() checks on boot up if the trace event
printf() formats dereference any pointers, and if they do, it then
looks at the arguments to make sure that the pointers they dereference
will exist in the event on the ring buffer. If they do not, it issues
a WARN_ON() as it is a likely bug.
But this isn't the case for the strings that can be dereferenced with
"%s", as some trace events (notably RCU and some IPI events) save a
pointer to a static string in the ring buffer. As the string it points
to lives as long as the kernel is running, it is not a bug to
reference it, as it is guaranteed to be there when the event is read.
But it is also possible (and a common bug) to point to some allocated
string that could be freed before the trace event is read and the
dereference is to bad memory. This case requires a run time check.
The previous way to handle this was with trace_check_vprintf() that
would process the printf format piece by piece and send what it didn't
care about to vsnprintf() to handle arguments that were not strings.
This kept it from having to reimplement vsnprintf(). But it relied on
va_list implementation and for architectures that copied the va_list
and did not pass it by reference, it wasn't even possible to do this
check and it would be skipped. As 64bit x86 passed va_list by
reference, most events were tested and this kept out bugs where
strings would have been dereferenced after being freed.
Instead of relying on the implementation of va_list, extend the boot
up test_event_printk() function to validate all the "%s" strings that
can be validated at boot, and for the few events that point to strings
outside the ring buffer, flag both the event and the field that is
dereferenced as "needs_test". Then before the event is printed, a call
to ignore_event() is made, and if the event has the flag set, it
iterates all its fields and for every field that is to be tested, it
will read the pointer directly from the event in the ring buffer and
make sure that it is valid. If the pointer is not valid, it will print
a WARN_ON(), print out to the trace that the event has unsafe memory
and ignore the print format.
With this new update, the trace_check_vprintf() can be safely removed
and now all events can be verified regardless of architecture"
* tag 'trace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Check "%s" dereference via the field and not the TP_printk format
tracing: Add "%s" check in test_event_printk()
tracing: Add missing helper functions in event pointer dereference check
tracing: Fix test_event_printk() to process entire print argument
|
|
The TP_printk() portion of a trace event is executed at the time a event
is read from the trace. This can happen seconds, minutes, hours, days,
months, years possibly later since the event was recorded. If the print
format contains a dereference to a string via "%s", and that string was
allocated, there's a chance that string could be freed before it is read
by the trace file.
To protect against such bugs, there are two functions that verify the
event. The first one is test_event_printk(), which is called when the
event is created. It reads the TP_printk() format as well as its arguments
to make sure nothing may be dereferencing a pointer that was not copied
into the ring buffer along with the event. If it is, it will trigger a
WARN_ON().
For strings that use "%s", it is not so easy. The string may not reside in
the ring buffer but may still be valid. Strings that are static and part
of the kernel proper which will not be freed for the life of the running
system, are safe to dereference. But to know if it is a pointer to a
static string or to something on the heap can not be determined until the
event is triggered.
This brings us to the second function that tests for the bad dereferencing
of strings, trace_check_vprintf(). It would walk through the printf format
looking for "%s", and when it finds it, it would validate that the pointer
is safe to read. If not, it would produces a WARN_ON() as well and write
into the ring buffer "[UNSAFE-MEMORY]".
The problem with this is how it used va_list to have vsnprintf() handle
all the cases that it didn't need to check. Instead of re-implementing
vsnprintf(), it would make a copy of the format up to the %s part, and
call vsnprintf() with the current va_list ap variable, where the ap would
then be ready to point at the string in question.
For architectures that passed va_list by reference this was possible. For
architectures that passed it by copy it was not. A test_can_verify()
function was used to differentiate between the two, and if it wasn't
possible, it would disable it.
Even for architectures where this was feasible, it was a stretch to rely
on such a method that is undocumented, and could cause issues later on
with new optimizations of the compiler.
Instead, the first function test_event_printk() was updated to look at
"%s" as well. If the "%s" argument is a pointer outside the event in the
ring buffer, it would find the field type of the event that is the problem
and mark the structure with a new flag called "needs_test". The event
itself will be marked by TRACE_EVENT_FL_TEST_STR to let it be known that
this event has a field that needs to be verified before the event can be
printed using the printf format.
When the event fields are created from the field type structure, the
fields would copy the field type's "needs_test" value.
Finally, before being printed, a new function ignore_event() is called
which will check if the event has the TEST_STR flag set (if not, it
returns false). If the flag is set, it then iterates through the events
fields looking for the ones that have the "needs_test" flag set.
Then it uses the offset field from the field structure to find the pointer
in the ring buffer event. It runs the tests to make sure that pointer is
safe to print and if not, it triggers the WARN_ON() and also adds to the
trace output that the event in question has an unsafe memory access.
The ignore_event() makes the trace_check_vprintf() obsolete so it is
removed.
Link: https://lore.kernel.org/all/CAHk-=wh3uOnqnZPpR0PeLZZtyWbZLboZ7cHLCKRWsocvs9Y7hQ@mail.gmail.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.848621576@goodmis.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The test_event_printk() code makes sure that when a trace event is
registered, any dereferenced pointers in from the event's TP_printk() are
pointing to content in the ring buffer. But currently it does not handle
"%s", as there's cases where the string pointer saved in the ring buffer
points to a static string in the kernel that will never be freed. As that
is a valid case, the pointer needs to be checked at runtime.
Currently the runtime check is done via trace_check_vprintf(), but to not
have to replicate everything in vsnprintf() it does some logic with the
va_list that may not be reliable across architectures. In order to get rid
of that logic, more work in the test_event_printk() needs to be done. Some
of the strings can be validated at this time when it is obvious the string
is valid because the string will be saved in the ring buffer content.
Do all the validation of strings in the ring buffer at boot in
test_event_printk(), and make sure that the field of the strings that
point into the kernel are accessible. This will allow adding checks at
runtime that will validate the fields themselves and not rely on paring
the TP_printk() format at runtime.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.685917008@goodmis.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The process_pointer() helper function looks to see if various trace event
macros are used. These macros are for storing data in the event. This
makes it safe to dereference as the dereference will then point into the
event on the ring buffer where the content of the data stays with the
event itself.
A few helper functions were missing. Those were:
__get_rel_dynamic_array()
__get_dynamic_array_len()
__get_rel_dynamic_array_len()
__get_rel_sockaddr()
Also add a helper function find_print_string() to not need to use a middle
man variable to test if the string exists.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.521836792@goodmis.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The test_event_printk() analyzes print formats of trace events looking for
cases where it may dereference a pointer that is not in the ring buffer
which can possibly be a bug when the trace event is read from the ring
buffer and the content of that pointer no longer exists.
The function needs to accurately go from one print format argument to the
next. It handles quotes and parenthesis that may be included in an
argument. When it finds the start of the next argument, it uses a simple
"c = strstr(fmt + i, ',')" to find the end of that argument!
In order to include "%s" dereferencing, it needs to process the entire
content of the print format argument and not just the content of the first
',' it finds. As there may be content like:
({ const char *saved_ptr = trace_seq_buffer_ptr(p); static const char
*access_str[] = { "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux"
}; union kvm_mmu_page_role role; role.word = REC->role;
trace_seq_printf(p, "sp gen %u gfn %llx l%u %u-byte q%u%s %s%s" " %snxe
%sad root %u %s%c", REC->mmu_valid_gen, REC->gfn, role.level,
role.has_4_byte_gpte ? 4 : 8, role.quadrant, role.direct ? " direct" : "",
access_str[role.access], role.invalid ? " invalid" : "", role.efer_nx ? ""
: "!", role.ad_disabled ? "!" : "", REC->root_count, REC->unsync ?
"unsync" : "sync", 0); saved_ptr; })
Which is an example of a full argument of an existing event. As the code
already handles finding the next print format argument, process the
argument at the end of it and not the start of it. This way it has both
the start of the argument as well as the end of it.
Add a helper function "process_pointer()" that will do the processing during
the loop as well as at the end. It also makes the code cleaner and easier
to read.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.362271189@goodmis.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When function tracing and function graph tracing are both enabled (in
different instances) the "parent" of some of the function tracing events
is "return_to_handler" which is the trampoline used by function graph
tracing. To fix this, ftrace_get_true_parent_ip() was introduced that
returns the "true" parent ip instead of the trampoline.
To do this, the ftrace_regs_get_stack_pointer() is used, which uses
kernel_stack_pointer(). The problem is that microblaze does not implement
kerenl_stack_pointer() so when function graph tracing is enabled, the
build fails. But microblaze also does not enabled HAVE_DYNAMIC_FTRACE_WITH_ARGS.
That option has to be enabled by the architecture to reliably get the
values from the fregs parameter passed in. When that config is not set,
the architecture can also pass in NULL, which is not tested for in that
function and could cause the kernel to crash.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jeff Xie <jeff.xie@linux.dev>
Link: https://lore.kernel.org/20241216164633.6df18e87@gandalf.local.home
Fixes: 60b1f578b578 ("ftrace: Get the true parent ip for function tracer")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
A bug was discovered where the idle shadow stacks were not initialized
for offline CPUs when starting function graph tracer, and when they came
online they were not traced due to the missing shadow stack. To fix
this, the idle task shadow stack initialization was moved to using the
CPU hotplug callbacks. But it removed the initialization when the
function graph was enabled. The problem here is that the hotplug
callbacks are called when the CPUs come online, but the idle shadow
stack initialization only happens if function graph is currently
active. This caused the online CPUs to not get their shadow stack
initialized.
The idle shadow stack initialization still needs to be done when the
function graph is registered, as they will not be allocated if function
graph is not registered.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241211135335.094ba282@batman.local.home
Fixes: 2c02f7375e65 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks")
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Closes: https://lore.kernel.org/all/CACRpkdaTBrHwRbbrphVy-=SeDz6MSsXhTKypOtLrTQ+DgGAOcQ@mail.gmail.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix a bug in the BPF verifier to track changes to packet data
property for global functions (Eduard Zingerman)
- Fix a theoretical BPF prog_array use-after-free in RCU handling of
__uprobe_perf_func (Jann Horn)
- Fix BPF tracing to have an explicit list of tracepoints and their
arguments which need to be annotated as PTR_MAYBE_NULL (Kumar
Kartikeya Dwivedi)
- Fix a logic bug in the bpf_remove_insns code where a potential error
would have been wrongly propagated (Anton Protopopov)
- Avoid deadlock scenarios caused by nested kprobe and fentry BPF
programs (Priya Bala Govindasamy)
- Fix a bug in BPF verifier which was missing a size check for
BTF-based context access (Kumar Kartikeya Dwivedi)
- Fix a crash found by syzbot through an invalid BPF prog_array access
in perf_event_detach_bpf_prog (Jiri Olsa)
- Fix several BPF sockmap bugs including a race causing a refcount
imbalance upon element replace (Michal Luczaj)
- Fix a use-after-free from mismatching BPF program/attachment RCU
flavors (Jann Horn)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
selftests/bpf: Add tests for raw_tp NULL args
bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
selftests/bpf: Add test for narrow ctx load for pointer args
bpf: Check size for BTF-based ctx access of pointer members
selftests/bpf: extend changes_pkt_data with cases w/o subprograms
bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs
bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
bpf: fix potential error return
selftests/bpf: validate that tail call invalidates packet pointers
bpf: consider that tail calls invalidate packet pointers
selftests/bpf: freplace tests for tracking of changes_packet_data
bpf: check changes_pkt_data property for extension programs
selftests/bpf: test for changing packet data from global functions
bpf: track changes_pkt_data property for global functions
bpf: refactor bpf_helper_changes_pkt_data to use helper number
bpf: add find_containing_subprog() utility function
bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull eprobes fix from Masami Hiramatsu:
- release eprobe when failing to add dyn_event.
This unregisters event call and release eprobe when it fails to add a
dynamic event. Found in cleaning up.
* tag 'probes-fixes-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/eprobe: Fix to release eprobe when failed to add dyn_event
|
|
Currently, the pointer stored in call->prog_array is loaded in
__uprobe_perf_func(), with no RCU annotation and no immediately visible
RCU protection, so it looks as if the loaded pointer can immediately be
dangling.
Later, bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical
section, but this is too late. It then uses rcu_dereference_check(), but
this use of rcu_dereference_check() does not actually dereference anything.
Fix it by aligning the semantics to bpf_prog_run_array(): Let the caller
provide rcu_read_lock_trace() protection and then load call->prog_array
with rcu_dereference_check().
This issue seems to be theoretical: I don't know of any way to reach this
code without having handle_swbp() further up the stack, which is already
holding a rcu_read_lock_trace() lock, so where we take
rcu_read_lock_trace() in __uprobe_perf_func()/bpf_prog_run_array_uprobe()
doesn't actually have any effect.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241210-bpf-fix-uprobe-uaf-v4-1-5fc8959b2b74@google.com
|
|
Syzbot reported [1] crash that happens for following tracing scenario:
- create tracepoint perf event with attr.inherit=1, attach it to the
process and set bpf program to it
- attached process forks -> chid creates inherited event
the new child event shares the parent's bpf program and tp_event
(hence prog_array) which is global for tracepoint
- exit both process and its child -> release both events
- first perf_event_detach_bpf_prog call will release tp_event->prog_array
and second perf_event_detach_bpf_prog will crash, because
tp_event->prog_array is NULL
The fix makes sure the perf_event_detach_bpf_prog checks prog_array
is valid before it tries to remove the bpf program from it.
[1] https://lore.kernel.org/bpf/Z1MR6dCIKajNS6nU@krava/T/#m91dbf0688221ec7a7fc95e896a7ef9ff93b0b8ad
Fixes: 0ee288e69d03 ("bpf,perf: Fix perf_event_detach_bpf_prog error handling")
Reported-by: syzbot+2e0d2840414ce817aaac@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241208142507.1207698-1-jolsa@kernel.org
|
|
Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU
protection. But it is possible to attach a non-sleepable BPF program to a
uprobe, and non-sleepable BPF programs are freed via normal RCU (see
__bpf_prog_put_noref()). This leads to UAF of the bpf_prog because a normal
RCU grace period does not imply a tasks-trace-RCU grace period.
Fix it by explicitly waiting for a tasks-trace-RCU grace period after
removing the attachment of a bpf_prog to a perf_event.
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20241210-bpf-fix-actual-uprobe-uaf-v1-1-19439849dd44@google.com
|
|
Fix eprobe event to unregister event call and release eprobe when it fails
to add dynamic event correctly.
Link: https://lore.kernel.org/all/173289886698.73724.1959899350183686006.stgit@devnote2/
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Tracepoints require having RCU "watching" as it uses RCU to do updates to
the tracepoints. There are some cases that would call a tracepoint when
RCU was not "watching". This was usually in the idle path where RCU has
"shutdown". For the few locations that had tracepoints without RCU
watching, there was an trace_*_rcuidle() variant that could be used. This
used SRCU for protection.
There are tracepoints that trace when interrupts and preemption are
enabled and disabled. In some architectures, these tracepoints are called
in a path where RCU is not watching. When x86 and arm64 removed these
locations, it was incorrectly assumed that it would be safe to remove the
trace_*_rcuidle() variant and also remove the SRCU logic, as it made the
code more complex and harder to implement new tracepoint features (like
faultable tracepoints and tracepoints in rust).
Instead of bringing back the trace_*_rcuidle(), as it will not be trivial
to do as new code has already been added depending on its removal, add a
workaround to the one file that still requires it (trace_preemptirq.c). If
the architecture does not define CONFIG_ARCH_WANTS_NO_INSTR, then check if
the code is in the idle path, and if so, call ct_irq_enter/exit() which
will enable RCU around the tracepoint.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20241204100414.4d3e06d0@gandalf.local.home
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: 48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()")
Closes: https://lore.kernel.org/all/bddb02de-957a-4df5-8e77-829f55728ea2@roeck-us.net/
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The cmp_entries_dup() function used as the comparator for sort()
violated the symmetry and transitivity properties required by the
sorting algorithm. Specifically, it returned 1 whenever memcmp() was
non-zero, which broke the following expectations:
* Symmetry: If x < y, then y > x.
* Transitivity: If x < y and y < z, then x < z.
These violations could lead to incorrect sorting and failure to
correctly identify duplicate elements.
Fix the issue by directly returning the result of memcmp(), which
adheres to the required comparison properties.
Cc: stable@vger.kernel.org
Fixes: 08d43a5fa063 ("tracing: Add lock-free tracing_map")
Link: https://lore.kernel.org/20241203202228.1274403-1-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The scheduler added NEED_RESCHED_LAZY scheduling. Record this state as
part of trace flags and expose it in the need_resched field.
Record and expose NEED_RESCHED_LAZY.
[bigeasy: Commit description, documentation bits.]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241122202849.7DfYpJR0@linutronix.de
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Addition of faultable tracepoints
There's a tracepoint attached to both a system call entry and exit.
This location is known to allow page faults. The tracepoints are
called under an rcu_read_lock() which does not allow faults that can
sleep. This limits the ability of tracepoint handlers to page fault
in user space system call parameters. Now these tracepoints have been
made "faultable", allowing the callbacks to fault in user space
parameters and record them.
Note, only the infrastructure has been implemented. The consumers
(perf, ftrace, BPF) now need to have their code modified to allow
faults.
- Fix up of BPF code for the tracepoint faultable logic
- Update tracepoints to use the new static branch API
- Remove trace_*_rcuidle() variants and the SRCU protection they used
- Remove unused TRACE_EVENT_FL_FILTERED logic
- Replace strncpy() with strscpy() and memcpy()
- Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr()
- Fix perf events to not duplicate samples when tracing is enabled
- Replace atomic64_add_return(1, counter) with
atomic64_inc_return(counter)
- Make stack trace buffer 4K instead of PAGE_SIZE
- Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used
- Get the true return address for function tracer when function graph
tracer is also running.
When function_graph trace is running along with function tracer, the
parent function of the function tracer sometimes is
"return_to_handler", which is the function graph trampoline to record
the exit of the function. Use existing logic that calls into the
fgraph infrastructure to find the real return address.
- Remove (un)regfunc pointers out of tracepoint structure
- Added last minute bug fix for setting pending modules in stack
function filter.
echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter
Would cause a kernel NULL dereference.
- Minor clean ups
* tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
ftrace: Fix regression with module command in stack_trace_filter
tracing: Fix function name for trampoline
ftrace: Get the true parent ip for function tracer
tracing: Remove redundant check on field->field in histograms
bpf: ensure RCU Tasks Trace GP for sleepable raw tracepoint BPF links
bpf: decouple BPF link/attach hook and BPF program sleepable semantics
bpf: put bpf_link's program when link is safe to be deallocated
tracing: Replace strncpy() with strscpy() when copying comm
tracing: Add might_fault() check in __DECLARE_TRACE_SYSCALL
tracing: Fix syscall tracepoint use-after-free
tracing: Introduce tracepoint_is_faultable()
tracing: Introduce tracepoint extended structure
tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT
tracing: Replace multiple deprecated strncpy with memcpy
tracing: Make percpu stack trace buffer invariant to PAGE_SIZE
tracing: Use atomic64_inc_return() in trace_clock_counter()
trace/trace_event_perf: remove duplicate samples on the first tracepoint event
tracing/bpf: Add might_fault check to syscall probes
tracing/perf: Add might_fault check to syscall probes
tracing/ftrace: Add might_fault check to syscall probes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tools updates from Steven Rostedt:
- Add ':' to getopt option 'trace-buffer-size' in timerlat_hist for
consistency
- Remove unused sched_getattr define
- Rename sched_setattr() helper to syscall_sched_setattr() to avoid
conflicts
- Update counters to long from int to avoid overflow
- Add libcpupower dependency detection
- Add --deepest-idle-state to timerlat to limit deep idle sleeps
- Other minor clean ups and documentation changes
* tag 'trace-tools-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
verification/dot2: Improve dot parser robustness
tools/rtla: Improve exception handling in timerlat_load.py
tools/rtla: Enhance argument parsing in timerlat_load.py
tools/rtla: Improve code readability in timerlat_load.py
rtla/timerlat: Do not set params->user_workload with -U
rtla: Documentation: Mention --deepest-idle-state
rtla/timerlat: Add --deepest-idle-state for hist
rtla/timerlat: Add --deepest-idle-state for top
rtla/utils: Add idle state disabling via libcpupower
rtla: Add optional dependency on libcpupower
tools/build: Add libcpupower dependency detection
rtla/timerlat: Make timerlat_hist_cpu->*_count unsigned long long
rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long
tools/rtla: fix collision with glibc sched_attr/sched_set_attr
tools/rtla: drop __NR_sched_getattr
rtla: Fix consistency in getopt_long for timerlat_hist
rv: Fix a typo
tools/rv: Correct the grammatical errors in the comments
tools/rv: Correct the grammatical errors in the comments
rtla: use the definition for stdout fd when calling isatty()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace ring-buffer updates from Steven Rostedt:
- Limit time interrupts are disabled in rb_check_pages()
rb_check_pages() is called after the ring buffer size is updated to
make sure that the ring buffer has not been corrupted. Commit
c2274b908db0 ("ring-buffer: Fix a race between readers and resize
checks") fixed a race with the check pages and simultaneous resizes
to the ring buffer by adding a raw_spin_lock_irqsave() around the
check operation. Although this was a simple fix, it would hold
interrupts disabled for non determinative amount of time. This could
harm PREEMPT_RT operations.
Instead, modify the logic by adding a counter when the buffer is
modified and to release the raw_spin_lock() at each iteration. It
checks the counter under the lock to see if a modification happened
during the loop, and if it did, it would restart the loop up to 3
times. After 3 times, it will simply exit the check, as it is
unlikely that would ever happen as buffer resizes are rare
occurrences.
- Replace some open coded str_low_high() with the helper
- Fix some documentation/comments
* tag 'trace-ring-buffer-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Correct a grammatical error in a comment
ring-buffer: Use str_low_high() helper in ring_buffer_producer()
ring-buffer: Reorganize kerneldoc parameter names
ring-buffer: Limit time with disabled interrupts in rb_check_pages()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Add BPF uprobe session support (Jiri Olsa)
- Optimize uprobe performance (Andrii Nakryiko)
- Add bpf_fastcall support to helpers and kfuncs (Eduard Zingerman)
- Avoid calling free_htab_elem() under hash map bucket lock (Hou Tao)
- Prevent tailcall infinite loop caused by freplace (Leon Hwang)
- Mark raw_tracepoint arguments as nullable (Kumar Kartikeya Dwivedi)
- Introduce uptr support in the task local storage map (Martin KaFai
Lau)
- Stringify errno log messages in libbpf (Mykyta Yatsenko)
- Add kmem_cache BPF iterator for perf's lock profiling (Namhyung Kim)
- Support BPF objects of either endianness in libbpf (Tony Ambardar)
- Add ksym to struct_ops trampoline to fix stack trace (Xu Kuohai)
- Introduce private stack for eligible BPF programs (Yonghong Song)
- Migrate samples/bpf tests to selftests/bpf test_progs (Daniel T. Lee)
- Migrate test_sock to selftests/bpf test_progs (Jordan Rife)
* tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (152 commits)
libbpf: Change hash_combine parameters from long to unsigned long
selftests/bpf: Fix build error with llvm 19
libbpf: Fix memory leak in bpf_program__attach_uprobe_multi
bpf: use common instruction history across all states
bpf: Add necessary migrate_disable to range_tree.
bpf: Do not alloc arena on unsupported arches
selftests/bpf: Set test path for token/obj_priv_implicit_token_envvar
selftests/bpf: Add a test for arena range tree algorithm
bpf: Introduce range_tree data structure and use it in bpf arena
samples/bpf: Remove unused variable in xdp2skb_meta_kern.c
samples/bpf: Remove unused variables in tc_l2_redirect_kern.c
bpftool: Cast variable `var` to long long
bpf, x86: Propagate tailcall info only for subprogs
bpf: Add kernel symbol for struct_ops trampoline
bpf: Use function pointers count as struct_ops links count
bpf: Remove unused member rcu from bpf_struct_ops_map
selftests/bpf: Add struct_ops prog private stack tests
bpf: Support private stack for struct_ops progs
selftests/bpf: Add tracing prog private stack tests
bpf, x86: Support private stack in jit
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"A relatively modest collection of changes:
- Adopt kstrtoint() and kstrtol() instead of the simple_strtoXX
family for better error checking of user input.
- Align the print behavour when breakpoints are enabled and disabled
by adopting the current behaviour of breakpoint disable for both.
- Remove some of the (rather odd and user hostile) hex fallbacks and
require kdb users to prefix with 0x instead.
- Tidy up (and fix) control code handling in kdb's keyboard code.
This makes the control code handling at the keyboard behave the
same way as it does via the UART.
- Switch my own entry in MAINTAINERS to my @kernel.org address"
* tag 'kgdb-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: fix ctrl+e/a/f/b/d/p/n broken in keyboard mode
MAINTAINERS: Use Daniel Thompson's korg address for kgdb work
kdb: Fix breakpoint enable to be silent if already enabled
kdb: Remove fallback interpretation of arbitrary numbers as hex
trace: kdb: Replace simple_strtoul with kstrtoul in kdb_ftdump
kdb: Replace the use of simple_strto with safer kstrto in kdb_main
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Restructure the function graph shadow stack to prepare it for use
with kretprobes
With the goal of merging the shadow stack logic of function graph and
kretprobes, some more restructuring of the function shadow stack is
required.
Move out function graph specific fields from the fgraph
infrastructure and store it on the new stack variables that can pass
data from the entry callback to the exit callback.
Hopefully, with this change, the merge of kretprobes to use fgraph
shadow stacks will be ready by the next merge window.
- Make shadow stack 4k instead of using PAGE_SIZE.
Some architectures have very large PAGE_SIZE values which make its
use for shadow stacks waste a lot of memory.
- Give shadow stacks its own kmem cache.
When function graph is started, every task on the system gets a
shadow stack. In the future, shadow stacks may not be 4K in size.
Have it have its own kmem cache so that whatever size it becomes will
still be efficient in allocations.
- Initialize profiler graph ops as it will be needed for new updates to
fgraph
- Convert to use guard(mutex) for several ftrace and fgraph functions
- Add more comments and documentation
- Show function return address in function graph tracer
Add an option to show the caller of a function at each entry of the
function graph tracer, similar to what the function tracer does.
- Abstract out ftrace_regs from being used directly like pt_regs
ftrace_regs was created to store a partial pt_regs. It holds only the
registers and stack information to get to the function arguments and
return values. On several archs, it is simply a wrapper around
pt_regs. But some users would access ftrace_regs directly to get the
pt_regs which will not work on all archs. Make ftrace_regs an
abstract structure that requires all access to its fields be through
accessor functions.
- Show how long it takes to do function code modifications
When code modification for function hooks happen, it always had the
time recorded in how long it took to do the conversion. But this
value was never exported. Recently the code was touched due to new
ROX modification handling that caused a large slow down in doing the
modifications and had a significant impact on boot times.
Expose the timings in the dyn_ftrace_total_info file. This file was
created a while ago to show information about memory usage and such
to implement dynamic function tracing. It's also an appropriate file
to store the timings of this modification as well. This will make it
easier to see the impact of changes to code modification on boot up
timings.
- Other clean ups and small fixes
* tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits)
ftrace: Show timings of how long nop patching took
ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash()
ftrace: Use guard to take the ftrace_lock in release_probe()
ftrace: Use guard to lock ftrace_lock in cache_mod()
ftrace: Use guard for match_records()
fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph()
fgraph: Give ret_stack its own kmem cache
fgraph: Separate size of ret_stack from PAGE_SIZE
ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
selftests/ftrace: Fix check of return value in fgraph-retval.tc test
ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros
ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs
ftrace: Make ftrace_regs abstract from direct use
fgragh: No need to invoke the function call_filter_check_discard()
fgraph: Simplify return address printing in function graph tracer
function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr()
function_graph: Support recording and printing the function return address
ftrace: Have calltime be saved in the fgraph storage
ftrace: Use a running sleeptime instead of saving on shadow stack
fgraph: Use fgraph data to store subtime for profiler
...
|
|
When executing the following command:
# echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter
The current mod command causes a null pointer dereference. While commit
0f17976568b3f ("ftrace: Fix regression with module command in stack_trace_filter")
has addressed part of the issue, it left a corner case unhandled, which still
results in a kernel crash.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241120052750.275463-1-guoweikang.kernel@gmail.com
Fixes: 04ec7bb642b77 ("tracing: Have the trace_array hold the list of registered func probes");
Signed-off-by: guoweikang <guoweikang.kernel@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Uprobes:
- Add BPF session support (Jiri Olsa)
- Switch to RCU Tasks Trace flavor for better performance (Andrii
Nakryiko)
- Massively increase uretprobe SMP scalability by SRCU-protecting
the uretprobe lifetime (Andrii Nakryiko)
- Kill xol_area->slot_count (Oleg Nesterov)
Core facilities:
- Implement targeted high-frequency profiling by adding the ability
for an event to "pause" or "resume" AUX area tracing (Adrian
Hunter)
VM profiling/sampling:
- Correct perf sampling with guest VMs (Colton Lewis)
New hardware support:
- x86/intel: Add PMU support for Intel ArrowLake-H CPUs (Dapeng Mi)
Misc fixes and enhancements:
- x86/intel/pt: Fix buffer full but size is 0 case (Adrian Hunter)
- x86/amd: Warn only on new bits set (Breno Leitao)
- x86/amd/uncore: Avoid a false positive warning about snprintf
truncation in amd_uncore_umc_ctx_init (Jean Delvare)
- uprobes: Re-order struct uprobe_task to save some space
(Christophe JAILLET)
- x86/rapl: Move the pmu allocation out of CPU hotplug (Kan Liang)
- x86/rapl: Clean up cpumask and hotplug (Kan Liang)
- uprobes: Deuglify xol_get_insn_slot/xol_free_insn_slot paths (Oleg
Nesterov)"
* tag 'perf-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
perf/core: Correct perf sampling with guest VMs
perf/x86: Refactor misc flag assignments
perf/powerpc: Use perf_arch_instruction_pointer()
perf/core: Hoist perf_instruction_pointer() and perf_misc_flags()
perf/arm: Drop unused functions
uprobes: Re-order struct uprobe_task to save some space
perf/x86/amd/uncore: Avoid a false positive warning about snprintf truncation in amd_uncore_umc_ctx_init
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
perf/x86/intel/pt: Add support for pause / resume
perf/core: Add aux_pause, aux_resume, aux_start_paused
perf/x86/intel/pt: Fix buffer full but size is 0 case
uprobes: SRCU-protect uretprobe lifetime (with timeout)
uprobes: allow put_uprobe() from non-sleepable softirq context
perf/x86/rapl: Clean up cpumask and hotplug
perf/x86/rapl: Move the pmu allocation out of CPU hotplug
uprobe: Add support for session consumer
uprobe: Add data pointer to consumer handlers
perf/x86/amd: Warn only on new bits set
uprobes: fold xol_take_insn_slot() into xol_get_insn_slot()
uprobes: kill xol_area->slot_count
...
|
|
The issue that unrelated function name is shown on stack trace like
following even though it should be trampoline code address is caused by
the creation of trampoline code in the area where .init.text section
of module was freed after module is loaded.
bash-1344 [002] ..... 43.644608: <stack trace>
=> (MODULE INIT FUNCTION)
=> vfs_write
=> ksys_write
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
To resolve this, when function address of stack trace entry is in
trampoline, output without looking up symbol name.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241021071454.34610-2-tatsuya.s2862@gmail.com
Signed-off-by: Tatsuya S <tatsuya.s2862@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When using both function tracer and function graph simultaneously,
it is found that function tracer sometimes captures a fake parent ip
(return_to_handler) instead of the true parent ip.
This issue is easy to reproduce. Below are my reproduction steps:
jeff-labs:~/bin # ./trace-net.sh
jeff-labs:~/bin # cat /sys/kernel/debug/tracing/instances/foo/trace | grep return_to_handler
trace-net.sh-405 [001] ...2. 31.859501: avc_has_perm+0x4/0x190 <-return_to_handler+0x0/0x40
trace-net.sh-405 [001] ...2. 31.859503: simple_setattr+0x4/0x70 <-return_to_handler+0x0/0x40
trace-net.sh-405 [001] ...2. 31.859503: truncate_pagecache+0x4/0x60 <-return_to_handler+0x0/0x40
trace-net.sh-405 [001] ...2. 31.859505: unmap_mapping_range+0x4/0x140 <-return_to_handler+0x0/0x40
trace-net.sh-405 [001] ...3. 31.859508: _raw_spin_unlock+0x4/0x30 <-return_to_handler+0x0/0x40
[...]
The following is my simple trace script:
<snip>
jeff-labs:~/bin # cat ./trace-net.sh
TRACE_PATH="/sys/kernel/tracing"
set_events() {
echo 1 > $1/events/net/enable
echo 1 > $1/events/tcp/enable
echo 1 > $1/events/sock/enable
echo 1 > $1/events/napi/enable
echo 1 > $1/events/fib/enable
echo 1 > $1/events/neigh/enable
}
set_events ${TRACE_PATH}
echo 1 > ${TRACE_PATH}/options/sym-offset
echo 1 > ${TRACE_PATH}/options/funcgraph-tail
echo 1 > ${TRACE_PATH}/options/funcgraph-proc
echo 1 > ${TRACE_PATH}/options/funcgraph-abstime
echo 'tcp_orphan*' > ${TRACE_PATH}/set_ftrace_notrace
echo function_graph > ${TRACE_PATH}/current_tracer
INSTANCE_FOO=${TRACE_PATH}/instances/foo
if [ ! -e $INSTANCE_FOO ]; then
mkdir ${INSTANCE_FOO}
fi
set_events ${INSTANCE_FOO}
echo 1 > ${INSTANCE_FOO}/options/sym-offset
echo 'tcp_orphan*' > ${INSTANCE_FOO}/set_ftrace_notrace
echo function > ${INSTANCE_FOO}/current_tracer
echo 1 > ${TRACE_PATH}/tracing_on
echo 1 > ${INSTANCE_FOO}/tracing_on
echo > ${TRACE_PATH}/trace
echo > ${INSTANCE_FOO}/trace
</snip>
Link: https://lore.kernel.org/20241008033159.22459-1-jeff.xie@linux.dev
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Jeff Xie <jeff.xie@linux.dev>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The word "trace" begins with a consonant sound,
so "a" should be used instead of "an".
Link: https://lore.kernel.org/20241107095327.6390-1-liujing@cmss.chinamobile.com
Signed-off-by: liujing <liujing@cmss.chinamobile.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring buffer fixes from Steven Rostedt:
- Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU
hotplug"
A crash that happened on cpu hotplug was actually caused by the
incorrect ref counting that was fixed by commit 2cf9733891a4
("ring-buffer: Fix refcount setting of boot mapped buffers"). The
removal of calling cpu hotplug callbacks on memory mapped buffers was
not an issue even though the tests at the time pointed toward it. But
in fact, there's a check in that code that tests to see if the
buffers are already allocated or not, and will not allocate them
again if they are. Not calling the cpu hotplug callbacks ended up not
initializing the non boot CPU buffers.
Simply remove that change.
- Clear all CPU buffers when starting tracing in a boot mapped buffer
To properly process events from a previous boot, the address space
needs to be accounted for due to KASLR and the events in the buffer
are updated accordingly when read. This also requires that when the
buffer has tracing enabled again in the current boot that the buffers
are reset so that events from the previous boot do not interact with
the events of the current boot and cause confusing due to not having
the proper meta data.
It was found that if a CPU is taken offline, that its per CPU buffer
is not reset when tracing starts. This allows for events to be from
both the previous boot and the current boot to be in the buffer at
the same time. Clear all CPU buffers when tracing is started in a
boot mapped buffer.
* tag 'trace-ringbuffer-v6.12-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/ring-buffer: Clear all memory mapped CPU ring buffers on first recording
Revert: "ring-buffer: Do not have boot mapped buffers hook to CPU hotplug"
|
|
The events of a memory mapped ring buffer from the previous boot should
not be mixed in with events from the current boot. There's meta data that
is used to handle KASLR so that function names can be shown properly.
Also, since the timestamps of the previous boot have no meaning to the
timestamps of the current boot, having them intermingled in a buffer can
also cause confusion because there could possibly be events in the future.
When a trace is activated the meta data is reset so that the pointers of
are now processed for the new address space. The trace buffers are reset
when tracing starts for the first time. The problem here is that the reset
only happens on online CPUs. If a CPU is offline, it does not get reset.
To demonstrate the issue, a previous boot had tracing enabled in the boot
mapped ring buffer on reboot. On the following boot, tracing has not been
started yet so the function trace from the previous boot is still visible.
# trace-cmd show -B boot_mapped -c 3 | tail
<idle>-0 [003] d.h2. 156.462395: __rcu_read_lock <-cpu_emergency_disable_virtualization
<idle>-0 [003] d.h2. 156.462396: vmx_emergency_disable_virtualization_cpu <-cpu_emergency_disable_virtualization
<idle>-0 [003] d.h2. 156.462396: __rcu_read_unlock <-__sysvec_reboot
<idle>-0 [003] d.h2. 156.462397: stop_this_cpu <-__sysvec_reboot
<idle>-0 [003] d.h2. 156.462397: set_cpu_online <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462397: disable_local_APIC <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462398: clear_local_APIC <-disable_local_APIC
<idle>-0 [003] d.h2. 156.462574: mcheck_cpu_clear <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462575: mce_intel_feature_clear <-stop_this_cpu
<idle>-0 [003] d.h2. 156.462575: lmce_supported <-mce_intel_feature_clear
Now, if CPU 3 is taken offline, and tracing is started on the memory
mapped ring buffer, the events from the previous boot in the CPU 3 ring
buffer is not reset. Now those events are using the meta data from the
current boot and produces just hex values.
# echo 0 > /sys/devices/system/cpu/cpu3/online
# trace-cmd start -B boot_mapped -p function
# trace-cmd show -B boot_mapped -c 3 | tail
<idle>-0 [003] d.h2. 156.462395: 0xffffffff9a1e3194 <-0xffffffff9a0f655e
<idle>-0 [003] d.h2. 156.462396: 0xffffffff9a0a1d24 <-0xffffffff9a0f656f
<idle>-0 [003] d.h2. 156.462396: 0xffffffff9a1e6bc4 <-0xffffffff9a0f7323
<idle>-0 [003] d.h2. 156.462397: 0xffffffff9a0d12b4 <-0xffffffff9a0f732a
<idle>-0 [003] d.h2. 156.462397: 0xffffffff9a1458d4 <-0xffffffff9a0d12e2
<idle>-0 [003] d.h2. 156.462397: 0xffffffff9a0faed4 <-0xffffffff9a0d12e7
<idle>-0 [003] d.h2. 156.462398: 0xffffffff9a0faaf4 <-0xffffffff9a0faef2
<idle>-0 [003] d.h2. 156.462574: 0xffffffff9a0e3444 <-0xffffffff9a0d12ef
<idle>-0 [003] d.h2. 156.462575: 0xffffffff9a0e4964 <-0xffffffff9a0d12ef
<idle>-0 [003] d.h2. 156.462575: 0xffffffff9a0e3fb0 <-0xffffffff9a0e496f
Reset all CPUs when starting a boot mapped ring buffer for the first time,
and not just the online CPUs.
Fixes: 7a1d1e4b9639f ("tracing/ring-buffer: Add last_boot_info file to boot instance")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
A crash happened when testing cpu hotplug with respect to the memory
mapped ring buffers. It was assumed that the hot plug code was adding a
per CPU buffer that was already created that caused the crash. The real
problem was due to ref counting and was fixed by commit 2cf9733891a4
("ring-buffer: Fix refcount setting of boot mapped buffers").
When a per CPU buffer is created, it will not be created again even with
CPU hotplug, so the fix to not use CPU hotplug was a red herring. In fact,
it caused only the boot CPU buffer to be created, leaving the other CPU
per CPU buffers disabled.
Revert that change as it was not the culprit of the fix it was intended to
be.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241113230839.6c03640f@gandalf.local.home
Fixes: 912da2c384d5 ("ring-buffer: Do not have boot mapped buffers hook to CPU hotplug")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Cross-merge bpf fixes after downstream PR.
In particular to bring the fix in
commit aa30eb3260b2 ("bpf: Force checkpoint when jmp history is too long").
The follow up verifier work depends on it.
And the fix in
commit 6801cf7890f2 ("selftests/bpf: Use -4095 as the bad address for bits iterator").
It's fixing instability of BPF CI on s390 arch.
No conflicts.
Adjacent changes in:
Auto-merging arch/Kconfig
Auto-merging kernel/bpf/helpers.c
Auto-merging kernel/bpf/memalloc.c
Auto-merging kernel/bpf/verifier.c
Auto-merging mm/slab_common.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The check on field->field being true is handled as the first check
on the cascaded if statement, so the later checks on field->field
are redundant because this clause has already been handled. Since
this later check is redundant, just remove it.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241107120530.18728-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Placing bpf_session_run_ctx layer in between bpf_run_ctx and
bpf_uprobe_multi_run_ctx, so the session data can be retrieved
from uprobe_multi link.
Plus granting session kfuncs access to uprobe session programs.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-5-jolsa@kernel.org
|
|
Adding support to attach BPF program for entry and return probe
of the same function. This is common use case which at the moment
requires to create two uprobe multi links.
Adding new BPF_TRACE_UPROBE_SESSION attach type that instructs
kernel to attach single link program to both entry and exit probe.
It's possible to control execution of the BPF program on return
probe simply by returning zero or non zero from the entry BPF
program execution to execute or not the BPF program on return
probe respectively.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-4-jolsa@kernel.org
|
|
As suggested by Andrii make uprobe multi bpf programs to always return 0,
so they can't force uprobe removal.
Keeping the int return type for uprobe_prog_run, because it will be used
in following session changes.
Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-3-jolsa@kernel.org
|
|
Stable tag for bpf-next's uprobe work.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
The function simple_strtoul performs no error checking in scenarios
where the input value overflows the intended output variable.
This results in this function successfully returning, even when the
output does not match the input string (aka the function returns
successfully even when the result is wrong).
Or as it was mentioned [1], "...simple_strtol(), simple_strtoll(),
simple_strtoul(), and simple_strtoull() functions explicitly ignore
overflows, which may lead to unexpected results in callers."
Hence, the use of those functions is discouraged.
This patch replaces all uses of the simple_strtoul with the safer
alternatives kstrtoint and kstrtol.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#simple-strtol-simple-strtoll-simple-strtoul-simple-strtoull
Signed-off-by: Yuran Pereira <yuran.pereira@hotmail.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
[nir: style fixes]
Signed-off-by: Nir Lichtman <nir@lichtman.org>
Link: https://lore.kernel.org/r/20241028192100.GB918454@lichtman.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
|