Age | Commit message (Collapse) | Author |
|
This is to slow down lock acquistion (on contention locks) deliberately.
A possible use case is to estimate impact on application performance by
optimization of kernel locking behavior. By delaying the lock it can
simulate the worse condition as a control group, and then compare with
the current behavior as a optimized condition.
The syntax is 'time@function' and the time can have unit suffix like
"us" and "ms". For example, I ran a simple test like below.
$ sudo perf lock con -abl -L tasklist_lock -- \
sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
contended total wait max wait avg wait address symbol
92 1.18 ms 199.54 us 12.79 us ffffffff8a806080 tasklist_lock (rwlock)
The contention count was 92 and the average wait time was around 10 us.
But if I add 100 usec of delay to the tasklist_lock,
$ sudo perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- \
sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
contended total wait max wait avg wait address symbol
190 15.67 ms 230.10 us 82.46 us ffffffff8a806080 tasklist_lock (rwlock)
The contention count increased and the average wait time was up closed
to 100 usec. If I increase the delay even more,
$ sudo perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- \
sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
contended total wait max wait avg wait address symbol
1002 2.80 s 3.01 ms 2.80 ms ffffffff8a806080 tasklist_lock (rwlock)
Now every sleep process had contention and the wait time was more than 1
msec. This is on my 4 CPU laptop so I guess one CPU has the lock while
other 3 are waiting for it mostly.
For simplicity, it only supports global locks for now.
Committer testing:
root@number:~# grep -m1 'model name' /proc/cpuinfo
model name : AMD Ryzen 9 9950X3D 16-Core Processor
root@number:~# perf lock con -abl -L tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
contended total wait max wait avg wait address symbol
142 453.85 us 25.39 us 3.20 us ffffffffae808080 tasklist_lock (rwlock)
root@number:~# perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
contended total wait max wait avg wait address symbol
1040 2.39 s 3.11 ms 2.30 ms ffffffffae808080 tasklist_lock (rwlock)
root@number:~# perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait'
contended total wait max wait avg wait address symbol
1025 24.72 s 31.01 ms 24.12 ms ffffffffae808080 tasklist_lock (rwlock)
root@number:~#
Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250509171950.183591-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The struct zone is embedded in struct pglist_data which can be allocated
for each NUMA node early in the boot process. As it's not a slab object
nor a global lock, this was not symbolized.
Since the zone->lock is often contended, it'd be nice if we can
symbolize it. On NUMA systems, node_data array will have pointers for
struct pglist_data. By following the pointer, it can calculate the
address of each zone and its lock using BTF. On UMA, it can just use
contig_page_data and its zones.
The following example shows the zone lock contention at the end.
$ sudo ./perf lock con -abl -E 5 -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.038 [sec]
contended total wait max wait avg wait address symbol
5167 18.17 ms 10.27 us 3.52 us ffff953340052d00 &kmem_cache_node (spinlock)
38 11.75 ms 465.49 us 309.13 us ffff95334060c480 &sock_inode_cache (spinlock)
3916 10.13 ms 10.43 us 2.59 us ffff953342aecb40 &kmem_cache_node (spinlock)
2963 10.02 ms 13.75 us 3.38 us ffff9533d2344098 &kmalloc-rnd-08-2k (spinlock)
216 5.05 ms 99.49 us 23.39 us ffff9542bf7d65d0 zone_lock (spinlock)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20250401063055.7431-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This implements per-callstack aggregation of lock owners in addition to
per-thread. The owner callstack is captured using `bpf_get_task_stack()`
at `contention_begin()` and it also adds a custom stackid function for the
owner stacks to be compared easily.
The owner info is kept in a hash map using lock addr as a key to handle
multiple waiters for the same lock. At `contention_end()`, it updates the
owner lock stat based on the info that was saved at `contention_begin()`.
If there are more waiters, it'd update the owner pid to itself as
`contention_end()` means it gets the lock now. But it also needs to check
the return value of the lock function in case task was killed by a signal
or something.
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-3-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a struct and few bpf maps in order to tracing owner stack.
`struct owner_tracing_data`: Contains owner's pid, stack id, timestamp for
when the owner acquires lock, and the count of lock waiters.
`stack_buf`: Percpu buffer for retrieving owner stacktrace.
`owner_stacks`: For tracing owner stacktrace to customized owner stack id.
`owner_data`: For tracing lock_address to `struct owner_tracing_data` in
bpf program.
`owner_stat`: For reporting owner stacktrace in usermode.
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-2-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This is to filter lock contention from specific slab objects only.
Like in the lock symbol output, we can use '&' prefix to filter slab
object names.
root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
3 14.99 us 14.44 us 5.00 us ffffffff851c0940 pack_mutex (mutex)
2 2.75 us 2.56 us 1.38 us ffff98d7031fb498 &task_struct (mutex)
4 1.42 us 557 ns 355 ns ffff98d706311400 &kmalloc-cg-512 (mutex)
2 953 ns 714 ns 476 ns ffffffff851c3620 delayed_uprobe_lock (mutex)
1 929 ns 929 ns 929 ns ffff98d7031fb538 &task_struct (mutex)
3 561 ns 210 ns 187 ns ffffffff84a8b3a0 text_mutex (mutex)
1 479 ns 479 ns 479 ns ffffffff851b4cf8 tracepoint_srcu_srcu_usage (mutex)
2 320 ns 195 ns 160 ns ffffffff851cf840 pcpu_alloc_mutex (mutex)
1 212 ns 212 ns 212 ns ffff98d7031784d8 &signal_cache (mutex)
1 177 ns 177 ns 177 ns ffffffff851b4c28 tracepoint_srcu_srcu_usage (mutex)
With the filter, it can show contentions from the task_struct only.
root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl -L '&task_struct' sleep 1
contended total wait max wait avg wait address symbol
2 1.97 us 1.71 us 987 ns ffff98d7032fd658 &task_struct (mutex)
1 1.20 us 1.20 us 1.20 us ffff98d7032fd6f8 &task_struct (mutex)
It can work with other aggregation mode:
root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -ab -L '&task_struct' sleep 1
contended total wait max wait avg wait type caller
1 25.10 us 25.10 us 25.10 us mutex perf_event_exit_task+0x39
1 21.60 us 21.60 us 21.60 us mutex futex_exit_release+0x21
1 5.56 us 5.56 us 5.56 us mutex futex_exec_release+0x21
Committer testing:
root@number:~# perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
1 20.80 us 20.80 us 20.80 us ffff9d417fbd65d0 (spinlock)
8 12.85 us 2.41 us 1.61 us ffff9d415eeb6a40 rq_lock (spinlock)
1 2.55 us 2.55 us 2.55 us ffff9d415f636a40 rq_lock (spinlock)
7 1.92 us 840 ns 274 ns ffff9d39c2cbc8c4 (spinlock)
1 1.23 us 1.23 us 1.23 us ffff9d415fb36a40 rq_lock (spinlock)
2 928 ns 738 ns 464 ns ffff9d39c1fa6660 &kmalloc-rnd-14-192 (rwlock)
4 788 ns 252 ns 197 ns ffffffffb8608a80 jiffies_lock (spinlock)
1 304 ns 304 ns 304 ns ffff9d39c2c979c4 (spinlock)
1 216 ns 216 ns 216 ns ffff9d3a0225c660 &kmalloc-rnd-14-192 (rwlock)
1 89 ns 89 ns 89 ns ffff9d3a0adbf3e0 &kmalloc-rnd-14-192 (rwlock)
1 61 ns 61 ns 61 ns ffff9d415f9b6a40 rq_lock (spinlock)
root@number:~# uname -r
6.13.0-rc2
root@number:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20241220060009.507297-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The bpf_get_kmem_cache() kfunc can return an address of the slab cache
(kmem_cache). As it has the name of the slab cache from the iterator,
we can use it to symbolize some dynamic kernel locks in a slab.
Before:
root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
2 3.34 us 2.87 us 1.67 us ffff9d7800ad9600 (mutex)
2 2.16 us 1.93 us 1.08 us ffff9d7804b992d8 (mutex)
4 1.37 us 517 ns 343 ns ffff9d78036e6e00 (mutex)
1 1.27 us 1.27 us 1.27 us ffff9d7804b99378 (mutex)
2 845 ns 599 ns 422 ns ffffffff9e1c3620 delayed_uprobe_lock (mutex)
1 845 ns 845 ns 845 ns ffffffff9da0b280 jiffies_lock (spinlock)
2 377 ns 259 ns 188 ns ffffffff9e1cf840 pcpu_alloc_mutex (mutex)
1 305 ns 305 ns 305 ns ffffffff9e1b4cf8 tracepoint_srcu_srcu_usage (mutex)
1 295 ns 295 ns 295 ns ffffffff9e1c0940 pack_mutex (mutex)
1 232 ns 232 ns 232 ns ffff9d7804b7d8d8 (mutex)
1 180 ns 180 ns 180 ns ffffffff9e1b4c28 tracepoint_srcu_srcu_usage (mutex)
1 165 ns 165 ns 165 ns ffffffff9da8b3a0 text_mutex (mutex)
After:
root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex)
1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex)
4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex)
2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex)
3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex)
3 421 ns 164 ns 140 ns ffffffffa3a8b3a0 text_mutex (mutex)
1 409 ns 409 ns 409 ns ffffffffa41b4cf8 tracepoint_srcu_srcu_usage (mutex)
2 362 ns 239 ns 181 ns ffffffffa41cf840 pcpu_alloc_mutex (mutex)
1 220 ns 220 ns 220 ns ffff9d5e82b534d8 &signal_cache (mutex)
1 215 ns 215 ns 215 ns ffffffffa41b4c28 tracepoint_srcu_srcu_usage (mutex)
Note that the name starts with '&' sign for slab objects to inform they
are dynamic locks. It won't give the accurate lock or type names but
it's still useful. We may add type info to the slab cache later to get
the exact name of the lock in the type later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20241220060009.507297-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Recently the kernel got the kmem_cache iterator to traverse metadata of
slab objects. This can be used to symbolize dynamic locks in a slab.
The new slab_caches hash map will have the pointer of the kmem_cache as
a key and save the name and a id. The id will be saved in the flags
part of the lock.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Chun-Tse Shao <ctshao@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20241220060009.507297-3-namhyung@kernel.org
[ Added change from Namhyung addressing review from Alexei: ]
Link: https://lore.kernel.org/r/Z2dVdH3o5iF-KrWj@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The control knobs set before loading BPF programs should be declared as
'const volatile' so that it can be optimized by the BPF core.
Committer testing:
root@x1:~# perf lock contention --use-bpf
contended total wait max wait avg wait type caller
5 31.57 us 14.93 us 6.31 us mutex btrfs_delayed_update_inode+0x43
1 16.91 us 16.91 us 16.91 us rwsem:R btrfs_tree_read_lock_nested+0x1b
1 15.13 us 15.13 us 15.13 us spinlock btrfs_getattr+0xd1
1 6.65 us 6.65 us 6.65 us rwsem:R btrfs_tree_read_lock_nested+0x1b
1 4.34 us 4.34 us 4.34 us spinlock process_one_work+0x1a9
root@x1:~#
root@x1:~# perf trace -e bpf --max-events 10 perf lock contention --use-bpf
0.000 ( 0.013 ms): :2948281/2948281 bpf(cmd: 36, uattr: 0x7ffd5f12d730, size: 8) = -1 EOPNOTSUPP (Operation not supported)
0.024 ( 0.120 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d460, size: 148) = 16
0.158 ( 0.034 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d520, size: 148) = 16
26.653 ( 0.154 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d3d0, size: 148) = 16
26.825 ( 0.014 ms): perf/2948281 bpf(uattr: 0x7ffd5f12d580, size: 80) = 16
87.924 ( 0.038 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d400, size: 40) = 16
87.988 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d470, size: 40) = 16
88.019 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d250, size: 40) = 16
88.029 ( 0.172 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d320, size: 148) = 17
88.217 ( 0.005 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d4d0, size: 40) = 16
root@x1:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240902200515.2103769-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When it updates the lock stat for the first time, it needs to create an
element in the BPF hash map.
But if there's a concurrent thread waiting for the same lock (like for
rwsem or rwlock), it might race with the thread and possibly fail to
update with -EEXIST.
In that case, it can lookup the map again and put the data there instead
of failing.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The LCB_F_SPIN bit is used for spinlock, rwlock and optimistic spinning
in mutex. In get_tstamp_elem() it needs to check spinlock and rwlock
only. As mutex sets the LCB_F_MUTEX, it can check those two bits and
reduce the number of operations.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It has some duplicate codes to do the same job. Let's add a label and
goto there to handle errors in a single place.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240830065150.1758962-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
I got a report for a failure in BPF verifier on a recent kernel with
perf lock contention command. It checks task->sighand->siglock without
checking if sighand is NULL or not. Let's add one.
; if (&curr->sighand->siglock == (void *)lock)
265: (79) r1 = *(u64 *)(r0 +2624) ; frame1: R0_w=trusted_ptr_task_struct(off=0,imm=0)
; R1_w=rcu_ptr_or_null_sighand_struct(off=0,imm=0)
266: (b7) r2 = 0 ; frame1: R2_w=0
267: (0f) r1 += r2
R1 pointer arithmetic on rcu_ptr_or_null_ prohibited, null-check it first
processed 164 insns (limit 1000000) max_states_per_insn 1 total_states 15 peak_states 15 mark_read 5
-- END PROG LOAD LOG --
libbpf: prog 'contention_end': failed to load: -13
libbpf: failed to load object 'lock_contention_bpf'
libbpf: failed to load BPF skeleton 'lock_contention_bpf': -13
Failed to load lock-contention BPF skeleton
lock contention BPF setup failed
lock contention did not detect any lock contention
Fixes: 1811e82767dcc ("perf lock contention: Track and show siglock with address")
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240409225542.1870999-1-namhyung@kernel.org
|
|
Currently it accounts the contention using delta between timestamps in
lock:contention_begin and lock:contention_end tracepoints. But it means
the lock should see the both events during the monitoring period.
Actually there are 4 cases that happen with the monitoring:
monitoring period
/ \
| |
1: B------+-----------------------+--------E
2: B----+-------------E |
3: | B-----------+----E
4: | B-------------E |
| |
t0 t1
where B and E mean contention BEGIN and END, respectively. So it only
accounts the case 4 for now. It seems there's no way to handle the case
1. The case 2 might be handled if it saved the timestamp (t0), but it
lacks the information from the B notably the flags which shows the lock
types. Also it could be a nested lock which it currently ignores. So
I think we should ignore the case 2.
However we can handle the case 3 if we save the timestamp (t1) at the
end of the period. And then it can iterate the map entries in the
userspace and update the lock stat accordinly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviwed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240228053335.312776-1-namhyung@kernel.org
|
|
Currently lock contention timestamp is maintained in a hash map keyed by
pid. That means it needs to get and release a map element (which is
proctected by spinlock!) on each contention begin and end pair. This
can impact on performance if there are a lot of contention (usually from
spinlocks).
It used to go with task local storage but it had an issue on memory
allocation in some critical paths. Although it's addressed in recent
kernels IIUC, the tool should support old kernels too. So it cannot
simply switch to the task local storage at least for now.
As spinlocks create lots of contention and they disabled preemption
during the spinning, it can use per-cpu array to keep the timestamp to
avoid overhead in hashmap update and delete.
In contention_begin, it's easy to check the lock types since it can see
the flags. But contention_end cannot see it. So let's try to per-cpu
array first (unconditionally) if it has an active element (lock != 0).
Then it should be used and per-task tstamp map should not be used until
the per-cpu array element is cleared which means nested spinlock
contention (if any) was finished and it nows see (the outer) lock.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
|
|
When pelem is NULL, it'd create a new entry with zero data. But it
might be preempted by IRQ/NMI just before calling bpf_map_update_elem()
then there's a chance to call it twice for the same pid. So it'd be
better to use BPF_NOEXIST flag and check the return value to prevent
the race.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org
|
|
It checks the current lock to calculated the delta of contention time.
The address is saved in the tstamp map which is allocated at begining of
contention and released at end of contention.
But it's possible for bpf_map_delete_elem() to fail. In that case, the
element in the tstamp map kept for the current lock and it makes the
next contention for the same lock tracked incorrectly. Specificially
the next contention begin will see the existing element for the task and
it'd just return. Then the next contention end will see the element and
calculate the time using the timestamp for the previous begin.
This can result in a large value for two small contentions happened from
time to time. Let's clear the lock address so that it can be updated
next time even if the bpf_map_delete_elem() failed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org
|
|
The -G/--cgroup-filter is to limit lock contention collection on the
tasks in the specific cgroups only.
$ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \
./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.174 [sec]
contended total wait max wait avg wait pid comm
4 114.45 us 60.06 us 28.61 us 214847 sched-messaging
2 111.40 us 60.84 us 55.70 us 214848 sched-messaging
2 106.09 us 59.42 us 53.04 us 214837 sched-messaging
1 81.70 us 81.70 us 81.70 us 214709 sched-messaging
68 78.44 us 6.83 us 1.15 us 214633 sched-messaging
69 73.71 us 2.69 us 1.07 us 214632 sched-messaging
4 72.62 us 60.83 us 18.15 us 214850 sched-messaging
2 71.75 us 67.60 us 35.88 us 214840 sched-messaging
2 69.29 us 67.53 us 34.65 us 214804 sched-messaging
2 69.00 us 68.23 us 34.50 us 214826 sched-messaging
...
Export cgroup__new() function as it's needed from outside.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The --lock-cgroup option shows lock contention stats break down by
cgroups.
Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field.
$ sudo ./perf lock con -ab --lock-cgroup sleep 1
contended total wait max wait avg wait cgroup
8 15.70 us 6.34 us 1.96 us /
2 1.48 us 747 ns 738 ns /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope
1 848 ns 848 ns 848 ns /user.slice/.../session.slice/org.gnome.Shell@x11.service
1 220 ns 220 ns 220 ns /user.slice/.../session.slice/pipewire-pulse.service
For now, the cgroup mode only works with BPF (-b).
Committer notes:
Remove -g as it is used in the other tools with a clear meaning of
collect/show callchains. As agreed with Namhyung off list.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
struct rq is defined in vmlinux.h when the vmlinux.h is generated,
this causes a redefinition failure if it is declared in
lock_contention.bpf.c. Move the definition to vmlinux.h for
consistency with the generated version.
Fixes: 760ebc45746b ("perf lock contention: Add empty 'struct rq' to satisfy libbpf 'runqueue' type verification")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230623041405.4039475-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
type verification
If 'struct rq' isn't defined in lock_contention.bpf.c then the type for
the 'runqueue' variable ends up being a forward declaration
(BTF_KIND_FWD) while the kernel has it defined (BTF_KIND_STRUCT).
This makes libbpf decide it has incompatible types and then fails to
load the BPF skeleton:
# perf lock con -ab sleep 1
libbpf: extern (var ksym) 'runqueues': incompatible types, expected [95] fwd rq, but kernel has [55509] struct rq
libbpf: failed to load object 'lock_contention_bpf'
libbpf: failed to load BPF skeleton 'lock_contention_bpf': -22
Failed to load lock-contention BPF skeleton
lock contention BPF setup failed
#
Add it as an empty struct to satisfy that type verification:
# perf lock con -ab sleep 1
contended total wait max wait avg wait type caller
2 50.64 us 25.38 us 25.32 us spinlock tick_do_update_jiffies64+0x25
1 26.18 us 26.18 us 26.18 us spinlock tick_do_update_jiffies64+0x25
#
Committer notes:
Extracted from a larger patch as Namhyung had already fixed the other
issues in e53de7b65a3ca59a ("perf lock contention: Fix struct rq lock
access").
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/lkml/ZFVqeKLssg7uzxzI@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It seems BPF CO-RE reloc doesn't work well with the pattern that gets
the field-offset only. Use offsetof() to make it explicit so that
the compiler would generate the correct code.
Fixes: 0c1228486befa3d6 ("perf lock contention: Support pre-5.14 kernels")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/r/20230427234833.1576130-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The BPF CO-RE's ignore suffix rule requires three underscores.
Otherwise it'd fail like below:
$ sudo perf lock contention -ab
libbpf: prog 'collect_lock_syms': BPF program load failed: Invalid argument
libbpf: prog 'collect_lock_syms': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function collect_lock_syms#380
; int BPF_PROG(collect_lock_syms)
0: (b7) r6 = 0 ; R6_w=0
1: (b7) r7 = 0 ; R7_w=0
2: (b7) r9 = 1 ; R9_w=1
3: <invalid CO-RE relocation>
failed to resolve CO-RE relocation <byte_off> [381] struct rq__new.__lock (0:0 @ offset 0)
Fixes: 0c1228486befa3d6 ("perf lock contention: Support pre-5.14 kernels")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230427234833.1576130-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'struct rq's member '__lock' was renamed from 'lock' in 5.14.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230408055208.1283832-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It doesn't delete data in the task_data and lock_stat maps. The data
is kept there until it's consumed by userspace at the end. But it calls
bpf_map_update_elem() again and again, and the data will be discarded if
the map is full. This is not good.
Worse, in the bpf_map_update_elem(), it keeps trying to get a new node
even if the map was full. I guess it makes sense if it deletes some node
like in the tstamp map (that's why I didn't make the change there).
In a pre-allocated hash map, that means it'd iterate all CPU to check the
freelist. And it has a bad performance impact on large machines.
I've checked it on my 64 CPU machine with this.
$ perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run
Total time: 2.825 [sec]
And I used the task mode, so that it can guarantee the map is full.
The default map entry size is 16K and this workload has 40K tasks.
Before:
$ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run
Total time: 11.299 [sec]
contended total wait max wait avg wait pid comm
19284 3.51 s 3.70 ms 181.91 us 1305863 sched-messaging
243 84.09 ms 466.67 us 346.04 us 1336608 sched-messaging
177 66.35 ms 12.08 ms 374.88 us 1220416 node
For some reason, it didn't report the data failures. But you can see the
total time in the workload is increased a lot (2.8 -> 11.3). If it fails
early when the map is full, it goes back to normal.
After:
$ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run
Total time: 3.044 [sec]
contended total wait max wait avg wait pid comm
18743 591.92 ms 442.96 us 31.58 us 1431454 sched-messaging
51 210.64 ms 207.45 ms 4.13 ms 1468724 sched-messaging
81 68.61 ms 65.79 ms 847.07 us 1463183 sched-messaging
=== output for debug ===
bad: 1164137, total: 2253341
bad rate: 51.66 %
histogram of failure reasons
task: 0
stack: 0
time: 0
data: 1164137
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It's possible to fail to update the data when the lock_stat map is full.
We should check that case and show the number at the end.
$ sudo ./perf lock con -ablv -E3 -- ./perf bench sched messaging
...
contended total wait max wait avg wait address symbol
6157 208.48 ms 69.29 us 33.86 us ffff934c001c1f00 (spinlock)
4030 72.04 ms 61.84 us 17.88 us ffff934c000415c0 (spinlock)
3201 50.30 ms 47.73 us 15.71 us ffff934c2eead850 (spinlock)
=== output for debug ===
bad: 0, total: 13388
bad rate: 0.00 %
histogram of failure reasons
task: 0
stack: 0
time: 0
data: 0 <----- added
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The BPF hash map will align the map size to a power of 2. So 10k would
be 16k anyway. Let's have the actual size to avoid confusions.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It can fail to collect lock stat from BPF for various reasons. For
example, I've got a report that sometimes time calculation seems wrong
in case of contended spinlocks. I suspect the time delta went negative
for some reason.
Count them separately and show in the output like below:
$ sudo perf lock contention -abE5 sleep 10
contended total wait max wait avg wait type caller
13 785.61 us 79.36 us 60.43 us spinlock remove_wait_queue+0x14
10 469.02 us 87.51 us 46.90 us spinlock prepare_to_wait+0x27
9 289.09 us 69.08 us 32.12 us spinlock finish_wait+0x36
114 251.05 us 8.56 us 2.20 us spinlock try_to_wake_up+0x1f5
132 188.63 us 5.01 us 1.43 us spinlock __wake_up_common_lock+0x62
=== output for debug ===
bad: 1, total: 279
bad rate: 0.36 %
histogram of failure reasons
task: 1
stack: 0
time: 0
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Using the BPF_PROG_RUN mechanism, we can run a raw_tp BPF program to
collect some semi-global locks like per-cpu locks. Let's add runqueue
locks using bpf_per_cpu_ptr() helper.
$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol
248 3.25 ms 32.23 us 13.10 us ffff8cc75cfd2940 siglock
60 217.91 us 9.69 us 3.63 us ffff8cc700061c00
8 70.23 us 13.86 us 8.78 us ffff8cc703629484
4 56.32 us 35.81 us 14.08 us ffff8cc78b66f778 mmap_lock
4 16.70 us 5.18 us 4.18 us ffff8cc7036a0684
3 4.99 us 2.65 us 1.66 us ffff8d053da30c80 rq_lock
2 3.44 us 2.28 us 1.72 us ffff8d053dcf0c80 rq_lock
9 2.51 us 371 ns 278 ns ffff8ccb92479440
2 2.11 us 1.24 us 1.06 us ffff8d053db30c80 rq_lock
2 2.06 us 1.69 us 1.03 us ffff8d053d970c80 rq_lock
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Likewise, we can display siglock by following the pointer like
current->sighand->siglock.
$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol
16 2.18 ms 305.35 us 136.34 us ffffffff92e06080 tasklist_lock
28 521.78 us 31.16 us 18.63 us ffff8cc703783ec4
7 119.03 us 23.55 us 17.00 us ffff8ccb92479440
15 88.29 us 10.06 us 5.89 us ffff8cd560b5f380 siglock
7 37.67 us 9.16 us 5.38 us ffff8d053daf0c80
5 8.81 us 4.92 us 1.76 us ffff8d053d6b0c80
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes there are severe contentions on the mmap_lock and we want
see it in the -l/--lock-addr output. However it cannot symbolize
the mmap_lock because it's allocated dynamically without symbols.
Stephane and Hao gave me an idea separately to display mmap_lock by
following the current->mm pointer. I added a flag to mark mmap_lock
after comparing the lock address so that it can show them differently.
With this change it can show mmap_lock like below:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80
Note that mmap_lock was renamed some time ago and it needs to support
old kernels with a different name 'mmap_sem'.
Suggested-by: Hao Luo <haoluo@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
__has_builtin was passed the macro rather than the actual builtin
feature. The builtin test isn't sufficient and a clang version test
also needs to be performed.
Fixes: 1bece1351c653c3d ("perf lock contention: Support old rw_semaphore type")
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230308003020.3653271-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The old kernel has a different type of the owner field in rwsem. We can
check it using bpf_core_type_matches() builtin in clang but it also
needs its own version check since it's available on recent versions.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230207002403.63590-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When there're many lock contentions in the system, people sometimes want
to know who caused the contention, IOW who's the owner of the locks.
The -o/--lock-owner option tries to follow the lock owners for the
contended mutexes and rwsems from BPF, and then attributes the
contention time to the owner instead of the waiter. It's a best effort
approach to get the owner info at the time of the contention and doesn't
guarantee to have the precise tracking of owners if it's changing over
time.
Currently it only handles mutex and rwsem that have owner field in their
struct and it basically points to a task_struct that owns the lock at
the moment.
Technically its type is atomic_long_t and it comes with some LSB bits
used for other meanings. So it needs to clear them when casting it to a
pointer to task_struct.
Also the atomic_long_t is a typedef of the atomic 32 or 64 bit types
depending on arch which is a wrapper struct for the counter value. I'm
not aware of proper ways to access those kernel atomic types from BPF so
I just read the internal counter value directly. Please let me know if
there's a better way.
When -o/--lock-owner option is used, it goes to the task aggregation
mode like -t/--threads option does. However it cannot get the owner for
other lock types like spinlock and sometimes even for mutex.
$ sudo ./perf lock con -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome
As you can see, the owner is unknown for the most cases. But if we
filter only for the mutex locks, it'd more likely get the onwers.
$ sudo ./perf lock con -abo -Y mutex -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.910 [sec]
4.910435 usecs/op
203647 ops/sec
contended total wait max wait avg wait pid owner
2 15.50 us 8.29 us 7.75 us 1582852 sched-pipe
7 7.20 us 2.47 us 1.03 us -1 Unknown
1 6.74 us 6.74 us 6.74 us 1582851 sched-pipe
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230207002403.63590-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It'd be useful to filter other than the current aggregation mode. For
example, users may want to see callstacks for specific locks only. Or
they may want tasks from a certain callstack.
The tracepoints already collected the information but it needs to check
the condition again when processing the event. And it needs to change
BPF to allow the key combinations.
The lock contentions on 'rcu_state' spinlock can be monitored:
$ sudo perf lock con -abv -L rcu_state sleep 1
...
contended total wait max wait avg wait type caller
4 151.39 us 62.57 us 37.85 us spinlock rcu_core+0xcb
0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46
0xffffffff8172d76b rcu_core+0xcb
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
0xffffffff81d49f78 cpuidle_enter_state+0xd8
0xffffffff81d4a259 cpuidle_enter+0x29
1 30.21 us 30.21 us 30.21 us spinlock rcu_core+0xcb
0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46
0xffffffff8172d76b rcu_core+0xcb
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc00c4 sysvec_apic_timer_interrupt+0x54
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
1 28.84 us 28.84 us 28.84 us spinlock rcu_accelerate_cbs_unlocked+0x40
0xffffffff81fd1c60 _raw_spin_lock+0x30
0xffffffff81728cf0 rcu_accelerate_cbs_unlocked+0x40
0xffffffff8172da82 rcu_core+0x3e2
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
0xffffffff81d49f78 cpuidle_enter_state+0xd8
...
To see tasks calling 'rcu_core' function:
$ sudo perf lock con -abt -S rcu_core sleep 1
contended total wait max wait avg wait pid comm
19 23.46 us 2.21 us 1.23 us 0 swapper
2 18.37 us 17.01 us 9.19 us 2061859 ThreadPoolForeg
3 5.76 us 1.97 us 1.92 us 3909 pipewire-pulse
1 2.26 us 2.26 us 2.26 us 1809271 MediaSu~isor #2
1 1.97 us 1.97 us 1.97 us 1514882 Chrome_ChildIOT
1 987 ns 987 ns 987 ns 3740 pipewire-pulse
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230203021324.143540-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Likewise, add addr_filter BPF hash map and check it with the lock
address.
$ sudo ./perf lock con -ab -L tasklist_lock -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.169 [sec]
contended total wait max wait avg wait type caller
18 174.09 us 25.31 us 9.67 us rwlock:W do_exit+0x36d
5 32.34 us 10.87 us 6.47 us rwlock:R do_wait+0x8b
4 15.41 us 4.73 us 3.85 us rwlock:W release_task+0x6e
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Likewise, add type_filter BPF hash map and check it when user gave a
lock type filter.
$ sudo ./perf lock con -ab -Y rwlock -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.203 [sec]
contended total wait max wait avg wait type caller
15 156.19 us 19.45 us 10.41 us rwlock:W do_exit+0x36d
1 11.12 us 11.12 us 11.12 us rwlock:R do_wait+0x8b
1 5.09 us 5.09 us 5.09 us rwlock:W release_task+0x6e
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The -l/--lock-addr option is to implement per-lock-instance contention
stat using LOCK_AGGR_ADDR. It displays lock address and optionally
symbol name if exists.
$ sudo ./perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
1 36.28 us 36.28 us 36.28 us ffff92615d6448b8
9 10.91 us 1.84 us 1.21 us ffffffffbaed50c0 rcu_state
1 10.49 us 10.49 us 10.49 us ffff9262ac4f0c80
8 4.68 us 1.67 us 585 ns ffffffffbae07a40 jiffies_lock
3 3.03 us 1.45 us 1.01 us ffff9262277861e0
1 924 ns 924 ns 924 ns ffff926095ba9d20
1 436 ns 436 ns 436 ns ffff9260bfda4f60
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The BPF didn't show the per-thread stat properly. Use task's thread id (PID)
as a key instead of stack_id and add a task_data map to save task comm names.
$ sudo ./perf lock con -abt -E 5 sleep 1
contended total wait max wait avg wait pid comm
1 740.66 ms 740.66 ms 740.66 ms 1950 nv_queue
3 305.50 ms 298.19 ms 101.83 ms 1884 nvidia-modeset/
1 25.14 us 25.14 us 25.14 us 2725038 EventManager_De
12 23.09 us 9.30 us 1.92 us 0 swapper
1 20.18 us 20.18 us 20.18 us 2725033 EventManager_De
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Accessing BPF maps should use the same data types. Add bpf_skel/lock_data.h
to define the common data structures. No functional changes.
Committer notes:
Fixed contention_key.stack_id missing rename to contention_key.stack_or_task_id.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It caused some troubles when a lock inside kmalloc is contended
because task local storage would allocate memory using kmalloc.
It'd create a recusion and even crash in my system.
There could be a couple of workarounds but I think the simplest
one is to use a pre-allocated hash map. We could fix the task
local storage to use the safe BPF allocator, but it takes time
so let's change this until it happens actually.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Chris Li <chriscli@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221118190109.1512674-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It was reported that it failed to build the BPF lock contention skeleton
on 32 bit arch due to the size of long. The lost count is used only for
reporting errors due to lack of stackmap space through bad_hist which
type is 'int'. Let's use int type then.
Fixes: 6d499a6b3d90277d ("perf lock: Print the number of lost entries for BPF")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20220926215638.3931222-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently it collects stack traces to max size then skip entries.
Because we don't have control how to skip perf callchains. But BPF can
do it with bpf_get_stackid() with a flag.
Say we have max-stack=4 and stack-skip=2, we get these stack traces.
Before: After:
.---> +---+ <--. .---> +---+ <--.
| | | | | | | |
| +---+ usable | +---+ |
max | | | max | | |
stack +---+ <--' stack +---+ usable
| | X | | | | |
| +---+ skip | +---+ |
| | X | | | | |
`---> +---+ `---> +---+ <--' <=== collection
| X |
+---+ skip
| X |
+---+
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220912055314.744552-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Like the normal 'perf lock contention' output, it'd print the number of
lost entries for BPF if exists or -v option is passed.
Currently it uses BROKEN_CONTENDED stat for the lost count (due to full
stack maps).
$ sudo perf lock con -a -b --map-nr-entries 128 sleep 5
...
=== output for debug===
bad: 43, total: 14903
bad rate: 0.29 %
histogram of events caused bad sequence
acquire: 0
acquired: 0
contended: 43
release: 0
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Blake Jones <blakejones@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220802191004.347740-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add -a/--all-cpus and -C/--cpu options for cpu filtering. Also -p/--pid
and --tid options are added for task filtering. The short -t option is
taken for --threads already. Tracking the command line workload is
possible as well.
$ sudo perf lock contention -a -b sleep 1
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Blake Jones <blakejones@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220729200756.666106-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add -b/--use-bpf option to use BPF to collect lock contention stats.
For simplicity it now runs system-wide and requires C-c to stop.
Upcoming changes will add the usual filtering.
$ sudo perf lock con -b
^C
contended total wait max wait avg wait type caller
42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20
23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a
6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30
3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c
1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115
1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148
2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b
1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06
2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f
1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Blake Jones <blakejones@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220729200756.666106-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|