summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
AgeCommit message (Collapse)Author
2025-01-24Merge tag 'perf-tools-for-v6.14-2025-01-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf-tools updates from Namhyung Kim: "There are a lot of changes in the perf tools in this cycle. build: - Use generic syscall table to generate syscall numbers on supported archs - This also enables to get rid of libaudit which was used for syscall numbers - Remove python2 support as it's deprecated for years - Fix issues on static build with libzstd perf record: - Intel-PT supports "aux-action" config term to pause or resume tracing in the aux-buffer. Users can start the intel_pt event as "started-paused" and configure other events to control the Intel-PT tracing: # perf record --kcore -e intel_pt/aux-action=start-paused/ \ -e syscalls:sys_enter_newuname/aux-action=resume/ \ -e syscalls:sys_exit_newuname/aux-action=pause/ -- uname This requires kernel support (which was added in v6.13) perf lock: - 'perf lock contention' command has an ability to symbolize locks in dynamically allocated objects using slab cache name when it runs with BPF. Those dynamic locks would have "&" prefix in the name to distinguish them from ordinary (static) locks # perf lock con -abl -E 5 sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) This also requires kernel/BPF support (which was added in v6.13) perf ftrace: - 'perf ftrace latency' command gets a couple of options to support linear buckets instead of exponential. Also it's possible to specify max and min latency for the linear buckets: # perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \ --min-latency=200 --max-latency=800 -- sleep 1 # DURATION | COUNT | GRAPH | 0 - 200 ns | 186 | ### | 200 - 300 ns | 256 | ##### | 300 - 400 ns | 364 | ####### | 400 - 500 ns | 223 | #### | 500 - 600 ns | 111 | ## | 600 - 700 ns | 41 | | 700 - 800 ns | 141 | ## | 800 - ... ns | 169 | ### | # statistics (in nsec) total time: 2162212 avg time: 967 max time: 16817 min time: 132 count: 2236 - As you can see in the above example, it nows shows the statistics at the end so that users can see the avg/max/min latencies easily - 'perf ftrace profile' command has --graph-opts option like 'perf ftrace trace' so that it can control the tracing behaviors in the same way. For example, it can limit the function call depth or threshold perf script: - Improve physical memory resolution in 'mem-phys-addr' script by parsing /proc/iomem file # perf script mem-phys-addr -- find / ... Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-85f7fffff : System RAM 8929 69.7 547600000-54785d23f : Kernel data 1240 9.7 546a00000-5474bdfff : Kernel rodata 490 3.8 5480ce000-5485fffff : Kernel bss 121 0.9 0-fff : Reserved 3860 30.1 100000-89c01fff : System RAM 18 0.1 8a22c000-8df6efff : System RAM 5 0.0 Others: - 'perf test' gets --runs-per-test option to run the test cases repeatedly. This would be helpful to see if it's flaky - Add 'parse_events' method to Python perf extension module, so that users can use the same event parsing logic in the python code. One more step towards implementing perf tools in Python. :) - Support opening tracepoint events without libtraceevent. This will be helpful if it won't use the tracing data like in 'perf stat' - Update ARM Neoverse N2/V2 JSON events and metrics" * tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (176 commits) perf test: Update event_groups test to use instructions perf bench: Fix undefined behavior in cmpworker() perf annotate: Prefer passing evsel to evsel->core.idx perf lock: Rename fields in lock_type_table perf lock: Add percpu-rwsem for type filter perf lock: Fix parse_lock_type which only retrieve one lock flag perf lock: Fix return code for functions in __cmd_contention perf hist: Fix width calculation in hpp__fmt() perf hist: Fix bogus profiles when filters are enabled perf hist: Deduplicate cmp/sort/collapse code perf test: Improve verbose documentation perf test: Add a runs-per-test flag perf test: Fix parallel/sequential option documentation perf test: Send list output to stdout rather than stderr perf test: Rename functions and variables for better clarity perf tools: Expose quiet/verbose variables in Makefile.perf perf config: Add a function to set one variable in .perfconfig perf test perftool_testsuite: Return correct value for skipping perf test perftool_testsuite: Add missing description perf test record+probe_libc_inet_pton: Make test resilient ...
2025-01-17perf lock: Add percpu-rwsem for type filterChun-Tse Shao
percpu-rwsem was missing in man page. And for backward compatibility, replace `pcpu-sem` with `percpu-rwsem` before parsing lock name. Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y percpu-rwsem` Fixes: 4f701063bfa2 ("perf lock contention: Show lock type with address") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-16perf test: Improve verbose documentationIan Rogers
Add a little more detail on the output expectations for each verbose level. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-16perf test: Add a runs-per-test flagIan Rogers
To detect flakes it is useful to run tests more than once. Add a runs-per-test flag that will run each test multiple times. Example output: ``` $ perf test -r 3 lbr -v 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok 122: perf record LBR tests : Ok ``` Update the documentation for the runs-per-test option. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-16perf test: Fix parallel/sequential option documentationIan Rogers
The parallel option was removed in commit 94d1a913bdc4 ("perf test: Make parallel testing the default"). Update the sequential documentation to reflect it isn't the default except for "exclusive" tests. Fixes: 94d1a913bdc4 ("perf test: Make parallel testing the default") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250110045736.598281-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-10perf docs: arm_spe: Document new discard modeJames Clark
Document the flag along with PMU events to hint what it's used for and give an example with other useful options to get minimal output. Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2025-01-10perf tools: Remove dependency on libauditCharlie Jenkins
All architectures now support HAVE_SYSCALL_TABLE_SUPPORT, so the flag is no longer needed. With the removal of the flag, the related GENERIC_SYSCALL_TABLE can also be removed. libaudit was only used as a fallback for when HAVE_SYSCALL_TABLE_SUPPORT was not defined, so libaudit is also no longer needed for any architecture. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mickaël Salaün <mic@digikod.net> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250108-perf_syscalltbl-v6-16-7543b5293098@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-01-08perf ftrace profile: Add --graph-opts optionNamhyung Kim
Like trace subcommand, it should be able to pass some options to control the tracing behavior for the function graph tracer. But some options are limited in order to maintain the internal behavior. For example, it can limit the function call depth like below: # perf ftrace profile --graph-opts depth=5 -- myprog Committer testing: root@number:~# perf ftrace profile --graph-opts thresh=1000 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001419.301 500709.650 1000032.000 2 x64_sys_call 1000032.000 1000032.000 1000032.000 1 __x64_sys_clock_nanosleep 1000032.000 1000032.000 1000032.000 1 common_nsleep 1000031.000 1000031.000 1000031.000 1 do_nanosleep 1000031.000 1000031.000 1000031.000 1 hrtimer_nanosleep 1000024.000 1000024.000 1000024.000 1 schedule 1387.208 1387.208 1387.208 1 __x64_sys_execve 1386.691 1386.691 1386.691 1 do_execveat_common.isra.0 1334.170 1334.170 1334.170 1 bprm_execve 1258.413 1258.413 1258.413 1 load_elf_binary 1123.068 1123.068 1123.068 1 begin_new_exec 1113.550 1113.550 1113.550 1 mmput 1109.237 1109.237 1109.237 1 exit_mmap root@number:~# perf ftrace profile --graph-opts thresh=1200 -- sleep 1 # Total (us) Avg (us) Max (us) Count Function 1001448.204 500724.102 1000018.000 2 x64_sys_call 1000017.000 1000017.000 1000017.000 1 __x64_sys_clock_nanosleep 1000017.000 1000017.000 1000017.000 1 common_nsleep 1000017.000 1000017.000 1000017.000 1 hrtimer_nanosleep 1000016.000 1000016.000 1000016.000 1 do_nanosleep 1000012.000 1000012.000 1000012.000 1 schedule 1430.112 1430.112 1430.112 1 __x64_sys_execve 1429.581 1429.581 1429.581 1 do_execveat_common.isra.0 1376.289 1376.289 1376.289 1 bprm_execve 1301.743 1301.743 1301.743 1 load_elf_binary root@number:~# Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250107224352.1128669-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-26perf docs: Add documentation for --force-btf optionHoward Chu
The --force-btf option is intended for debugging purposes and is currently undocumented. Add documentation for it. Committer notes: We need a follow up patch expanding on what can be done via BTF and what isn't possible and thus needs further work to convert kernel C source code into tables that can then be associated with syscall integer args and struct members, as discussed in: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Howard Chu <howardchu95@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20241215190712.787847-3-howardchu95@gmail.com Link: https://lore.kernel.org/all/20241215190712.787847-3-howardchu95@gmail.com/T/#mcfbba653200775c59c730705229a49b34a153db7 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf intel-pt: Add documentation for pause / resumeAdrian Hunter
Document the use of aux-action config term and provide a simple example. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf intel-pt: Improve man page formatAdrian Hunter
Improve format of config terms and section references. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf tools: Parse aux-actionAdrian Hunter
Add parsing for aux-action to accept "pause", "resume" or "start-paused" values. "start-paused" is valid only for AUX area events. "pause" and "resume" are valid only for events grouped with an AUX area event as the group leader. However, like with aux-output, the events will be automatically grouped if they are not currently in a group, and the AUX area event precedes the other events. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20241216070244.14450-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-10perf ftrace latency: Add --max-latency optionGabriele Monaco
This patch adds a max-latency option as discussed, in case the number of buckets is more than 22, we don't observe the setting (for now, let's say). By default or if 0 is passed, the value is automatically determined based on the number of buckets, range and minimum, so that we fill all available buffers (equivalent to the behaviour before this patch). We now get something like this: # perf ftrace latency --bucket-range=20 \ --min-latency 10 \ --max-latency=100 \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 10 us | 1731 | ################ | 10 - 30 us | 1 | | 30 - 50 us | 0 | | 50 - 70 us | 0 | | 70 - 90 us | 0 | | 90 - 100 us | 0 | | 100 - ... us | 0 | | Note the maximum is observed also if it doesn't cover completely a full range (the second to last range is 10us long to let the last start at 100 sharp), this looks to me more sensible and eases the computations, since we don't need to account for the range while filling the buckets. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-10perf ftrace latency: Introduce --min-latency to narrow down into a latency rangeArnaldo Carvalho de Melo
Things below and over will be in the first and last, outlier, buckets. Without it: # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 200 ns | 0 | | 200 - 400 ns | 44 | | 400 - 600 ns | 291 | # | 600 - 800 ns | 506 | ## | 800 - 1000 ns | 148 | | 1.00 - 1.20 us | 581 | ## | 1.20 - 1.40 us | 2199 | ########## | 1.40 - 1.60 us | 1048 | #### | 1.60 - 1.80 us | 1448 | ###### | 1.80 - 2.00 us | 1091 | ##### | 2.00 - 2.20 us | 517 | ## | 2.20 - 2.40 us | 318 | # | 2.40 - 2.60 us | 370 | # | 2.60 - 2.80 us | 271 | # | 2.80 - 3.00 us | 150 | | 3.00 - 3.20 us | 85 | | 3.20 - 3.40 us | 48 | | 3.40 - 3.60 us | 40 | | 3.60 - 3.80 us | 22 | | 3.80 - 4.00 us | 13 | | 4.00 - 4.20 us | 14 | | 4.20 - ... us | 626 | ## | # # perf ftrace latency --use-nsec --use-bpf \ --bucket-range=20 --min-latency=1200 \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 1200 ns | 1243 | ##### | 1.20 - 1.22 us | 141 | | 1.22 - 1.24 us | 202 | | 1.24 - 1.26 us | 209 | | 1.26 - 1.28 us | 219 | | 1.28 - 1.30 us | 208 | | 1.30 - 1.32 us | 245 | # | 1.32 - 1.34 us | 246 | # | 1.34 - 1.36 us | 224 | # | 1.36 - 1.38 us | 219 | | 1.38 - 1.40 us | 206 | | 1.40 - 1.42 us | 190 | | 1.42 - 1.44 us | 190 | | 1.44 - 1.46 us | 146 | | 1.46 - 1.48 us | 140 | | 1.48 - 1.50 us | 125 | | 1.50 - 1.52 us | 115 | | 1.52 - 1.54 us | 102 | | 1.54 - 1.56 us | 87 | | 1.56 - 1.58 us | 90 | | 1.58 - 1.60 us | 85 | | 1.60 - ... us | 5487 | ######################## | # Now we want focus on the latencies starting at 1.2us, with a finer grained range of 20ns: This is all on a live system, so statistically interesting, but not narrowing down on the same numbers, so a 'perf ftrace latency record' seems interesting to then use all on the same snapshot of latencies. A --max-latency counterpart should come next, at first limiting the max-latency to 20 * bucket-size, as we have a fixed buckets array with 20 + 2 entries (+ for the outliers) and thus would need to make it larger for higher latencies. We also may need a way to ask for not considering the out of range values (first and last buckets) when drawing the buckets bars. Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-4-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-10perf ftrace latency: Introduce --bucket-range to ask for linear bucketingArnaldo Carvalho de Melo
In addition to showing it exponentially, using log2() to figure out the histogram index, allow for showing it linearly: The preexisting more, the default: # perf ftrace latency --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 1 ns | 0 | | 1 - 2 ns | 0 | | 2 - 4 ns | 0 | | 4 - 8 ns | 0 | | 8 - 16 ns | 0 | | 16 - 32 ns | 0 | | 32 - 64 ns | 0 | | 64 - 128 ns | 238 | # | 128 - 256 ns | 1704 | ########## | 256 - 512 ns | 672 | ### | 512 - 1024 ns | 4458 | ########################## | 1 - 2 us | 677 | #### | 2 - 4 us | 5 | | 4 - 8 us | 0 | | 8 - 16 us | 0 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - ... ms | 0 | | # The new histogram mode: # perf ftrace latency --bucket-range=150 --use-nsec --use-bpf \ -T switch_mm_irqs_off -a sleep 2 # DURATION | COUNT | GRAPH | 0 - 1 ns | 0 | | 1 - 151 ns | 265 | # | 151 - 301 ns | 1797 | ########### | 301 - 451 ns | 258 | # | 451 - 601 ns | 289 | # | 601 - 751 ns | 2049 | ############# | 751 - 901 ns | 967 | ###### | 901 - 1051 ns | 513 | ### | 1.05 - 1.20 us | 114 | | 1.20 - 1.35 us | 559 | ### | 1.35 - 1.50 us | 189 | # | 1.50 - 1.65 us | 137 | | 1.65 - 1.80 us | 32 | | 1.80 - 1.95 us | 2 | | 1.95 - 2.10 us | 0 | | 2.10 - 2.25 us | 1 | | 2.25 - 2.40 us | 1 | | 2.40 - 2.55 us | 0 | | 2.55 - 2.70 us | 0 | | 2.70 - 2.85 us | 0 | | 2.85 - 3.00 us | 1 | | 3.00 - ... us | 4 | | # Co-developed-by: Gabriele Monaco <gmonaco@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20241112181214.1171244-3-acme@kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf config: Fix trival typo 'an' -> 'can'Arnaldo Carvalho de Melo
Just a trivial typo, should be 'can', did a spell check on the rest of the file just in case, nothing more stood out. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-13perf disasm: Allow configuring what disassemblers to useArnaldo Carvalho de Melo
The perf tools annotation code used for a long time parsing the output of binutils's objdump (or its reimplementations, like llvm's) to then parse and augment it with samples, allow navigation, etc. More recently disassemblers from the capstone and llvm (libraries, not parsing the output of tools using those libraries to mimic binutils's objdump output) were introduced. So when all those methods are available, there is a static preference for a series of attempts of disassembling a binary, with the 'llvm, capstone, objdump' sequence being hard coded. This patch allows users to change that sequence, specifying via a 'perf config' 'annotate.disassemblers' entry which and in what order disassemblers should be attempted. As alluded to in the comments in the source code of this series, this flexibility is useful for users and developers alike, elliminating the requirement to rebuild the tool with some specific set of libraries to see how the output of disassembling would be for one of these methods. root@x1:~# rm -f ~/.perfconfig root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> symbol__disassemble: filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, sym=update_load_avg, start=0xffffffffb6148fe0, en> annotating [0x6ff7170] /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : [0x7407ca0] update_load_avg Disassembled with llvm annotate.disassemblers=llvm,capstone,objdump Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = capstone root@x1:~# root@x1:~# perf annotate -v --stdio2 update_load_avg <SNIP> Disassembled with capstone annotate.disassemblers=capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent 0xffffffff81148fe0 <update_load_avg>: 1.61 pushq %r15 pushq %r14 1.00 pushq %r13 movl %edx,%r13d 1.90 pushq %r12 pushq %rbp movq %rsi,%rbp pushq %rbx movq %rdi,%rbx subq $0x18,%rsp 15.14 movl 0x1a4(%rdi),%eax root@x1:~# perf config annotate.disassemblers=objdump,capstone root@x1:~# perf config annotate.disassemblers annotate.disassemblers=objdump,capstone root@x1:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump,capstone root@x1:~# perf annotate -v --stdio2 update_load_avg Executing: objdump --start-address=0xffffffff81148fe0 \ --stop-address=0xffffffff811497aa \ -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump,capstone Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, Event count (approx.): 5185444, [percent: local period] update_load_avg() /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux Percent Disassembly of section .text: ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 ffffffff81148fe0 <update_load_avg>: #define DO_ATTACH 0x4 #define DO_DETACH 0x8 /* Update task and its cfs_rq load average */ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { 1.61 push %r15 push %r14 1.00 push %r13 mov %edx,%r13d 1.90 push %r12 push %rbp mov %rsi,%rbp push %rbx mov %rdi,%rbx sub $0x18,%rsp } /* rq->task_clock normalized against any time this cfs_rq has spent throttled */ static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq) { if (unlikely(cfs_rq->throttle_count)) 15.14 mov 0x1a4(%rdi),%eax root@x1:~# After adding a way to select the disassembler from the command line a 'perf test' comparing the output of the various diassemblers should be introduced, to test these codebases. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steinar H. Gunderson <sesse@google.com> Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-09perf docs: Document tool and hwmon eventsIan Rogers
Add a few paragraphs on tool and hwmon events. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20241109003759.473460-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-29perf arm-spe: Update --itrace help textGraham Woodward
The --itrace help now needs updating to reflect that the --itrace=b argument sythesises branches as well as branch misses. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-5-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-21perf test: Document the -w/--workload optionArnaldo Carvalho de Melo
Wasn't documented so far, mention that it is mostly used in the shell regression tests. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Clark Williams <williams@redhat.com> Link: https://lore.kernel.org/r/20241020021842.1752770-4-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-18perf build: Rename HAVE_DWARF_SUPPORT to HAVE_LIBDW_SUPPORTIan Rogers
In Makefile.config for unwinding the name dwarf implies either libunwind or libdw. Make it clearer that HAVE_DWARF_SUPPORT is really just defined when libdw is present by renaming to HAVE_LIBDW_SUPPORT. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-18perf libdw: Remove unnecessary definesIan Rogers
As HAVE_DWARF_GETLOCATIONS_SUPPORT and HAVE_DWARF_CFI_SUPPORT always match HAVE_DWARF_SUPPORT remove the macros and use HAVE_DWARF_SUPPORT. If building the file is guarded by CONFIG_DWARF then remove all ifs. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Anup Patel <anup@brainfault.org> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shenlin Liang <liangshenlin@eswincomputing.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Chen Pei <cp0613@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Aditya Gupta <adityag@linux.ibm.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: Bibo Mao <maobibo@loongson.cn> Cc: John Garry <john.g.garry@oracle.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-csky@vger.kernel.org Link: https://lore.kernel.org/r/20241017001354.56973-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf sched timehist: Add pre-migration wait time optionMadadi Vineeth Reddy
pre-migration wait time is the time that a task unnecessarily spends on the runqueue of a CPU but doesn't get switched-in there. In terms of tracepoints, it is the time between sched:sched_wakeup and sched:sched_migrate_task. Let's say a task woke up on CPU2, then it got migrated to CPU4 and then it's switched-in to CPU4. So, here pre-migration wait time is time that it was waiting on runqueue of CPU2 after it is woken up. The general pattern for pre-migration to occur is: sched:sched_wakeup sched:sched_migrate_task sched:sched_switch The sched:sched_waking event is used to capture the wakeup time, as it aligns with the existing code and only introduces a negligible time difference. pre-migrations are generally not useful and it increases migrations. This metric would be helpful in testing patches mainly related to wakeup and load-balancer code paths as better wakeup logic would choose an optimal CPU where task would be switched-in and thereby reducing pre- migrations. The sample output(s) when -P or --pre-migrations is used: ================= time cpu task name wait time sch delay run time pre-mig time [tid/pid] (msec) (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- --------- 38456.720806 [0001] schbench[28634/28574] 4.917 4.768 1.004 0.000 38456.720810 [0001] rcu_preempt[18] 3.919 0.003 0.004 0.000 38456.721800 [0006] schbench[28779/28574] 23.465 23.465 1.999 0.000 38456.722800 [0002] schbench[28773/28574] 60.371 60.237 3.955 60.197 38456.722806 [0001] schbench[28634/28574] 0.004 0.004 1.996 0.000 38456.722811 [0001] rcu_preempt[18] 1.996 0.005 0.005 0.000 38456.723800 [0000] schbench[28833/28574] 4.000 4.000 3.999 0.000 38456.723800 [0004] schbench[28762/28574] 42.951 42.839 3.999 39.867 38456.723802 [0007] schbench[28812/28574] 43.947 43.817 3.999 40.866 38456.723804 [0001] schbench[28587/28574] 7.935 7.822 0.993 0.000 Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Link: https://lore.kernel.org/r/20241004170756.18064-1-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-10perf report: Display columns Predicted/Abort/Cycles in --branch-historyThomas Falcon
The original commit message: " Use current sort mechanism but the real .se_cmp() just returns 0 so that new columns "Predicted", "Abort" and "Cycles" are created in display but actually these keys are not the sort keys. For example: Overhead Source:Line Symbol Shared Object Predicted Abort Cycles ........ ............ ........ ............. ......... ..... ...... 38.25% div.c:45 [.] main div 97.6% 0 3 " Update missed commit from series "perf report: Show branch flags/cycles in --branch-history callgraph view" to apply to current repository so that new columns described above are visible. Link to original series: https://lore.kernel.org/lkml/1477876794-30749-1-git-send-email-yao.jin@linux.intel.com/ Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Suggested-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20241010184046.203822-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-03perf list: update option desc in man pageYoshihiro Furudera
There is a difference between the SYNOPSIS section of the help message and the man page (tools/perf/Documentation/perf-list.txt) for the perf list command. After checking, we found that the help message reflected the latest specifications. Therefore, revised the SYNOPSIS section of the man page to match the help message. Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Liang Link: https://lore.kernel.org/r/20241003002404.2592094-1-fj5100bi@fujitsu.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-24perf scripting python: Add function to get a config valueJames Clark
This can be used to get config values like which objdump Perf uses for disassembly. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ruidong Tian <tianruidong@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Cc: John Garry <john.g.garry@oracle.com> Cc: scclevenger@os.amperecomputing.com Link: https://lore.kernel.org/r/20240916135743.1490403-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-09-04perf check: Fix inconsistencies in feature namesAditya Gupta
Fix two inconsistencies in feature names as discussed in [1]: 1. Rename "dwarf-unwind-support" to "dwarf-unwind" 2. 'get_cpuid' feature and 'HAVE_AUXTRACE_SUPPORT' names don't look related, change the feature name to 'auxtrace' to match the macro name, as 'get_cpuid' string is not used anywhere to check the feature presence [1]: https://lore.kernel.org/linux-perf-users/ZoRw5we4HLSTZND6@x1/ Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-7-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf check: Introduce 'check' subcommandAditya Gupta
Currently the presence of a feature is checked with a combination of perf version --build-options and greps, such as: perf version --build-options | grep " on .* HAVE_FEATURE" Instead of this, introduce a subcommand "perf check feature", with which scripts can test for presence of a feature, such as: perf check feature HAVE_FEATURE 'perf check feature' command is expected to have exit status of 0 if feature is built-in, and 1 if it's not built-in or if feature is not known. Multiple features can also be passed as a comma-separated list, in which case the exit status will be 1 only if all of the passed features are built-in. For example, with below command, it will have exit status of 0 only if both libtraceevent and bpf are enabled, else 1 in all other cases perf check feature libtraceevent,bpf The arguments are case-insensitive. An array 'supported_features' has also been introduced that can be used by other commands like 'perf version --build-options', so that new features can be added in one place, with the array Committer testing: $ perf check feature libtraceevent,bpf libtraceevent: [ on ] # HAVE_LIBTRACEEVENT bpf: [ on ] # HAVE_LIBBPF_SUPPORT $ perf check feature libtraceevent libtraceevent: [ on ] # HAVE_LIBTRACEEVENT $ perf check feature bpf bpf: [ on ] # HAVE_LIBBPF_SUPPORT $ perf check -q feature bpf && echo "BPF support is present" BPF support is present $ perf check -q feature Bogus && echo "Bogus support is present" $ Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904061836.55873-3-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf sched timehist: Add --prio optionYang Jihong
The --prio option is used to only show events for the given task priority(ies). The default is to show events for all priority tasks, which is consistent with the previous behavior. Testcase: # perf sched record nice -n 9 perf bench sched messaging -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 3.435 [sec] [ perf record: Woken up 270 times to write data ] [ perf record: Captured and wrote 618.688 MB perf.data (5729036 samples) ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --prio <prio> analyze events only for given task priority(ies) --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist --prio 140 Samples of sched_switch event do not have callchains. Invalid prio string # perf sched timehist --show-prio --prio 129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.765421 [0002] sched-messaging[1229618] 129 0.000 0.000 0.029 2090450.765445 [0007] sched-messaging[1229616] 129 0.000 0.062 0.043 2090450.765448 [0014] sched-messaging[1229619] 129 0.000 0.000 0.032 2090450.765478 [0013] sched-messaging[1229617] 129 0.000 0.065 0.048 2090450.765503 [0014] sched-messaging[1229622] 129 0.000 0.000 0.017 2090450.765550 [0002] sched-messaging[1229624] 129 0.000 0.000 0.021 2090450.765562 [0007] sched-messaging[1229621] 129 0.000 0.071 0.028 2090450.765570 [0005] sched-messaging[1229620] 129 0.000 0.064 0.066 2090450.765583 [0001] sched-messaging[1229625] 129 0.000 0.001 0.031 2090450.765595 [0013] sched-messaging[1229623] 129 0.000 0.060 0.028 2090450.765637 [0014] sched-messaging[1229628] 129 0.000 0.000 0.019 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 <SNIP> # perf sched timehist --show-prio --prio 0,120-129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.763231 [0000] perf[1229608] 120 0.000 0.000 0.000 2090450.763235 [0000] migration/0[15] 0 0.000 0.001 0.003 2090450.763263 [0001] perf[1229608] 120 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0 0.000 0.001 0.004 2090450.763302 [0002] perf[1229608] 120 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0 0.000 0.001 0.007 2090450.763338 [0003] perf[1229608] 120 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0 0.000 0.001 0.004 2090450.763459 [0004] perf[1229608] 120 0.000 0.000 0.000 2090450.763469 [0004] migration/4[39] 0 0.000 0.002 0.010 2090450.763496 [0005] perf[1229608] 120 0.000 0.000 0.000 2090450.763501 [0005] migration/5[45] 0 0.000 0.001 0.004 2090450.763613 [0006] perf[1229608] 120 0.000 0.000 0.000 2090450.763622 [0006] migration/6[51] 0 0.000 0.001 0.008 2090450.763652 [0007] perf[1229608] 120 0.000 0.000 0.000 2090450.763660 [0007] migration/7[57] 0 0.000 0.001 0.008 <SNIP> 2090450.765665 [0001] <idle> 120 0.031 0.031 0.081 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 2090450.765667 [0000] s1-perf[8235/7168] 120 0.008 0.000 0.004 2090450.765684 [0013] <idle> 120 0.028 0.028 0.088 2090450.765685 [0001] sched-messaging[1229630] 129 0.000 0.001 0.020 2090450.765688 [0000] <idle> 120 0.004 0.004 0.020 2090450.765689 [0002] <idle> 120 0.021 0.021 0.138 2090450.765691 [0005] sched-messaging[1229626] 129 0.000 0.085 0.029 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf sched timehist: Add --show-prio optionYang Jihong
The --show-prio option is used to display the priority of task. It is disabled by default, which is consistent with original behavior. The display format is xxx (priority does not change during task running) or xxx->yyy (priority changes during task running) Testcase: # perf sched record nice -n 9 true [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 0.497 MB perf.data ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 23952.006537 [0000] perf[534] 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0.000 0.014 0.056 23952.006899 [0001] perf[534] 0.000 0.000 0.000 23952.006947 [0001] migration/1[22] 0.000 0.015 0.047 23952.007138 [0002] perf[534] 0.000 0.000 0.000 <SNIP> # perf sched timehist --show-prio Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 23952.006537 [0000] perf[534] 120 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0 0.000 0.014 0.056 23952.006899 [0001] perf[534] 120 0.000 0.000 0.000 <SNIP> 23952.034843 [0003] nice[535] 120->129 0.189 0.024 23.314 <SNIP> 23952.053838 [0005] rcu_preempt[16] 120 3.993 0.000 0.023 23952.053990 [0005] <idle> 120 0.023 0.023 0.152 23952.054137 [0006] <idle> 120 1.427 1.427 17.855 23952.054278 [0007] <idle> 120 0.506 0.506 1.650 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-14perf script: Add branch countersKan Liang
It's useful to print the branch counter information for each jump in the brstackinsn when it's available. Add a new field 'brcntr' to display the branch counter information. By default, the abbreviation will be used to indicate the branch counter. In the verbose mode, the real event name is shown. $ perf script -F +brstackinsn,+brcntr # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: A # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 7 cycles [14] 0.43 IPC $ perf script -F +brstackinsn,+brcntr -v tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/os.linux.perf.test-suite/kernels/lbr_kernel/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 7 cycles [14] 0.43 IPC Originally-by: Tinghao Zhang <tinghao.zhang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-9-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-14perf report: Display the branch counter histogramKan Liang
Reusing the existing --total-cycles option to display the branch counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display the logged branch counter events. They are shown right after all the cycle-related annotations. Extend the 'struct block_info' to store and pass the branch counter related information. The annotation_br_cntr_entry() is to print the histogram of each branch counter event. If the number of logged events is less than 4, the exact number of the abbr name is printed. Otherwise, using '+' to stands for more than 3 events. Assume the number of logged events is less than 4. The annotation_br_cntr_abbr_list() prints the branch counter's abbreviation list. Press 'B' to display the list in the TUI mode. $ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter $ perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }' # Event count (approx.): 1610046 # # Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 |A |- | ... 25.27% 1.1M 0.00% 2 |AA |- | ... 15.61% 667.2K 0.00% 1 |A |- | ... 0.16% 6.9K 0.81% 575 |A |- | ... 0.16% 6.8K 1.38% 977 |AA |- | ... 0.16% 6.8K 0.04% 28 |AA |B | ... 0.15% 6.6K 1.33% 946 |A |- | ... 0.11% 4.5K 0.06% 46 |AAA+|- | ... 0.10% 4.4K 0.88% 624 |A |- | ... 0.09% 3.7K 0.74% 524 |AAA+|B | ... With -v applied, # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range] # ............... .............. ........... .......... .............. .................. # 57.55% 2.5M 0.00% 3 A=1 ,B=- ... 25.27% 1.1M 0.00% 2 A=2 ,B=- ... 15.61% 667.2K 0.00% 1 A=1 ,B=- ... 0.16% 6.9K 0.81% 575 A=1 ,B=- ... 0.16% 6.8K 1.38% 977 A=2 ,B=- ... 0.16% 6.8K 0.04% 28 A=2 ,B=1 ... 0.15% 6.6K 1.33% 946 A=1 ,B=- ... 0.11% 4.5K 0.06% 46 A=3+,B=- ... 0.10% 4.4K 0.88% 624 A=1 ,B=- ... 0.09% 3.7K 0.74% 524 A=3+,B=1 ... Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-7-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13perf Document: Add TPEBS (Timed PEBS(Precise Event-Based Sampling)) to DocumentsWeilin Wang
TPEBS (Timed PEBS(Precise Event-Based Sampling)) is a new feature Intel PMU from Granite Rapids microarchitecture. It will be used in new TMA (Top-Down Microarchitecture Analysis) releases. Add related introduction to documents while adding new code to support it in 'perf stat'. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240720062102.444578-8-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-13perf stat: Add command line option for enabling TPEBS recordingWeilin Wang
With this command line option, TPEBS recording is turned off in 'perf stat' on default. It will only be turned on when this option is given in 'perf stat' command. Example with --record-tpebs: perf stat -M tma_split_loads -C1-4 --record-tpebs sleep 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.044 MB - ] Performance counter stats for 'CPU(s) 1-4': 53,259,156,071 cpu_core/TOPDOWN.SLOTS/ # 1.6 % tma_split_loads (50.00%) 15,867,565,250 cpu_core/topdown-retiring/ (50.00%) 15,655,580,731 cpu_core/topdown-mem-bound/ (50.00%) 11,738,022,218 cpu_core/topdown-bad-spec/ (50.00%) 6,151,265,424 cpu_core/topdown-fe-bound/ (50.00%) 20,445,917,581 cpu_core/topdown-be-bound/ (50.00%) 6,925,098,013 cpu_core/L1D_PEND_MISS.PENDING/ (50.00%) 3,838,653,421 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ (50.00%) 4,797,059,783 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (50.00%) 11,931,916,714 cpu_core/CPU_CLK_UNHALTED.THREAD/ (50.00%) 102,576,164 cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/ (50.00%) 64,071,854 cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/ (50.00%) 3 cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/R 1.003049679 seconds time elapsed Example without --record-tpebs: perf stat -M tma_contested_accesses -C1 sleep 1 Performance counter stats for 'CPU(s) 1': 50,203,891 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_contested_accesses (63.60%) 10,040,777 cpu_core/topdown-retiring/ (63.60%) 6,890,729 cpu_core/topdown-mem-bound/ (63.60%) 2,756,463 cpu_core/topdown-bad-spec/ (63.60%) 10,828,288 cpu_core/topdown-fe-bound/ (63.60%) 28,350,432 cpu_core/topdown-be-bound/ (63.60%) 98 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/ (63.70%) 577,520 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ (54.62%) 313,339 cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/ (54.62%) 14,155 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (45.54%) 0 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/ (36.30%) 8,468,077 cpu_core/CPU_CLK_UNHALTED.THREAD/ (45.38%) 198 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/ (45.38%) 8,324 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (45.38%) 3,388,031,520 TSC 23,226,785 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (54.46%) 80 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/ (54.46%) 0 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/R 0 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/R 1,006,816,667 ns duration_time 1.002537737 seconds time elapsed Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240720062102.444578-7-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf docs: Refine the description for the buffer sizeLeo Yan
Current description for the AUX trace buffer size is misleading. When a user specifies the option '-m,512M', it represents a size value in bytes (512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB). Make the document clear that the normal buffer and the AUX tracing buffer share the same semantics. Syncs the documents for consistent text. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812093459.2575278-1-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf script: add --addr2line optionMartin Liška
Similarly to other subcommands (like report, top), it would be handy to provide a path for addr2line command. Signed-off-by: Martin Liska <martin.liska@hey.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf annotate: Add --skip-empty optionNamhyung Kim
Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report. For example, the following command would use 3 events including dummy. $ perf mem record -a -- perf test -w noploop $ perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u Just using perf annotate with --group will show the all 3 events. $ perf annotate --group --stdio | head Percent | Source code & Disassembly of ... -------------------------------------------------------------- : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 0.00 : e060: pushq %rbp 0.00 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 0.00 : e064: pushq %r15 0.00 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 0.00 : e069: pushq %r14 0.00 0.00 0.00 : e06b: pushq %r13 0.00 0.00 0.00 : e06d: movl %edx, %r13d Now with --skip-empty, it'll hide the last dummy event. $ perf annotate --group --stdio --skip-empty | head Percent | Source code & Disassembly of ... ------------------------------------------------------ : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 : e060: pushq %rbp 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 : e064: pushq %r15 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 : e069: pushq %r14 0.00 0.00 : e06b: pushq %r13 0.00 0.00 : e06d: movl %edx, %r13d Committer testing: root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# Before: root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 0.00 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# After: root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf mem: Update documentation for new optionsNamhyung Kim
Add a common options section and move some items to the section. Also add description of new options to report options. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/20240802180913.1023886-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-01perf record: Add --setup-filter optionNamhyung Kim
To allow BPF filters for unprivileged users it needs to pin the BPF objects to BPF-fs first. Let's add a new option to pin and unpin the objects easily. I'm not sure 'perf record' is a right place to do this but I don't have a better idea right now. $ sudo perf record --setup-filter pin The above command would pin BPF program and maps for the filter when the system has BPF-fs (usually at /sys/fs/bpf/). To unpin the objects, users can run the following command (as root). $ sudo perf record --setup-filter unpin Committer testing: root@number:~# perf record --setup-filter pin root@number:~# ls -la /sys/fs/bpf/perf_filter/ total 0 drwxr-xr-x. 2 root root 0 Jul 31 10:43 . drwxr-xr-t. 3 root root 0 Jul 31 10:43 .. -rw-rw-rw-. 1 root root 0 Jul 31 10:43 dropped -rw-rw-rw-. 1 root root 0 Jul 31 10:43 filters -rwxrwxrwx. 1 root root 0 Jul 31 10:43 perf_sample_filter -rw-rw-rw-. 1 root root 0 Jul 31 10:43 pid_hash -rw-------. 1 root root 0 Jul 31 10:43 sample_f_rodata root@number:~# ls -la /sys/fs/bpf/perf_filter/perf_sample_filter -rwxrwxrwx. 1 root root 0 Jul 31 10:43 /sys/fs/bpf/perf_filter/perf_sample_filter root@number:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31perf ftrace profile: Add -s/--sort optionNamhyung Kim
The -s/--sort option is to sort the output by given column. $ sudo perf ftrace profile -s max sync | head # Total (us) Avg (us) Max (us) Count Function 6301.811 6301.811 6301.811 1 __do_sys_sync 6301.328 6301.328 6301.328 1 ksys_sync 5320.300 1773.433 2858.819 3 iterate_supers 2755.875 17.012 2610.633 162 sync_fs_one_sb 2728.351 682.088 2610.413 4 ext4_sync_fs [ext4] 2603.654 2603.654 2603.654 1 jbd2_log_wait_commit [jbd2] 4750.615 593.827 2597.427 8 schedule 2164.986 26.728 2115.673 81 sync_inodes_one_sb 2143.842 26.467 2115.438 81 sync_inodes_sb Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31perf ftrace: Add 'profile' commandNamhyung Kim
The 'perf ftrace profile' command is to get function execution profiles using function-graph tracer so that users can see the total, average, max execution time as well as the number of invocations easily. The following is a profile for the perf_event_open syscall. $ sudo perf ftrace profile -G __x64_sys_perf_event_open -- \ perf stat -e cycles -C1 true 2> /dev/null | head # Total (us) Avg (us) Max (us) Count Function 65.611 65.611 65.611 1 __x64_sys_perf_event_open 30.527 30.527 30.527 1 anon_inode_getfile 30.260 30.260 30.260 1 __anon_inode_getfile 29.700 29.700 29.700 1 alloc_file_pseudo 17.578 17.578 17.578 1 d_alloc_pseudo 17.382 17.382 17.382 1 __d_alloc 16.738 16.738 16.738 1 kmem_cache_alloc_lru 15.686 15.686 15.686 1 perf_event_alloc 14.012 7.006 11.264 2 obj_cgroup_charge # Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-31perf ftrace: Add 'tail' option to --graph-optsNamhyung Kim
The 'graph-tail' option is to print function name as a comment at the end. This is useful when a large function is mixed with other functions (possibly from different CPUs). For example, $ sudo perf ftrace -- perf stat true ... 1) | get_unused_fd_flags() { 1) | alloc_fd() { 1) 0.178 us | _raw_spin_lock(); 1) 0.187 us | expand_files(); 1) 0.169 us | _raw_spin_unlock(); 1) 1.211 us | } 1) 1.503 us | } $ sudo perf ftrace --graph-opts tail -- perf stat true ... 1) | get_unused_fd_flags() { 1) | alloc_fd() { 1) 0.099 us | _raw_spin_lock(); 1) 0.083 us | expand_files(); 1) 0.081 us | _raw_spin_unlock(); 1) 0.601 us | } /* alloc_fd */ 1) 0.751 us | } /* get_unused_fd_flags */ Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/20240729004127.238611-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-07-26perf docs: Document cross compilationLeo Yan
Records the commands for cross compilation with two methods. The first method relies on Multiarch. The second approach is to explicitly specify the PKG_CONFIG variables, which is widely used in build system (like Buildroot, Yocto, etc). Co-developed-by: James Clark <james.clark@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: amadio@gentoo.org Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240717082211.524826-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf sched map: Add --fuzzy-name option for fuzzy matching in task namesMadadi Vineeth Reddy
The --fuzzy-name option can be used if fuzzy name matching is required. For example, "taskname" can be matched to any string that contains "taskname" as its substring. Sample output for --task-name wdav --fuzzy-name ============= . *A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509 . A0 *B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274 . *- B0 . . . - . 131040.641379 secs *C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283 C0 . B0 . *D0 . . . 131040.641572 secs D0 => wdavdaemon:62277 C0 . B0 . D0 . *E0 . 131040.641578 secs E0 => wdavdaemon:62270 *- . B0 . D0 . E0 . 131040.641581 secs Suggested-by: Chen Yu <yu.c.chen@intel.com> Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Link: https://lore.kernel.org/r/20240707182716.22054-4-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf sched map: Add support for multiple task names using CSVMadadi Vineeth Reddy
To track the scheduling patterns of multiple tasks simultaneously, multiple task names can be specified using a comma separator without any whitespace. Sample output for --task-name perf,wdavdaemon ============= . *A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509 . A0 *B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274 . *- B0 . . . - . 131040.641379 secs *C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283 ... . *- . . . . . . 131041.395649 secs . . . . . . . *X2 131041.403969 secs X2 => perf:70211 . . . . . . . *- 131041.404006 secs Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Chen Yu <yu.c.chen@intel.com> Link: https://lore.kernel.org/r/20240707182716.22054-3-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-07-12perf sched map: Add task-name option to filter the output mapMadadi Vineeth Reddy
By default, perf sched map prints sched-in events for all the tasks which may not be required all the time as it prints lot of symbols and rows to the terminal. With --task-name option, one could specify the specific task name for which the map has to be shown. This would help in analyzing the CPU usage patterns easier for that specific task. Since multiple PID's might have the same task name, using task-name filter would be more useful for debugging. For other tasks, instead of printing the symbol, '-' is printed and the same '.' is used to represent idle. '-' is used instead of symbol for other tasks because it helps in clear visualization of task of interest and secondly the symbol itself doesn't mean anything because the sched-in of that symbol will not be printed(first sched-in contains pid and the corresponding symbol). When using the --task-name option, the sched-out time is represented by a '*-'. Since not all task sched-in events are printed, the sched-out time of the relevant task might be lost. This representation ensures that the sched-out time of the interested task is not overlooked. 6.10.0-rc1 ========== *A0 131040.639793 secs A0 => migration/0:19 *. 131040.639801 secs . => swapper:0 . *B0 131040.639830 secs B0 => migration/1:24 . *. 131040.639836 secs . . *C0 131040.640108 secs C0 => migration/2:30 . . *. 131040.640163 secs . . . *D0 131040.640386 secs D0 => migration/3:36 . . . *. 131040.640395 secs 6.10.0-rc1 + patch (--task-name wdavdaemon) ============= . *A0 . . . . - . 131040.641346 secs A0 => wdavdaemon:62509 . A0 *B0 . . . - . 131040.641378 secs B0 => wdavdaemon:62274 - *- B0 . . . - . 131040.641379 secs *C0 . B0 . . . . . 131040.641572 secs C0 => wdavdaemon:62283 C0 . B0 . *D0 . . . 131040.641572 secs D0 => wdavdaemon:62277 C0 . B0 . D0 . *E0 . 131040.641578 secs E0 => wdavdaemon:62270 *- . B0 . D0 . E0 . 131040.641581 secs . . B0 . D0 . *- . 131040.641583 secs Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Chen Yu <yu.c.chen@intel.com> Link: https://lore.kernel.org/r/20240707182716.22054-2-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-06-28perf sched replay: Fix -r/--repeat command line option for infinityMadadi Vineeth Reddy
Currently, the -r/--repeat option accepts values from 0 and complains for -1. The help section specifies: -r, --repeat <n> repeat the workload replay N times (-1: infinite) The -r -1 option raises an error because replay_repeat is defined as an unsigned int. In the current implementation, the workload is repeated n times when -r <n> is used, except when n is 0. When -r is set to 0, the workload is also repeated once. This happens because when -r=0, the run_one_test function is not called. (Note that mutex unlocking, which is essential for child threads spawned to emulate the workload, happens in run_one_test.) However, mutex unlocking is still performed in the destroy_tasks function. Thus, -r=0 results in the workload running once coincidentally. To clarify and maintain the existing logic for -r >= 1 (which runs the workload the specified number of times) and to fix the issue with infinite runs, make -r=0 perform an infinite run. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20240628071821.15264-1-vineethr@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-06-25perf: Timehist account sch delay for scheduled out runningFernand Sieber
When using perf timehist, sch delay is only computed for a waking task, not for a pre empted task. This patches changes sch delay to account for both. This makes sense as testing scheduling policy need to consider the effect of scheduling delay globally, not only for waking tasks. Example of `perf timehist` report before the patch for `stress` task competing with each other. First column is wait time, second column sch delay, third column runtime. 1.492060 [0000] s stress[81] 1.999 0.000 2.000 R next: stress[83] 1.494060 [0000] s stress[83] 2.000 0.000 2.000 R next: stress[81] 1.496060 [0000] s stress[81] 2.000 0.000 2.000 R next: stress[83] 1.498060 [0000] s stress[83] 2.000 0.000 1.999 R next: stress[81] After the patch, it looks like this (note that all wait time is not zero anymore): 1.492060 [0000] s stress[81] 1.999 1.999 2.000 R next: stress[83] 1.494060 [0000] s stress[83] 2.000 2.000 2.000 R next: stress[81] 1.496060 [0000] s stress[81] 2.000 2.000 2.000 R next: stress[83] 1.498060 [0000] s stress[83] 2.000 2.000 1.999 R next: stress[81] Signed-off-by: Fernand Sieber <sieberf@amazon.com> Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240618090339.87482-1-sieberf@amazon.com
2024-06-20perf doc: Add AMD IBS usage documentRavi Bangoria
Add a perf man page document that describes how to exploit AMD IBS with Linux perf. Brief intro about IBS and simple one-liner examples will help naive users to get started. This is not meant to be an exhaustive IBS guide. User should refer latest AMD64 Architecture Programmer's Manual for detailed description of IBS. Usage: $ man perf-amd-ibs Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: ananth.narayan@amd.com Cc: sandipan.das@amd.com Cc: santosh.shukla@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620054104.815-1-ravi.bangoria@amd.com
2024-06-03perf lock info: Display both map and thread by defaultNick Forrington
Change "perf lock info" argument handling to: Display both map and thread info (rather than an error) when neither are specified. Display both map and thread info (rather than just thread info) when both are requested. Signed-off-by: Nick Forrington <nick.forrington@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240513091413.738537-2-nick.forrington@arm.com