diff options
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/perf-amd-ibs.txt | 59 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-check.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-ftrace.txt | 10 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-list.txt | 25 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-mem.txt | 50 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-record.txt | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-stat.txt | 6 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-trace.txt | 8 |
8 files changed, 126 insertions, 38 deletions
diff --git a/tools/perf/Documentation/perf-amd-ibs.txt b/tools/perf/Documentation/perf-amd-ibs.txt index 55f80beae037..548549935760 100644 --- a/tools/perf/Documentation/perf-amd-ibs.txt +++ b/tools/perf/Documentation/perf-amd-ibs.txt @@ -171,23 +171,48 @@ Below is a simple example of the perf mem tool. # perf mem report A normal perf mem report output will provide detailed memory access profile. -However, it can also be aggregated based on output fields. For example: - - # perf mem report -F mem,sample,snoop - Samples: 3M of event 'ibs_op//', Event count (approx.): 23524876 - Memory access Samples Snoop - N/A 1903343 N/A - L1 hit 1056754 N/A - L2 hit 75231 N/A - L3 hit 9496 HitM - L3 hit 2270 N/A - RAM hit 8710 N/A - Remote node, same socket RAM hit 3241 N/A - Remote core, same node Any cache hit 1572 HitM - Remote core, same node Any cache hit 514 N/A - Remote node, same socket Any cache hit 1216 HitM - Remote node, same socket Any cache hit 350 N/A - Uncached hit 18 N/A +New output fields will show related access info together. For example: + + # perf mem report -F overhead,cache,snoop,comm + ... + # Samples: 92K of event 'ibs_op//' + # Total weight : 531104 + # + # ---------- Cache ----------- --- Snoop ---- + # Overhead L1 L2 L1-buf Other HitM Other Command + # ........ ............................ .............. .......... + # + 76.07% 5.8% 35.7% 0.0% 34.6% 23.3% 52.8% cc1 + 5.79% 0.2% 0.0% 0.0% 5.6% 0.1% 5.7% make + 5.78% 0.1% 4.4% 0.0% 1.2% 0.5% 5.3% gcc + 5.33% 0.3% 3.9% 0.0% 1.1% 0.2% 5.2% as + 5.00% 0.1% 3.8% 0.0% 1.0% 0.3% 4.7% sh + 1.56% 0.1% 0.1% 0.0% 1.4% 0.6% 0.9% ld + 0.28% 0.1% 0.0% 0.0% 0.2% 0.1% 0.2% pkg-config + 0.09% 0.0% 0.0% 0.0% 0.1% 0.0% 0.1% git + 0.03% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% rm + ... + +Also, it can be aggregated based on various memory access info using the +sort keys. For example: + + # perf mem report -s mem,snoop + ... + # Samples: 92K of event 'ibs_op//' + # Total weight : 531104 + # Sort order : mem,snoop + # + # Overhead Samples Memory access Snoop + # ........ ............ ....................................... ............ + # + 47.99% 1509 L2 hit N/A + 25.08% 338 core, same node Any cache hit HitM + 10.24% 54374 N/A N/A + 6.77% 35938 L1 hit N/A + 6.39% 101 core, same node Any cache hit N/A + 3.50% 69 RAM hit N/A + 0.03% 158 LFB/MAB hit N/A + 0.00% 2 Uncached hit N/A Please refer to their man page for more detail. diff --git a/tools/perf/Documentation/perf-check.txt b/tools/perf/Documentation/perf-check.txt index a764a4629220..ee92042082f7 100644 --- a/tools/perf/Documentation/perf-check.txt +++ b/tools/perf/Documentation/perf-check.txt @@ -52,8 +52,8 @@ feature:: dwarf-unwind / HAVE_DWARF_UNWIND_SUPPORT auxtrace / HAVE_AUXTRACE_SUPPORT libbfd / HAVE_LIBBFD_SUPPORT + libbpf-strings / HAVE_LIBBPF_STRINGS_SUPPORT libcapstone / HAVE_LIBCAPSTONE_SUPPORT - libcrypto / HAVE_LIBCRYPTO_SUPPORT libdw-dwarf-unwind / HAVE_LIBDW_SUPPORT libelf / HAVE_LIBELF_SUPPORT libnuma / HAVE_LIBNUMA_SUPPORT diff --git a/tools/perf/Documentation/perf-ftrace.txt b/tools/perf/Documentation/perf-ftrace.txt index b77f58c4d2fd..3f3808e513fe 100644 --- a/tools/perf/Documentation/perf-ftrace.txt +++ b/tools/perf/Documentation/perf-ftrace.txt @@ -123,6 +123,10 @@ OPTIONS for 'perf ftrace trace' --graph-opts:: List of options allowed to set: + - args - Show function arguments. + - retval - Show function return value. + - retval-hex - Show function return value in hexadecimal format. + - retaddr - Show function return address. - nosleep-time - Measure on-CPU time only for function_graph tracer. - noirqs - Ignore functions that happen inside interrupt. - verbose - Show process names, PIDs, timestamps, etc. @@ -139,6 +143,12 @@ OPTIONS for 'perf ftrace latency' Set the function name to get the histogram. Unlike perf ftrace trace, it only allows single function to calculate the histogram. +-e:: +--events=:: + Set the pair of events to get the histogram. The histogram is calculated + by the time difference between the two events from the same thread. This + requires -b/--use-bpf option. + -b:: --use-bpf:: Use BPF to measure function latency instead of using the ftrace (it diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index ce0735021473..28215306a78a 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -278,26 +278,33 @@ also be supplied. For example: perf stat -C 0 -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ... -EVENT QUALIFIERS: +EVENT QUALIFIERS +---------------- It is also possible to add extra qualifiers to an event: percore: -Sums up the event counts for all hardware threads in a core, e.g.: - - - perf stat -e cpu/event=0,umask=0x3,percore=1/ + Sums up the event counts for all hardware threads in a core, e.g.: + perf stat -e cpu/event=0,umask=0x3,percore=1/ cpu: -Specifies the CPU to open the event upon. The value may be repeated to -specify opening the event on multiple CPUs: + Specifies a CPU or a range of CPUs to open the event upon. It may + also reference a PMU to copy the CPU mask from. The value may be + repeated to specify opening the event on multiple CPUs. + Example 1: to open the instructions event on CPUs 0 and 2, the + cycles event on CPUs 1 and 2: + perf stat -e instructions/cpu=0,cpu=2/,cycles/cpu=1-2/ -a sleep 1 - perf stat -e instructions/cpu=0,cpu=2/,cycles/cpu=1,cpu=2/ -a sleep 1 - perf stat -e data_read/cpu=0/,data_write/cpu=1/ -a sleep 1 + Example 2: to open the data_read uncore event on CPU 0 and the + data_write uncore event on CPU 1: + perf stat -e data_read/cpu=0/,data_write/cpu=1/ -a sleep 1 + Example 3: to open the software msr/tsc/ event only on the CPUs + matching those from the cpu_core PMU: + perf stat -e msr/tsc,cpu=cpu_core/ -a sleep 1 EVENT GROUPS ------------ diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt index 965e73d37772..4d164836d094 100644 --- a/tools/perf/Documentation/perf-mem.txt +++ b/tools/perf/Documentation/perf-mem.txt @@ -119,6 +119,22 @@ REPORT OPTIONS And the default sort keys are changed to local_weight, mem, sym, dso, symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat. +-F:: +--fields=:: + Specify output field - multiple keys can be specified in CSV format. + Please see linkperf:perf-report[1] for details. + + In addition to the default fields, 'perf mem report' will provide the + following fields to break down sample periods. + + - op: operation in the sample instruction (load, store, prefetch, ...) + - cache: location in CPU cache (L1, L2, ...) where the sample hit + - mem: location in memory or other places the sample hit + - dtlb: location in Data TLB (L1, L2) where the sample hit + - snoop: snoop result for the sampled data access + + Please take a look at the OUTPUT FIELD SELECTION section for caveats. + -T:: --type-profile:: Show data-type profile result instead of code symbols. This requires @@ -156,6 +172,40 @@ but one sample with weight 180 and the other with weight 20: 90% [k] memcpy 10% [.] strcmp +OUTPUT FIELD SELECTION +---------------------- +"perf mem report" adds a number of new output fields specific to data source +information in the sample. Some of them have the same name with the existing +sort keys ("mem" and "snoop"). So unlike other fields and sort keys, they'll +behave differently when it's used by -F/--fields or -s/--sort. + +Using those two as output fields will aggregate samples altogether and show +breakdown. + + $ perf mem report -F mem,snoop + ... + # ------ Memory ------- --- Snoop ---- + # RAM Uncach Other HitM Other + # ..................... .............. + # + 3.5% 0.0% 96.5% 25.1% 74.9% + +But using the same name for sort keys will aggregate samples for each type +separately. + + $ perf mem report -s mem,snoop + # Overhead Samples Memory access Snoop + # ........ ............ ....................................... ............ + # + 47.99% 1509 L2 hit N/A + 25.08% 338 core, same node Any cache hit HitM + 10.24% 54374 N/A N/A + 6.77% 35938 L1 hit N/A + 6.39% 101 core, same node Any cache hit N/A + 3.50% 69 RAM hit N/A + 0.03% 158 LFB/MAB hit N/A + 0.00% 2 Uncached hit N/A + SEE ALSO -------- linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1] diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 612612fa2d80..067891bd7da6 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -563,7 +563,9 @@ Specify vmlinux path which has debuginfo. Record build-id of all DSOs regardless whether it's actually hit or not. --buildid-mmap:: -Record build ids in mmap2 events, disables build id cache (implies --no-buildid). +Legacy record build-id in map events option which is now the +default. Behaves indentically to --no-buildid. Disable with +--no-buildid-mmap. --aio[=n]:: Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default: 1, max: 4). diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 61d091670dee..1a766d4a2233 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -640,18 +640,20 @@ JSON FORMAT With -j, perf stat is able to print out a JSON format output that can be used for parsing. -- timestamp : optional usec time stamp in fractions of second (with -I) +- interval : optional timestamp in fractions of second (with -I) - optional aggregate options: - core : core identifier (with --per-core) - die : die identifier (with --per-die) - socket : socket identifier (with --per-socket) - node : node identifier (with --per-node) - thread : thread identifier (with --per-thread) +- counters : number of aggregated PMU counters - counter-value : counter value - unit : unit of the counter value or empty - event : event name - variance : optional variance if multiple values are collected (with -r) -- runtime : run time of counter +- event-runtime : run time of the event +- pcnt-running : percentage of time the event was running - metric-value : optional metric value - metric-unit : optional unit of metric diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt index c1fb6056a0d3..973fede403a0 100644 --- a/tools/perf/Documentation/perf-trace.txt +++ b/tools/perf/Documentation/perf-trace.txt @@ -238,14 +238,6 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs. the same beautifiers used in the strace-like enter+exit lines to augment the tracepoint arguments. ---map-dump:: - Dump BPF maps setup by events passed via -e, for instance the augmented_raw_syscalls - living in tools/perf/examples/bpf/augmented_raw_syscalls.c. For now this - dumps just boolean map values and integer keys, in time this will print in hex - by default and use BTF when available, as well as use functions to do pretty - printing using the existing 'perf trace' syscall arg beautifiers to map integer - arguments to strings (pid to comm, syscall id to syscall name, etc). - --force-btf:: Use btf_dump to pretty print syscall argument data, instead of using hand-crafted pretty printers. This option is intended for testing BTF integration in perf trace. btf_dump-based |