diff options
Diffstat (limited to 'tools/perf/Documentation/perf-report.txt')
| -rw-r--r-- | tools/perf/Documentation/perf-report.txt | 209 |
1 files changed, 182 insertions, 27 deletions
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 1a27bfe05039..acef3ff4178e 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -27,7 +27,7 @@ OPTIONS -q:: --quiet:: - Do not show any message. (Suppress -v) + Do not show any warnings or messages. (Suppress -v) -n:: --show-nr-samples:: @@ -44,7 +44,7 @@ OPTIONS --comms=:: Only consider symbols in these comms. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. --pid=:: Only show events for given process ID (comma separated list). @@ -54,12 +54,12 @@ OPTIONS --dsos=:: Only consider symbols in these dsos. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. -S:: --symbols=:: Only consider these symbols. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. --symbol-filter=:: Only show symbols that match (partially) with this filter. @@ -68,17 +68,33 @@ OPTIONS --hide-unresolved:: Only display entries resolved to a symbol. +--parallelism:: + Only consider these parallelism levels. Parallelism level is the number + of threads that actively run on CPUs at the time of sample. The flag + accepts single number, comma-separated list, and ranges (for example: + "1", "7,8", "1,64-128"). This is useful in understanding what a program + is doing during sequential/low-parallelism phases as compared to + high-parallelism phases. This option will affect the percentage of + the overhead and latency columns. See --percentage for more info. + Also see the `CPU and latency overheads' section for more details. + +--latency:: + Show latency-centric profile rather than the default + CPU-consumption-centric profile + (requires perf record --latency flag). + -s:: --sort=:: Sort histogram entries by given key(s) - multiple keys can be specified in CSV format. Following sort keys are available: pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, - local_weight, cgroup_id. + local_weight, cgroup_id, addr. Each key has following meaning: - comm: command (name) of the task which can be read via /proc/<pid>/comm - pid: command and tid of the task + - tgid: command and tgid of the task - dso: name of library or module executed at the time of sample - dso_size: size of library or module executed at the time of sample - symbol: name of function executed at the time of sample @@ -87,27 +103,49 @@ OPTIONS entries are displayed as "[other]". - cpu: cpu number the task ran at the time of sample - socket: processor socket number the task ran at the time of sample + - parallelism: number of running threads at the time of sample - srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided. - - srcfile: file name of the source file of the same. Requires dwarf + - srcfile: file name of the source file of the samples. Requires dwarf information. - weight: Event specific weight, e.g. memory latency or transaction abort cost. This is the global weight. - local_weight: Local weight version of the weight above. - cgroup_id: ID derived from cgroup namespace device and inode numbers. + - cgroup: cgroup pathname in the cgroupfs. - transaction: Transaction abort flags. - - overhead: Overhead percentage of sample - - overhead_sys: Overhead percentage of sample running in system mode - - overhead_us: Overhead percentage of sample running in user mode - - overhead_guest_sys: Overhead percentage of sample running in system mode + - overhead: CPU overhead percentage of sample. + - latency: latency (wall-clock) overhead percentage of sample. + See the `CPU and latency overheads' section for more details. + - overhead_sys: CPU overhead percentage of sample running in system mode + - overhead_us: CPU overhead percentage of sample running in user mode + - overhead_guest_sys: CPU overhead percentage of sample running in system mode on guest machine - - overhead_guest_us: Overhead percentage of sample running in user mode on + - overhead_guest_us: CPU overhead percentage of sample running in user mode on guest machine - sample: Number of sample - period: Raw number of event count of sample - - By default, comm, dso and symbol keys are used. - (i.e. --sort comm,dso,symbol) + - time: Separate the samples by time stamp with the resolution specified by + --time-quantum (default 100ms). Specify with overhead and before it. + - code_page_size: the code page size of sampled code address (ip) + - ins_lat: Instruction latency in core cycles. This is the global instruction + latency + - local_ins_lat: Local instruction latency version + - p_stage_cyc: On powerpc, this presents the number of cycles spent in a + pipeline stage. And currently supported only on powerpc. + - addr: (Full) virtual address of the sampled instruction + - retire_lat: On X86, this reports pipeline stall of this instruction compared + to the previous instruction in cycles. And currently supported only on X86 + - simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate + - type: Data type of sample memory access. + - typeoff: Offset in the data type of sample memory access. + - symoff: Offset in the symbol. + - weight1: Average value of event specific weight (1st field of weight_struct). + - weight2: Average value of event specific weight (2nd field of weight_struct). + - weight3: Average value of event specific weight (3rd field of weight_struct). + + By default, overhead, comm, dso and symbol keys are used. + (i.e. --sort overhead,comm,dso,symbol). If --branch-stack option is used, following sort keys are also available: @@ -136,7 +174,7 @@ OPTIONS If the --mem-mode option is used, the following sort keys are also available (incompatible with --branch-stack): - symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline. + symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline, blocked. - symbol_daddr: name of data symbol being executed on at the time of sample - dso_daddr: name of library or module containing the data being executed @@ -147,9 +185,12 @@ OPTIONS - snoop: type of snoop (if any) for the data at the time of the sample - dcacheline: the cacheline the data address is on at the time of the sample - phys_daddr: physical address of data being executed on at the time of sample + - data_page_size: the data page size of data being executed on at the time of sample + - blocked: reason of blocked load access for the data at the time of the sample And the default sort keys are changed to local_weight, mem, sym, dso, - symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'. + symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat, + see '--mem-mode'. If the data file has tracepoint event(s), following (dynamic) sort keys are also available: @@ -179,7 +220,11 @@ OPTIONS --fields=:: Specify output field - multiple keys can be specified in CSV format. Following fields are available: - overhead, overhead_sys, overhead_us, overhead_children, sample and period. + overhead, latency, overhead_sys, overhead_us, overhead_children, sample, + period, weight1, weight2, weight3, ins_lat, p_stage_cyc and retire_lat. + The last 3 names are alias for the corresponding weights. When the weight + fields are used, they will show the average value of the weight. + Also it can contain any sort key(s). By default, every sort keys not specified in -F will be appended @@ -214,6 +259,9 @@ OPTIONS --dump-raw-trace:: Dump raw trace in ASCII. +--disable-order:: + Disable raw trace ordering. + -g:: --call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>:: Display call chains using type, min percent threshold, print limit, @@ -260,7 +308,7 @@ OPTIONS Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires callchains are recorded. - See the `overhead calculation' section for more details. Enabled by + See the `Overhead calculation' section for more details. Enabled by default, disable with --no-children. --max-stack:: @@ -362,13 +410,35 @@ OPTIONS This allows to examine the path the program took to each sample. The data collection must have used -b (or -j) and -g. + Also show with some branch flags that can be: + - Predicted: display the average percentage of predicated branches. + (predicated number / total number) + - Abort: display the number of tsx aborted branches. + - Cycles: cycles in basic block. + + - iterations: display the average number of iterations in callchain list. + +--addr2line=<path>:: + Path to addr2line binary. + --objdump=<path>:: Path to objdump binary. +--prefix=PREFIX:: +--prefix-strip=N:: + Remove first N entries from source file path names in executables + and add PREFIX. This allows to display source code compiled on systems + with different file system layout. + --group:: Show event group information together. It forces group output also if there are no groups defined in data file. +--group-sort-idx:: + Sort the output by the event at the index n in group. If n is invalid, + sort by the first event. It can support multiple groups with different + amount of events. WARNING: This should be used on grouped events. + --demangle:: Demangle symbol names to human readable form. It's enabled by default, disable with --no-demangle. @@ -391,9 +461,9 @@ OPTIONS --call-graph option for details. --percentage:: - Determine how to display the overhead percentage of filtered entries. - Filters can be applied by --comms, --dsos and/or --symbols options and - Zoom operations on the TUI (thread, dso, etc). + Determine how to display the CPU and latency overhead percentage + of filtered entries. Filters can be applied by --comms, --dsos, --symbols + and/or --parallelism options and Zoom operations on the TUI (thread, dso, etc). "relative" means it's relative to filtered entries only so that the sum of shown entries will be always 100%. "absolute" means it retains @@ -410,12 +480,13 @@ OPTIONS --time:: Only analyze samples within given time window: <start>,<stop>. Times - have the format seconds.microseconds. If start is not given (i.e., time + have the format seconds.nanoseconds. If start is not given (i.e. time string is ',x.y') then analysis starts at the beginning of the file. If - stop time is not given (i.e, time string is 'x.y,') then analysis goes - to end of file. + stop time is not given (i.e. time string is 'x.y,') then analysis goes + to end of file. Multiple ranges can be separated by spaces, which + requires the argument to be quoted e.g. --time "1234.567,1234.789 1235," - Also support time percent with multiple time range. Time string is + Also support time percent with multiple time ranges. Time string is 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: @@ -435,6 +506,23 @@ OPTIONS perf report --time 0%-10%,30%-40% +--switch-on EVENT_NAME:: + Only consider events after this event is found. + + This may be interesting to measure a workload only after some initialization + phase is over, i.e. insert a perf probe at that point and then using this + option with that probe. + +--switch-off EVENT_NAME:: + Stop considering events after this event is found. + +--show-on-off-events:: + Show the --switch-on/off events too. This has no effect in 'perf report' now + but probably we'll make the default not to show the switch-on/off events + on the --group mode and if there is only one event besides the off/on ones, + go straight to the histogram browser, just like 'perf report' with no events + explicitly specified does. + --itrace:: Options for decoding instruction tracing data. The options are: @@ -456,14 +544,56 @@ include::itrace.txt[] This option extends the perf report to show reference callgraphs, which collected by reference event, in no callgraph event. +--stitch-lbr:: + Show callgraph with stitched LBRs, which may have more complete + callgraph. The perf.data file must have been obtained using + perf record --call-graph lbr. + Disabled by default. In common cases with call stack overflows, + it can recreate better call stacks than the default lbr call stack + output. But this approach is not foolproof. There can be cases + where it creates incorrect call stacks from incorrect matches. + The known limitations include exception handing such as + setjmp/longjmp will have calls/returns not match. + --socket-filter:: Only report the samples on the processor socket that match with this filter +--samples=N:: + Save N individual samples for each histogram entry to show context in perf + report tui browser. + --raw-trace:: When displaying traceevent output, do not use print fmt or plugins. +-H:: --hierarchy:: - Enable hierarchical output. + Enable hierarchical output. In the hierarchy mode, each sort key groups + samples based on the criteria and then sub-divide it using the lower + level sort key. + + For example: + In normal output: + + perf report -s dso,sym + # Overhead Shared Object Symbol + 50.00% [kernel.kallsyms] [k] kfunc1 + 20.00% perf [.] foo + 15.00% [kernel.kallsyms] [k] kfunc2 + 10.00% perf [.] bar + 5.00% libc.so [.] libcall + + In hierarchy output: + + perf report -s dso,sym --hierarchy + # Overhead Shared Object / Symbol + 65.00% [kernel.kallsyms] + 50.00% [k] kfunc1 + 15.00% [k] kfunc2 + 30.00% perf + 20.00% [.] foo + 10.00% [.] bar + 5.00% libc.so + 5.00% [.] libcall --inline:: If a callgraph address belongs to an inlined function, the inline stack @@ -477,6 +607,9 @@ include::itrace.txt[] Please note that not all mmaps are stored, options affecting which ones are include 'perf record --data', for instance. +--ns:: + Show time stamps in nanoseconds. + --stats:: Display overall events statistics without any further processing. (like the one at the end of the perf report -D command) @@ -494,8 +627,30 @@ include::itrace.txt[] The period/hits keywords set the base the percentage is computed on - the samples period or the number of samples (hits). +--time-quantum:: + Configure time quantum for time sort key. Default 100ms. + Accepts s, us, ms, ns units. + +--total-cycles:: + When --total-cycles is specified, it supports sorting for all blocks by + 'Sampled Cycles%'. This is useful to concentrate on the globally hottest + blocks. In output, there are some new columns: + + 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles + 'Sampled Cycles' - block sampled cycles aggregation + 'Avg Cycles%' - block average sampled cycles / sum of total block average + sampled cycles + 'Avg Cycles' - block average sampled cycles + 'Branch Counter' - block branch counter histogram (with -v showing the number) + +--skip-empty:: + Do not print 0 results in the --stat output. + +include::cpu-and-latency-overheads.txt[] + include::callchain-overhead-calculation.txt[] SEE ALSO -------- -linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1] +linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1], +linkperf:perf-intel-pt[1] |
