diff options
Diffstat (limited to 'tools/perf/Documentation/perf-report.txt')
-rw-r--r-- | tools/perf/Documentation/perf-report.txt | 66 |
1 files changed, 51 insertions, 15 deletions
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index d8b863e01fe0..3376c4710575 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -44,7 +44,7 @@ OPTIONS --comms=:: Only consider symbols in these comms. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. --pid=:: Only show events for given process ID (comma separated list). @@ -54,12 +54,12 @@ OPTIONS --dsos=:: Only consider symbols in these dsos. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. -S:: --symbols=:: Only consider these symbols. CSV that understands file://filename entries. This option will affect the percentage of - the overhead column. See --percentage for more info. + the overhead and latency columns. See --percentage for more info. --symbol-filter=:: Only show symbols that match (partially) with this filter. @@ -68,6 +68,21 @@ OPTIONS --hide-unresolved:: Only display entries resolved to a symbol. +--parallelism:: + Only consider these parallelism levels. Parallelism level is the number + of threads that actively run on CPUs at the time of sample. The flag + accepts single number, comma-separated list, and ranges (for example: + "1", "7,8", "1,64-128"). This is useful in understanding what a program + is doing during sequential/low-parallelism phases as compared to + high-parallelism phases. This option will affect the percentage of + the overhead and latency columns. See --percentage for more info. + Also see the `CPU and latency overheads' section for more details. + +--latency:: + Show latency-centric profile rather than the default + CPU-consumption-centric profile + (requires perf record --latency flag). + -s:: --sort=:: Sort histogram entries by given key(s) - multiple keys can be specified @@ -87,6 +102,7 @@ OPTIONS entries are displayed as "[other]". - cpu: cpu number the task ran at the time of sample - socket: processor socket number the task ran at the time of sample + - parallelism: number of running threads at the time of sample - srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided. - srcfile: file name of the source file of the samples. Requires dwarf @@ -97,12 +113,14 @@ OPTIONS - cgroup_id: ID derived from cgroup namespace device and inode numbers. - cgroup: cgroup pathname in the cgroupfs. - transaction: Transaction abort flags. - - overhead: Overhead percentage of sample - - overhead_sys: Overhead percentage of sample running in system mode - - overhead_us: Overhead percentage of sample running in user mode - - overhead_guest_sys: Overhead percentage of sample running in system mode + - overhead: CPU overhead percentage of sample. + - latency: latency (wall-clock) overhead percentage of sample. + See the `CPU and latency overheads' section for more details. + - overhead_sys: CPU overhead percentage of sample running in system mode + - overhead_us: CPU overhead percentage of sample running in user mode + - overhead_guest_sys: CPU overhead percentage of sample running in system mode on guest machine - - overhead_guest_us: Overhead percentage of sample running in user mode on + - overhead_guest_us: CPU overhead percentage of sample running in user mode on guest machine - sample: Number of sample - period: Raw number of event count of sample @@ -121,9 +139,12 @@ OPTIONS - type: Data type of sample memory access. - typeoff: Offset in the data type of sample memory access. - symoff: Offset in the symbol. + - weight1: Average value of event specific weight (1st field of weight_struct). + - weight2: Average value of event specific weight (2nd field of weight_struct). + - weight3: Average value of event specific weight (3rd field of weight_struct). - By default, comm, dso and symbol keys are used. - (i.e. --sort comm,dso,symbol) + By default, overhead, comm, dso and symbol keys are used. + (i.e. --sort overhead,comm,dso,symbol). If --branch-stack option is used, following sort keys are also available: @@ -198,7 +219,11 @@ OPTIONS --fields=:: Specify output field - multiple keys can be specified in CSV format. Following fields are available: - overhead, overhead_sys, overhead_us, overhead_children, sample and period. + overhead, latency, overhead_sys, overhead_us, overhead_children, sample, + period, weight1, weight2, weight3, ins_lat, p_stage_cyc and retire_lat. + The last 3 names are alias for the corresponding weights. When the weight + fields are used, they will show the average value of the weight. + Also it can contain any sort key(s). By default, every sort keys not specified in -F will be appended @@ -282,7 +307,7 @@ OPTIONS Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires callchains are recorded. - See the `overhead calculation' section for more details. Enabled by + See the `Overhead calculation' section for more details. Enabled by default, disable with --no-children. --max-stack:: @@ -384,6 +409,14 @@ OPTIONS This allows to examine the path the program took to each sample. The data collection must have used -b (or -j) and -g. + Also show with some branch flags that can be: + - Predicted: display the average percentage of predicated branches. + (predicated number / total number) + - Abort: display the number of tsx aborted branches. + - Cycles: cycles in basic block. + + - iterations: display the average number of iterations in callchain list. + --addr2line=<path>:: Path to addr2line binary. @@ -427,9 +460,9 @@ OPTIONS --call-graph option for details. --percentage:: - Determine how to display the overhead percentage of filtered entries. - Filters can be applied by --comms, --dsos and/or --symbols options and - Zoom operations on the TUI (thread, dso, etc). + Determine how to display the CPU and latency overhead percentage + of filtered entries. Filters can be applied by --comms, --dsos, --symbols + and/or --parallelism options and Zoom operations on the TUI (thread, dso, etc). "relative" means it's relative to filtered entries only so that the sum of shown entries will be always 100%. "absolute" means it retains @@ -607,10 +640,13 @@ include::itrace.txt[] 'Avg Cycles%' - block average sampled cycles / sum of total block average sampled cycles 'Avg Cycles' - block average sampled cycles + 'Branch Counter' - block branch counter histogram (with -v showing the number) --skip-empty:: Do not print 0 results in the --stat output. +include::cpu-and-latency-overheads.txt[] + include::callchain-overhead-calculation.txt[] SEE ALSO |