summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation/perf-report.txt
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation/perf-report.txt')
-rw-r--r--tools/perf/Documentation/perf-report.txt209
1 files changed, 182 insertions, 27 deletions
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 1a27bfe05039..acef3ff4178e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -27,7 +27,7 @@ OPTIONS
-q::
--quiet::
- Do not show any message. (Suppress -v)
+ Do not show any warnings or messages. (Suppress -v)
-n::
--show-nr-samples::
@@ -44,7 +44,7 @@ OPTIONS
--comms=::
Only consider symbols in these comms. CSV that understands
file://filename entries. This option will affect the percentage of
- the overhead column. See --percentage for more info.
+ the overhead and latency columns. See --percentage for more info.
--pid=::
Only show events for given process ID (comma separated list).
@@ -54,12 +54,12 @@ OPTIONS
--dsos=::
Only consider symbols in these dsos. CSV that understands
file://filename entries. This option will affect the percentage of
- the overhead column. See --percentage for more info.
+ the overhead and latency columns. See --percentage for more info.
-S::
--symbols=::
Only consider these symbols. CSV that understands
file://filename entries. This option will affect the percentage of
- the overhead column. See --percentage for more info.
+ the overhead and latency columns. See --percentage for more info.
--symbol-filter=::
Only show symbols that match (partially) with this filter.
@@ -68,17 +68,33 @@ OPTIONS
--hide-unresolved::
Only display entries resolved to a symbol.
+--parallelism::
+ Only consider these parallelism levels. Parallelism level is the number
+ of threads that actively run on CPUs at the time of sample. The flag
+ accepts single number, comma-separated list, and ranges (for example:
+ "1", "7,8", "1,64-128"). This is useful in understanding what a program
+ is doing during sequential/low-parallelism phases as compared to
+ high-parallelism phases. This option will affect the percentage of
+ the overhead and latency columns. See --percentage for more info.
+ Also see the `CPU and latency overheads' section for more details.
+
+--latency::
+ Show latency-centric profile rather than the default
+ CPU-consumption-centric profile
+ (requires perf record --latency flag).
+
-s::
--sort=::
Sort histogram entries by given key(s) - multiple keys can be specified
in CSV format. Following sort keys are available:
pid, comm, dso, symbol, parent, cpu, socket, srcline, weight,
- local_weight, cgroup_id.
+ local_weight, cgroup_id, addr.
Each key has following meaning:
- comm: command (name) of the task which can be read via /proc/<pid>/comm
- pid: command and tid of the task
+ - tgid: command and tgid of the task
- dso: name of library or module executed at the time of sample
- dso_size: size of library or module executed at the time of sample
- symbol: name of function executed at the time of sample
@@ -87,27 +103,49 @@ OPTIONS
entries are displayed as "[other]".
- cpu: cpu number the task ran at the time of sample
- socket: processor socket number the task ran at the time of sample
+ - parallelism: number of running threads at the time of sample
- srcline: filename and line number executed at the time of sample. The
DWARF debugging info must be provided.
- - srcfile: file name of the source file of the same. Requires dwarf
+ - srcfile: file name of the source file of the samples. Requires dwarf
information.
- weight: Event specific weight, e.g. memory latency or transaction
abort cost. This is the global weight.
- local_weight: Local weight version of the weight above.
- cgroup_id: ID derived from cgroup namespace device and inode numbers.
+ - cgroup: cgroup pathname in the cgroupfs.
- transaction: Transaction abort flags.
- - overhead: Overhead percentage of sample
- - overhead_sys: Overhead percentage of sample running in system mode
- - overhead_us: Overhead percentage of sample running in user mode
- - overhead_guest_sys: Overhead percentage of sample running in system mode
+ - overhead: CPU overhead percentage of sample.
+ - latency: latency (wall-clock) overhead percentage of sample.
+ See the `CPU and latency overheads' section for more details.
+ - overhead_sys: CPU overhead percentage of sample running in system mode
+ - overhead_us: CPU overhead percentage of sample running in user mode
+ - overhead_guest_sys: CPU overhead percentage of sample running in system mode
on guest machine
- - overhead_guest_us: Overhead percentage of sample running in user mode on
+ - overhead_guest_us: CPU overhead percentage of sample running in user mode on
guest machine
- sample: Number of sample
- period: Raw number of event count of sample
-
- By default, comm, dso and symbol keys are used.
- (i.e. --sort comm,dso,symbol)
+ - time: Separate the samples by time stamp with the resolution specified by
+ --time-quantum (default 100ms). Specify with overhead and before it.
+ - code_page_size: the code page size of sampled code address (ip)
+ - ins_lat: Instruction latency in core cycles. This is the global instruction
+ latency
+ - local_ins_lat: Local instruction latency version
+ - p_stage_cyc: On powerpc, this presents the number of cycles spent in a
+ pipeline stage. And currently supported only on powerpc.
+ - addr: (Full) virtual address of the sampled instruction
+ - retire_lat: On X86, this reports pipeline stall of this instruction compared
+ to the previous instruction in cycles. And currently supported only on X86
+ - simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate
+ - type: Data type of sample memory access.
+ - typeoff: Offset in the data type of sample memory access.
+ - symoff: Offset in the symbol.
+ - weight1: Average value of event specific weight (1st field of weight_struct).
+ - weight2: Average value of event specific weight (2nd field of weight_struct).
+ - weight3: Average value of event specific weight (3rd field of weight_struct).
+
+ By default, overhead, comm, dso and symbol keys are used.
+ (i.e. --sort overhead,comm,dso,symbol).
If --branch-stack option is used, following sort keys are also
available:
@@ -136,7 +174,7 @@ OPTIONS
If the --mem-mode option is used, the following sort keys are also available
(incompatible with --branch-stack):
- symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
+ symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline, blocked.
- symbol_daddr: name of data symbol being executed on at the time of sample
- dso_daddr: name of library or module containing the data being executed
@@ -147,9 +185,12 @@ OPTIONS
- snoop: type of snoop (if any) for the data at the time of the sample
- dcacheline: the cacheline the data address is on at the time of the sample
- phys_daddr: physical address of data being executed on at the time of sample
+ - data_page_size: the data page size of data being executed on at the time of sample
+ - blocked: reason of blocked load access for the data at the time of the sample
And the default sort keys are changed to local_weight, mem, sym, dso,
- symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
+ symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat,
+ see '--mem-mode'.
If the data file has tracepoint event(s), following (dynamic) sort keys
are also available:
@@ -179,7 +220,11 @@ OPTIONS
--fields=::
Specify output field - multiple keys can be specified in CSV format.
Following fields are available:
- overhead, overhead_sys, overhead_us, overhead_children, sample and period.
+ overhead, latency, overhead_sys, overhead_us, overhead_children, sample,
+ period, weight1, weight2, weight3, ins_lat, p_stage_cyc and retire_lat.
+ The last 3 names are alias for the corresponding weights. When the weight
+ fields are used, they will show the average value of the weight.
+
Also it can contain any sort key(s).
By default, every sort keys not specified in -F will be appended
@@ -214,6 +259,9 @@ OPTIONS
--dump-raw-trace::
Dump raw trace in ASCII.
+--disable-order::
+ Disable raw trace ordering.
+
-g::
--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
Display call chains using type, min percent threshold, print limit,
@@ -260,7 +308,7 @@ OPTIONS
Accumulate callchain of children to parent entry so that then can
show up in the output. The output will have a new "Children" column
and will be sorted on the data. It requires callchains are recorded.
- See the `overhead calculation' section for more details. Enabled by
+ See the `Overhead calculation' section for more details. Enabled by
default, disable with --no-children.
--max-stack::
@@ -362,13 +410,35 @@ OPTIONS
This allows to examine the path the program took to each sample.
The data collection must have used -b (or -j) and -g.
+ Also show with some branch flags that can be:
+ - Predicted: display the average percentage of predicated branches.
+ (predicated number / total number)
+ - Abort: display the number of tsx aborted branches.
+ - Cycles: cycles in basic block.
+
+ - iterations: display the average number of iterations in callchain list.
+
+--addr2line=<path>::
+ Path to addr2line binary.
+
--objdump=<path>::
Path to objdump binary.
+--prefix=PREFIX::
+--prefix-strip=N::
+ Remove first N entries from source file path names in executables
+ and add PREFIX. This allows to display source code compiled on systems
+ with different file system layout.
+
--group::
Show event group information together. It forces group output also
if there are no groups defined in data file.
+--group-sort-idx::
+ Sort the output by the event at the index n in group. If n is invalid,
+ sort by the first event. It can support multiple groups with different
+ amount of events. WARNING: This should be used on grouped events.
+
--demangle::
Demangle symbol names to human readable form. It's enabled by default,
disable with --no-demangle.
@@ -391,9 +461,9 @@ OPTIONS
--call-graph option for details.
--percentage::
- Determine how to display the overhead percentage of filtered entries.
- Filters can be applied by --comms, --dsos and/or --symbols options and
- Zoom operations on the TUI (thread, dso, etc).
+ Determine how to display the CPU and latency overhead percentage
+ of filtered entries. Filters can be applied by --comms, --dsos, --symbols
+ and/or --parallelism options and Zoom operations on the TUI (thread, dso, etc).
"relative" means it's relative to filtered entries only so that the
sum of shown entries will be always 100%. "absolute" means it retains
@@ -410,12 +480,13 @@ OPTIONS
--time::
Only analyze samples within given time window: <start>,<stop>. Times
- have the format seconds.microseconds. If start is not given (i.e., time
+ have the format seconds.nanoseconds. If start is not given (i.e. time
string is ',x.y') then analysis starts at the beginning of the file. If
- stop time is not given (i.e, time string is 'x.y,') then analysis goes
- to end of file.
+ stop time is not given (i.e. time string is 'x.y,') then analysis goes
+ to end of file. Multiple ranges can be separated by spaces, which
+ requires the argument to be quoted e.g. --time "1234.567,1234.789 1235,"
- Also support time percent with multiple time range. Time string is
+ Also support time percent with multiple time ranges. Time string is
'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
For example:
@@ -435,6 +506,23 @@ OPTIONS
perf report --time 0%-10%,30%-40%
+--switch-on EVENT_NAME::
+ Only consider events after this event is found.
+
+ This may be interesting to measure a workload only after some initialization
+ phase is over, i.e. insert a perf probe at that point and then using this
+ option with that probe.
+
+--switch-off EVENT_NAME::
+ Stop considering events after this event is found.
+
+--show-on-off-events::
+ Show the --switch-on/off events too. This has no effect in 'perf report' now
+ but probably we'll make the default not to show the switch-on/off events
+ on the --group mode and if there is only one event besides the off/on ones,
+ go straight to the histogram browser, just like 'perf report' with no events
+ explicitly specified does.
+
--itrace::
Options for decoding instruction tracing data. The options are:
@@ -456,14 +544,56 @@ include::itrace.txt[]
This option extends the perf report to show reference callgraphs,
which collected by reference event, in no callgraph event.
+--stitch-lbr::
+ Show callgraph with stitched LBRs, which may have more complete
+ callgraph. The perf.data file must have been obtained using
+ perf record --call-graph lbr.
+ Disabled by default. In common cases with call stack overflows,
+ it can recreate better call stacks than the default lbr call stack
+ output. But this approach is not foolproof. There can be cases
+ where it creates incorrect call stacks from incorrect matches.
+ The known limitations include exception handing such as
+ setjmp/longjmp will have calls/returns not match.
+
--socket-filter::
Only report the samples on the processor socket that match with this filter
+--samples=N::
+ Save N individual samples for each histogram entry to show context in perf
+ report tui browser.
+
--raw-trace::
When displaying traceevent output, do not use print fmt or plugins.
+-H::
--hierarchy::
- Enable hierarchical output.
+ Enable hierarchical output. In the hierarchy mode, each sort key groups
+ samples based on the criteria and then sub-divide it using the lower
+ level sort key.
+
+ For example:
+ In normal output:
+
+ perf report -s dso,sym
+ # Overhead Shared Object Symbol
+ 50.00% [kernel.kallsyms] [k] kfunc1
+ 20.00% perf [.] foo
+ 15.00% [kernel.kallsyms] [k] kfunc2
+ 10.00% perf [.] bar
+ 5.00% libc.so [.] libcall
+
+ In hierarchy output:
+
+ perf report -s dso,sym --hierarchy
+ # Overhead Shared Object / Symbol
+ 65.00% [kernel.kallsyms]
+ 50.00% [k] kfunc1
+ 15.00% [k] kfunc2
+ 30.00% perf
+ 20.00% [.] foo
+ 10.00% [.] bar
+ 5.00% libc.so
+ 5.00% [.] libcall
--inline::
If a callgraph address belongs to an inlined function, the inline stack
@@ -477,6 +607,9 @@ include::itrace.txt[]
Please note that not all mmaps are stored, options affecting which ones
are include 'perf record --data', for instance.
+--ns::
+ Show time stamps in nanoseconds.
+
--stats::
Display overall events statistics without any further processing.
(like the one at the end of the perf report -D command)
@@ -494,8 +627,30 @@ include::itrace.txt[]
The period/hits keywords set the base the percentage is computed
on - the samples period or the number of samples (hits).
+--time-quantum::
+ Configure time quantum for time sort key. Default 100ms.
+ Accepts s, us, ms, ns units.
+
+--total-cycles::
+ When --total-cycles is specified, it supports sorting for all blocks by
+ 'Sampled Cycles%'. This is useful to concentrate on the globally hottest
+ blocks. In output, there are some new columns:
+
+ 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles
+ 'Sampled Cycles' - block sampled cycles aggregation
+ 'Avg Cycles%' - block average sampled cycles / sum of total block average
+ sampled cycles
+ 'Avg Cycles' - block average sampled cycles
+ 'Branch Counter' - block branch counter histogram (with -v showing the number)
+
+--skip-empty::
+ Do not print 0 results in the --stat output.
+
+include::cpu-and-latency-overheads.txt[]
+
include::callchain-overhead-calculation.txt[]
SEE ALSO
--------
-linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1]
+linkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1],
+linkperf:perf-intel-pt[1]