summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r--tools/perf/Documentation/perf-intel-pt.txt14
-rw-r--r--tools/perf/Documentation/perf-report.txt29
-rw-r--r--tools/perf/Documentation/perf-script-python.txt4
-rw-r--r--tools/perf/Documentation/perf-script.txt20
-rw-r--r--tools/perf/Documentation/perf-stat.txt11
-rw-r--r--tools/perf/Documentation/perf-top.txt32
-rw-r--r--tools/perf/Documentation/perf.txt2
-rw-r--r--tools/perf/Documentation/tips.txt31
8 files changed, 120 insertions, 23 deletions
diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 2109690b0d5f..59ab1ff9d75f 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -115,9 +115,13 @@ toggle respectively.
perf script also supports higher level ways to dump instruction traces:
+ perf script --insn-trace=disasm
+
+or to use the xed disassembler, which requires installing the xed tool
+(see XED below):
+
perf script --insn-trace --xed
-Dump all instructions. This requires installing the xed tool (see XED below)
Dumping all instructions in a long trace can be fairly slow. It is usually better
to start with higher level decoding, like
@@ -130,12 +134,12 @@ or
and then select a time range of interest. The time range can then be examined
in detail with
- perf script --time starttime,stoptime --insn-trace --xed
+ perf script --time starttime,stoptime --insn-trace=disasm
While examining the trace it's also useful to filter on specific CPUs using
the -C option
- perf script --time starttime,stoptime --insn-trace --xed -C 1
+ perf script --time starttime,stoptime --insn-trace=disasm -C 1
Dump all instructions in time range on CPU 1.
@@ -1306,7 +1310,7 @@ Without timestamps, --per-thread must be specified to distinguish threads.
perf script can be used to provide an instruction trace
- $ perf script --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21
+ $ perf script --guestkallsyms $KALLSYMS --insn-trace=disasm -F+ipc | grep -C10 vmresume | head -21
CPU 0/KVM 1440 ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms]) movq 0x48(%rax), %r9
CPU 0/KVM 1440 ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms]) movq 0x50(%rax), %r10
CPU 0/KVM 1440 ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms]) movq 0x58(%rax), %r11
@@ -1407,7 +1411,7 @@ There were none.
'perf script' can be used to provide an instruction trace showing timestamps
- $ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21
+ $ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --insn-trace=disasm -F+ipc | grep -C10 vmresume | head -21
CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms]) movq 0x48(%rax), %r9
CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms]) movq 0x50(%rax), %r10
CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms]) movq 0x58(%rax), %r11
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 38f59ac064f7..d8b863e01fe0 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -531,8 +531,35 @@ include::itrace.txt[]
--raw-trace::
When displaying traceevent output, do not use print fmt or plugins.
+-H::
--hierarchy::
- Enable hierarchical output.
+ Enable hierarchical output. In the hierarchy mode, each sort key groups
+ samples based on the criteria and then sub-divide it using the lower
+ level sort key.
+
+ For example:
+ In normal output:
+
+ perf report -s dso,sym
+ # Overhead Shared Object Symbol
+ 50.00% [kernel.kallsyms] [k] kfunc1
+ 20.00% perf [.] foo
+ 15.00% [kernel.kallsyms] [k] kfunc2
+ 10.00% perf [.] bar
+ 5.00% libc.so [.] libcall
+
+ In hierarchy output:
+
+ perf report -s dso,sym --hierarchy
+ # Overhead Shared Object / Symbol
+ 65.00% [kernel.kallsyms]
+ 50.00% [k] kfunc1
+ 15.00% [k] kfunc2
+ 30.00% perf
+ 20.00% [.] foo
+ 10.00% [.] bar
+ 5.00% libc.so
+ 5.00% [.] libcall
--inline::
If a callgraph address belongs to an inlined function, the inline stack
diff --git a/tools/perf/Documentation/perf-script-python.txt b/tools/perf/Documentation/perf-script-python.txt
index 6a8581012e16..13e37e9385ee 100644
--- a/tools/perf/Documentation/perf-script-python.txt
+++ b/tools/perf/Documentation/perf-script-python.txt
@@ -642,8 +642,8 @@ SUPPORTED FIELDS
Currently supported fields:
-ev_name, comm, pid, tid, cpu, ip, time, period, phys_addr, addr,
-symbol, symoff, dso, time_enabled, time_running, values, callchain,
+ev_name, comm, id, stream_id, pid, tid, cpu, ip, time, period, phys_addr,
+addr, symbol, symoff, dso, time_enabled, time_running, values, callchain,
brstack, brstacksym, datasrc, datasrc_decode, iregs, uregs,
weight, transaction, raw_buf, attr, cpumode.
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index ff9a52e44688..005e51df855e 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -132,9 +132,10 @@ OPTIONS
Comma separated list of fields to print. Options are:
comm, tid, pid, time, cpu, event, trace, ip, sym, dso, dsoff, addr, symoff,
srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
- brstackinsn, brstackinsnlen, brstackoff, callindent, insn, insnlen, synth,
- phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat,
- machine_pid, vcpu, cgroup, retire_lat.
+ brstackinsn, brstackinsnlen, brstackoff, callindent, insn, disasm,
+ insnlen, synth, phys_addr, metric, misc, srccode, ipc, data_page_size,
+ code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat.
+
Field list can be prepended with the type, trace, sw or hw,
to indicate to which event type the field list applies.
e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace
@@ -217,9 +218,9 @@ OPTIONS
Instruction Trace decoding. For calls and returns, it will display the
name of the symbol indented with spaces to reflect the stack depth.
- When doing instruction trace decoding insn and insnlen give the
- instruction bytes and the instruction length of the current
- instruction.
+ When doing instruction trace decoding, insn, disasm and insnlen give the
+ instruction bytes, disassembled instructions (requires libcapstone support)
+ and the instruction length of the current instruction respectively.
The synth field is used by synthesized events which may be created when
Instruction Trace decoding.
@@ -441,9 +442,10 @@ include::itrace.txt[]
will be printed. Each entry has function name and file/line. Enabled by
default, disable with --no-inline.
---insn-trace::
- Show instruction stream for intel_pt traces. Combine with --xed to
- show disassembly.
+--insn-trace[=<raw|disasm>]::
+ Show instruction stream in bytes (raw) or disassembled (disasm)
+ for intel_pt traces. The default is 'raw'. To use xed, combine
+ 'raw' with --xed to show disassembly done by xed.
--xed::
Run xed disassembler on output. Requires installing the xed disassembler.
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 5af2e432b54f..29756a87ab6f 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -308,6 +308,14 @@ use --per-die in addition to -a. (system-wide). The output includes the
die number and the number of online processors on that die. This is
useful to gauge the amount of aggregation.
+--per-cluster::
+Aggregate counts per processor cluster for system-wide mode measurement. This
+is a useful mode to detect imbalance between clusters. To enable this mode,
+use --per-cluster in addition to -a. (system-wide). The output includes the
+cluster number and the number of online processors on that cluster. This is
+useful to gauge the amount of aggregation. The information of cluster ID and
+related CPUs can be gotten from /sys/devices/system/cpu/cpuX/topology/cluster_{id, cpus}.
+
--per-cache::
Aggregate counts per cache instance for system-wide mode measurements. By
default, the aggregation happens for the cache level at the highest index
@@ -396,6 +404,9 @@ Aggregate counts per processor socket for system-wide mode measurements.
--per-die::
Aggregate counts per processor die for system-wide mode measurements.
+--per-cluster::
+Aggregate counts perf processor cluster for system-wide mode measurements.
+
--per-cache::
Aggregate counts per cache instance for system-wide mode measurements. By
default, the aggregation happens for the cache level at the highest index
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 3c202ec080ba..a754875fa5bb 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -261,8 +261,38 @@ Default is to monitor all CPUS.
--raw-trace::
When displaying traceevent output, do not use print fmt or plugins.
+-H::
--hierarchy::
- Enable hierarchy output.
+ Enable hierarchical output. In the hierarchy mode, each sort key groups
+ samples based on the criteria and then sub-divide it using the lower
+ level sort key.
+
+ For example, in normal output:
+
+ perf report -s dso,sym
+ #
+ # Overhead Shared Object Symbol
+ # ........ ................. ...........
+ 50.00% [kernel.kallsyms] [k] kfunc1
+ 20.00% perf [.] foo
+ 15.00% [kernel.kallsyms] [k] kfunc2
+ 10.00% perf [.] bar
+ 5.00% libc.so [.] libcall
+
+ In hierarchy output:
+
+ perf report -s dso,sym --hierarchy
+ #
+ # Overhead Shared Object / Symbol
+ # .......... ......................
+ 65.00% [kernel.kallsyms]
+ 50.00% [k] kfunc1
+ 15.00% [k] kfunc2
+ 30.00% perf
+ 20.00% [.] foo
+ 10.00% [.] bar
+ 5.00% libc.so
+ 5.00% [.] libcall
--overwrite::
Enable this to use just the most recent records, which helps in high core count
diff --git a/tools/perf/Documentation/perf.txt b/tools/perf/Documentation/perf.txt
index a7cf7bc2f968..09f516f3fdfb 100644
--- a/tools/perf/Documentation/perf.txt
+++ b/tools/perf/Documentation/perf.txt
@@ -63,6 +63,8 @@ OPTIONS
in browser mode
perf-event-open - Print perf_event_open() arguments and
return value
+ kmaps - Print kernel and module maps (perf script
+ and perf report without browser)
--debug-file::
Write debug output to a specified file.
diff --git a/tools/perf/Documentation/tips.txt b/tools/perf/Documentation/tips.txt
index 825745a645c1..67b326ba0040 100644
--- a/tools/perf/Documentation/tips.txt
+++ b/tools/perf/Documentation/tips.txt
@@ -2,6 +2,7 @@ For a higher level overview, try: perf report --sort comm,dso
Sample related events with: perf record -e '{cycles,instructions}:S'
Compare performance results with: perf diff [<old file> <new file>]
Boolean options have negative forms, e.g.: perf report --no-children
+To not accumulate CPU time of children symbols add --no-children
Customize output of perf script with: perf script -F event,ip,sym
Generate a script for your data: perf script -g <lang>
Save output of perf stat using: perf stat record <target workload>
@@ -12,32 +13,52 @@ List events using substring match: perf list <keyword>
To see list of saved events and attributes: perf evlist -v
Use --symfs <dir> if your symbol files are in non-standard locations
To see callchains in a more compact form: perf report -g folded
+To see call chains by final symbol taking CPU time (bottom up) use perf report -G
Show individual samples with: perf script
Limit to show entries above 5% only: perf report --percent-limit 5
Profiling branch (mis)predictions with: perf record -b / perf report
-To show assembler sample contexts use perf record -b / perf script -F +brstackinsn --xed
-Treat branches as callchains: perf report --branch-history
-To count events in every 1000 msec: perf stat -I 1000
-Print event counts in CSV format with: perf stat -x,
+To show assembler sample context control flow use perf record -b / perf report --samples 10 and then browse context
+To adjust path to source files to local file system use perf report --prefix=... --prefix-strip=...
+Treat branches as callchains: perf record -b ... ; perf report --branch-history
+Show estimate cycles per function and IPC in annotate use perf record -b ... ; perf report --total-cycles
+To count events every 1000 msec: perf stat -I 1000
+Print event counts in machine readable CSV format with: perf stat -x\;
If you have debuginfo enabled, try: perf report -s sym,srcline
For memory address profiling, try: perf mem record / perf mem report
For tracepoint events, try: perf report -s trace_fields
To record callchains for each sample: perf record -g
+If call chains don't work try perf record --call-graph dwarf or --call-graph lbr
To record every process run by a user: perf record -u <user>
+To show inline functions in call traces add --inline to perf report
+To not record events from perf itself add --exclude-perf
Skip collecting build-id when recording: perf record -B
To change sampling frequency to 100 Hz: perf record -F 100
+To show information about system the samples were collected on use perf report --header
+To only collect call graph on one event use perf record -e cpu/cpu-cycles,callgraph=1/,branches ; perf report --show-ref-call-graph
+To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...
+To group events which need to be collected together for accuracy use {}: perf record -e {cycles,branches}' ...
+To compute metrics for samples use perf record -e '{cycles,instructions}' ... ; perf script -F +metric
See assembly instructions with percentage: perf annotate <symbol>
If you prefer Intel style assembly, try: perf annotate -M intel
+When collecting LBR backtraces use --stitch-lbr to handle more than 32 deep entries: perf record --call-graph lbr ; perf report --stitch-lbr
For hierarchical output, try: perf report --hierarchy
Order by the overhead of source file name and line number: perf report -s srcline
System-wide collection from all CPUs: perf record -a
Show current config key-value pairs: perf config --list
+To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed)
+To trace calls using Processor Trace use perf record -e intel_pt// ... ; perf script --call-trace. Then use perf script --time A-B --insn-trace to look at region of interest.
+To measure approximate function latency with Processor Trace use perf record -e intel_pt// ... ; perf script --call-ret-trace
+To trace only single function with Processor Trace use perf record --filter 'filter func @ program' -e intel_pt//u ./program ; perf script --insn-trace
Show user configuration overrides: perf config --user --list
To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache --add `which node`
-To report cacheline events from previous recording: perf c2c report
+To analyze cache line scalability issues use perf c2c record ... ; perf c2c report
To browse sample contexts use perf report --sample 10 and select in context menu
To separate samples by time use perf report --sort time,overhead,sym
+To filter subset of samples with report or script add --time X-Y or --cpu A,B,C or --socket-filter ...
To set sample time separation other than 100ms with --sort time use --time-quantum
Add -I to perf record to sample register values, which will be visible in perf report sample context.
To show IPC for sampling periods use perf record -e '{cycles,instructions}:S' and then browse context
To show context switches in perf report sample context add --switch-events to perf record.
+To show time in nanoseconds in record/report add --ns
+To compare hot regions in two workloads use perf record -b -o file ... ; perf diff --stream file1 file2
+To compare scalability of two workload samples use perf diff -c ratio file1 file2