summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation/perf-record.txt
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation/perf-record.txt')
-rw-r--r--tools/perf/Documentation/perf-record.txt455
1 files changed, 407 insertions, 48 deletions
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index d232b13ea713..e8b9aadbbfa5 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -9,7 +9,7 @@ SYNOPSIS
--------
[verse]
'perf record' [-e <EVENT> | --event=EVENT] [-a] <command>
-'perf record' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
+'perf record' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
DESCRIPTION
-----------
@@ -30,8 +30,14 @@ OPTIONS
- a symbolic event name (use 'perf list' to list all events)
- - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
- hexadecimal event descriptor.
+ - a raw PMU event in the form of rN where N is a hexadecimal value
+ that represents the raw register encoding with the layout of the
+ event control registers as described by entries in
+ /sys/bus/event_source/devices/cpu/format/*.
+
+ - a symbolic or raw PMU event followed by an optional colon
+ and a list of event modifiers, e.g., cpu-cycles:p. See the
+ linkperf:perf-list[1] man page for details on event modifiers.
- a symbolically formed PMU event like 'pmu/param1=0x3,param2/' where
'param1', 'param2', etc are defined as formats for the PMU in
@@ -60,6 +66,15 @@ OPTIONS
- 'name' : User defined event name. Single quotes (') may be used to
escape symbols in the name from parsing by shell and tool
like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.
+ - 'aux-output': Generate AUX records instead of events. This requires
+ that an AUX area event is also provided.
+ - 'aux-action': "pause" or "resume" to pause or resume an AUX
+ area event (the group leader) when this event occurs.
+ "start-paused" on an AUX area event itself, will
+ start in a paused state.
+ - 'aux-sample-size': Set sample size for AUX area sampling. If the
+ '--aux-sample' option has been used, set aux-sample-size=0 to disable
+ AUX area sampling for the event.
See the linkperf:perf-list[1] man page for more parameters.
@@ -94,9 +109,12 @@ OPTIONS
"perf report" to view group events together.
--filter=<filter>::
- Event filter. This option should follow an event selector (-e) which
- selects either tracepoint event(s) or a hardware trace PMU
- (e.g. Intel PT or CoreSight).
+ Event filter. This option should follow an event selector (-e).
+ If the event is a tracepoint, the filter string will be parsed by
+ the kernel. If the event is a hardware trace PMU (e.g. Intel PT
+ or CoreSight), it'll be processed as an address filter. Otherwise
+ it means a general filter using BPF which can be applied for any
+ kind of event.
- tracepoint filters
@@ -151,6 +169,57 @@ OPTIONS
Multiple filters can be separated with space or comma.
+ - bpf filters
+
+ A BPF filter can access the sample data and make a decision based on the
+ data. Users need to set an appropriate sample type to use the BPF
+ filter. BPF filters need root privilege.
+
+ The sample data field can be specified in lower case letter. Multiple
+ filters can be separated with comma. For example,
+
+ --filter 'period > 1000, cpu == 1'
+ or
+ --filter 'mem_op == load || mem_op == store, mem_lvl > l1'
+
+ The former filter only accept samples with period greater than 1000 AND
+ CPU number is 1. The latter one accepts either load and store memory
+ operations but it should have memory level above the L1. Since the
+ mem_op and mem_lvl fields come from the (memory) data_source, it'd only
+ work with some events which set the data_source field.
+
+ Also user should request to collect that information (with -d option in
+ the above case). Otherwise, the following message will be shown.
+
+ $ sudo perf record -e cycles --filter 'mem_op == load'
+ Error: cycles event does not have PERF_SAMPLE_DATA_SRC
+ Hint: please add -d option to perf record.
+ failed to set filter "BPF" on event cycles with 22 (Invalid argument)
+
+ Essentially the BPF filter expression is:
+
+ <term> <operator> <value> (("," | "||") <term> <operator> <value>)*
+
+ The <term> can be one of:
+ ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
+ code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
+ p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
+ mem_dtlb, mem_blk, mem_hops, uid, gid
+
+ The <operator> can be one of:
+ ==, !=, >, >=, <, <=, &
+
+ The <value> can be one of:
+ <number> (for any term)
+ na, load, store, pfetch, exec (for mem_op)
+ l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
+ na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
+ remote (for mem_remote)
+ na, locked (for mem_locked)
+ na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
+ na, by_data, by_addr (for mem_blk)
+ hops0, hops1, hops2, hops3 (for mem_hops)
+
--exclude-perf::
Don't record events issued by perf itself. This option should follow
an event selector (-e) which selects tracepoint event(s). It adds a
@@ -158,6 +227,10 @@ OPTIONS
'--filter' exists, the new filter expression will be combined with
them by '&&'.
+--latency::
+ Enable data collection for latency profiling.
+ Use perf report --latency for latency-centric profile.
+
-a::
--all-cpus::
System-wide collection from all CPUs (default if no target is specified).
@@ -208,26 +281,29 @@ OPTIONS
-m::
--mmap-pages=::
Number of mmap data pages (must be a power of two) or size
- specification with appended unit character - B/K/M/G. The
- size is rounded up to have nearest pages power of two value.
- Also, by adding a comma, the number of mmap pages for AUX
- area tracing can be specified.
-
---group::
- Put all events in a single event group. This precedes the --event
- option and remains only for backward compatibility. See --event.
+ specification in bytes with appended unit character - B/K/M/G.
+ The size is rounded up to the nearest power-of-two page value.
+ By adding a comma, an additional parameter with the same
+ semantics used for the normal mmap areas can be specified for
+ AUX tracing area.
-g::
- Enables call-graph (stack chain/backtrace) recording.
+ Enables call-graph (stack chain/backtrace) recording for both
+ kernel space and user space.
--call-graph::
Setup and enable call-graph (stack chain/backtrace) recording,
- implies -g. Default is "fp".
+ implies -g. Default is "fp" (for user space).
- Allows specifying "fp" (frame pointer) or "dwarf"
- (DWARF's CFI - Call Frame Information) or "lbr"
- (Hardware Last Branch Record facility) as the method to collect
- the information used to show the call graphs.
+ The unwinding method used for kernel space is dependent on the
+ unwinder used by the active kernel configuration, i.e
+ CONFIG_UNWINDER_FRAME_POINTER (fp) or CONFIG_UNWINDER_ORC (orc)
+
+ Any option specified here controls the method used for user space.
+
+ Valid options are "fp" (frame pointer), "dwarf" (DWARF's CFI -
+ Call Frame Information) or "lbr" (Hardware Last Branch Record
+ facility).
In some systems, where binaries are build with gcc
--fomit-frame-pointer, using the "fp" method will produce bogus
@@ -244,9 +320,18 @@ OPTIONS
User can change the size by passing the size after comma like
"--call-graph dwarf,4096".
+ When "fp" recording is used, perf tries to save stack entries
+ up to the number specified in sysctl.kernel.perf_event_max_stack
+ by default. User can change the number by passing it after comma
+ like "--call-graph fp,32".
+
+ Also "defer" can be used with "fp" (like "--call-graph fp,defer") to
+ enable deferred user callchain which will collect user-space callchains
+ when the thread returns to the user space.
+
-q::
--quiet::
- Don't print any message, useful for scripting.
+ Don't print any warnings or messages, useful for scripting.
-v::
--verbose::
@@ -259,11 +344,17 @@ OPTIONS
-d::
--data::
- Record the sample virtual addresses.
+ Record the sample virtual addresses. Implies --sample-mem-info.
--phys-data::
Record the sample physical addresses.
+--data-page-size::
+ Record the sampled data address data page size.
+
+--code-page-size::
+ Record the sampled code address (ip) page size
+
-T::
--timestamp::
Record the sample timestamps. Use it with 'perf report -D' to see the
@@ -276,6 +367,16 @@ OPTIONS
--sample-cpu::
Record the sample cpu.
+--sample-identifier::
+ Record the sample identifier i.e. PERF_SAMPLE_IDENTIFIER bit set in
+ the sample_type member of the struct perf_event_attr argument to the
+ perf_event_open system call.
+
+--sample-mem-info::
+ Record the sample data source information for memory operations.
+ It requires hardware supports and may work on specific events only.
+ Please consider using 'perf mem record' instead if you're not sure.
+
-n::
--no-samples::
Don't sample.
@@ -291,6 +392,9 @@ comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-
In per-thread mode with inheritance mode on (default), samples are captured only when
the thread executes on the designated CPUs. Default is to monitor all CPUs.
+User space tasks can migrate between CPUs, so when tracing selected CPUs,
+a dummy event is created to track sideband for all CPUs.
+
-B::
--no-buildid::
Do not save the build ids of binaries in the perf.data files. This skips
@@ -341,6 +445,7 @@ following filters are defined:
- any_call: any function call or system call
- any_ret: any function return or system call return
- ind_call: any indirect branch
+ - ind_jmp: any indirect jump
- call: direct calls, including far (to/from kernel) calls
- u: only when the branch target is at the user level
- k: only when the branch target is in the kernel
@@ -349,7 +454,19 @@ following filters are defined:
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
- cond: conditional branches
+ - call_stack: save call stack
+ - no_flags: don't save branch flags e.g prediction, misprediction etc
+ - no_cycles: don't save branch cycles
+ - hw_index: save branch hardware index
- save_type: save branch type during sampling in case binary is not available later
+ For the platforms with Intel Arch LBR support (12th-Gen+ client or
+ 4th-Gen Xeon+ server), the save branch type is unconditionally enabled
+ when the taken branch stack sampling is enabled.
+ - priv: save privilege state during sampling in case binary is not available later
+ - counter: save occurrences of the event since the last branch entry. Currently, the
+ feature is only supported by a newer CPU, e.g., Intel Sierra Forest and
+ later platforms. An error out is expected if it's used on the unsupported
+ kernel or CPUs.
+
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
@@ -360,13 +477,17 @@ is enabled for all the sampling events. The sampled branch type is the same for
The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
Note that this feature may not be available on all processors.
+-W::
--weight::
Enable weightened sampling. An additional weight is recorded per sample and can be
displayed with the weight and local_weight sort keys. This currently works for TSX
abort events and some memory events in precise mode on modern Intel CPUs.
--namespaces::
-Record events of type PERF_RECORD_NAMESPACES.
+Record events of type PERF_RECORD_NAMESPACES. This enables 'cgroup_id' sort key.
+
+--all-cgroups::
+Record events of type PERF_RECORD_CGROUP. This enables 'cgroup' sort key.
--transaction::
Record transaction flags for transaction related events.
@@ -379,8 +500,11 @@ if combined with -a or -C options.
-D::
--delay=::
-After starting the program, wait msecs before measuring. This is useful to
-filter out the startup phase of the program, which is often very different.
+After starting the program, wait msecs before measuring (-1: start with events
+disabled), or enable events only for specified ranges of msecs (e.g.
+-D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable
+for 10 msecs, then stop). Note, delaying enabling of events is useful to filter
+out the startup phase of the program, which is often very different.
-I::
--intr-regs::
@@ -392,7 +516,8 @@ symbolic names, e.g. on x86, ax, si. To list the available registers use
--intr-regs=ax,bx. The list of register is architecture dependent.
--user-regs::
-Capture user registers at sample time. Same arguments as -I.
+Similar to -I, but capture user registers at sample time. To list the available
+user registers use --user-regs=\?.
--running-time::
Record running and enabled time for read events (:S)
@@ -407,9 +532,21 @@ CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI.
-S::
--snapshot::
Select AUX area tracing Snapshot Mode. This option is valid only with an
-AUX area tracing event. Optionally the number of bytes to capture per
-snapshot can be specified. In Snapshot Mode, trace data is captured only when
-signal SIGUSR2 is received.
+AUX area tracing event. Optionally, certain snapshot capturing parameters
+can be specified in a string that follows this option:
+
+ - 'e': take one last snapshot on exit; guarantees that there is at least one
+ snapshot in the output file;
+ - <size>: if the PMU supports this, specify the desired snapshot size.
+
+In Snapshot Mode trace data is captured only when signal SIGUSR2 is received
+and on exit if the above 'e' option is given.
+
+--aux-sample[=OPTIONS]::
+Select AUX area sampling. At least one of the events selected by the -e option
+must be an AUX area event. Samples on other events will be created containing
+data from the AUX area. Optionally sample size may be specified, otherwise it
+defaults to 4KiB.
--proc-map-timeout::
When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
@@ -418,15 +555,9 @@ This option sets the time out limit. The default value is 500 ms.
--switch-events::
Record context switch events i.e. events of type PERF_RECORD_SWITCH or
-PERF_RECORD_SWITCH_CPU_WIDE.
-
---clang-path=PATH::
-Path to clang binary to use for compiling BPF scriptlets.
-(enabled when BPF support is on)
-
---clang-opt=OPTIONS::
-Options passed to clang when compiling BPF scriptlets.
-(enabled when BPF support is on)
+PERF_RECORD_SWITCH_CPU_WIDE. In some cases (e.g. Intel PT, CoreSight or Arm SPE)
+switch events will be enabled automatically, which can be suppressed by
+by the option --no-switch-events.
--vmlinux=PATH::
Specify vmlinux path which has debuginfo.
@@ -435,17 +566,63 @@ Specify vmlinux path which has debuginfo.
--buildid-all::
Record build-id of all DSOs regardless whether it's actually hit or not.
+--buildid-mmap::
+Legacy record build-id in map events option which is now the
+default. Behaves indentically to --no-buildid. Disable with
+--no-buildid-mmap.
+
--aio[=n]::
Use <n> control blocks in asynchronous (Posix AIO) trace writing mode (default: 1, max: 4).
Asynchronous mode is supported only when linking Perf tool with libc library
providing implementation for Posix AIO API.
+--affinity=mode::
+Set affinity mask of trace reading thread according to the policy defined by 'mode' value:
+
+ - node - thread affinity mask is set to NUMA node cpu mask of the processed mmap buffer
+ - cpu - thread affinity mask is set to cpu of the processed mmap buffer
+
+--mmap-flush=number::
+
+Specify minimal number of bytes that is extracted from mmap data pages and
+processed for output. One can specify the number using B/K/M/G suffixes.
+
+The maximal allowed value is a quarter of the size of mmaped data pages.
+
+The default option value is 1 byte which means that every time that the output
+writing thread finds some new data in the mmaped buffer the data is extracted,
+possibly compressed (-z) and written to the output, perf.data or pipe.
+
+Larger data chunks are compressed more effectively in comparison to smaller
+chunks so extraction of larger chunks from the mmap data pages is preferable
+from the perspective of output size reduction.
+
+Also at some cases executing less output write syscalls with bigger data size
+can take less time than executing more output write syscalls with smaller data
+size thus lowering runtime profiling overhead.
+
+-z::
+--compression-level[=n]::
+Produce compressed trace using specified level n (default: 1 - fastest compression,
+22 - smallest trace)
+
--all-kernel::
Configure all used events to run in kernel space.
--all-user::
Configure all used events to run in user space.
+--kernel-callchains::
+Collect callchains only from kernel space. I.e. this option sets
+perf_event_attr.exclude_callchain_user to 1.
+
+--user-callchains::
+Collect callchains only from user space. I.e. this option sets
+perf_event_attr.exclude_callchain_kernel to 1.
+
+Don't use both --kernel-callchains and --user-callchains at the same time or no
+callchains will be collected.
+
--timestamp-filename
Append timestamp to output file name.
@@ -455,16 +632,17 @@ Record timestamp boundary (time of first/last samples).
--switch-output[=mode]::
Generate multiple perf.data files, timestamp prefixed, switching to a new one
based on 'mode' value:
- "signal" - when receiving a SIGUSR2 (default value) or
- <size> - when reaching the size threshold, size is expected to
- be a number with appended unit character - B/K/M/G
- <time> - when reaching the time threshold, size is expected to
- be a number with appended unit character - s/m/h/d
- Note: the precision of the size threshold hugely depends
- on your configuration - the number and size of your ring
- buffers (-m). It is generally more precise for higher sizes
- (like >5M), for lower values expect different sizes.
+ - "signal" - when receiving a SIGUSR2 (default value) or
+ - <size> - when reaching the size threshold, size is expected to
+ be a number with appended unit character - B/K/M/G
+ - <time> - when reaching the time threshold, size is expected to
+ be a number with appended unit character - s/m/h/d
+
+ Note: the precision of the size threshold hugely depends
+ on your configuration - the number and size of your ring
+ buffers (-m). It is generally more precise for higher sizes
+ (like >5M), for lower values expect different sizes.
A possible use case is to, given an external event, slice the perf.data file
that gets then processed, possibly via a perf script, to decide if that
@@ -476,6 +654,23 @@ overhead. You can still switch them on with:
--switch-output --no-no-buildid --no-no-buildid-cache
+--switch-output-event::
+Events that will cause the switch of the perf.data file, auto-selecting
+--switch-output=signal, the results are similar as internally the side band
+thread will also send a SIGUSR2 to the main one.
+
+Uses the same syntax as --event, it will just not be recorded, serving only to
+switch the perf.data file as soon as the --switch-output event is processed by
+a separate sideband thread.
+
+This sideband thread is also used to other purposes, like processing the
+PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF
+information, etc.
+
+--switch-max-files=N::
+
+When rotating perf.data with --switch-output, only keep N files.
+
--dry-run::
Parse options then exit. --dry-run can be used to detect errors in cmdline
options.
@@ -483,6 +678,23 @@ options.
'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj
in config file is set to true.
+--synth=TYPE::
+Collect and synthesize given type of events (comma separated). Note that
+this option controls the synthesis from the /proc filesystem which represent
+task status for pre-existing threads.
+
+Kernel (and some other) events are recorded regardless of the
+choice in this option. For example, --synth=no would have MMAP events for
+kernel and modules.
+
+Available types are:
+
+ - 'task' - synthesize FORK and COMM events for each task
+ - 'mmap' - synthesize MMAP events for each process (implies 'task')
+ - 'cgroup' - synthesize CGROUP events for each cgroup
+ - 'all' - synthesize all events (default)
+ - 'no' - do not synthesize any of the above events
+
--tail-synthesize::
Instead of collecting non-sample events (for example, fork, comm, mmap) at
the beginning of record, collect them during finalizing an output file.
@@ -505,6 +717,153 @@ config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.
Implies --tail-synthesize.
+--kcore::
+Make a copy of /proc/kcore and place it into a directory with the perf data file.
+
+--max-size=<size>::
+Limit the sample data max size, <size> is expected to be a number with
+appended unit character - B/K/M/G
+
+--num-thread-synthesize::
+ The number of threads to run when synthesizing events for existing processes.
+ By default, the number of threads equals 1.
+
+ifdef::HAVE_LIBPFM[]
+--pfm-events events::
+Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
+including support for event filters. For example '--pfm-events
+inst_retired:any_p:u:c=1:i'. More than one event can be passed to the
+option using the comma separator. Hardware events and generic hardware
+events cannot be mixed together. The latter must be used with the -e
+option. The -e option and this one can be mixed and matched. Events
+can be grouped using the {} notation.
+endif::HAVE_LIBPFM[]
+
+--control=fifo:ctl-fifo[,ack-fifo]::
+--control=fd:ctl-fd[,ack-fd]::
+ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
+Listen on ctl-fd descriptor for command to control measurement.
+
+Available commands:
+
+ - 'enable' : enable events
+ - 'disable' : disable events
+ - 'enable name' : enable event 'name'
+ - 'disable name' : disable event 'name'
+ - 'snapshot' : AUX area tracing snapshot).
+ - 'stop' : stop perf record
+ - 'ping' : ping
+ - 'evlist [-v|-g|-F] : display all events
+
+ -F Show just the sample frequency used for each event.
+ -v Show all fields.
+ -g Show event group information.
+
+Measurements can be started with events disabled using --delay=-1 option. Optionally
+send control command completion ('ack\n') to ack-fd descriptor to synchronize with the
+controlling process. Example of bash shell script to enable and disable events during
+measurements:
+
+ #!/bin/bash
+
+ ctl_dir=/tmp/
+
+ ctl_fifo=${ctl_dir}perf_ctl.fifo
+ test -p ${ctl_fifo} && unlink ${ctl_fifo}
+ mkfifo ${ctl_fifo}
+ exec {ctl_fd}<>${ctl_fifo}
+
+ ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
+ test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
+ mkfifo ${ctl_ack_fifo}
+ exec {ctl_fd_ack}<>${ctl_ack_fifo}
+
+ perf record -D -1 -e cpu-cycles -a \
+ --control fd:${ctl_fd},${ctl_fd_ack} \
+ -- sleep 30 &
+ perf_pid=$!
+
+ sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
+ sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
+
+ exec {ctl_fd_ack}>&-
+ unlink ${ctl_ack_fifo}
+
+ exec {ctl_fd}>&-
+ unlink ${ctl_fifo}
+
+ wait -n ${perf_pid}
+ exit $?
+
+--threads=<spec>::
+Write collected trace data into several data files using parallel threads.
+<spec> value can be user defined list of masks. Masks separated by colon
+define CPUs to be monitored by a thread and affinity mask of that thread
+is separated by slash:
+
+ <cpus mask 1>/<affinity mask 1>:<cpus mask 2>/<affinity mask 2>:...
+
+CPUs or affinity masks must not overlap with other corresponding masks.
+Invalid CPUs are ignored, but masks containing only invalid CPUs are not
+allowed.
+
+For example user specification like the following:
+
+ 0,2-4/2-4:1,5-7/5-7
+
+specifies parallel threads layout that consists of two threads,
+the first thread monitors CPUs 0 and 2-4 with the affinity mask 2-4,
+the second monitors CPUs 1 and 5-7 with the affinity mask 5-7.
+
+<spec> value can also be a string meaning predefined parallel threads
+layout:
+
+ - cpu - create new data streaming thread for every monitored cpu
+ - core - create new thread to monitor CPUs grouped by a core
+ - package - create new thread to monitor CPUs grouped by a package
+ - numa - create new threed to monitor CPUs grouped by a NUMA domain
+
+Predefined layouts can be used on systems with large number of CPUs in
+order not to spawn multiple per-cpu streaming threads but still avoid LOST
+events in data directory files. Option specified with no or empty value
+defaults to CPU layout. Masks defined or provided by the option value are
+filtered through the mask provided by -C option.
+
+--debuginfod[=URLs]::
+ Specify debuginfod URL to be used when cacheing perf.data binaries,
+ it follows the same syntax as the DEBUGINFOD_URLS variable, like:
+
+ http://192.168.122.174:8002
+
+ If the URLs is not specified, the value of DEBUGINFOD_URLS
+ system environment variable is used.
+
+--off-cpu::
+ Enable off-cpu profiling with BPF. The BPF program will collect
+ task scheduling information with (user) stacktrace and save them
+ as sample data of a software event named "offcpu-time". The
+ sample period will have the time the task slept in nanoseconds.
+
+ Note that BPF can collect stack traces using frame pointer ("fp")
+ only, as of now. So the applications built without the frame
+ pointer might see bogus addresses.
+
+ off-cpu profiling consists two types of samples: direct samples, which
+ share the same behavior as regular samples, and the accumulated
+ samples, stored in BPF stack trace map, presented after all the regular
+ samples.
+
+--off-cpu-thresh::
+ Once a task's off-cpu time reaches this threshold (in milliseconds), it
+ generates a direct off-cpu sample. The default is 500ms.
+
+--setup-filter=<action>::
+ Prepare BPF filter to be used by regular users. The action should be
+ either "pin" or "unpin". The filter can be used after it's pinned.
+
+
+include::intel-hybrid.txt[]
+
SEE ALSO
--------
-linkperf:perf-stat[1], linkperf:perf-list[1]
+linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1]