summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation/perf-stat.txt
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation/perf-stat.txt')
-rw-r--r--tools/perf/Documentation/perf-stat.txt84
1 files changed, 58 insertions, 26 deletions
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 8f789fa1242e..1a766d4a2233 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -308,6 +308,14 @@ use --per-die in addition to -a. (system-wide). The output includes the
die number and the number of online processors on that die. This is
useful to gauge the amount of aggregation.
+--per-cluster::
+Aggregate counts per processor cluster for system-wide mode measurement. This
+is a useful mode to detect imbalance between clusters. To enable this mode,
+use --per-cluster in addition to -a. (system-wide). The output includes the
+cluster number and the number of online processors on that cluster. This is
+useful to gauge the amount of aggregation. The information of cluster ID and
+related CPUs can be gotten from /sys/devices/system/cpu/cpuX/topology/cluster_{id, cpus}.
+
--per-cache::
Aggregate counts per cache instance for system-wide mode measurements. By
default, the aggregation happens for the cache level at the highest index
@@ -396,6 +404,9 @@ Aggregate counts per processor socket for system-wide mode measurements.
--per-die::
Aggregate counts per processor die for system-wide mode measurements.
+--per-cluster::
+Aggregate counts perf processor cluster for system-wide mode measurements.
+
--per-cache::
Aggregate counts per cache instance for system-wide mode measurements. By
default, the aggregation happens for the cache level at the highest index
@@ -422,7 +433,34 @@ See perf list output for the possible metrics and metricgroups.
-A::
--no-aggr::
-Do not aggregate counts across all monitored CPUs.
+--no-merge::
+Do not aggregate/merge counts across monitored CPUs or PMUs.
+
+When multiple events are created from a single event specification,
+stat will, by default, aggregate the event counts and show the result
+in a single row. This option disables that behavior and shows the
+individual events and counts.
+
+Multiple events are created from a single event specification when:
+
+1. PID monitoring isn't requested and the system has more than one
+ CPU. For example, a system with 8 SMT threads will have one event
+ opened on each thread and aggregation is performed across them.
+
+2. Prefix or glob wildcard matching is used for the PMU name. For
+ example, multiple memory controller PMUs may exist typically with a
+ suffix of _0, _1, etc. By default the event counts will all be
+ combined if the PMU is specified without the suffix such as
+ uncore_imc rather than uncore_imc_0.
+
+3. Aliases, which are listed immediately after the Kernel PMU events
+ by perf list, are used.
+
+--hybrid-merge::
+Merge core event counts from all core PMUs. In hybrid or big.LITTLE
+systems by default each core PMU will report its count
+separately. This option forces core PMU counts to be combined to give
+a behavior closer to having a single CPU type in the system.
--topdown::
Print top-down metrics supported by the CPU. This allows to determine
@@ -460,6 +498,21 @@ To interpret the results it is usually needed to know on which
CPUs the workload runs on. If needed the CPUs can be forced using
taskset.
+--record-tpebs::
+Enable automatic sampling on Intel TPEBS retire_latency events (event with :R
+modifier). Without this option, perf would not capture dynamic retire_latency
+at runtime. Currently, a zero value is assigned to the retire_latency event when
+this option is not set. The TPEBS hardware feature starts from Intel Granite
+Rapids microarchitecture. This option only exists in X86_64 and is meaningful on
+Intel platforms with TPEBS feature.
+
+--tpebs-mode=[mean|min|max|last]::
+Set how retirement latency events have their sample times
+combined. The default "mean" gives the average of retirement
+latency. "min" or "max" give the smallest or largest retirment latency
+times respectively. "last" uses the last retirment latency sample's
+time.
+
--td-level::
Print the top-down statistics that equal the input level. It allows
users to print the interested top-down metrics level instead of the
@@ -475,29 +528,6 @@ highlight 'tma_frontend_bound'. This metric may be drilled into with
Error out if the input is higher than the supported max level.
---no-merge::
-Do not merge results from same PMUs.
-
-When multiple events are created from a single event specification,
-stat will, by default, aggregate the event counts and show the result
-in a single row. This option disables that behavior and shows
-the individual events and counts.
-
-Multiple events are created from a single event specification when:
-1. Prefix or glob matching is used for the PMU name.
-2. Aliases, which are listed immediately after the Kernel PMU events
- by perf list, are used.
-
---hybrid-merge::
-Merge the hybrid event counts from all PMUs.
-
-For hybrid events, by default, the stat aggregates and reports the event
-counts per PMU. But sometimes, it's also useful to aggregate event counts
-from all PMUs. This option enables that behavior and reports the counts
-without PMUs.
-
-For non-hybrid events, it should be no effect.
-
--smi-cost::
Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
@@ -610,18 +640,20 @@ JSON FORMAT
With -j, perf stat is able to print out a JSON format output
that can be used for parsing.
-- timestamp : optional usec time stamp in fractions of second (with -I)
+- interval : optional timestamp in fractions of second (with -I)
- optional aggregate options:
- core : core identifier (with --per-core)
- die : die identifier (with --per-die)
- socket : socket identifier (with --per-socket)
- node : node identifier (with --per-node)
- thread : thread identifier (with --per-thread)
+- counters : number of aggregated PMU counters
- counter-value : counter value
- unit : unit of the counter value or empty
- event : event name
- variance : optional variance if multiple values are collected (with -r)
-- runtime : run time of counter
+- event-runtime : run time of the event
+- pcnt-running : percentage of time the event was running
- metric-value : optional metric value
- metric-unit : optional unit of metric