diff options
Diffstat (limited to 'tools/perf/Documentation/perf-stat.txt')
| -rw-r--r-- | tools/perf/Documentation/perf-stat.txt | 84 |
1 files changed, 58 insertions, 26 deletions
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 8f789fa1242e..1a766d4a2233 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -308,6 +308,14 @@ use --per-die in addition to -a. (system-wide). The output includes the die number and the number of online processors on that die. This is useful to gauge the amount of aggregation. +--per-cluster:: +Aggregate counts per processor cluster for system-wide mode measurement. This +is a useful mode to detect imbalance between clusters. To enable this mode, +use --per-cluster in addition to -a. (system-wide). The output includes the +cluster number and the number of online processors on that cluster. This is +useful to gauge the amount of aggregation. The information of cluster ID and +related CPUs can be gotten from /sys/devices/system/cpu/cpuX/topology/cluster_{id, cpus}. + --per-cache:: Aggregate counts per cache instance for system-wide mode measurements. By default, the aggregation happens for the cache level at the highest index @@ -396,6 +404,9 @@ Aggregate counts per processor socket for system-wide mode measurements. --per-die:: Aggregate counts per processor die for system-wide mode measurements. +--per-cluster:: +Aggregate counts perf processor cluster for system-wide mode measurements. + --per-cache:: Aggregate counts per cache instance for system-wide mode measurements. By default, the aggregation happens for the cache level at the highest index @@ -422,7 +433,34 @@ See perf list output for the possible metrics and metricgroups. -A:: --no-aggr:: -Do not aggregate counts across all monitored CPUs. +--no-merge:: +Do not aggregate/merge counts across monitored CPUs or PMUs. + +When multiple events are created from a single event specification, +stat will, by default, aggregate the event counts and show the result +in a single row. This option disables that behavior and shows the +individual events and counts. + +Multiple events are created from a single event specification when: + +1. PID monitoring isn't requested and the system has more than one + CPU. For example, a system with 8 SMT threads will have one event + opened on each thread and aggregation is performed across them. + +2. Prefix or glob wildcard matching is used for the PMU name. For + example, multiple memory controller PMUs may exist typically with a + suffix of _0, _1, etc. By default the event counts will all be + combined if the PMU is specified without the suffix such as + uncore_imc rather than uncore_imc_0. + +3. Aliases, which are listed immediately after the Kernel PMU events + by perf list, are used. + +--hybrid-merge:: +Merge core event counts from all core PMUs. In hybrid or big.LITTLE +systems by default each core PMU will report its count +separately. This option forces core PMU counts to be combined to give +a behavior closer to having a single CPU type in the system. --topdown:: Print top-down metrics supported by the CPU. This allows to determine @@ -460,6 +498,21 @@ To interpret the results it is usually needed to know on which CPUs the workload runs on. If needed the CPUs can be forced using taskset. +--record-tpebs:: +Enable automatic sampling on Intel TPEBS retire_latency events (event with :R +modifier). Without this option, perf would not capture dynamic retire_latency +at runtime. Currently, a zero value is assigned to the retire_latency event when +this option is not set. The TPEBS hardware feature starts from Intel Granite +Rapids microarchitecture. This option only exists in X86_64 and is meaningful on +Intel platforms with TPEBS feature. + +--tpebs-mode=[mean|min|max|last]:: +Set how retirement latency events have their sample times +combined. The default "mean" gives the average of retirement +latency. "min" or "max" give the smallest or largest retirment latency +times respectively. "last" uses the last retirment latency sample's +time. + --td-level:: Print the top-down statistics that equal the input level. It allows users to print the interested top-down metrics level instead of the @@ -475,29 +528,6 @@ highlight 'tma_frontend_bound'. This metric may be drilled into with Error out if the input is higher than the supported max level. ---no-merge:: -Do not merge results from same PMUs. - -When multiple events are created from a single event specification, -stat will, by default, aggregate the event counts and show the result -in a single row. This option disables that behavior and shows -the individual events and counts. - -Multiple events are created from a single event specification when: -1. Prefix or glob matching is used for the PMU name. -2. Aliases, which are listed immediately after the Kernel PMU events - by perf list, are used. - ---hybrid-merge:: -Merge the hybrid event counts from all PMUs. - -For hybrid events, by default, the stat aggregates and reports the event -counts per PMU. But sometimes, it's also useful to aggregate event counts -from all PMUs. This option enables that behavior and reports the counts -without PMUs. - -For non-hybrid events, it should be no effect. - --smi-cost:: Measure SMI cost if msr/aperf/ and msr/smi/ events are supported. @@ -610,18 +640,20 @@ JSON FORMAT With -j, perf stat is able to print out a JSON format output that can be used for parsing. -- timestamp : optional usec time stamp in fractions of second (with -I) +- interval : optional timestamp in fractions of second (with -I) - optional aggregate options: - core : core identifier (with --per-core) - die : die identifier (with --per-die) - socket : socket identifier (with --per-socket) - node : node identifier (with --per-node) - thread : thread identifier (with --per-thread) +- counters : number of aggregated PMU counters - counter-value : counter value - unit : unit of the counter value or empty - event : event name - variance : optional variance if multiple values are collected (with -r) -- runtime : run time of counter +- event-runtime : run time of the event +- pcnt-running : percentage of time the event was running - metric-value : optional metric value - metric-unit : optional unit of metric |
