Age | Commit message (Collapse) | Author |
|
Add 'simd' sort field to visualize SIMD ops in 'perf report'.
Rows are labeled with the SIMD ISA, and the type of predicate (if any):
- [p] partial predicate
- [e] empty predicate (no elements in the vector being used)
Example with Arm SPE and SVE (Scalable Vector Extension):
#include <arm_sve.h>
double src[1025], dst[1025];
int main(void) {
svfloat64_t vc = svdup_f64(1);
for(;;)
for(int i = 0; i < 1025; i += svcntd())
{
svbool_t pg = svwhilelt_b64(i, 1025);
svfloat64_t vsrc = svld1(pg, &src[i]);
svfloat64_t vdst = svadd_x(pg, vsrc, vc);
svst1(pg, &dst[i], vdst);
}
return 0;
}
... compiled using "gcc-11 -march=armv8-a+sve -O3"
Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:
$ perf record -e arm_spe_0// -- ./a.out
$ perf report --itrace=i1i -s overhead,pid,simd,sym
Overhead Pid:Command Simd Symbol
........ ................ ....... ......................
53.76% 10758:program [.] main
46.14% 10758:program [.] SVE [.] main
0.09% 10758:program [p] SVE [.] main
The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
__hists__add_entry() creates a temporary entry and compare it with
existed histograms entries, if any existed entry equals to the
temporary entry it skips to allocation to avoid duplication.
The problem for support KVM event in histograms is it doesn't contain
any info to identify KVM event and can be used for comparison entries.
This patch adds 'kvm_info' field in the histograms entry which contains
the KVM event's key, this identifier will be used for comparison
histograms entries in later change.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In __hists__insert_output_entry(), it calls fmt->sort() for dynamic
entries with NULL to update column width for tracepoint fields.
But it's a hacky abuse of the sort callback, better to have a proper
callback for that. I'll add more use cases later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221215192817.2734573-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes users want to see actual (virtual) address of sampled instructions.
Add a new 'addr' sort key to display the raw addresses.
$ perf record -o- true | perf report -i- -s addr
# To display the perf.data header info, please use --header/--header-only options.
#
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
#
# Total Lost Samples: 0
#
# Samples: 12 of event 'cycles:u'
# Event count (approx.): 252512
#
# Overhead Address
# ........ ..................
#
42.96% 0x7f96f08443d7
29.55% 0x7f96f0859b50
14.76% 0x7f96f0852e02
8.30% 0x7f96f0855028
4.43% 0xffffffff8de01087
Note that it just compares and displays the sample ip. Each process can
have a different memory layout and the ip will be different even if they run
the same binary. So this sort key is mostly meaningful for per-process
profile data.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220923173142.805896-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This is a preparation to display accurate lost sample counts for
each evsel.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220901195739.668604-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Switch to the use of mutex wrappers that provide better error checking.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dario Petrillo <dario.pk1@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fangrui Song <maskray@google.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pavithra Gurushankar <gpavithrasha@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Weiguo Li <liwg06@foxmail.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Zechuan Chen <chenzechuan1@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Cc: yaowenbin <yaowenbin1@huawei.com>
Link: https://lore.kernel.org/r/20220826164242.43412-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
With the existing symbol_from/symbol_to, branches captured in the same
function would be collapsed into a single function if the latencies
associated with the each branch (cycles) were all the same. That is the
case on Intel Broadwell, for instance. Since Intel Skylake, the latency
is captured by hardware and therefore is used to disambiguate branches.
Add addr_from/addr_to sort dimensions to sort branches based on their
addresses and not the function there are in. The output is still the
function name but the offset within the function is provided to uniquely
identify each branch. These new sort dimensions also help with annotate
because they create different entries in the histogram which, in turn,
generates proper branch annotations.
Here is an example using AMD's branch sampling:
$ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg
$ perf report
Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycle
99.65% test_prg test_prg [.] test_thread [.] test_thread -
0.02% test_prg [kernel.vmlinux] [k] asm_sysvec_apic_timer_interrupt [k] error_entry -
$ perf report -F overhead,comm,dso,addr_from,addr_to
Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
Overhead Command Shared Object Source Address Target Address
4.22% test_prg test_prg [.] test_thread+0x3c [.] test_thread+0x4
4.13% test_prg test_prg [.] test_thread+0x4 [.] test_thread+0x3a
4.09% test_prg test_prg [.] test_thread+0x3a [.] test_thread+0x6
4.08% test_prg test_prg [.] test_thread+0x2 [.] test_thread+0x3c
4.06% test_prg test_prg [.] test_thread+0x3e [.] test_thread+0x2
3.87% test_prg test_prg [.] test_thread+0x6 [.] test_thread+0x38
3.84% test_prg test_prg [.] test_thread [.] test_thread+0x3e
3.76% test_prg test_prg [.] test_thread+0x1e [.] test_thread
3.76% test_prg test_prg [.] test_thread+0x38 [.] test_thread+0x8
3.56% test_prg test_prg [.] test_thread+0x22 [.] test_thread+0x1e
3.54% test_prg test_prg [.] test_thread+0x8 [.] test_thread+0x36
3.47% test_prg test_prg [.] test_thread+0x1c [.] test_thread+0x22
3.45% test_prg test_prg [.] test_thread+0x36 [.] test_thread+0xa
3.28% test_prg test_prg [.] test_thread+0x24 [.] test_thread+0x1c
3.25% test_prg test_prg [.] test_thread+0xa [.] test_thread+0x34
3.24% test_prg test_prg [.] test_thread+0x1a [.] test_thread+0x24
3.20% test_prg test_prg [.] test_thread+0x34 [.] test_thread+0xc
3.04% test_prg test_prg [.] test_thread+0x26 [.] test_thread+0x1a
3.01% test_prg test_prg [.] test_thread+0xc [.] test_thread+0x32
2.98% test_prg test_prg [.] test_thread+0x18 [.] test_thread+0x26
2.94% test_prg test_prg [.] test_thread+0x32 [.] test_thread+0xe
2.76% test_prg test_prg [.] test_thread+0x28 [.] test_thread+0x18
2.73% test_prg test_prg [.] test_thread+0xe [.] test_thread+0x30
2.67% test_prg test_prg [.] test_thread+0x30 [.] test_thread+0x10
2.67% test_prg test_prg [.] test_thread+0x16 [.] test_thread+0x28
2.46% test_prg test_prg [.] test_thread+0x10 [.] test_thread+0x2e
2.44% test_prg test_prg [.] test_thread+0x2a [.] test_thread+0x16
2.38% test_prg test_prg [.] test_thread+0x14 [.] test_thread+0x2a
2.32% test_prg test_prg [.] test_thread+0x2e [.] test_thread+0x12
2.28% test_prg test_prg [.] test_thread+0x12 [.] test_thread+0x2c
2.16% test_prg test_prg [.] test_thread+0x2c [.] test_thread+0x14
0.02% test_prg [kernel.vmlinux] [k] asm_sysvec_apic_ti+0x5 [k] error_entry
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20220208211637.2221872-13-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sort key 'p_stage_cyc' is used to present the latency cycles spent in
pipeline stages.
perf has local 'p_stage_cyc' sort key to display this info. There is no
global variant available for this sort key. The local variant shows
latency in a single sample, whereas the global value will be useful to
present the total latency (sum of latencies) in the hist entry. It
represents the latency number multiplied by the number of samples.
Add global ('p_stage_cyc') and local variant ('local_p_stage_cyc') for
this sort key. Use 'local_p_stage_cyc' as default option for "mem" sort
mode.
Also add this to the list of dynamic sort keys and made the
"dynamic_headers" and "arch_specific_sort_keys" as static.
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20211203022038.48240-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_hpp__column_unregister() removes an entry from a list but doesn't
free the memory causing a memory leak spotted by leak sanitizer.
Add the free while at the same time reducing the scope of the function
to static.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118071247.2140392-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To make the output more readable, I think it's better to remove 0's in
the output. Also the dummy event has no event stats so it just wasts
the space. Let's use the --skip-empty option to suppress it.
$ perf report --stat --skip-empty
Aggregated stats:
TOTAL events: 16530
MMAP events: 226
COMM events: 1596
EXIT events: 2
THROTTLE events: 121
UNTHROTTLE events: 117
FORK events: 1595
SAMPLE events: 719
MMAP2 events: 12147
CGROUP events: 2
FINISHED_ROUND events: 2
THREAD_MAP events: 1
CPU_MAP events: 1
TIME_CONV events: 1
cycles stats:
SAMPLE events: 719
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427013717.1651674-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Each struct hists have events_stats but most of the fields were not
used. It's to count number of samples and periods whether filtered or
not. And other fields are used only by evlist.
So it'd be better to split hists_stats and events_stats to reduce
wasted memory in the struct hists. This makes the output of event
statistics in the perf report compact by skipping 0 events in each
evsel/hists.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427013717.1651674-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The pipeline stage cycles details can be recorded on powerpc from the
contents of Performance Monitor Unit (PMU) registers. On ISA v3.1
platform, sampling registers exposes the cycles spent in different
pipeline stages. Patch adds perf tools support to present two of the
cycle counter information along with memory latency (weight).
Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
This is stored in 'var2_w' field of 'perf_sample_weight'.
Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
which is stored in 'var3_w' field of perf_sample_weight.
Add new sort function 'Pipeline Stage Cycle' and include this in
default_mem_sort_order[]. This new sort function may be used to denote
some other pipeline stage in another architecture. So add this to list
of sort entries that can have dynamic header string.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The instruction latency information can be recorded on some platforms,
e.g., the Intel Sapphire Rapids server. With both memory latency
(weight) and the new instruction latency information, users can easily
locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications in
different pipeline stages.
The 'weight' field is shared among different architectures. Reusing the
'weight' field may impacts other architectures. Add a new field to store
the instruction latency.
Like the 'weight' support, introduce a 'ins_lat' for the global
instruction latency, and a 'local_ins_lat' for the local instruction
latency version.
Add new sort functions, INSTR Latency and Local INSTR Latency,
accordingly.
Add local_ins_lat to the default_mem_sort_order[].
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Two new data source fields, to indicate the block reasons of a load
instruction, are introduced on the Intel Sapphire Rapids server. The
fields can be used by the memory profiling.
Add a new sort function, SORT_MEM_BLOCKED, for the two fields.
For the previous platforms or the block reason is unknown, print "N/A"
for the block reason.
Add blocked as a default mem sort key for perf report and perf mem
report.
Committer testing:
So in machines without this capability we get a "N/A" filling the new "Blocked"
column:
$ perf mem record ls
arch certs CREDITS Documentation include ipc Kconfig lib MAINTAINERS mm samples security usr block
COPYING crypto drivers fs init Kbuild kernel LICENSES Makefile net README scripts sound tools
virt
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
$
$ perf mem report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6 of event 'cpu/mem-loads,ldlat=30/Pu'
# Total weight : 1381
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked
# ........ ....... ............ .................... ....................... ............. ...................... ............ ..... ............ ...... .......
#
32.87% 1 454 Local RAM or RAM hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91cef3078 libc-2.31.so Hit L1 or L2 hit No N/A
25.56% 1 353 LFB or LFB hit [.] strcmp ld-2.31.so [.] 0x00005586973855ca ls None L1 or L2 hit No N/A
22.59% 1 312 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0e3b18 ld.so.cache None L1 or L2 hit No N/A
8.47% 1 117 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceee570 libc-2.31.so None L1 or L2 hit No N/A
6.88% 1 95 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceed490 libc-2.31.so None L1 or L2 hit No N/A
3.62% 1 50 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0ebe60 ld.so.cache None L1 or L2 hit No N/A
# Samples: 11 of event 'cpu/mem-stores/Pu'
# Total weight : 11
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked
# ........ ....... ............ ............. ....................... ............. ...................... ........... ..... .......... ...... .......
#
9.09% 1 0 L1 hit [.] __strcoll_l libc-2.31.so [.] 0x00007fffe5648fc8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_lookup_symbol_x ld-2.31.so [.] 0x00007fffe56490b8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_name_match_p ld-2.31.so [.] 0x00007fffe56487d8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_start ld-2.31.so [.] start_time+0x0 ld-2.31.so N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_sysdep_start ld-2.31.so [.] 0x00007fffe56494b8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5648ff8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649064 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649130 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xaf8 ld-2.31.so N/A N/A N/A N/A
9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xc28 ld-2.31.so N/A N/A N/A N/A
9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] 0x00007fffe56495b8 [stack] N/A N/A N/A N/A
# (Tip: Show user configuration overrides: perf config --user --list)
$
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a new sort dimension "code_page_size" for common sort.
With this option applied, perf can sort and report by sample's code page
size.
For example:
# perf report --stdio --sort=comm,symbol,code_page_size
# To display the perf.data header info, please use
# --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 3K of event 'mem-loads:uP'
# Event count (approx.): 1470769
#
# Overhead Command Symbol Code Page Size IPC [IPC Coverage]
# ........ ....... ............................ .............. ....................
#
69.56% dtlb [.] GetTickCount 4K - -
17.93% dtlb [.] Calibrate 4K - -
11.40% dtlb [.] __gettimeofday 4K - -
#
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-6-kan.liang@linux.intel.com
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a new sort option "data_page_size" for --mem-mode sort. With this
option applied, perf can sort and report by sample's data page size.
Here is an example:
perf report --stdio --mem-mode
--sort=comm,symbol,phys_daddr,data_page_size
# To display the perf.data header info, please use
# --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'mem-loads:uP'
# Total weight : 9028
# Sort order : comm,symbol,phys_daddr,data_page_size
#
# Overhead Command Symbol Data Physical
# Address
# Data Page Size
# ........ ....... ............................
# ...................... ......................
#
11.19% dtlb [.] touch_buffer [.] 0x00000003fec82ea8 4K
8.61% dtlb [.] GetTickCount [.] 0x00000003c4f2c8a8 4K
4.52% dtlb [.] GetTickCount [.] 0x00000003fec82f58 4K
4.33% dtlb [.] __gettimeofday [.] 0x00000003fec82f48 4K
4.32% dtlb [.] GetTickCount [.] 0x00000003fec82f78 4K
4.28% dtlb [.] GetTickCount [.] 0x00000003fec82f50 4K
4.23% dtlb [.] GetTickCount [.] 0x00000003fec82f70 4K
4.11% dtlb [.] GetTickCount [.] 0x00000003fec82f68 4K
4.00% dtlb [.] Calibrate [.] 0x00000003fec82f98 4K
3.91% dtlb [.] Calibrate [.] 0x00000003fec82f90 4K
3.43% dtlb [.] touch_buffer [.] 0x00000003fec82e98 4K
3.42% dtlb [.] touch_buffer [.] 0x00000003fec82e90 4K
0.09% dtlb [.] DoDependentLoads [.] 0x000000036ea084c0 2M
0.08% dtlb [.] DoDependentLoads [.] 0x000000032b010b80 2M
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201216185805.9981-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The cgroup sort key is to show cgroup membership of each task.
Currently it shows full path in the cgroupfs (not relative to the root
of cgroup namespace) since it'd be more intuitive IMHO. Otherwise root
cgroup in different namespaces will all show same name - "/".
The cgroup sort key should come before cgroup_id otherwise
sort_dimension__add() will match it to cgroup_id as it only matches with
the given substring.
For example it will look like following. Note that record patch adding
--all-cgroups patch will come later.
$ perf record -a --namespace --all-cgroups cgtest
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ]
$ perf report -s cgroup_id,cgroup,pid
...
# Overhead cgroup id (dev/inode) Cgroup Pid:Command
# ........ ..................... .......... ...............
#
93.96% 0/0x0 / 0:swapper
1.25% 3/0xeffffffb / 278:looper0
0.86% 3/0xf000015f /sub/cgrp1 280:cgtest
0.37% 3/0xf0000160 /sub/cgrp2 281:cgtest
0.34% 3/0xf0000163 /sub/cgrp3 282:cgtest
0.22% 3/0xeffffffb /sub 278:looper0
0.20% 3/0xeffffffb / 280:cgtest
0.15% 3/0xf0000163 /sub/cgrp3 285:looper3
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes we may need to reload the browser to update the output since
some options are changed.
This patch creates a new key K_RELOAD. Once the __cmd_report() returns
K_RELOAD, it would repeat the whole process, such as, read samples from
data file, sort the data and display in the browser.
v5:
---
1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
2. Skip setup_sorting() in repeat path if last key is K_RELOAD.
v4:
---
Need to quit in perf_evsel_menu__run if key is K_RELOAD.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Variable names are inconsistent in hists__for_each macro().
Due to this inconsistency, the macro replaces its second argument with
"fmt" regardless of its original name.
So far it works because only "fmt" is passed to the second argument.
However, this behavior is not expected and should be fixed.
Fixes: f0786af536bb ("perf hists: Introduce hists__for_each_format macro")
Fixes: aa6f50af822a ("perf hists: Introduce hists__for_each_sort_list macro")
Signed-off-by: Yuya Fujita <fujita.yuya@fujitsu.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/OSAPR01MB1588E1C47AC22043175DE1B2E8520@OSAPR01MB1588.jpnprd01.prod.outlook.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This patch supports jumping from tui total cycles view to symbol source
view.
For example,
perf record -b ./div
perf report --total-cycles
In total cycles view, we can select one entry and press 'a' or press
ENTER key to jump to symbol source view.
This patch also sets sort_order to NULL in cmd_report() which will use
the default branch sort order. The percent value in new annotate view
will be consistent with the percent in annotate view switched from perf
report (we observed the original percent gap with previous patches).
v2:
---
Fix the 'make NO_SLANG=1' error. (set __maybe_unused to
annotation_opts in block_hists_tui_browse()).
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It would be nice if we could jump to the assembler/source view (like the
normal perf report) from total cycles view.
This patch moves the block_hists_tui_browse from block-info.c to
ui/browsers/hists.c in order to reuse some browser codes (i.e
do_annotate) for implementing new annotation view.
v2:
---
Fix the 'make NO_SLANG=1' error. (Change 'int block_hists_tui_browse()'
to 'static inline int block_hists_tui_browse()')
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191118140849.20714-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We can get the per sample cycles by hist__account_cycles(). It's also
useful to know the total cycles of all samples in order to get the
cycles coverage for a single program block in further. For example:
coverage = per block sampled cycles / total sampled cycles
This patch creates a new argument 'total_cycles' in hist__account_cycles(),
which will be added with the cycles of each sample.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Its needed, was being obtained indirectly, fix it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-srzphk0ehptfn3zqmpkgsi65@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This will allow us to untangle the header dependency a bit more, as some
places will not need event.h anymore.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-enqncj29ovzaat3cd9203rwl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We only need a forward declaration, add it and fixup all the files that
need ui_progress definitions but were wrongly getting it from hist.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-84a90o9jdxybffxo9jmouokw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The event group feature links relevant hist entries among events so that
they can be displayed together. During the link process, each hist
entry in non-leader events is connected to a hist entry in the leader
event. This is done in order of events specified in the command line so
it assumes that events are linked in the order.
But 'perf top' can break the assumption since it does the link process
multiple times. For example, a hist entry can be in the third event
only at first so it's linked after the leader. Some time later, second
event has a hist entry for it and it'll be linked after the entry of the
third event.
This makes the code compilicated to deal with such unordered entries.
This patch simply unlink all the entries after it's printed so that they
can assume the correct order after the repeated link process. Also it'd
be easy to deal with decaying old entries IMHO.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190827231555.121411-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Rename struct perf_evlist to struct evlist, so we don't have a name
clash when we add struct perf_evlist in libperf.
Committer notes:
Added fixes to build on arm64, from Jiri and from me
(tools/perf/util/cs-etm.c)
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Rename struct perf_evsel to struct evsel, so we don't have a name clash
when we add struct perf_evsel in libperf.
Committer notes:
Added fixes for arm64, provided by Jiri.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
$ perf record -b ./div
$ perf record -b ./div
Following is the default perf diff output
$ perf diff
# Event 'cycles'
#
# Baseline Delta Abs Shared Object Symbol
# ........ ......... ................ ..................................
#
48.75% +0.33% div [.] main
8.21% -0.20% div [.] compute_flag
19.02% -0.12% libc-2.23.so [.] __random_r
16.17% -0.09% libc-2.23.so [.] __random
2.27% -0.03% div [.] rand@plt
+0.02% [i915] [k] gen8_irq_handler
5.52% +0.02% libc-2.23.so [.] rand
This patch creates a new computation selection 'cycles'.
$ perf diff -c cycles
# Event 'cycles'
#
# Baseline [Program Block Range] Cycles Diff Shared Object Symbol
# ........ ....................................... .........................................
#
48.75% [div.c:42 -> div.c:45] 147 div [.] main
48.75% [div.c:31 -> div.c:40] 4 div [.] main
48.75% [div.c:40 -> div.c:40] 0 div [.] main
48.75% [div.c:42 -> div.c:42] 0 div [.] main
48.75% [div.c:42 -> div.c:44] 0 div [.] main
19.02% [random_r.c:357 -> random_r.c:360] 0 libc-2.23.so [.] __random_r
19.02% [random_r.c:357 -> random_r.c:373] 0 libc-2.23.so [.] __random_r
19.02% [random_r.c:357 -> random_r.c:376] 0 libc-2.23.so [.] __random_r
19.02% [random_r.c:357 -> random_r.c:380] 0 libc-2.23.so [.] __random_r
19.02% [random_r.c:357 -> random_r.c:392] 0 libc-2.23.so [.] __random_r
16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random
16.17% [random.c:288 -> random.c:291] 0 libc-2.23.so [.] __random
16.17% [random.c:288 -> random.c:295] 0 libc-2.23.so [.] __random
16.17% [random.c:288 -> random.c:297] 0 libc-2.23.so [.] __random
16.17% [random.c:291 -> random.c:291] 0 libc-2.23.so [.] __random
16.17% [random.c:293 -> random.c:293] 0 libc-2.23.so [.] __random
8.21% [div.c:22 -> div.c:22] 148 div [.] compute_flag
8.21% [div.c:22 -> div.c:25] 0 div [.] compute_flag
8.21% [div.c:27 -> div.c:28] 0 div [.] compute_flag
5.52% [rand.c:26 -> rand.c:27] 0 libc-2.23.so [.] rand
5.52% [rand.c:26 -> rand.c:28] 0 libc-2.23.so [.] rand
2.27% [rand@plt+0 -> rand@plt+0] 0 div [.] rand@plt
0.01% [entry_64.S:694 -> entry_64.S:694] 16 [vmlinux] [k] native_irq_return_iret
0.00% [fair.c:7676 -> fair.c:7665] 162 [vmlinux] [k] update_blocked_averages
"[Program Block Range]" indicates the range of program basic block
(start -> end). If we can find the source line it prints the source line
otherwise it prints the symbol+offset instead.
v4:
---
Use source lines or symbol+offset to indicate the basic block. It should
be easier to understand.
v3:
---
Cast 'struct hist_entry' to 'struct block_hist' in hist_entry__block_fprintf.
Use symbol_conf.report_block to check if executing hist_entry__block_fprintf.
v2:
---
Keep standard perf diff format and display the 'Baseline' and
'Shared Object'.
The output is sorted by "Baseline" and the basic blocks in the same
function are sorted by cycles diff.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1561713784-30533-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The block_info contains the program basic block information, i.e,
contains the start address and the end address of this basic block and
how much cycles it takes.
We need to compare, sort and even print out the basic block by some
orders, i.e. sort by cycles.
For this purpose, we add block_info field to hist_entry. In order not to
impact current interface, we creates a new function
hists__add_entry_block.
v6:
---
Remove the 'ops' argument in hists__add_entry_block
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1561713784-30533-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now 'perf report' can show whole time periods with 'perf script', but
the user still has to find individual samples of interest manually.
It would be expensive and complicated to search for the right samples in
the whole perf file. Typically users only need to look at a small number
of samples for useful analysis.
Also the full scripts tend to show samples of all CPUs and all threads
mixed up, which can be very confusing on larger systems.
Add a new --samples option to save a small random number of samples per
hist entry.
Use a reservoir sample technique to select a representatve number of
samples.
Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or cpu of the sample is displayed. Then we use less' search
functionality to directly jump the to the time stamp of the selected
sample.
It uses different menus for assembler and source display. Assembler
needs xed installed and source needs debuginfo.
Currently it only supports as many samples as fit on the screen due to
some limitations in the slang ui code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311174605.GA29294@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The scripts menu traditionally only showed custom perf scripts.
Allow to run standard perf script with useful default options too.
- Normal perf script
- perf script with assembler (needs xed installed)
- perf script with source code output (needs debuginfo)
- perf script with custom arguments
Then we automatically select the right options to display the
information in the perf.data file.
For example with -b display branch contexts.
It's not easily possible to check for xed's existence in advance. perf
script usually gives sensible error messages when it's not available.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-7-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a time sort key to perf report to display samples for different time
quantums separately. This allows easier analysis of workloads that
change over time, and also will allow looking at the context of samples.
% perf record ...
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
0.67% 277061.87300 [.] _dl_start
0.50% 277061.87300 [.] f1
0.50% 277061.87300 [.] f2
0.33% 277061.87300 [.] main
0.29% 277061.87300 [.] _dl_lookup_symbol_x
0.29% 277061.87300 [.] dl_main
0.29% 277061.87300 [.] do_lookup_x
0.17% 277061.87300 [.] _dl_debug_initialize
0.17% 277061.87300 [.] _dl_init_paths
0.08% 277061.87300 [.] check_match
0.04% 277061.87300 [.] _dl_count_modids
1.33% 277061.87400 [.] f1
1.33% 277061.87400 [.] f2
1.33% 277061.87400 [.] main
1.17% 277061.87500 [.] main
1.08% 277061.87500 [.] f1
1.08% 277061.87500 [.] f2
1.00% 277061.87600 [.] main
0.83% 277061.87600 [.] f1
0.83% 277061.87600 [.] f2
1.00% 277061.87700 [.] main
Committer notes:
Rename 'time' argument to hist_time() to htime to overcome this in older
distros:
cc1: warnings being treated as errors
util/hist.c: In function 'hist_time':
util/hist.c:251: error: declaration of 'time' shadows a global declaration
/usr/include/time.h:186: error: shadowed declaration is here
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add perf_evsel__output_resort_cb() so we have an interface with a
callback for each hist entry. It will be used in the following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190204141808.23031-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add argument to hists__resort_cb_t so that we can pass data from upper
layers to the callback function. It will be used in the following
patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190204141808.23031-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Nothing that is provided by callchain.h is used there, just things that
should've be directly included in hist.h, such as rbtree.h and a
map_symbol forward declaration.
Remove it so that we reduce the headers dependency tree.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-zivvqfx93w5zzur7hr7h0nlh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To reduce the includes dependencies.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-cmvg5ght75mmfg1efeyna9rn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To allow headers just wanting this definition to be able to get it
without all the things in symbol.h, to reduce the include dep tree.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-l32z2qyhs6fe8unf4gk2ead2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something heavily required for histograms. Specifically, the following
are converted to rb_root_cached, and users accordingly:
hist::entries_in_array
hist::entries_in
hist::entries
hist::entries_collapsed
hist_entry::hroot_in
hist_entry::hroot_out
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net
[ Added some missing conversions to rb_first_cached() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Support displaying the average IPC and IPC coverage per symbol in 'perf
report' --tui and --stdio modes.
For example,
$ perf record -b ...
$ perf report -s symbol
Overhead Symbol IPC [IPC Coverage]
39.60% [.] __random 2.30 [ 54.8%]
18.02% [.] main 0.43 [ 54.3%]
14.21% [.] compute_flag 2.29 [100.0%]
14.16% [.] rand 0.36 [100.0%]
7.06% [.] __random_r 2.57 [ 70.5%]
6.85% [.] rand@plt 0.00 [ 0.0%]
Jiri Olsa <jolsa@redhat.com> provided the patch to support the --stdio
mode. I merged Jiri's code in this patch.
$ perf report -s symbol --stdio
# Overhead Symbol IPC [IPC Coverage]
# ........ ........................... ....................
#
39.60% [.] __random 2.30 [ 54.8%]
18.02% [.] main 0.43 [ 54.3%]
14.21% [.] compute_flag 2.29 [100.0%]
14.16% [.] rand 0.36 [100.0%]
7.06% [.] __random_r 2.57 [ 70.5%]
6.85% [.] rand@plt 0.00 [ 0.0%]
0.02% [k] run_timer_softirq 1.60 [ 57.2%]
The columns "IPC" and "[IPC Coverage]" are automatically enabled when
the sort-key "symbol" is specified. If the perf.data file doesn't
contain timed LBR information, columns are filled with "-".
For example,
# Overhead Symbol IPC [IPC Coverage]
# ........ ........................... ....................
#
46.57% [.] main - -
17.60% [.] rand - -
15.84% [.] __random_r - -
11.90% [.] __random - -
6.50% [.] compute_flag - -
1.59% [.] rand@plt - -
0.00% [.] _dl_relocate_object - -
0.00% [k] tlb_flush_mmu - -
0.00% [k] perf_event_mmap - -
0.00% [k] native_sched_clock - -
0.00% [k] intel_pmu_handle_irq_v4 - -
0.00% [k] native_write_msr - -
v3:
---
Removed the sortkey 'ipc' from command-line. The columns "IPC"
and "[IPC Coverage]" are automatically enabled when "symbol"
is specified.
v2:
---
Merge in Jiri's patch to support stdio mode
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We want to allow having mixed events with/without callchains, not
using a global flag to show callchains, but allowing supressing
callchains when they are present.
So invert the logic of the last parameter to hists__fprint() to
that effect.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ohqyisr6qge79qa95ojslptx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There are places where we have only access to struct hists and need to
know if any of its hist_entries has callchains, like when drawing
headers for the various output modes (stdio, TUI, etc), so, when adding
a new hist_entry, check if it has callchains, storing this info for
later use by hists__has_callchains().
This reimplementation is necessary because not always a 'struct hists'
is allocated together with a 'struct perf evsel', so we can't go from
'hists' to 'perf_event_attr.sample_type & PERF_SAMPLE_CALLCHAIN'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hg5g7yddjio3ljwyqnnaj5dt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So far if we use 'perf record -g' this will make
symbol_conf.use_callchain 'true' and logic will assume that all events
have callchains enabled, but ever since we added the possibility of
setting up callchains for some events (e.g.: -e
cycles/call-graph=dwarf/) while not for others, we limit usage scenarios
by looking at that symbol_conf.use_callchain global boolean, we better
look at each event attributes.
On the road to that we need to look if a hist_entry has callchains, that
is, to go from hist_entry->hists to the evsel that contains it, to then
look at evsel->sample_type for PERF_SAMPLE_CALLCHAIN.
The next step is to add a symbol_conf.ignore_callchains global, to use
in the places where what we really want to know is if callchains should
be ignored, even if present.
Then -g will mean just to select a callchain mode to be applied to all
events not explicitely setting some other callchain mode, i.e. a default
callchain mode, and --no-call-graph will set
symbol_conf.ignore_callchains with that clear intention.
That too will at some point become a per evsel thing, that tools can set
for all or just a few of its evsels.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0sas5cm4dsw2obn75g7ruz69@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So that things changed in the command line may percolate to the browser
code without using globals.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-5daawc40zhl6gcs600com1ua@git.kernel.org
[ Merged fix for NO_SLANG=1 build provided by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
That is not use any struct hists_browser internals, so that it can be
shared with the other UIs and tools.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935
Link: https://lkml.kernel.org/n/tip-w8mczjnqnbcj9yzfkv9ja6ro@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add DSO size to perf report/top sort output list.
This includes adding a map__size fn to map.h, which is
approximately equal to the DSO data file_size:
DSO file size map (end-start) file / (end-start)
libwebkit2gtk-4.0.so.37.24.9 43260072 41295872 95%
libglib-2.0.so.0.5400.1 1125680 1118208 99%
libc-2.26.so 1960656 1925120 101%
libdbus-1.so.3.14.13 309456 303104 102%
Sample output:
$ ./perf report -s dso_size,dso
Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340
Overhead DSO size Shared Object
90.62% unknown [unknown]
2.87% 1118208 libglib-2.0.so.0.5400.1
1.92% 303104 libdbus-1.so.3.14.13
1.42% 1925120 libc-2.26.so
0.77% 41295872 libwebkit2gtk-4.0.so.37.24.9
0.61% 335872 libgobject-2.0.so.0.5400.1
0.41% 1052672 libgdk-3.so.0.2200.25
0.36% 106496 libpthread-2.26.so
0.29% 221184 dbus-daemon
0.17% 159744 ld-2.26.so
0.13% 49152 libwayland-client.so.0.3.0
0.12% 1642496 libgio-2.0.so.0.5400.1
0.09% 7327744 libgtk-3.so.0.2200.25
0.09% 12324864 libmozjs-52.so.0.0.0
0.05% 4796416 perf
0.04% 843776 libgjs.so.0.0.0
0.03% 1409024 libmutter-clutter-1.so
Committer testing:
To sort by DSO size, use:
# perf report -F dso_size,dso,overhead -s dso_size
<SNIP>
3465216 libdns-export.so.174.0.1 0.00%
3522560 libgc.so.1.0.3 0.00%
3538944 libbfd-2.29-13.fc27.so 0.59%
3670016 libunistring.so.2.1.0 0.00%
3723264 libguile-2.0.so.22.8.1 0.00%
3776512 libgio-2.0.so.0.5400.3 0.00%
3891200 libc-2.26.so 0.96%
3944448 libmozjs-17.0.so 0.00%
4218880 libperl.so.5.26.1 0.18%
4452352 libpython2.7.so.1.0 0.02%
4472832 perf 0.02%
4603904 git 0.01%
4751360 libcrypto.so.1.1.0g 0.00%
5005312 libslang.so.2.3.1 0.00%
7315456 libgtk-3.so.0.2200.26 0.09%
8818688 i965_dri.so 2.46%
8818688 i965_dri.so (deleted) 1.26%
12414976 libmozjs-52.so.0.0.0 0.03%
23642112 cc1 2.02%
27889664 [kernel.kallsyms] 25.41%
80834560 libxul.so (deleted) 15.68%
98078720 chrome 32.03%
1056964608 [kernel.kallsyms] 1.59%
#
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Jin Yao reported memory corrupton in perf report with
branch info used for stack trace:
> Following command lines will cause perf crash.
> perf record -j call -g -a <application>
> perf report --branch-history
>
> *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 ***
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725]
> /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc]
> perf[0x51b914]
> perf(hist_entry_iter__add+0x1e5)[0x51f305]
> perf[0x43cf01]
> perf[0x4fa3bf]
> perf[0x4fa923]
> perf[0x4fd396]
> perf[0x4f9614]
> perf(perf_session__process_events+0x89e)[0x4fc38e]
> perf(cmd_report+0x15d2)[0x43f202]
> perf[0x4a059f]
> perf(main+0x631)[0x427b71]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830]
> perf(_start+0x29)[0x427d89]
For the cumulative output, we allocate the he_cache array based on the
--max-stack option value and populate it with data from 'callchain_cursor'.
The --max-stack option value does not ensure now the limit for number of
callchain_cursor nodes, so the cumulative iter code will allocate smaller array
than it's actually needed and cause above corruption.
I think the --max-stack limit does not apply here anyway, because we add
callchain data as normal hist entries, while the --max-stack control the limit
of single entry callchain depth.
Using the callchain_cursor.nr as he_cache array count to fix this. Also
removing struct hist_entry_iter::max_stack, because there's no longer any use
for it.
We need more fixes to ensure that the branch stack code follows properly the
logic of --max-stack, which is not the case at the moment.
Original-patch-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180216123619.GA9945@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|