Age | Commit message (Collapse) | Author |
|
And into a separate util/record.h, to better isolate things and make
sure that those who use record_opts and the other moved declarations
are explicitly including the necessary header.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-31q8mei1qkh74qvkl9nwidfq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Just a forward declaration for 'struct timespec' is needed, ditch the
rest.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-6shdqw801oqe7ax6r307k27r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
From a quick look this was never needed and just polluted the build,
needlessly making things including cpumap.h to be rebuild if perf.h or
anything it includes gets changed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-x10p8slllqkn3fc3bntjx3n0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Following the patch 'perf stat: Fix --no-scale', an alignment trap
happens in process_counter_values() on ARMv7 platforms due to the
attempt to copy non 64 bits aligned double words (pointed by 'count')
via a NEON vectored instruction ('vld1' with 64 bits alignment
constraint).
This patch sets a 64 bits alignment constraint on 'contents[]' field in
'struct xyarray' since the 'count' pointer used above points to such a
structure.
Signed-off-by: Gerald Baeza <gerald.baeza@st.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1566464769-16374-1-git-send-email-gerald.baeza@st.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If c2c is recorded on a machine where any cpus are offline, 'perf c2c
report' throws an error "node/cpu topology bugFailed setup nodes".
It fails because while preparing node-cpu mapping we don't consider
offline cpus.
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Fixes: 1e181b92a2da ("perf c2c report: Add 'node' sort key")
Link: http://lkml.kernel.org/r/20190822085045.25108-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So it's part of libperf library as basic functions operating on
perf_thread_map objects.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190822111141.25823-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The util/cpumap.h file doesn't use anything in refcount.h not in
debug.h, it needs just a forward reference to 'struct cpu_map_data',
that is defined in util/event.h and cpumap.h was getting indirectly via,
of all things, debug.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-mtjww98yptt4ppo6g2blavg5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We don't need what is in perf's util/cpumap.h, just the struct cpu_map
that is in libperf's internal/cpumap.h file to cover this one case:
tools/perf/util/evsel.h:215:27: error: dereferencing pointer to incomplete type ‘struct perf_cpu_map’
215 | return evsel__cpus(evsel)->nr;
So switch to libperf's cpumap.h and add some missing struct foward
declarations and include sys/types.h to get pid_t.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ufjkpohijti05ggk69s91ktf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It uses strcmp(), strstr() and was getting the required string.h header
by luck, from evsel.h -> cpumap.h -> debug.h -> string.h, add the
missing header.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-qrz8hhvrhwnmt5ocfwk4br5d@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
And it was getting it by luck from util/cpumap.h that shouldn't be
included in util/evsel.h as it only needs what is in libperf, i.e.
struct cpu_map, that is in internal/cpumap.h, so add stdio.h before
we fix that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-2ywx5sl031tj3zske7c7edgv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We added it in 07ac002f2fcc ("perf evsel: Introduce is_group_member
method") but we already ditched that function, and there was nothing
else left that needed NULL nor anything else from stddef.h, ditch it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-1zy0xfsy61x81f3fpyx5znco@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We need only a struct forward declaration, so prune the header
dependency tree a bit more.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-oqvgf04w4ku8xasrz79zquim@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since util/evsel.h uses perf_evsel__cpus() that has its prototype in
libperf's perf/evsel.h file, we need it explicitely included.
This was working by luck as util/evsel.h includes counts.h, but that is
not necessary, just some forward declarations, so, before we remove
counts.h from util/evsel.h, add what is realli needed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-nfb9e0t4jm9zhvr0q86hc29d@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It is getting this via evsel.h, that don't strictly need counts.h, just
forward declarations for some structs, so add it here before we remove
it from there.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-6bxk3ltwkw91qcld2ot86bgg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It is getting this via evsel.h, that don't strictly need counts.h, just
forward declarations for some structs, so add it here before we remove
it from there.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-jwcbm9gv9llloe3he5qkdefs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Those are getting counts.h via evsel.h, that don't strictly need
counts.h, just forward declarations for some structs, so add it here
before we remove it from there.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-phldqlfxxu563txja7evd4zt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It is getting this via evsel.h, that don't strictly need counts.h, just
forward declarations for some structs, so add it here before we remove
it from there.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-q4shpvlxyjqz7val1hyrdak9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It gets it very indirectly, via evsel.h -> counts.h, and since counts.h
doesn't need xyarray.h at all, add it here before we remove it there.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-hkizv6gojwfklj9ezaiiztll@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This was being obtained indirectly via evsel.h -> counts.h, since we
don't need xyarray in counts.h, we need to add it here explicitely
before removing it from counts.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-jirmxg527i82yz31bwad9we7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We get these by sheer luck, since we're cleaning unneeded headers use,
this needs to be done first to avoid breakage down the line.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-p7bncbi53t4p2kobkbmu86a4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
All we need in util/evsel.h is the foward declaration of 'struct
xyarray', not the internal/xyarray.h, that can be moved to util/evsel.c
and then we reduce the header dependency tree.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-wwqce6ixwcyq6yzx3ljrdm80@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There we need just some struct forward declarations, do that instead and
add the includes needed by metricgroup.c.
That should help with needless rebuilds when changing the removed
headers from metricgroup.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-1fkskjws6imir2hhztqhdyb0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It uses strstr(), needs to include string.h or its not going to build
when we remove string.h from the place it is getting from indirectly, by
luck.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-72y0i0uiaqght5b83e3ae7p4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This file uses pr_debug() but isn't including debug.h, getting it by
luck, fix it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-t7pisnsdfh88kclpw52jcwl7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
As an internal function that will be used by both perf and libperf, but
is not exported at this point.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190822111141.25823-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So it's part of the libperf library as one of basic functions operating
on the perf_cpu_map class.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190822111141.25823-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Switch the rest of the perf code to use libperf's perf_cpu_map__nr(),
which is the same as current cpu_map__nr() and remove the cpu_map__nr()
function.
Link: http://lkml.kernel.org/n/tip-6e0guy75clis7nm0xpuz9fga@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190822111141.25823-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Guenter Roeck reported problem with compilation when the ARCH is
specified:
$ make ARCH=x86_64
In file included from tools/include/asm/atomic.h:6:0,
from include/linux/atomic.h:5,
from tools/include/linux/refcount.h:41,
from cpumap.c:4: tools/include/asm/../../arch/x86/include/asm/atomic.h:11:10:
fatal error: asm/cmpxchg.h: No such file or directory
The problem is that we don't use SRCARCH (the sanitized ARCH version)
and we don't get the proper include path.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 314350491810 ("libperf: Make libperf.a part of the perf build")
Link: http://lkml.kernel.org/r/20190820124624.GG24105@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Give visual cue about what is happening while initially collecting the
minimal set of samples to collect/sort/display.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-xcui60p1v6ozijfam2o89ya8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
available to display
The 'perf top' tool will use that to avoid having a initial blank screen
while collecting the minimum number of samples to sort and display.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-89ciceg8cy4442he3t0jzo3f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes we want just to print a message on the center of the screen,
like in 'perf top' while we wait for the minimum amount of samples to be
collected before sorting and showing them.
Also expose __ui__info_window() as an optimization for cases where such
message is to be printed while holding the ui lock.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-uat0f89vfwl2w52kv9wzwd8a@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We will not need it when refactoring this function to be
non-interactive, so make it optional.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-pnx1dn17bsz7lqt9ty95nnjx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The synthetic branch and instruction samples are missed to set
instruction related info, thus the perf tool fails to display samples
with flags '-F,+insn,+insnlen'.
The CoreSight trace decoder provides sufficient information to decide
the instruction size based on the ISA type: A64/A32 instructions are
32-bit size, but one exception is the T32 instruction size, which might
be 32-bit or 16-bit.
This patch handles these cases and it reads the instruction values from
DSO file; thus can support the flags '-F,+insn,+insnlen'.
Before:
# perf script -F,insn,insnlen,ip,sym
0 [unknown] ilen: 0
ffff97174044 _start ilen: 0
ffff97174938 _dl_start ilen: 0
ffff97174938 _dl_start ilen: 0
ffff97174938 _dl_start ilen: 0
ffff97174938 _dl_start ilen: 0
ffff97174938 _dl_start ilen: 0
ffff97174938 _dl_start ilen: 0
ffff97174938 _dl_start ilen: 0
ffff97174938 _dl_start ilen: 0
[...]
After:
# perf script -F,insn,insnlen,ip,sym
0 [unknown] ilen: 0
ffff97174044 _start ilen: 4 insn: 2f 02 00 94
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
ffff97174938 _dl_start ilen: 4 insn: c1 ff ff 54
[...]
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190815082854.18191-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Display DWARF based callchains when the perf.data file contains raw thread
stack data as LBR callstack data.
Commiter testing:
This changes the output from the branch stack based one, i.e. without
this patch, for the same file as in the previous csets:
# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 13
#
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ........................... ......................................... ..................
#
7.69% ls libpthread-2.29.so [.] _init [.] __pthread_initialize_minimal_internal 6827
7.69% ls ld-2.29.so [k] _start [k] _dl_start -
7.69% ls ld-2.29.so [.] _dl_start_user [.] _dl_init -24790
7.69% ls ld-2.29.so [k] _dl_start [k] _dl_sysdep_start 278
7.69% ls ld-2.29.so [k] dl_main [k] _dl_map_object_deps 15581
7.69% ls ld-2.29.so [k] open_verify.constprop.0 [k] lseek64 4228
7.69% ls ld-2.29.so [k] _dl_map_object [k] open_verify.constprop.0 55
7.69% ls ld-2.29.so [k] openaux [k] _dl_map_object 67
7.69% ls ld-2.29.so [k] _dl_map_object_deps [k] 0x00007f441b57c090 112
7.69% ls ld-2.29.so [.] call_init.part.0 [.] _init 334
7.69% ls ld-2.29.so [.] _dl_init [.] call_init.part.0 383
7.69% ls ld-2.29.so [k] _dl_sysdep_start [k] dl_main 45
7.69% ls ld-2.29.so [k] _dl_catch_exception [k] openaux 116
#
# (Tip: For memory address profiling, try: perf mem record / perf mem report)
#
To the one that shows call chains:
# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 10 of event 'cycles'
# Event count (approx.): 3204047
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. .........................................
#
55.01% 0.00% ls [kernel.vmlinux] [k] entry_SYSCALL_64_after_hwframe
|
---entry_SYSCALL_64_after_hwframe
do_syscall_64
|
--16.01%--__x64_sys_execve
__do_execve_file.isra.0
search_binary_handler
load_elf_binary
elf_map
vm_mmap_pgoff
do_mmap
mmap_region
perf_event_mmap
perf_iterate_sb
perf_iterate_ctx
perf_event_mmap_output
perf_output_copy
memcpy_erms
55.01% 39.00% ls [kernel.vmlinux] [k] do_syscall_64
|
|--39.00%--0xffffffffffffffff
| _dl_map_object
| open_verify.constprop.0
| __lseek64 (inlined)
| entry_SYSCALL_64_after_hwframe
| do_syscall_64
|
--16.01%--do_syscall_64
__x64_sys_execve
__do_execve_file.isra.0
search_binary_handler
load_elf_binary
elf_map
vm_mmap_pgoff
do_mmap
mmap_region
perf_event_mmap
perf_iterate_sb
perf_iterate_ctx
perf_event_mmap_output
perf_output_copy
memcpy_erms
42.95% 42.95% ls libpthread-2.29.so [.] __pthread_initialize_minimal_internal
|
---_init
__pthread_initialize_minimal_internal
42.95% 0.00% ls libpthread-2.29.so [.] _init
|
---_init
__pthread_initialize_minimal_internal
<SNIP>
#
# (Tip: Profiling branch (mis)predictions with: perf record -b / perf report)
#
#
The branch stack view be explicitely selected using:
# perf report -h branch-stack
Usage: perf report [<options>]
-b, --branch-stack use branch records for per branch histogram filling
#
I.e. after this patch:
# perf report -b --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 13
#
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ........................... ......................................... ..................
#
7.69% ls libpthread-2.29.so [.] _init [.] __pthread_initialize_minimal_internal 6827
7.69% ls ld-2.29.so [k] _start [k] _dl_start -
7.69% ls ld-2.29.so [.] _dl_start_user [.] _dl_init -24790
7.69% ls ld-2.29.so [k] _dl_start [k] _dl_sysdep_start 278
7.69% ls ld-2.29.so [k] dl_main [k] _dl_map_object_deps 15581
7.69% ls ld-2.29.so [k] open_verify.constprop.0 [k] lseek64 4228
7.69% ls ld-2.29.so [k] _dl_map_object [k] open_verify.constprop.0 55
7.69% ls ld-2.29.so [k] openaux [k] _dl_map_object 67
7.69% ls ld-2.29.so [k] _dl_map_object_deps [k] 0x00007f441b57c090 112
7.69% ls ld-2.29.so [.] call_init.part.0 [.] _init 334
7.69% ls ld-2.29.so [.] _dl_init [.] call_init.part.0 383
7.69% ls ld-2.29.so [k] _dl_sysdep_start [k] dl_main 45
7.69% ls ld-2.29.so [k] _dl_catch_exception [k] openaux 116
#
# (Tip: Show current config key-value pairs: perf config --list)
#
#
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ccbd9583-82f4-dec5-7e84-64bf56e351fb@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Make perf report -D command print captured LBR callstack chain when it is
collected together with raw thread stack data:
2752673087247083 0x5d10 [0x548]: PERF_RECORD_SAMPLE(IP, 0x4002): 5841/5841: 0x40121f period: 1543862 addr: 0
... FP chain: nr:0
... branch callstack: nr:3
..... 0: 00000000004011d0
..... 1: 00007f393c388411
..... 2: 0000000000401098
... user regs: mask 0xff0fff ABI 64-bit
.... AX 0x34e7
.... BX 0x7fff5f6dd3c0
.... CX 0xffffffff
.... DX 0x34e6
.... SI 0x7f393c5268d0
.... DI 0x0
.... BP 0x401260
.... SP 0x7fff5f6dd3c0
.... IP 0x40121f
.... FLAGS 0x29f
.... CS 0x33
.... SS 0x2b
.... R8 0x7f393c526800
.... R9 0x7f393c525da0
.... R10 0xfffffffffffff70a
.... R11 0x246
.... R12 0x401070
.... R13 0x7fff5f6ddcb0
.... R14 0x0
.... R15 0x0
... ustack: size 1024, offset 0x130
. data_src: 0x5080021
... thread: stack_test:5841
...... dso: /root/abudanko/stacks/stack_test
Committer testing:
# perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.042 MB perf.data (10 samples) ]
#
Before:
# perf report -D |& grep PERF_RECORD_SAMPLE -A28 | tail -29
67538909824483 0xa7a0 [0x560]: PERF_RECORD_SAMPLE(IP, 0x4002): 9721/9721: 0x7f441b2b1e20 period: 1376095 addr: 0
... FP chain: nr:0
... user regs: mask 0xff0fff ABI 64-bit
.... AX 0x7f441b2b1000
.... BX 0x7f441b55b970
.... CX 0x7fff6e2db218
.... DX 0x7fff6e2db218
.... SI 0x7fff6e2db208
.... DI 0x1
.... BP 0x1
.... SP 0x7fff6e2db178
.... IP 0x7f441b2b1e20
.... FLAGS 0x20a
.... CS 0x33
.... SS 0x2b
.... R8 0x1
.... R9 0x7f441b371c18
.... R10 0x7f441b5a5f10
.... R11 0x202
.... R12 0x7fff6e2db208
.... R13 0x7fff6e2db218
.... R14 0x7f441b5a7150
.... R15 0x0
... ustack: size 1024, offset 0x148
. data_src: 0x5080021
... thread: ls:9721
...... dso: /usr/lib64/libpthread-2.29.so
0xad00 [0x60]: event: 10
#
After:
# perf report -D |& grep PERF_RECORD_SAMPLE -A31 | tail -32
67538909824483 0xa7a0 [0x560]: PERF_RECORD_SAMPLE(IP, 0x4002): 9721/9721: 0x7f441b2b1e20 period: 1376095 addr: 0
... FP chain: nr:0
... branch callstack: nr:4
..... 0: 00007f441b2b1e20
..... 1: 00007f441b58af1a
..... 2: 00007f441b58b0e1
..... 3: 00007f441b57c145
... user regs: mask 0xff0fff ABI 64-bit
.... AX 0x7f441b2b1000
.... BX 0x7f441b55b970
.... CX 0x7fff6e2db218
.... DX 0x7fff6e2db218
.... SI 0x7fff6e2db208
.... DI 0x1
.... BP 0x1
.... SP 0x7fff6e2db178
.... IP 0x7f441b2b1e20
.... FLAGS 0x20a
.... CS 0x33
.... SS 0x2b
.... R8 0x1
.... R9 0x7f441b371c18
.... R10 0x7f441b5a5f10
.... R11 0x202
.... R12 0x7fff6e2db208
.... R13 0x7fff6e2db218
.... R14 0x7f441b5a7150
.... R15 0x0
... ustack: size 1024, offset 0x148
. data_src: 0x5080021
... thread: ls:9721
...... dso: /usr/lib64/libpthread-2.29.so
#
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/aa82e5dd-def2-0ca8-a064-db9e2e8ad076@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Enable '-j stack' applicability together with '--call-graph dwarf'
option so thread stack data and LBR call stack could be captured
jointly:
$ perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test
Collected LBR call stack can be used to augment DWARF call stack
calculated from the raw thread stack data and to provide more
comprehensive call stack information for cases when collected SIZE is
not enough to cover complete thread stack.
Such cases are typical for workloads that allocate large arrays of data
on its threads stacks or the possible SIZE to collect can't be large
enough due to workload nature or system configuration and this is where
hardware captured LBR call stacks can provide missing stack frames.
Possible DWARF plus LBR call stacks consolidation algorithm description
follows.
With this patch set perf report command UI currently ignores collected
LBR call stack data and still provides DWARF based call stacks
information.
===========================================================================
Overview:
Legend:
THS - thread stack
CTX - thread register context
SWS - software stack
SSF - skipped stack frames
PSS - Perf sample stack
ip,sp,bp - HW registers values
d - allocated stack regions
kip - ip address in the kernel space
K - captured thread stack size
THS
-----
| |<-stack bottom
...
|---|
|ip4|
|---| PSS = SWS(THS(K))
| |
--> | |
| |d3 | user/
| |---| user PSS kernel PSS
| |ip3| ------ ------
| |---| |SSF | |SSF |
| | | .... ....
| | | ------ ------
| |d2 | | -1 | | -1 |
|---| user ------ ------
K |ip2| CTX |ip3 | |ip3 |
|---| |----| |----|
| |d1 | ... |ip2 | , |ip2 |
| |---| |---| |----| |----|
| |ip1| |bp0| |ip1 | |ip1 |
| |---| |---| |----| |----|
| | | |ip0|->|ip0 | |ip0 |<-user stack top
| | | |---| ------ ------
| | |<-|sp0|<-stack |kip0|<-kernel stack bottom
--> ----- ----- top |----|
|kip1|
|----|
|kip2|
|----|
....
| |<-kernel stack top
------
Algorithm details:
Legend:
HWS - hardware stack
K-SWS - kernel software stack
BRANCH
TABLE
HWS ip ip
from to
------ -----------
|ip7`| |ip7`| |
|----| |----|----|
|ip6`| |ip6`| |
user PSS |----| |----|----|
|ip5`| |ip5`| |
------ |----| |----|----|
| -1 | |ip4`| |ip4`| |
------ |----| |----|----|
|ip3 |~~~|ip3`| |ip3`| |
|----| |----| |----|----|
|ip2 |~~~|ip2`| |ip2`| |
|----| |----| |----|----|
|ip1 |~~~|ip1`| |ip1`|ip0`|
|----| |----| -----------
|ip0 |~~~|ip0`|<---------'
------ ------
1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS
2. ipj` , j=4-7 ===> user PSS
Augmented PSS = A_SWS(SWS(THS(K)), HWS):
user/
user PSS kernel PSS
------ ------
|ip7`| |ip7`|<-user PSS bottom
|----| |----|
|ip6`| |ip6`|
|----| |----|
HWS |ip5`| |ip5`|
|----| |----|
|ip4`| |ip4`|
------ ------
|ip3 | |ip3 |
|----| |----|
SWS |ip2 | |ip2 |
|----| |----|
|ip1 | |ip1 |
|----| |----|
|ip0 | |ip0 |<-user PSS top
------ ------
|kip0|<-kernel PSS bottom
|----|
|kip1|
K-SWS |----|
|kip2|
|----|
|kip3|<-kernel PSS top
------
APSS
Committer testing:
Before:
# perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null
unknown branch filter stack, check man page
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-j, --branch-filter <branch filter mask>
branch stack filter modes
# perf record -g --call-graph dwarf,1024 -j u ls > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.054 MB perf.data (12 samples) ]
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY, sample_regs_user: 0xff0fff, sample_stack_user: 1024
#
After:
# perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (11 samples) ]
[root@quaco ~]# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK, sample_regs_user: 0xff0fff, sample_stack_user: 1024
#
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/e9e00090-66fb-d2a4-c90f-1d12344f7788@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The 'idx' member was added as preparation for AUX area sampling. Add a
comment to describe why.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/83ff264f-84c3-5372-8976-dd9293d20c6f@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So that can update the copy of linux/bits.h that now uses macros defined
in const.h and that are not available in older systems.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-c2qfcbl58hxyfb5u5xivp7is@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The next cset will grap const.h copies from the kernel to keep bits.h
in sync as it started to use linux/const.h, that in turn includes
uapi/linux/const.h.
So now we have a file with the same name in tools/include and
tools/uapi/include, and one includes the other, we need to have
tools/include/uapi/ after tools/include/ for this to work, fix it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-qzjqxa1wdrt51kwadyqawnuj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If dwarf_callchain_users is false, then unwind__prepare_access() will
not set unwind_libunwind_ops so the remaining test here is sufficient.
Signed-off-by: John Keeping <john@metanate.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: john keeping <john@metanate.com>
Link: http://lkml.kernel.org/r/20190815100146.28842-3-john@metanate.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit e5adfc3e7e77 ("perf map: Synthesize maps only for thread group
leader") changed the recording side so that we no longer get mmap events
for threads other than the thread group leader (when synthesising these
events for threads which exist before perf is started).
When a file recorded after this change is loaded, the lack of mmap
records mean that unwinding is not set up for any other threads.
This can be seen in a simple record/report scenario:
perf record --call-graph=dwarf -t $TID
perf report
If $TID is a process ID then the report will show call graphs, but if
$TID is a secondary thread the output is as if --call-graph=none was
specified.
Following the rationale in that commit, move the libunwind fields into
struct map_groups and update the libunwind functions to take this
instead of the struct thread. This is only required for
unwind__finish_access which must now be called from map_groups__delete
and the others are changed for symmetry.
Note that unwind__get_entries keeps the thread argument since it is
required for symbol lookup and the libdw unwind provider uses the thread
ID.
Signed-off-by: John Keeping <john@metanate.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e5adfc3e7e77 ("perf map: Synthesize maps only for thread group leader")
Link: http://lkml.kernel.org/r/20190815100146.28842-2-john@metanate.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In the next commit we will add new fields to map_groups and we need
these to be null if no value is assigned. The simplest way to achieve
this is to request zeroed memory from the allocator.
Signed-off-by: John Keeping <john@metanate.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: john keeping <john@metanate.com>
Link: http://lkml.kernel.org/r/20190815100146.28842-1-john@metanate.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since 'perf top' shares the histogram browser with 'perf report', then
the same explanation in the previous cset applies.
An additional example uses a pair of SDT events available for systemtap:
# perf probe --exec=/usr/bin/stap '%*:*'
Added new events:
sdt_stap:benchmark__thread__start (on %* in /usr/bin/stap)
sdt_stap:benchmark (on %* in /usr/bin/stap)
sdt_stap:benchmark__thread__end (on %* in /usr/bin/stap)
sdt_stap:pass6__start (on %* in /usr/bin/stap)
sdt_stap:pass6__end (on %* in /usr/bin/stap)
sdt_stap:pass5__start (on %* in /usr/bin/stap)
sdt_stap:pass5__end (on %* in /usr/bin/stap)
sdt_stap:pass0__start (on %* in /usr/bin/stap)
sdt_stap:pass0__end (on %* in /usr/bin/stap)
sdt_stap:pass1a__start (on %* in /usr/bin/stap)
sdt_stap:pass1b__start (on %* in /usr/bin/stap)
sdt_stap:pass1__end (on %* in /usr/bin/stap)
sdt_stap:pass2__start (on %* in /usr/bin/stap)
sdt_stap:pass2__end (on %* in /usr/bin/stap)
sdt_stap:pass3__start (on %* in /usr/bin/stap)
sdt_stap:pass3__end (on %* in /usr/bin/stap)
sdt_stap:pass4__start (on %* in /usr/bin/stap)
sdt_stap:pass4__end (on %* in /usr/bin/stap)
sdt_stap:benchmark__start (on %* in /usr/bin/stap)
sdt_stap:benchmark__end (on %* in /usr/bin/stap)
sdt_stap:cache__get (on %* in /usr/bin/stap)
sdt_stap:cache__clean (on %* in /usr/bin/stap)
sdt_stap:cache__add__module (on %* in /usr/bin/stap)
sdt_stap:cache__add__source (on %* in /usr/bin/stap)
sdt_stap:stap_system__complete (on %* in /usr/bin/stap)
sdt_stap:stap_system__start (on %* in /usr/bin/stap)
sdt_stap:stap_system__spawn (on %* in /usr/bin/stap)
sdt_stap:stap_system__fork (on %* in /usr/bin/stap)
sdt_stap:intern_string (on %* in /usr/bin/stap)
sdt_stap:client__start (on %* in /usr/bin/stap)
sdt_stap:client__end (on %* in /usr/bin/stap)
You can now use it in all perf tools, such as:
perf record -e sdt_stap:client__end -aR sleep 1
#
From these we're use the two below to run systemtap's test suite:
# perf record -e sdt_stap:pass2__*,cycles:P make installcheck > /dev/null
^C[ perf record: Woken up 8 times to write data ]
[ perf record: Captured and wrote 2.691 MB perf.data (39638 samples) ]
Terminated
# perf script | grep sdt_stap
stap 28979 [000] 19424.302660: sdt_stap:pass2__start: (561b9a537de3) arg1=140730364262544
stap 28979 [000] 19424.333083: sdt_stap:pass2__end: (561b9a53a9e1) arg1=140730364262544
stap 29045 [006] 19424.933460: sdt_stap:pass2__start: (563edddcede3) arg1=140722674883152
stap 29045 [006] 19424.963794: sdt_stap:pass2__end: (563edddd19e1) arg1=140722674883152
# perf script | grep cycles | wc -l
39634
#
Looking at the whole perf.data file:
[root@quaco testsuite]# perf report | grep cycles:P -A25
# Samples: 39K of event 'cycles:P'
# Event count (approx.): 34044267368
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ................................
#
3.50% cc1 cc1 [.] ht_lookup_with_hash
3.04% cc1 cc1 [.] _cpp_lex_token
2.11% cc1 cc1 [.] ggc_internal_alloc
1.83% cc1 cc1 [.] cpp_get_token_with_location
1.68% cc1 libc-2.29.so [.] _int_malloc
1.41% cc1 cc1 [.] linemap_position_for_column
1.25% cc1 cc1 [.] ggc_internal_cleared_alloc
1.20% cc1 cc1 [.] c_lex_with_flags
1.18% cc1 cc1 [.] get_combined_adhoc_loc
1.05% cc1 libc-2.29.so [.] malloc
1.01% cc1 libc-2.29.so [.] _int_free
0.96% stap stap [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, stringtable_hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > > >
0.78% stap stap [.] lexer::scan
0.74% cc1 cc1 [.] _cpp_lex_direct
0.70% cc1 cc1 [.] pop_scope
0.70% cc1 cc1 [.] c_parser_declspecs
0.69% stap libc-2.29.so [.] _int_malloc
0.68% cc1 cc1 [.] htab_find_slot
0.68% cc1 [kernel.vmlinux] [k] prepare_exit_to_usermode
0.64% cc1 [kernel.vmlinux] [k] clear_page_erms
[root@quaco testsuite]#
And now only what happens in slices demarcated by those start/end SDT
events:
[root@quaco testsuite]# perf report --switch-on=sdt_stap:pass2__start --switch-off=sdt_stap:pass2__end | grep cycles:P -A100
# Samples: 240 of event 'cycles:P'
# Event count (approx.): 206491934
#
# Overhead Command Shared Object Symbol
# ........ ....... ................... ................................................
#
38.99% stap stap [.] systemtap_session::register_library_aliases
19.47% stap stap [.] match_key::operator<
15.01% stap libc-2.29.so [.] __memcmp_avx2_movbe
5.19% stap libc-2.29.so [.] _int_malloc
2.50% stap libstdc++.so.6.0.26 [.] std::_Rb_tree_insert_and_rebalance
2.30% stap stap [.] match_node::build_no_more
2.07% stap libc-2.29.so [.] malloc
1.66% stap stap [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::find
1.66% stap stap [.] match_node::bind
1.58% stap [kernel.vmlinux] [k] prepare_exit_to_usermode
1.17% stap [kernel.vmlinux] [k] native_irq_return_iret
0.87% stap stap [.] 0x0000000000032ec4
0.77% stap libstdc++.so.6.0.26 [.] std::_Rb_tree_increment
0.47% stap stap [.] std::vector<derived_probe_builder*, std::allocator<derived_probe_builder*> >::_M_realloc_insert<derived_probe_builder* const&>
0.47% stap [kernel.vmlinux] [k] get_page_from_freelist
0.47% stap [kernel.vmlinux] [k] swapgs_restore_regs_and_return_to_usermode
0.47% stap [kernel.vmlinux] [k] do_user_addr_fault
0.46% stap [kernel.vmlinux] [k] __pagevec_lru_add_fn
0.46% stap stap [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::_M_emplace_unique<std::pair<match_key, match_node*> >
0.42% stap libstdc++.so.6.0.26 [.] 0x00000000000c18fa
0.40% stap [kernel.vmlinux] [k] interrupt_entry
0.40% stap [kernel.vmlinux] [k] update_load_avg
0.40% stap [kernel.vmlinux] [k] __intel_pmu_disable_all
0.40% stap [kernel.vmlinux] [k] clear_page_erms
0.39% stap [kernel.vmlinux] [k] __mod_node_page_state
0.39% stap [kernel.vmlinux] [k] error_entry
0.39% stap [kernel.vmlinux] [k] sync_regs
0.38% stap [kernel.vmlinux] [k] __handle_mm_fault
0.38% stap stap [.] derive_probes
#
# (Tip: System-wide collection from all CPUs: perf record -a)
#
[root@quaco testsuite]#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-408hvumcnyn93a0auihnawew@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Just like 'perf trace' and 'perf script', should be useful for instance
to only consider samples after the initialization phase of some
workload.
The man page has some examples and considerations about its current
interface, that still doesn't handle the on/off events in a special way,
behaving just like when multiple events are specified, i.e.:
- In non-group mode (when the event list is not enclosed in {}) show a
a menu to allow choosing which event the user wants to see in the
histograms browser
- In group mode, be it using {} or asking for --group, show one column
per event.
Try for instance:
# perf top -e '{cycles,instructions,probe:icmp_rcv}' --switch-on=probe:icmp_rcv
Replace probe:icmp_rcv, that I put in place using:
# perf probe icmp_rcv:59
To hit when broadcast packets arrive, with a probe installed after an
initialization phase is over or after some other point of interest, some
garbage collection, etc, and also use --switch-off, for instance, on a
probe installed after said garbage collection is over.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-c7q7qjeqtyvc9mkeipxza6ne@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Just like with 'perf script':
# perf trace -e sched:*,syscalls:*sleep* sleep 1
0.000 :28345/28345 sched:sched_waking:comm=perf pid=28346 prio=120 target_cpu=005
0.005 :28345/28345 sched:sched_wakeup:perf:28346 [120] success=1 CPU:005
0.383 sleep/28346 sched:sched_process_exec:filename=/usr/bin/sleep pid=28346 old_pid=28346
0.613 sleep/28346 sched:sched_stat_runtime:comm=sleep pid=28346 runtime=607375 [ns] vruntime=23289041218 [ns]
0.689 sleep/28346 syscalls:sys_enter_nanosleep:rqtp: 0x7ffc491789b0
0.693 sleep/28346 sched:sched_stat_runtime:comm=sleep pid=28346 runtime=72021 [ns] vruntime=23289113239 [ns]
0.694 sleep/28346 sched:sched_switch:sleep:28346 [120] S ==> swapper/5:0 [120]
1000.787 :0/0 sched:sched_waking:comm=sleep pid=28346 prio=120 target_cpu=005
1000.824 :0/0 sched:sched_wakeup:sleep:28346 [120] success=1 CPU:005
1000.908 sleep/28346 syscalls:sys_exit_nanosleep:0x0
1001.218 sleep/28346 sched:sched_process_exit:comm=sleep pid=28346 prio=120
# perf trace -e sched:*,syscalls:*sleep* --switch-on=syscalls:sys_enter_nanosleep sleep 1
0.000 sleep/28349 sched:sched_stat_runtime:comm=sleep pid=28349 runtime=603036 [ns] vruntime=23873537697 [ns]
0.001 sleep/28349 sched:sched_switch:sleep:28349 [120] S ==> swapper/4:0 [120]
1000.392 :0/0 sched:sched_waking:comm=sleep pid=28349 prio=120 target_cpu=004
1000.443 :0/0 sched:sched_wakeup:sleep:28349 [120] success=1 CPU:004
1000.540 sleep/28349 syscalls:sys_exit_nanosleep:0x0
1000.852 sleep/28349 sched:sched_process_exit:comm=sleep pid=28349 prio=120
# perf trace -e sched:*,syscalls:*sleep* --switch-on=syscalls:sys_enter_nanosleep --switch-off=syscalls:sys_exit_nanosleep sleep 1
0.000 sleep/28352 sched:sched_stat_runtime:comm=sleep pid=28352 runtime=610543 [ns] vruntime=24811686681 [ns]
0.001 sleep/28352 sched:sched_switch:sleep:28352 [120] S ==> swapper/0:0 [120]
1000.397 :0/0 sched:sched_waking:comm=sleep pid=28352 prio=120 target_cpu=000
1000.440 :0/0 sched:sched_wakeup:sleep:28352 [120] success=1 CPU:000
#
# perf trace -e sched:*,syscalls:*sleep* --switch-on=syscalls:sys_enter_nanosleep --switch-off=syscalls:sys_exit_nanosleep --show-on-off sleep 1
0.000 sleep/28367 syscalls:sys_enter_nanosleep:rqtp: 0x7fffd1a25fc0
0.004 sleep/28367 sched:sched_stat_runtime:comm=sleep pid=28367 runtime=628760 [ns] vruntime=22170052672 [ns]
0.005 sleep/28367 sched:sched_switch:sleep:28367 [120] S ==> swapper/2:0 [120]
1000.367 :0/0 sched:sched_waking:comm=sleep pid=28367 prio=120 target_cpu=002
1000.412 :0/0 sched:sched_wakeup:sleep:28367 [120] success=1 CPU:002
1000.512 sleep/28367 syscalls:sys_exit_nanosleep:0x0
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-t3ngpt1brcc1fm9gep9gxm4q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If the user specifies a on or off switch event and it isn't in the
perf.data file, provide a hint about how to see the events in the
perf.data evlist:
# perf script --switch-on=syscall:sys_enter_nanosleep --switch-off=syscalls:sys_exit_nanosleep
ERROR: event_on event not found (syscall:sys_enter_nanosleep)
HINT: use 'perf evlist' to see the available event names
#
# perf evlist
sched:sched_kthread_stop
sched:sched_kthread_stop_ret
sched:sched_waking
sched:sched_wakeup
sched:sched_wakeup_new
sched:sched_switch
sched:sched_migrate_task
sched:sched_process_free
sched:sched_process_exit
sched:sched_wait_task
sched:sched_process_wait
sched:sched_process_fork
sched:sched_process_exec
sched:sched_stat_wait
sched:sched_stat_sleep
sched:sched_stat_iowait
sched:sched_stat_blocked
sched:sched_stat_runtime
sched:sched_pi_setprio
sched:sched_move_numa
sched:sched_stick_numa
sched:sched_swap_numa
sched:sched_wake_idle_without_ipi
syscalls:sys_enter_clock_nanosleep
syscalls:sys_exit_clock_nanosleep
syscalls:sys_enter_nanosleep
syscalls:sys_exit_nanosleep
# Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
#
# perf script --switch-on=syscalls:sys_enter_nanosleep --switch-off=syscalls:sys_exit_nanosleep
sleep 20919 [001] 109866.144411: sched:sched_stat_runtime: comm=sleep pid=20919 runtime=521249 [ns] vruntime=202919398131 [ns]
sleep 20919 [001] 109866.144412: sched:sched_switch: sleep:20919 [120] S ==> swapper/1:0 [120]
swapper 0 [001] 109867.144568: sched:sched_waking: comm=sleep pid=20919 prio=120 target_cpu=001
swapper 0 [001] 109867.144586: sched:sched_wakeup: sleep:20919 [120] success=1 CPU:001
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-iijjvdlyad973oskdq8gmi5w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Allows adding hints there, will be done in followup patch.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-1kvrdi7weuz3hxycwvarcu6v@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
command line
Another step in having all the boilerplate in just one place to then use
in the other tools.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-snreb1wmwyjei3eefwotxp1l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
All tools will want those, so provide a convenient way to get them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-v16pe3sbf3wjmn152u18f649@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
So that we can have macros for the OPT_ entries and also for finding
those in an evlist, this way other tools will use this very easily.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Link: https://lkml.kernel.org/n/tip-q0og1xoqqi0w38ve5u0a43k2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|