summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2016-09-28perf trace: Beautify sched_[gs]et_attr return valueArnaldo Carvalho de Melo
Both return errno, show the string associated then. More work needed to capture the sched_attr arg to beautify it in turn, probably using BPF. Before: 0.210 ( 0.001 ms): sched_setattr(uattr: 0x7ffc684f02b0) = -22 After the patch, for this sched_attr, all other parms are zero, so not shown: struct sched_attr attr = { .size = sizeof(attr), .sched_policy = SCHED_DEADLINE, .sched_runtime = 10 * USECS_PER_SEC, .sched_period = 30 * USECS_PER_SEC, .sched_deadline = attr.sched_period, }; 0.321 ( 0.002 ms): sched_setattr(uattr: 0x7ffc44116da0) = -1 EINVAL Invalid argument [root@jouet c]# perf trace -e sched_setattr ./sched_deadline Couldn't negotiate deadline: Invalid argument 0.229 ( 0.003 ms): sched_setattr(uattr: 0x7ffd8dcd8df0) = -1 EINVAL Invalid argument [root@jouet c]# Now to figure out the reason for this EINVAL. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-tyot2n7e48zm8pdw8tbcm3sl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-28perf data: Fix building in 32 bit platform with libbabeltraceWang Nan
On ARM32 building it report following error when we build with libbabeltrace: util/data-convert-bt.c: In function 'add_bpf_output_values': util/data-convert-bt.c:440:3: error: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Werror=format] cc1: all warnings being treated as errors Fix it by changing %lu to %zu. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Fixes: 6122d57e9f7c ("perf data: Support converting data from bpf_perf_event_output()") Link: http://lkml.kernel.org/r/1475035126-146587-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-28perf tools: Fix MMAP event synthesis broken by MAP_HUGETLB changeAdrian Hunter
Patch "perf record: Mark MAP_HUGETLB when synthesizing mmap events") breaks MMAP event synthesis. The executable name comparison will match any name if the length is zero, resulting in all the user space maps becoming anonymous. This is particularly noticeable with system-wide traces. Example: perf record -a sleep 1 perf script --show-mmap-events Committer note: That is not the case when, say, one has a qemu instance and libvirt actually mounts hugetlbfs. To test this I had to first umount it: [root@jouet ~]# mount | grep hugetlbfs hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) [root@jouet ~]# After unmount it the error fixed by this patch manifests itself: # perf record -a sleep 1 # perf script --show-mmap-events | grep PERF_RECORD_MMAP2 | head -5 systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x557d47ed8000(0x167000) @ 0 fd:00 3146896 7362875424355726126]: r-xp //anon systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c488d000(0x4000) @ 0 fd:00 3153214 7362875424355726126]: r-xp //anon systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c4a92000(0x3d000) @ 0 fd:00 3159276 7362875424355726126]: r-xp //anon systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c4cd5000(0x15000) @ 0 fd:00 3153725 7362875424355726126]: r-xp //anon systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c4eeb000(0x25000) @ 0 fd:00 3153260 7362875424355726126]: r-xp //anon # Fixed version: # perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.419 MB perf.data (182 samples) ] # perf script --show-mmap-events | grep PERF_RECORD_MMAP2 | head -5 systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x557d47ed8000(0x167000) @ 0 fd:00 3146896 7362875424355726126]: r-xp /usr/lib/systemd/systemd systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c488d000(0x4000) @ 0 fd:00 3153214 7362875424355726126]: r-xp /usr/lib64/libuuid.so.1.3.0 systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c4a92000(0x3d000) @ 0 fd:00 3159276 7362875424355726126]: r-xp /usr/lib64/libblkid.so.1.1.0 systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c4cd5000(0x15000) @ 0 fd:00 3153725 7362875424355726126]: r-xp /usr/lib64/libz.so.1.2.8 systemd 0 [000] 0.000000: PERF_RECORD_MMAP2 1/1: [0x7f96c4eeb000(0x25000) @ 0 fd:00 3153260 7362875424355726126]: r-xp /usr/lib64/liblzma.so.5.2.2 [root@jouet ~]# Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Link: http://lkml.kernel.org/r/1474641528-18776-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-27perf record: Fix documentation 'event_sources' -> 'event_source'Adrian Hunter
Change '/sys/bus/event_sources' to the correct path which is '/sys/bus/event_source'. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Link: http://lkml.kernel.org/r/1474641528-18776-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf hists: Make hists__fprintf_headers function globalJiri Olsa
Will be used from external places in the upcoming c2c patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474558645-19956-10-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf hists: Make __hist_entry__snprintf function globalJiri Olsa
Will be used from external places in the upcoming c2c patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474558645-19956-9-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Make several display functions globalJiri Olsa
Will be used from external places in the upcoming c2c patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474558645-19956-8-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Make several sorting functions globalJiri Olsa
Will be used from external places in the upcoming c2c patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474558645-19956-7-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Make output_field_add and sort_dimension__add globalJiri Olsa
Will be used from external places in the upcoming c2c patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474558645-19956-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Make reset_dimensions globalJiri Olsa
Will be used from external places in the upcoming c2c patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474558645-19956-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf hists: Add __hist_entry__snprintf functionJiri Olsa
Add __hist_entry__snprintf() to take a perf_hpp_list as an argument instead of using he->hists->hpp_list. This way we can display arbitrary list of entries regardless of the hists setup, which will be useful in the upcoming c2c patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474558645-19956-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Add sink configuration for cs_etm PMUMathieu Poirier
Using the PMU::set_drv_config() callback to enable the CoreSight sink that will be used for the trace session. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1474041004-13956-8-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Add PMU configuration to toolsMathieu Poirier
Now that the required mechanic is there to deal with PMU specific configuration, add the functionality to the tools where events can be selected. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1474041004-13956-7-git-send-email-mathieu.poirier@linaro.org [ Fix the build on XSI-compliant systems, using str_error_r() to make sure we return a string, not an integer ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf pmu: Push configuration down to PMU driverMathieu Poirier
This patch adds a PMU callback and the required mechanic so that drivers can process the command line configuration elements found in evsel::config_terms. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1474041004-13956-6-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Add coresight etm PMU record capabilitiesMathieu Poirier
Coresight ETMs are IP blocks used to perform HW assisted tracing on a CPU core. This patch introduce the required auxiliary API functions allowing the perf core to interact with a tracer. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1474041004-13956-4-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Make coresight PMU listableMathieu Poirier
Adding the required mechanic allowing 'perf list pmu' to discover coresight ETM/PTM tracers. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1474041004-13956-3-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-22perf tools: Confine __get_cpuid() to x86 architectureMathieu Poirier
The __get_cpuid() test is only valid when compiling for x86. When compiling for other architectures like ARM/ARM64 the test fails event if the functionality is not required. This patch isolate the build-in feature check to x86 platform, allowing the compilation and usage of PMUs that use the AUXTRACE infrastructure on other architectures (i.e ARM CoreSight). Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1474041004-13956-2-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-21perf hists: Use bigger buffer for stdio headersJiri Olsa
With node column on big CPUs servers we can run out of stdio header space quite soon. Enlarging header buffer. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474290610-23241-5-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-21perf evsel: Remove superfluous initialization of weightJiri Olsa
Removing superfluous initialization of weight, it's already set to 0 via memset. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474290610-23241-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf symbols: Do not open device filesJiri Olsa
The dso__read_binary_type_filename gets the dso's file name to open. We need to check it for regular file before trying to open it, otherwise we might get stuck with device file. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160920161245.GA8995@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf hists: Factor out hists__reset_column_width()Namhyung Kim
The stdio and tui has same code to reset hpp format column width. Factor it out as a new function. Suggested-and-Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160920053025.13989-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf ui/tui: Reset output width for hierarchyNamhyung Kim
When --hierarchy option is used, each entry has its own hpp_list to show the result. But it missed to update width of each column. Before: - 46.29% 48.12% netctl-auto + 31.44% 29.25% [kernel.vmlinux] + 8.52% 11.55% libc-2.22.so + 5.19% 6.91% bash + 10.75% 11.83% wpa_cli + 8.25% 2.23% swapper + 6.45% 5.40% tr + 4.81% 8.09% awk + 4.15% 2.85% firefox + 3.86% 2.53% sh After: - 46.29% 48.12% netctl-auto + 31.44% 29.25% [kernel.vmlinux] + 8.52% 11.55% libc-2.22.so + 5.19% 6.91% bash + 10.75% 11.83% wpa_cli + 8.25% 2.23% swapper + 6.45% 5.40% tr + 4.81% 8.09% awk + 4.15% 2.85% firefox + 3.86% 2.53% sh Committer note: Full testing instructions: 1) Record with an event group: $ perf record -e '{cycles,instructions}' make -j4 2) Use report in hierarchy mode, to get a few expanded trees on the same screen, use --percent-limit: $ perf report --hierarchy --percent-limit 0.5 Samples: 103K of event 'anon group { cycles:u, instructions:u }', Event count (approx.): 57317631725 Overhead Command / Shared Object / Symbol ◆ - 58.89% 55.12% cc1 ▒ - 50.26% 48.10% cc1 ▒ 3.61% 5.13% [.] _cpp_lex_token ▒ 2.58% 0.78% [.] ht_lookup_with_hash ▒ 1.31% 1.30% [.] ggc_internal_alloc ▒ 1.08% 2.25% [.] get_combined_adhoc_loc ▒ 1.01% 1.95% [.] ira_init ▒ 0.96% 1.78% [.] linemap_position_for_column ▒ 0.65% 1.01% [.] cpp_get_token_with_location ▒ - 7.52% 6.58% libc-2.23.so ▒ 1.70% 1.78% [.] _int_malloc ▒ 0.69% 0.75% [.] _int_free ▒ 0.67% 0.42% [.] malloc_consolidate ▒ - 0.58% 0.42% ld-2.23.so ▒ no entry >= 0.50% ▒ - 0.52% 0.03% [kernel.vmlinux] ▒ no entry >= 0.50% ▒ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 1b2dbbf41a0f ("perf hists: Use own hpp_list for hierarchy mode") Link: http://lkml.kernel.org/r/20160920053025.13989-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf annotate: Resolve 'call' operands to function namesArnaldo Carvalho de Melo
Before this patch the '_raw_spin_lock_irqsave' and 'update_rq_clock' operands were appearing just as hexadecimal numbers: update_blocked_averages /proc/kcore │ push %r12 │ push %rbx │ and $0xfffffffffffffff0,%rsp │ sub $0x40,%rsp │ add -0x662cac00(,%rdi,8),%rax │ mov %rax,%rbx │ mov %rax,%rdi │ mov %rax,0x38(%rsp) │ → callq _raw_spin_lock_irqsave │ mov %rbx,%rdi │ mov %rax,0x30(%rsp) │ → callq update_rq_clock │ mov 0x8d0(%rbx),%rax │ lea 0x8d0(%rbx),%r11 To check that all is right one can always use the 'o' hotkey and see the original objdump -dS output, that for this case is: update_blocked_averages /proc/kcore │ffffffff990d5489: push %r12 │ffffffff990d548b: push %rbx │ffffffff990d548c: and $0xfffffffffffffff0,%rsp │ffffffff990d5490: sub $0x40,%rsp │ffffffff990d5494: add -0x662cac00(,%rdi,8),%rax │ffffffff990d549c: mov %rax,%rbx │ffffffff990d549f: mov %rax,%rdi │ffffffff990d54a2: mov %rax,0x38(%rsp) │ffffffff990d54a7: → callq 0xffffffff997eb7a0 │ffffffff990d54ac: mov %rbx,%rdi │ffffffff990d54af: mov %rax,0x30(%rsp) │ffffffff990d54b4: → callq 0xffffffff990c7720 │ffffffff990d54b9: mov 0x8d0(%rbx),%rax │ffffffff990d54c0: lea 0x8d0(%rbx),%r11 Use the 'h' hotkey to see a list of available hotkeys. More work needed to cover operands for other instructions, such as 'mov', that can resolve variable names, etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-xqgtw9mzmzcjgwkis9kiiv1p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf annotate: Pass the symbol's map/dso to the instruction parsersArnaldo Carvalho de Melo
So that things like: → callq 0xffffffff993e3230 found while disassembling /proc/kcore can be beautified by later patches, that will resolve that address to a function, looking it up in /proc/kallsyms. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-p76myuke4j7gplg54amaklxk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf annotate: Do not ignore call instruction with indirect targetRavi Bangoria
Do not ignore call instruction with indirect target when its already identified as a call. This is an extension of commit e8ea1561952b ("perf annotate: Use raw form for register indirect call instructions") to generalize annotation for all instructions with indirect calls. This is needed for certain powerpc call instructions that use address in a register (such as bctrl, btarl, ...). Apart from that, when kcore is used to disassemble function, all call instructions were ignored. This patch will fix it as a side effect by not ignoring them. For example, Before (with kcore): mov %r13,%rdi callq 0xffffffff811a7e70 ^ jmpq 64 mov %gs:0x7ef41a6e(%rip),%al After (with kcore): mov %r13,%rdi > callq 0xffffffff811a7e70 ^ jmpq 64 mov %gs:0x7ef41a6e(%rip),%al Suggested-by: Michael Ellerman <mpe@ellerman.id.au> [Suggested about 'bctrl' instruction] Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/1471611578-11255-5-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf hists: Fix width computation for srcline sort entryJiri Olsa
Adding header size to width computation for srcline sort entry, because it's possible to get empty data with ':0' which set width of 2 which is lower than width needed to display column header. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474290610-23241-62-git-send-email-jolsa@kernel.org [ Added declaration to sort.h ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-19perf trace beauty mmap: Add missing MADV_FREEWang Nan
tools/perf/trace/beauty/mmap.c forgets to check MADV_FREE. This patch fixes it. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1473850649-83389-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf tools: Add infrastructure for PMU specific configurationMathieu Poirier
This patch adds PMU driver specific configuration to the parser infrastructure by preceding any term with the '@' letter. As such doing something like: perf record -e some_event/@cfg1,@cfg2=config/ ... will see 'cfg1' and 'cfg2=config' being added to the list of evsel config terms. Token 'cfg1' and 'cfg2=config' are not processed in user space and are meant to be interpreted by the PMU driver. First the lexer/parser are supplemented with the required definitions to recognise the driver specific configuration. From there they are simply added to the list of event terms. The bulk of the work is done in function "parse_events_add_pmu()" where driver config event terms are added to a new list of driver config terms, which in turn spliced with the event's new driver configuration list. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1473179837-3293-4-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf report: Enable group view with hierarchyNamhyung Kim
Now that all the missing pieces are implemented, let's enable it. An example output below: $ perf record -e '{cycles,instructions}' make $ perf report --hierarchy --stdio ... # Overhead Command / Shared Object / Symbol # ...................... .................................. # ... 25.74% 27.18% sh 19.96% 24.14% libc-2.24.so 9.55% 14.64% [.] __strcmp_sse2 1.54% 0.00% [.] __tfind 1.07% 1.13% [.] _int_malloc 0.95% 0.00% [.] __strchr_sse2 0.89% 1.39% [.] __tsearch 0.76% 0.00% [.] strlen ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Requested-by: Andi Kleen <andi@firstfloor.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf ui/stdio: Rename print_hierarchy_header()Namhyung Kim
Now the hists__fprintf_hierarchy_headers() is a simple wrapper passing field separator. Let's do it directly. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf ui/stdio: Always reset output width for hierarchyNamhyung Kim
When the --hierarchy option is used, each entry has its own hpp_list to show the result. But it is not updating the width of each column for perf-top. The perf-report command has no problem since it resets it during header display. $ sudo perf top --hierarchy --stdio PerfTop: 160 irqs/sec kernel:38.8% exact: 100.0% [4000Hz cycles:pp], (all, 12 CPUs) ---------------------------------------------------------------------- 52.32% perf 24.74% [.] __symbols__insert 5.62% [.] rb_next 5.14% [.] dso__load_sym Move the code into hists__fprintf() so that it can be called always. Also it'd be better to put similar code together. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 1b2dbbf41a0f ("perf hists: Use own hpp_list for hierarchy mode") Link: http://lkml.kernel.org/r/20160913074552.13284-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf hist: Initialize hierarchy tree explicitlyNamhyung Kim
The hroot_in and hroot_out are roots of hierarchy trees of hist entries. But when a hist entry is initialized by copying existing template entry, it sometimes has non-empty tree and copies it incorrectly. This is a problem especially when an event group is used since it creates dummy entries from already-processed entries in other event members. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf hists: Introduce hists__link_hierarchy()Namhyung Kim
The hists__link_hierarchy() is to support hierarchy reports with an event group. When it matches the leader event and the other members (using hists__match_hierarchy()), it also needs to link unmatched member entries with a dummy leader event so that it can show up in the output. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf hists: Introduce hists__match_hierarchy()Namhyung Kim
The hists__match_hierarchy() is to find matching hist entries in a group. A matching entry has the same values for all sort keys given. With an event group (e.g.: -e "{cycles,instructions}"), a leader event should show other members in a group. So each entry in the leader should be able to find its pair entries which have same values. With hierarchy mode, it needs to search all matching children in a hierarchy. An example output looks like: # Overhead Command / Shared Object / Symbol # ...................... .................................. # 25.74% 27.18% sh 19.96% 24.14% libc-2.24.so 9.55% 14.64% [.] __strcmp_sse2 1.54% 0.00% [.] __tfind 1.07% 1.13% [.] _int_malloc ... In the above example, two overheads are shown - one for the leader and another for the other group member. They were matched since their command, dso and symbol have the same values. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf build: Compare mman.h related headers against kernel originalsWang Nan
As with other cloned headers, compare the newly introduced mman related headers against their source copy in kernel tree. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1473684871-209320-4-git-send-email-wangnan0@huawei.com [ Added -I to ignore the uapi/ difference ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf tools: Do hugetlb handling in more systemsArnaldo Carvalho de Melo
The csets: 0ac3348e5024 ("perf tools: Recognize hugetlb mapping as anon mapping") d7e404af115b ("perf record: Mark MAP_HUGETLB when synthesizing mmap events") Added code conditional on MAP_HUGETLB, to make it build in older systems where that define wasn't available. Now that we grabbed copies of uapi/linux/mmap.h to have all those definitions in tools/, use it so that we can support building the tools for older systems (without the MAP_HUGETLB define in its libc headers) using new kernels that support such maps. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/n/tip-wv6oqbfkpxbix4umj2kcfmaz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf trace beauty mmap: Fix defines for non !x86_64Arnaldo Carvalho de Melo
Several defines have different values in different arches, so we can't just define it to the x86_64 value, use uapi/linux/mmap.h that was recently introduced to reliably find those, not using possibly outdated libc headers. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/n/tip-4eajp5yp8i2fuw44n7jmcg5t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13tools include: Add uapi mman.h for each architectureWang Nan
Some mmap related macros have different values for different architectures. This patch introduces uapi mman.h for each architectures. Three headers are cloned from kernel include to tools/include: tools/include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman.h tools/include/uapi/linux/mman.h The main part of this patch is generated by following script: macros=`cat $0 | awk 'V==1 {print}; /^# start macro list/ {V=1}'` for arch in `ls tools/arch` do [ -d tools/arch/$arch/include/uapi/asm ] || mkdir -p tools/arch/$arch/include/uapi/asm src=arch/$arch/include/uapi/asm/mman.h target=tools/arch/$arch/include/uapi/asm/mman.h guard="TOOLS_ARCH_"`echo $arch | awk '{print toupper($0)}'`_UAPI_ASM_MMAN_FIX_H echo '#ifndef '$guard > $target echo '#define '$guard >> $target [ -f $src ] && for m in $macros do if grep '#define[ \t]*'$m $src > /dev/null 2>&1 then grep -h '#define[ \t]*'$m $src | sed 's/[ \t]*\/\*.*$//g' >> $target fi done if [ -f $src ] then grep '#include <asm-generic' $src >> $target else echo "#include <asm-generic/mman.h>" >> $target fi echo '#endif' >> $target echo "$target" done exit 0 # Following macros are extracted from: # tools/perf/trace/beauty/mmap.c # # start macro list MADV_DODUMP MADV_DOFORK MADV_DONTDUMP MADV_DONTFORK MADV_DONTNEED MADV_HUGEPAGE MADV_HWPOISON MADV_MERGEABLE MADV_NOHUGEPAGE MADV_NORMAL MADV_RANDOM MADV_REMOVE MADV_SEQUENTIAL MADV_SOFT_OFFLINE MADV_UNMERGEABLE MADV_WILLNEED MAP_32BIT MAP_ANONYMOUS MAP_DENYWRITE MAP_EXECUTABLE MAP_FILE MAP_FIXED MAP_GROWSDOWN MAP_HUGETLB MAP_LOCKED MAP_NONBLOCK MAP_NORESERVE MAP_POPULATE MAP_PRIVATE MAP_SHARED MAP_STACK MAP_UNINITIALIZED MREMAP_FIXED MREMAP_MAYMOVE PROT_EXEC PROT_GROWSDOWN PROT_GROWSUP PROT_NONE PROT_READ PROT_SEM PROT_WRITE Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1473684871-209320-2-git-send-email-wangnan0@huawei.com [ Added new files to tools/perf/MANIFEST to fix the detached tarball build, add mman.h for ARC ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-12perf hists browser: Fix event group displayNamhyung Kim
Milian reported that the event group on TUI shows duplicated overhead. This was due to a bug on calculating hpp->buf position. The hpp_advance() was called from __hpp__slsmg_color_printf() on TUI but it's already called from the hpp__call_print_fn macro in __hpp__fmt(). The end result is that the print function returns number of bytes it printed but the buffer advanced twice of the length. This is generally not a problem since it doesn't need to access the buffer again. But with event group, overhead needs to be printed multiple times and hist_entry__snprintf_alignment() tries to fill the space with buffer after it printed. So it (brokenly) showed the last overhead again. The bug was there from the beginning, but I think it's only revealed when the alignment function was added. Reported-by: Milian Wolff <milian.wolff@kdab.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 89fee7094323 ("perf hists: Do column alignment on the format iterator") Link: http://lkml.kernel.org/r/20160912061958.16656-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-12perf probe: Fix dwarf regs table for x86_64Arnaldo Carvalho de Melo
In 293d5b439483 ("perf probe: Support probing on offline cross-arch binary") DWARF register tables were introduced for many architectures, with the one for the "dx" register being broken for x86_64, which got noticed by the 'perf test bpf' testcase, that has this difference from a successful run to one that fails, with the aforementioned patch: -Writing event: p:perf_bpf_probe/func _text+5197232 f_mode=+68(%di):x32 offset=%si:s64 orig=dx:s32 -Failed to write event: Invalid argument -bpf_probe: failed to apply perf probe eventsFailed to add events selected by BPF +Writing event: p:perf_bpf_probe/func _text+5197232 f_mode=+68(%di):x32 offset=%si:s64 orig=%dx:s32 Add the missing '%' to '%dx' to fix this. Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 293d5b439483 ("perf probe: Support probing on offline cross-arch binary") Link: https://lkml.kernel.org/r/20160909145955.GC32585@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-08perf powerpc: Fix build-test failureRavi Bangoria
'make -C tools/perf build-test' is failing with below log for poewrpc. In file included from /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf.h:15:0, from util/cpumap.h:8, from util/env.c:1: /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf-sys.h:23:56: fatal error: ../../arch/powerpc/include/uapi/asm/unistd.h: No such file or directory compilation terminated. I bisected it and found it's failing from commit ad430729ae00 ("Remove: kernel unistd*h files from perf's MANIFEST, not used"). Header file '../../arch/powerpc/include/uapi/asm/unistd.h' is included only for powerpc in tools/perf/perf-sys.h. By looking closly at commit history, I found little weird thing: Commit f2d9cae9ea9e ("perf powerpc: Use uapi/unistd.h to fix build error") replaced 'asm/unistd.h' with 'uapi/asm/unistd.h' Commit d2709c7ce4c5 ("perf: Make perf build for x86 with UAPI disintegration applied") removes all arch specific 'uapi/asm/unistd.h' for all archs and adds generic <asm/unistd.h>. Commit f0b9abfb0446 ("Merge branch 'linus' into perf/core") again includes 'uapi/asm/unistd.h' for powerpc. Don't know how exactly this happened as this change is not part of commit also. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1472630591-5089-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Fixes: ad430729ae00 ("Remove: kernel unistd*h files from perf's MANIFEST, not used") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-08perf pmu: Support alternative sysfs cpumaskMark Rutland
The perf tools can read a cpumask file for a PMU, describing a subset of CPUs which that PMU covers. So far this has only been used to cater for uncore PMUs, which in practice happen to only have a single CPU described in the mask. Until recently, the perf tools only correctly handled cpumask containing a single CPU, and only when monitoring in system-wide mode. For example, prior to commit 00e727bb389359c8 ("perf stat: Balance opening and reading events"), a mask with more than a single CPU could cause perf stat to hang. When a CPU PMU covers a subset of CPUs, but lacks a cpumask, perf record will fail to open events (on the cores the PMU does not support), and gives up. For systems with heterogeneous CPUs such as ARM big.LITTLE systems, this presents a problem. We have a PMU for each microarchitecture (e.g. a big PMU and a little PMU), and would like to expose a cpumask for each (so as to allow perf record and other tools to do the right thing). However, doing so kernel-side will cause old perf binaries to not function (e.g. hitting the issue solved by 00e727bb389359c8), and thus commits the cardinal sin of breaking (existing) userspace. To address this chicken-and-egg problem, this patch adds support got a new file, cpus, which is largely identical to the existing cpumask file. A kernel can expose this file, knowing that new perf binaries will correctly support it, while old perf binaries will not look for it (and thus will not be broken). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1473330112-28528-8-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-08perf evlist: Only open events on CPUs an evsel permitsMark Rutland
In systems with heterogeneous CPU PMUs, it's possible for each evsel to cover a distinct set of CPUs, and hence the cpu_map associated with each evsel may have a distinct idx<->id mapping. Any of these may be distinct from the evlist's cpu map. Events can be tied to the same fd so long as they use the same per-cpu ringbuffer (i.e. so long as they are on the same CPU). To acquire the correct FDs, we must compare the Linux logical IDs rather than the evsel or evlist indices. This path adds logic to perf_evlist__mmap_per_evsel to handle this, translating IDs as required. As PMUs may cover a subset of CPUs from the evlist, we skip the CPUs a PMU cannot handle. Without this patch, perf record may try to mmap erroneous FDs on heterogeneous systems, and will bail out early rather than running the workload. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1473330112-28528-7-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-08perf annotate: Add branch stack / basic blockPeter Zijlstra
I wanted to know the hottest path through a function and figured the branch-stack (LBR) information should be able to help out with that. The below uses the branch-stack to create basic blocks and generate statistics from them. from to branch_i * ----> * | | block v * ----> * from to branch_i+1 The blocks are broken down into non-overlapping ranges, while tracking if the start of each range is an entry point and/or the end of a range is a branch. Each block iterates all ranges it covers (while splitting where required to exactly match the block) and increments the 'coverage' count. For the range including the branch we increment the taken counter, as well as the pred counter if flags.predicted. Using these number we can find if an instruction: - had coverage; given by: br->coverage / br->sym->max_coverage This metric ensures each symbol has a 100% spot, which reflects the observation that each symbol must have a most covered/hottest block. - is a branch target: br->is_target && br->start == add - for targets, how much of a branch's coverages comes from it: target->entry / branch->coverage - is a branch: br->is_branch && br->end == addr - for branches, how often it was taken: br->taken / br->coverage after all, all execution that didn't take the branch would have incremented the coverage and continued onward to a later branch. - for branches, how often it was predicted: br->pred / br->taken The coverage percentage is used to color the address and asm sections; for low (<1%) coverage we use NORMAL (uncolored), indicating that these instructions are not 'important'. For high coverage (>75%) we color the address RED. For each branch, we add an asm comment after the instruction with information on how often it was taken and predicted. Output looks like (sans color, which does loose a lot of the information :/) $ perf record --branch-filter u,any -e cycles:p ./branches 27 $ perf annotate branches Percent | Source code & Disassembly of branches for cycles:pu (217 samples) --------------------------------------------------------------------------------- : branches(): 0.00 : 40057a: push %rbp 0.00 : 40057b: mov %rsp,%rbp 0.00 : 40057e: sub $0x20,%rsp 0.00 : 400582: mov %rdi,-0x18(%rbp) 0.00 : 400586: mov %rsi,-0x20(%rbp) 0.00 : 40058a: mov -0x18(%rbp),%rax 0.00 : 40058e: mov %rax,-0x10(%rbp) 0.00 : 400592: movq $0x0,-0x8(%rbp) 0.00 : 40059a: jmpq 400656 <branches+0xdc> 1.84 : 40059f: mov -0x10(%rbp),%rax # +100.00% 3.23 : 4005a3: and $0x1,%eax 1.84 : 4005a6: test %rax,%rax 0.00 : 4005a9: je 4005bf <branches+0x45> # -54.50% (p:42.00%) 0.46 : 4005ab: mov 0x200bbe(%rip),%rax # 601170 <acc> 12.90 : 4005b2: add $0x1,%rax 2.30 : 4005b6: mov %rax,0x200bb3(%rip) # 601170 <acc> 0.46 : 4005bd: jmp 4005d1 <branches+0x57> # -100.00% (p:100.00%) 0.92 : 4005bf: mov 0x200baa(%rip),%rax # 601170 <acc> # +49.54% 13.82 : 4005c6: sub $0x1,%rax 0.46 : 4005ca: mov %rax,0x200b9f(%rip) # 601170 <acc> 2.30 : 4005d1: mov -0x10(%rbp),%rax # +50.46% 0.46 : 4005d5: mov %rax,%rdi 0.46 : 4005d8: callq 400526 <lfsr> # -100.00% (p:100.00%) 0.00 : 4005dd: mov %rax,-0x10(%rbp) # +100.00% 0.92 : 4005e1: mov -0x18(%rbp),%rax 0.00 : 4005e5: and $0x1,%eax 0.00 : 4005e8: test %rax,%rax 0.00 : 4005eb: je 4005ff <branches+0x85> # -100.00% (p:100.00%) 0.00 : 4005ed: mov 0x200b7c(%rip),%rax # 601170 <acc> 0.00 : 4005f4: shr $0x2,%rax 0.00 : 4005f8: mov %rax,0x200b71(%rip) # 601170 <acc> 0.00 : 4005ff: mov -0x10(%rbp),%rax # +100.00% 7.37 : 400603: and $0x1,%eax 3.69 : 400606: test %rax,%rax 0.00 : 400609: jne 400612 <branches+0x98> # -59.25% (p:42.99%) 1.84 : 40060b: mov $0x1,%eax 14.29 : 400610: jmp 400617 <branches+0x9d> # -100.00% (p:100.00%) 1.38 : 400612: mov $0x0,%eax # +57.65% 10.14 : 400617: test %al,%al # +42.35% 0.00 : 400619: je 40062f <branches+0xb5> # -57.65% (p:100.00%) 0.46 : 40061b: mov 0x200b4e(%rip),%rax # 601170 <acc> 2.76 : 400622: sub $0x1,%rax 0.00 : 400626: mov %rax,0x200b43(%rip) # 601170 <acc> 0.46 : 40062d: jmp 400641 <branches+0xc7> # -100.00% (p:100.00%) 0.92 : 40062f: mov 0x200b3a(%rip),%rax # 601170 <acc> # +56.13% 2.30 : 400636: add $0x1,%rax 0.92 : 40063a: mov %rax,0x200b2f(%rip) # 601170 <acc> 0.92 : 400641: mov -0x10(%rbp),%rax # +43.87% 2.30 : 400645: mov %rax,%rdi 0.00 : 400648: callq 400526 <lfsr> # -100.00% (p:100.00%) 0.00 : 40064d: mov %rax,-0x10(%rbp) # +100.00% 1.84 : 400651: addq $0x1,-0x8(%rbp) 0.92 : 400656: mov -0x8(%rbp),%rax 5.07 : 40065a: cmp -0x20(%rbp),%rax 0.00 : 40065e: jb 40059f <branches+0x25> # -100.00% (p:100.00%) 0.00 : 400664: nop 0.00 : 400665: leaveq 0.00 : 400666: retq (Note: the --branch-filter u,any was used to avoid spurious target and branch points due to interrupts/faults, they show up as very small -/+ annotations on 'weird' locations) Committer note: Please take a look at: http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png To see the colors. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> [ Moved sym->max_coverage to 'struct annotate', aka symbol__annotate(sym) ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-08perf record: Mark MAP_HUGETLB when synthesizing mmap eventsWang Nan
When synthesizing mmap events, add MAP_HUGETLB map flag if the source of mapping is file in hugetlbfs. After this patch, perf can identify hugetlb mapping even if perf is started after the mapping of huge pages (like with 'perf top'). Signed-off-by: Wang Nan <wangnan0@huawei.com> Reviewed-by: Nilay Vaish <nilayvaish@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Hou Pengyang <houpengyang@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1473137909-142064-4-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-08perf tools: Recognize hugetlb mapping as anon mappingWang Nan
Hugetlbfs mapping should be recognized as anon mapping so user has a chance to create /tmp/perf-<pid>.map file for symbol resolving. This patch utilizes MAP_HUGETLB to identify hugetlb mapping. After this patch, if perf is started before a program starts using huge pages (so perf gets MMAP2 events from kernel), perf is able to recognize hugetlb mapping as anon mapping. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: He Kuang <hekuang@huawei.com> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1473137909-142064-2-git-send-email-wangnan0@huawei.com Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-05perf symbols: Remove symbol_filter_t machineryArnaldo Carvalho de Melo
We're not using it anymore, few users were, but we really could do without it, simplify lots of functions by removing it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1zng8wdznn00iiz08bb7q3vn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-05perf test vmlinux: Remove dead symbol_filter_t codeArnaldo Carvalho de Melo
We don't need to initialize that area as we're not using it afterwards, leftover, ditch it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-jb2un8buy4rqawz73mcdm1sn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-05perf machine: Remove machine->symbol_filter and friendsArnaldo Carvalho de Melo
Including machines__set_symbol_filter(), not used anymore. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7o1qgmrpvzuis4a9f0t8mnri@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-05perf top: Remove old kernel-only symbol filterArnaldo Carvalho de Melo
Not needed, we already have code to prune aliases. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1ysyce7qjgui93gi1efbjwhf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>