summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2017-10-25perf report: Cache srclines for callchain nodesMilian Wolff
On one hand this ensures that the memory is properly freed when the DSO gets freed. On the other hand this significantly speeds up the processing of the callchain nodes when lots of srclines are requested. For one of my data files e.g.: Before: Performance counter stats for 'perf report -s srcline -g srcline --stdio': 52496.495043 task-clock (msec) # 0.999 CPUs utilized 634 context-switches # 0.012 K/sec 2 cpu-migrations # 0.000 K/sec 191,561 page-faults # 0.004 M/sec 165,074,498,235 cycles # 3.144 GHz 334,170,832,408 instructions # 2.02 insn per cycle 90,220,029,745 branches # 1718.591 M/sec 654,525,177 branch-misses # 0.73% of all branches 52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks! After: Performance counter stats for 'perf report -s srcline -g srcline --stdio': 22606.323706 task-clock (msec) # 1.000 CPUs utilized 31 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 185,471 page-faults # 0.008 M/sec 71,188,113,681 cycles # 3.149 GHz 133,204,943,083 instructions # 1.87 insn per cycle 34,886,384,979 branches # 1543.214 M/sec 278,214,495 branch-misses # 0.80% of all branches 22.609857253 seconds time elapsed Note that the difference is only this large when `--inline` is not passed. In such situations, we would use the inliner cache and thus do not run this code path that often. I think that this cache should actually be used in other places, too. When looking at the valgrind leak report for perf report, we see tons of srclines being leaked, most notably from calls to hist_entry__get_srcline. The problem is that get_srcline has many different formatting options (show_sym, show_addr, potentially even unwind_inlines when calling __get_srcline directly). As such, the srcline cannot easily be cached for all calls, or we'd have to add caches for all formatting combinations (6 so far). An alternative would be to remove the formatting options and handle that on a different level - i.e. print the sym/addr on demand wherever we actually output something. And the unwind_inlines could be moved into a separate function that does not return the srcline. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Cache failed lookups of inlined framesMilian Wolff
When no inlined frames could be found for a given address, we did not store this information anywhere. That means we potentially do the costly inliner lookup repeatedly for cases where we know it can never succeed. This patch makes dso__parse_addr_inlines always return a valid inline_node. It will be empty when no inliners are found. This enables us to cache the empty list in the DSO, thereby improving the performance when many addresses fail to find the inliners. For my trivial example, the performance impact is already quite significant: Before: ~~~~~ Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs): 594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 53 context-switches # 0.089 K/sec ( +- 4.09% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 5,687 page-faults # 0.010 M/sec ( +- 0.02% ) 2,300,918,213 cycles # 3.868 GHz ( +- 0.09% ) 4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% ) 939,177,205 branches # 1578.969 M/sec ( +- 0.00% ) 11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% ) 0.596246531 seconds time elapsed ( +- 0.07% ) ~~~~~ After: ~~~~~ Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs): 113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% ) 29 context-switches # 0.255 K/sec ( +- 54.25% ) 0 cpu-migrations # 0.000 K/sec 5,380 page-faults # 0.048 M/sec ( +- 0.01% ) 432,378,779 cycles # 3.823 GHz ( +- 0.75% ) 670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% ) 141,001,247 branches # 1246.570 M/sec ( +- 0.01% ) 2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% ) 0.114222393 seconds time elapsed ( +- 1.19% ) ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Properly handle branch count in match_chain()Milian Wolff
Some of the code paths I introduced before returned too early without running the code to handle a node's branch count. By refactoring match_chain to only have one exit point, this can be remedied. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Compare symbol name for inlined frames when sortingMilian Wolff
Similar to the callstack frame matching, we also have to compare the symbol name when sorting hist entries. The reason is twofold: On one hand, multiple inlined functions will use the same symbol start/end values of the parent, non-inlined symbol. As such, all of these symbols often end up missing from top-level report, as they get merged with the non-inlined frame. On the other hand, multiple different functions may end up inlining the same function, and we need to aggregate these values properly. Before: ~~~~~ perf report --stdio --inline -g none # Children Self Command Shared Object Symbol # ........ ........ ............ ............. ................................... # 100.00% 39.69% cpp-inlining cpp-inlining [.] main 100.00% 0.00% cpp-inlining cpp-inlining [.] _start 100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main 97.03% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined) 59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot 55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite 0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs ~~~~~ After: ~~~~~ perf report --stdio --inline -g none # Children Self Command Shared Object Symbol # ........ ........ ............ ............. ................................................................................................................................... # 100.00% 39.69% cpp-inlining cpp-inlining [.] main 100.00% 0.00% cpp-inlining cpp-inlining [.] _start 100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main 62.57% 0.00% cpp-inlining cpp-inlining [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::__complex_abs (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::abs<double> (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined) 59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot 55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite 34.46% 0.00% cpp-inlining cpp-inlining [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) 32.39% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) 32.39% 0.00% cpp-inlining cpp-inlining [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) 0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Compare symbol name for inlined frames when matchingMilian Wolff
The fake symbols we create for inlined frames will represent different functions but can use the symbol start address. This leads to issues when different inline branches all lead to the same function. Before: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | --37.57%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) | --36.36%--std::abs<double> (inlined) std::__complex_abs (inlined) | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~~~~~ Note that this backtrace representation is completely bogus. Complex abs does not call the linear congruential engine! It is just a side-effect of a longer inlined stack being appended to a shorter, different inlined stack, both of which originate in the same function (main). This patch fixes the issue: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) | std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) | std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) | std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) | --1.99%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) std::abs<double> (inlined) std::__complex_abs (inlined) ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> [ Fix up conflict with c1fbc0cf81f1 ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf script: Mark inlined frames and do not print DSO for themMilian Wolff
Instead of showing the (repeated) DSO name of the non-inlined frame, we now show the "(inlined)" suffix instead. Before: 214f7 __hypot_finite (/usr/lib/libm-2.25.so) ace3 hypot (/usr/lib/libm-2.25.so) a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining) a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining) a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining) a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining) a4a main (/home/milian/projects/src/perf-tests/inlining) 20510 __libc_start_main (/usr/lib/libc-2.25.so) bd9 _start (/home/milian/projects/src/perf-tests/inlining) After: 214f7 __hypot_finite (/usr/lib/libm-2.25.so) ace3 hypot (/usr/lib/libm-2.25.so) a4a std::__complex_abs (inlined) a4a std::abs<double> (inlined) a4a std::_Norm_helper<true>::_S_do_it<double> (inlined) a4a std::norm<double> (inlined) a4a main (/home/milian/projects/src/perf-tests/inlining) 20510 __libc_start_main (/usr/lib/libc-2.25.so) bd9 _start (/home/milian/projects/src/perf-tests/inlining) Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Mark inlined frames in output by " (inlined)" suffixMilian Wolff
The original patch that introduced inline frame output in the various browsers used this suffix already. The new centralized approach that uses fake symbols for inlined frames was missing this approach so far. Instead of changing the symbol name itself, we only print the suffix where needed. This allows us to efficiently lookup the symbol for a given name without first having to append the suffix before the lookup. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Fall-back to function name comparison for -g srclineMilian Wolff
When a callchain entry has no srcline available, we ended up comparing the instruction pointer. I consider this to be not too useful. Rather, I think we should group the entries by function name, which this patch adds. For people who want to split the data on the IP boundary, using `-g address` is the correct choice. Before: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--56.03%--hypot | | | | | |--8.45%--__hypot_finite | | | | | |--7.62%--__hypot_finite | | | | | |--2.29%--__hypot_finite | | | | | |--2.24%--__hypot_finite | | | | | |--2.06%--__hypot_finite | | | | | |--1.81%--__hypot_finite ... ~~~~~ After: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--60.29%--hypot | | | | | --56.03%--__hypot_finite | | | --0.85%--cabs ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Create real callchain entries for inlined framesMilian Wolff
The inline_node structs are maintained by the new dso->inlines tree. This in turn keeps ownership of the fake symbols and srcline string representing an inline frame. This tree is sorted by address to allow quick lookups. All other entries of the symbol beside the function name are unused for inline frames. The advantage of this approach is that all existing users of the callchain API can now transparently display inlined frames without having to patch their code. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Refactor inline_list to store srcline string directlyMilian Wolff
This is a preparation for the creation of real callchain entries for inlined frames. The rest of the perf code uses the srcline string. As such, using that also for the srcline API allows us to simplify some of the upcoming code. Most notably, it will allow us to cache the srcline for a given inline node and reuse it for different callchain entries. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Refactor inline_list to operate on symbolsMilian Wolff
This is a requirement to create real callchain entries for inlined frames. Since the list of inlines usually contains the target symbol too, i.e. the location where the frames get inlined to, we alias that symbol and reuse it as-is is. This ensures that other dependent functionality keeps working, most notably annotation of the target frames. For all other entries in the inline_list, a fake symbol is created. These are marked by new 'inlined' member which is set to true. Only those symbols are managed by the inline_list and get freed when the inline_list is deleted from within inline_node__delete. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Store srcline in callchain_cursor_nodeMilian Wolff
This is mostly a preparation to enable the creation of full callchain nodes for inline frames. Such frames will reference the IP of the non-inlined frame, but hold the symbol and srcline for an inlined location. As such, we won't be able to query the srcline on-demand based on the IP alone. Instead, we will leverage the functionality provided by this patch here, and store the srcline for the inlined nodes in the new srcline member of callchain_cursor_node. Note that this patch on its own leaks the srcline, as there is no free_callchain_cursor_node or similar. A future patch will add caching of the srcline and handle deletion properly. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Remove code to handle inline frames from browsersMilian Wolff
The follow-up commits will make inline frames first-class citizens in the callchain, thereby obsoleting all of this special code. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Add Goldmont Plus V1 event fileKan Liang
Add a Intel event file for perf. Signed-off-by: Kan Liang <Kan.liang@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1508331907-395162-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf namespaces: Add more appropriate set of headersArnaldo Carvalho de Melo
We don't need perf.h, that is a kitchen sink, all we need is perf_events.h for perf_ns_link_info, sys/types.h for pid_t and linux/types.h for u64, list_head. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Zhijian <lizhijian@cn.fujitsu.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-f2uxyaj4s2hmntkrezpa6dqz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf kmem: Perform some cleanup if '--time' is given an invalid valueChristophe JAILLET
If the string passed in '--time' is invalid, we must do some cleanup before leaving. As in the other error handling paths of this function. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Fixes: 2a865bd8dddd ("perf kmem: Add option to specify time window of interest") Link: http://lkml.kernel.org/r/20170916060936.28199-1-christophe.jaillet@wanadoo.fr Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf script: Fix error handling pathChristophe JAILLET
If the string passed in '--time' is invalid, or if failed to set libtraceevent function resolver, we must do some cleanup before leaving. As in the other error handling paths of this function. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170916062537.28921-1-christophe.jaillet@wanadoo.fr Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf script: Use fprintf like printing uniformlyArnaldo Carvalho de Melo
We've been mixing print() with fprintf() style printing for a while, but now we need to use fprintf() like syntax uniformly as a preparatory patch for supporting printing to different files, one per event. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lkml.kernel.org/n/tip-kv5z3v8ptfghbarv3a9usvin@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf tools: Introduce binary__fprintf()Arnaldo Carvalho de Melo
Out of print_binary() but receiving a fp pointer and expecting that the printer be a fprintf like function, i.e. receive a FILE pointer and return the number of characters printed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lkml.kernel.org/n/tip-6oqnxr6lmgqe6q6p3iugnscx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Fix incorrect cmask syntax for some Intel metricsAndi Kleen
Some of the metrics use an incorrect syntax for specifying the cmask for an event. Convert to perf syntax so that they can be resolved. Fixes metrics on Broadwell, SandyBridge. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/n/tip-3k3fkfj8obek9dkmryyrqzhu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf tools: Do not check ABI headers in a detached tarball buildArnaldo Carvalho de Melo
When we use one of: [acme@jouet linux]$ make help | grep perf perf-tar-src-pkg - Build perf-4.14.0-rc3.tar source tarball perf-targz-src-pkg - Build perf-4.14.0-rc3.tar.gz source tarball perf-tarbz2-src-pkg - Build perf-4.14.0-rc3.tar.bz2 source tarball perf-tarxz-src-pkg - Build perf-4.14.0-rc3.tar.xz source tarball [acme@jouet linux]$ I.e. when we create a detached tarball to build perf outside outside the enveloping kernel sources (from a kernel tarball or a checked out linux.git directory) we by definition can't check for differences among the tools/{include,arch}, etc files we originally copied from the kernel, so bail out in that case, to avoid warnings when doing the detached builds. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vbrga0mhplv7niwxr3ghjyxv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf annotate: Remove arch::cpuid_parse callbackJiri Olsa
There's no need for extra cpuid_parse arch callback, it can be handled directly in init callback. Adding the init function to x86 to cover the cpuid initialization. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171011150158.11895-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf list: Fix group description in the man pageAndi Kleen
Fix an incorrect description in the 'perf list' manpage. When a group does not fit into the hardware it is partially scheduled, but does not error out. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20171010224322.15861-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf tests attr: Make hw events optionalJiri Olsa
Otherwise we fail on virtual machines with no support for specific HW events. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171009130712.14747-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf mmap: Adopt push method from builtin-record.cArnaldo Carvalho de Melo
The previous prep patch was just to show exactly what changed in that function, now its time to move that method and things only it uses to the right place, mmap.[ch] Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-aaxywfgw3d44x6xlu8zm1avu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf record: Make record__mmap_read genericArnaldo Carvalho de Melo
It becomes a perf_mmap method, "push", that build reads from a mmap and "pushes" it to a consumer, that in the initial case, for 'perf record', just writes it to the perf.data file descriptor, but may be used by 'top', etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-u4l1qjbi6l76r2k0nv99220n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf mmap: Move perf_mmap and methods to separate mmap.[ch] filesArnaldo Carvalho de Melo
To better organize the sources, and we may end up even using it directly, without evlists and evsels. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-oiqrm7grflurnnzo2ovfnslg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for Skylake ServerAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for SkylakeAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for Sandy BridgeAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for JakeTownAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for IvyTownAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for IvyBridgeAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for Haswell ServerAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for HaswellAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for Broadwell ServerAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Update JSON metrics for BroadwellAndi Kleen
Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-20Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18perf test shell trace+probe_libc_inet_pton.sh: Be compatible with Debian/UbuntuLi Zhijian
In debian/ubuntu, libc.so is located at a different place, /lib/x86_64-linux-gnu/libc-2.23.so, so it outputs like this when testing: PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.040 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.040/0.040/0.040/0.000 ms 0.000 probe_libc:inet_pton:(7f0e2db741c0)) __GI___inet_pton (/lib/x86_64-linux-gnu/libc-2.23.so) getaddrinfo (/lib/x86_64-linux-gnu/libc-2.23.so) [0xffffa9d40f34ff4d] (/bin/ping) Fix up the libc path to make sure this test works in more OSes. Committer testing: When this test fails one can use 'perf test -v', i.e. in verbose mode, where it'll show the expected backtrace, so, after applying this test: On Fedora 26: # perf test -v ping 62: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 23322 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.058 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms 0.000 probe_libc:inet_pton:(7fe344310d80)) __GI___inet_pton (/usr/lib64/libc-2.25.so) getaddrinfo (/usr/lib64/libc-2.25.so) _init (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok # Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Li Zhijian <lizhijian@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philip Li <philip.li@intel.com> Link: http://lkml.kernel.org/r/1508315649-18836-1-git-send-email-lizhijian@cn.fujitsu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-18perf xyarray: Fix wrong processing when closing evsel fdJin Yao
In current xyarray code, xyarray__max_x() returns max_y, and xyarray__max_y() returns max_x. It's confusing and for code logic it looks not correct. Error happens when closing evsel fd. Let's see this scenario: 1. Allocate an fd (pseudo-code) perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads) { evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int)); } xyarray__new(int xlen, int ylen, size_t entry_size) { size_t row_size = ylen * entry_size; struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size); xy->entry_size = entry_size; xy->row_size = row_size; xy->entries = xlen * ylen; xy->max_x = xlen; xy->max_y = ylen; ...... } So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4. 2. Use perf syscall and get the fd int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, struct thread_map *threads) { for (cpu = 0; cpu < cpus->nr; cpu++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu], group_fd, flags); FD(evsel, cpu, thread) = fd; } } static inline void *xyarray__entry(struct xyarray *xy, int x, int y) { return &xy->contents[x * xy->row_size + y * xy->entry_size]; } These codes don't have issues. The issue happens in the closing of fd. 3. Close fd. void perf_evsel__close_fd(struct perf_evsel *evsel) { int cpu, thread; for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } } Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y() returns max_x (ncpus), so above code is actually to be: for (cpu = 0; cpu < nthreads; cpu++) for (thread = 0; thread < ncpus; ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } It's not correct! This change is introduced by "475fb533fb7d" ("perf evsel: Fix buffer overflow while freeing events") This fix is to let xyarray__max_x() return max_x (ncpus) and let xyarry__max_y() return max_y (nthreads) Committer note: This was also fixed by Ravi Bangoria, who provided the same patch, noticing the problem with 'perf record': <quote Ravi> I see 'perf record -p <pid>' crashes with following log: *** Error in `./perf': free(): invalid next size (normal): 0x000000000298b340 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7fd85c87e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f7fd85d137a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7fd85d553c] ./perf(perf_evsel__close+0xb4)[0x4b7614] ./perf(perf_evlist__delete+0x100)[0x4ab180] ./perf(cmd_record+0x1d9)[0x43a5a9] ./perf[0x49aa2f] ./perf(main+0x631)[0x427841] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7fd8571830] ./perf(_start+0x29)[0x427a59] </> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Fixes: d74be4767367 ("perf xyarray: Save max_x, max_y") Link: http://lkml.kernel.org/r/1508339478-26674-1-git-send-email-yao.jin@linux.intel.com Link: http://lkml.kernel.org/r/1508327446-15302-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-17perf buildid-list: Fix crash when processing PERF_RECORD_NAMESPACENamhyung Kim
Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL pointer deref when he ran it on a data with namespace events. It was because the buildid_id__mark_dso_hit_ops lacks the namespace event handler and perf_too__fill_default() didn't set it. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x (gdb) where #0 0x0000000000000000 in ?? () #1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18, evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888, tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287 #2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>, sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340 #3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470, file_offset=file_offset@entry=1136) at util/session.c:1522 #4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>, data_offset=<optimized out>, session=0x0) at util/session.c:1899 #5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953 #6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>) at builtin-buildid-list.c:83 #7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115 #8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0) at perf.c:296 #9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348 #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392 #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536 (gdb) Fix it by adding a stub event handler for namespace event. Committer testing: Further clarifying, plain using 'perf buildid-list' will not end up in a SEGFAULT when processing a perf.data file with namespace info: # perf record -a --namespaces sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ] # perf buildid-list | wc -l 38 # perf buildid-list | head -5 e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux 874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko 5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so # It is only when one asks for checking what of those entries actually had samples, i.e. when we use either -H or --with-hits, that we will process all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c neither explicitely set a perf_tool.namespaces() callback nor the default stub was set that we end up, when processing a PERF_RECORD_NAMESPACE record, causing a SEGFAULT: # perf buildid-list -H Segmentation fault (core dumped) ^C # Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Fixes: f3b3614a284d ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-17perf record: Fix documentation for a inexistent option '-l'Taeung Song
'perf record' had a '-l' option that meant "scale counter values" a very long time ago, but it currently belongs to 'perf stat' as '-c'. So remove it. I found this problem in the below case. $ perf record -e cycles -l sleep 3 Error: unknown switch `l Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1507907412-19813-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-13perf tools: Check wether the eBPF file exists in event parsingJiri Olsa
Adding the check wether the eBPF file exists, to consider it as eBPF input file. This way we can differentiate eBPF events from events that end up with same suffix as eBPF file. Before: $ perf stat -e 'cpu/uops_executed.core/' true bpf: builtin compilation failed: -95, try external compiler WARNING: unable to get correct kernel building directory. Hint: Set correct kbuild directory using 'kbuild-dir' option in [llvm] section of ~/.perfconfig or set it to "" to suppress kbuild detection. event syntax error: 'cpu/uops_executed.core/' \___ Failed to load cpu/uops_executed.c from source: 'version' section incorrect or lost After: $ perf stat -e 'cpu/uops_executed.core/' true Performance counter stats for 'true': 181,533 cpu/uops_executed.core/:u 0.002795447 seconds time elapsed If user makes type in the eBPF file, we prioritize the event syntax and show following warning: $ perf stat -e 'krava.c//' true event syntax error: 'krava.c//' \___ Cannot find PMU `krava.c'. Missing kernel support? Reported-and-Tested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-13perf hists: Add extra integrity checks to fmt_free()Jiri Olsa
Make sure the struct perf_hpp_fmt is properly unhooked before we free it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-13perf hists: Fix crash in perf_hpp__reset_output_field()Jiri Olsa
Du Changbin reported crash [1] when calling perf_hpp__reset_output_field() after unregistering field via perf_hpp__column_unregister(). This ends up in calling following list_del* sequence on the same format: perf_hpp__column_unregister: list_del(&format->list); perf_hpp__reset_output_field: list_del_init(&fmt->list); where the later list_del_init might touch already freed formats. Fixing this by replacing list_del() with list_del_init() in perf_hpp__column_unregister(). [1] http://marc.info/?l=linux-kernel&m=149059595826019&w=2 Reported-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-09perf pmu: Unbreak perf record for arm/arm64 with events with explicit PMUMark Rutland
Currently, perf record is broken on arm/arm64 systems when the PMU is specified explicitly as part of the event, e.g. $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true In such cases, perf record fails to open events unless perf_event_paranoid is set to -1, even if the PMU in question supports mode exclusion. Further, even when perf_event_paranoid is toggled, no samples are recorded. This is an unintended side effect of commit: e3ba76deef23064f ("perf tools: Force uncore events to system wide monitoring) ... which assumes that if a PMU has an associated cpu_map, it is an uncore PMU, and forces events for such PMUs to be system-wide. This is not true for arm/arm64 systems, which can have heterogeneous CPUs. To account for this, multiple CPU PMUs are exposed, each with a "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM PMUs do not have a "cpumask" file, and only have a "cpus" file. For the gory details as to why, see commit: 7e3fcffe95544010 ("perf pmu: Support alternative sysfs cpumask") Given all of this, we can instead identify uncore PMUs by explicitly checking for a "cpumask" file, and restore arm/arm64 PMU support back to a working state. This patch does so, adding a new perf_pmu::is_uncore field, and splitting the existing cpumask parsing so that it can be reused. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by Will Deacon <will.deacon@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: 4.12+ <stable@vger.kernel.org> Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide monitoring) Link: http://lkml.kernel.org/r/1507315102-5942-1-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-06perf script: Add missing separator for "-F ip,brstack" (and brstackoff)Mark Santaniello
Prior to commit 55b9b50811ca ("perf script: Support -F brstack,dso and brstacksym,dso"), we were printing a space before the brstack data. It seems that this space was important. Without it, parsing is difficult. Very sorry for the mistake. Notice here how the "ip" and "brstack" run together: $ perf script -F ip,brstack | head -n 1 22e18c40x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 After this diff, sanity is restored: $ perf script -F ip,brstack | head -n 1 22e18c4 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 Signed-off-by: Mark Santaniello <marksan@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: 4.13+ <stable@vger.kernel.org> Fixes: 55b9b50811ca ("perf script: Support -F brstack,dso and brstacksym,dso") Link: http://lkml.kernel.org/r/20171006080722.3442046-1-marksan@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-05perf callchain: Compare dsos (as well) for CCKEY_FUNCTIONRavi Bangoria
Two functions from different binaries can have same start address. Thus, comparing only start address in match_chain() leads to inconsistent callchains. Fix this by adding a check for dsos as well. Ex, https://www.spinics.net/lists/linux-perf-users/msg04067.html Reported-by: Alexander Pozdneev <pozdneyev@gmail.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/20171005091234.5874-1-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-03perf tests attr: Fix group stat testsJiri Olsa
We started to use group read whenever it's possible: 82bf311e15d2 perf stat: Use group read for event groups That breaks some of attr tests, this change adds the new possible read_format value. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> LPU-Reference: 20170928160633.GA26973@krava Link: http://lkml.kernel.org/n/tip-1ko2zc4nph93d8lfwjyk9ivz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-03perf top: Add option to set the number of thread for event synthesizeKan Liang
Using UINT_MAX to indicate the default thread#, which is the max number of online CPU. Committer testing: # perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9 # cat /tmp/output ? ( ? ): ... [continued]: clone()) = 26651 (perf) 0.059 ( 0.010 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf) 0.116 ( 0.014 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf) 0.141 ( 0.009 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf) 0.160 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf) 0.232 ( 0.013 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf) 0.393 ( 0.011 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf) 0.802 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf) 1.411 ( 0.022 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf) 246.422 ( 0.042 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf) # Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>