summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-30perf stat: Move the shadow stats scale computation in ↵Jiri Olsa
perf_stat__update_shadow_stats Move the shadow stats scale computation to the perf_stat__update_shadow_stats() function, so it's centralized and we don't forget to do it. It also saves few lines of code. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-htg7mmyxv6pcrf57qyo6msid@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf tools: Add perf_data_file__write functionJiri Olsa
Adding perf_data_file__write function to provide single file write operation. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf tools: Add struct perf_data_fileJiri Olsa
Add struct perf_data_file to represent a single file within a perf_data struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-c3f9p4xzykr845ktqcek6p4t@git.kernel.org [ Fixup recent changes in 'perf script --per-event-dump' ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf tools: Rename struct perf_data_file to perf_dataJiri Olsa
Rename struct perf_data_file to perf_data, because we will add the possibility to have multiple files under perf.data, so the 'perf_data' name fits better. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-39wn4d77phel3dgkzo3lyan0@git.kernel.org [ Fixup recent changes in 'perf script --per-event-dump' ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-30perf script: Print information about per-event-dump filesArnaldo Carvalho de Melo
For a file generated by "perf sched record sleep 50": # perf script --per-event-dump [ perf script: Wrote 23.121 MB perf.data.sched:sched_switch.dump (206015 samples) ] [ perf script: Wrote 0.000 MB perf.data.sched:sched_stat_wait.dump (0 samples) ] [ perf script: Wrote 0.000 MB perf.data.sched:sched_stat_sleep.dump (0 samples) ] [ perf script: Wrote 0.000 MB perf.data.sched:sched_stat_iowait.dump (0 samples) ] [ perf script: Wrote 17.680 MB perf.data.sched:sched_stat_runtime.dump (129342 samples) ] [ perf script: Wrote 0.000 MB perf.data.sched:sched_process_fork.dump (24 samples) ] [ perf script: Wrote 11.328 MB perf.data.sched:sched_wakeup.dump (106770 samples) ] [ perf script: Wrote 0.000 MB perf.data.sched:sched_wakeup_new.dump (24 samples) ] [ perf script: Wrote 2.477 MB perf.data.sched:sched_migrate_task.dump (20434 samples) ] # Similar to what is generated by 'perf record'. Based-on-a-patch-by: yuzhoujian <yuzhoujian@didichuxing.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1508921599-10832-3-git-send-email-yuzhoujian@didichuxing.com Link: http://lkml.kernel.org/n/tip-xuketkkjuk2c0qz546ypd1u7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf trace beauty prctl: Generate 'option' string table from kernel headersArnaldo Carvalho de Melo
This is one more case where the way that syscall parameter values are defined in kernel headers are easy to parse using a shell script that will then generate the string table that gets used by the prctl 'option' argument beautifier. This way as soon as the header syncronization mechanism in perf's build system detects a change in a copy of a kernel ABI header and that file is syncronized, we get 'perf trace' updated automagically. Further work needed for the PR_SET_ values, as well for using eBPF to copy the non-integer arguments to/from the kernel. E.g.: System wide prctl tracing: # perf trace -e prctl 1668.028 ( 0.025 ms): TaskSchedulerR/10649 prctl(option: SET_NAME, arg2: 0x2b61d5db15d0) = 0 3365.663 ( 0.018 ms): chrome/10650 prctl(option: SET_SECCOMP, arg2: 2, arg4: 8 ) = -1 EFAULT Bad address 3366.585 ( 0.010 ms): chrome/10650 prctl(option: SET_NO_NEW_PRIVS, arg2: 1 ) = 0 3367.173 ( 0.009 ms): TaskSchedulerR/10652 prctl(option: SET_NAME, arg2: 0x2b61d2aaa300) = 0 3367.222 ( 0.003 ms): TaskSchedulerR/10653 prctl(option: SET_NAME, arg2: 0x2b61d2aaa1e0) = 0 3367.244 ( 0.002 ms): TaskSchedulerR/10654 prctl(option: SET_NAME, arg2: 0x2b61d2aaa0c0) = 0 3367.265 ( 0.002 ms): TaskSchedulerR/10655 prctl(option: SET_NAME, arg2: 0x2b61d2ac7f90) = 0 3367.281 ( 0.002 ms): Chrome_ChildIO/10656 prctl(option: SET_NAME, arg2: 0x7efbe406bb11) = 0 3367.220 ( 0.004 ms): TaskSchedulerS/10651 prctl(option: SET_NAME, arg2: 0x2b61d2ac1be0) = 0 3370.906 ( 0.010 ms): GpuMemoryThrea/10657 prctl(option: SET_NAME, arg2: 0x7efbe386ab11) = 0 3370.983 ( 0.003 ms): File/10658 prctl(option: SET_NAME, arg2: 0x7efbe3069b11 ) = 0 3384.272 ( 0.020 ms): Compositor/10659 prctl(option: SET_NAME, arg2: 0x7efbe2868b11 ) = 0 3612.091 ( 0.012 ms): DOM Worker/11489 prctl(option: SET_NAME, arg2: 0x7f49ab97ebf2 ) = 0 <SNIP> 4512.437 ( 0.004 ms): (sa1)/11490 prctl(option: SET_NAME, arg2: 0x7ffca15af844 ) = 0 4512.468 ( 0.002 ms): (sa1)/11490 prctl(option: SET_MM, arg2: ARG_START, arg3: 0x7f5cb7c81000) = 0 4512.472 ( 0.001 ms): (sa1)/11490 prctl(option: SET_MM, arg2: ARG_END, arg3: 0x7f5cb7c81006) = 0 4514.667 ( 0.002 ms): (sa1)/11490 prctl(option: GET_SECUREBITS ) = 0 Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-q0s2uw579o5ei6xlh2zjirgz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27tools include uapi: Grab a copy of linux/prctl.hArnaldo Carvalho de Melo
We will use it to generate tables for beautifying prctl's 'option' arg and some of the others eventually. Cc: Andy Lutomirski <luto@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-cg8mpmz4hk9nfih685emnbk9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf script: Allow creating per-event dump filesArnaldo Carvalho de Melo
Introduce a new option to dump trace output to files named by the monitored events and update perf-script documentation accordingly. Shown below is output of perf script command with the newly introduced option. $ perf record -e cycles -e cs -ag -- sleep 1 $ perf script --per-event-dump $ ls perf.data.cycles.dump perf.data.cs.dump Without per-event-dump support, drawing flamegraphs for different events would require post processing to separate events. You can monitor only one event at a time if you want to get flamegraphs for different events. Using this option, you can get the trace output files named by the monitored events, and could draw flamegraphs according to the event's name. Based-on-a-patch-by: yuzhoujian <yuzhoujian@didichuxing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1508921599-10832-3-git-send-email-yuzhoujian@didichuxing.com Link: http://lkml.kernel.org/n/tip-8ngzsjdhgiovkupl3r5yy570@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf evsel: Restore evsel->priv as a tool private areaArnaldo Carvalho de Melo
When we started using it for stats and did it not just in builtin-stat.c, but also for builtin-script.c, then it stopped being a tool private area, so introduce a new pointer for these stats and leave ->priv to its original purpose. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Fixes: cfc8874a4859 ("perf script: Process cpu/threads maps") Link: http://lkml.kernel.org/n/tip-jtpzx3rjqo78snmmsdzwb2eb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf script: Use event_format__fprintf()Arnaldo Carvalho de Melo
Another case where we a1a587073ccd ("perf script: Use fprintf like printing uniformly") forgot to redirect output to the FILE descriptor, fix this too. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lkml.kernel.org/n/tip-jmwx4pgfezw98ezfoj9t957s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf script: Use pr_debug where appropriateArnaldo Carvalho de Melo
We have facilities for reporting unexpected, unlikely errors, use them. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lkml.kernel.org/n/tip-c7j22xfjf1j773g7ufp607q0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf script: Add a few missing conversions to fprintf styleArnaldo Carvalho de Melo
In a1a587073ccd ("perf script: Use fprintf like printing uniformly") there were a few cases that were missed, fix it. Reported-by: yuzhoujian <yuzhoujian@didichuxing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-sq9hvfk5mkjdqzlpyiq7jkos@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-27perf/core: Rewrite event timekeepingPeter Zijlstra
The current even timekeeping, which computes enabled and running times, uses 3 distinct timestamps to reflect the various event states: OFF (stopped), INACTIVE (enabled) and ACTIVE (running). Furthermore, the update rules are such that even INACTIVE events need their timestamps updated. This is undesirable because we'd like to not touch INACTIVE events if at all possible, this makes event scheduling (much) more expensive than needed. Rewrite the timekeeping to directly use event->state, this greatly simplifies the code and results in only having to update things when we change state, or an up-to-date value is requested (read). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/core: Fix perf_event_read()Peter Zijlstra
perf_event_read() has a number of issues regarding the timekeeping bits. - The IPI didn't update group times when it found INACTIVE - The direct call would not re-check ->state after taking ctx->lock which can result in ->count and timestamps getting out of sync. And we can make use of the ordering introduced for perf_event_stop() to make it more accurate for ACTIVE. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/core: Remove wrong barrierPeter Zijlstra
The barrier and comment make no sense: - if what the barrier says is true, it should be wmb() but that should then be part of the arch driver, not the generic code. - if it is an SMP barrier, there must be a matching barrier, and there isn't one. So kill it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/core: Rename 'enum perf_event_active_state'Peter Zijlstra
Its a weird name, active is one of the states, it should not be part of the name, also, its too long. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/core: Make sure to update ctx time before using itPeter Zijlstra
We should make sure to update ctx time before we use it to update event times. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/core: Fix __perf_read_group_add() lockingPeter Zijlstra
Event timestamps are serialized using ctx->lock, make sure to hold it over reading all values. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/core: Update ctx time before detaching eventsPeter Zijlstra
We should make sure the ctx time is updated before we detach events; which will want to update event times. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/core: Fix perf_event_read_value() lockingPeter Zijlstra
perf_event_read_value() is an external accessor, just like perf_event_{en,dis}able() and should thus use perf_event_ctx_lock(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf ↵Yonghong Song
event change needed for subsequent bpf helpers" eBPF programs would like access to the (perf) event enabled and running times along with the event value, such that they can deal with event multiplexing (among other things). This patch extends the interface; a future eBPF patch will utilize the new functionality. [ Note, there's a same-content commit with a poor changelog and a meaningless title in the networking tree as well - but we need this change for subsequent perf work, so apply it here as well, with a proper changelog. Hopefully Git will be able to sort out this somewhat messy workflow, if there are no other, conflicting changes to these files. ] Signed-off-by: Yonghong Song <yhs@fb.com> [ Rewrote the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <ast@fb.com> Cc: <daniel@iogearbox.net> Cc: <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David S. Miller <davem@davemloft.net> Link: http://lkml.kernel.org/r/20171005161923.332790-2-yhs@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25Merge tag 'perf-core-for-mingo-4.15-20171025' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core inline improvements from Arnaldo Carvalho de Melo: From Milian's cover letter: (Milian Wolff) "This series of patches completely reworks the way inline frames are handled. Instead of querying for the inline nodes on-demand in the individual tools, we now create proper callchain nodes for inlined frames. The advantages this approach brings are numerous: - Less duplicated code in the individual browser - Aggregated cost for inlined frames for the --children top-down list - Various bug fixes that arose from querying for a srcline/symbol based on the IP of a sample, which will always point to the last inlined frame instead of the corresponding non-inlined frame - Overall much better support for visualizing cost for heavily-inlined C++ code, which simply was confusing and unreliably before - srcline honors the global setting as to whether full paths or basenames should be shown - Caches for inlined frames and srcline information, which allow us to enable inline frame handling by default" Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25perf util: Enable handling of inlined frames by defaultMilian Wolff
Now that we have caches in place to speed up the process of finding inlined frames and srcline information repeatedly, we can enable this useful option by default. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Use srcline from callchain for hist entriesMilian Wolff
This also removes the symbol name from the srcline column, more on this below. This ensures we use the correct srcline, which could originate from a potentially inlined function. The hist entries used to query for the srcline based purely on the IP, which leads to wrong results for inlined entries. Before: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ .................................................................................................................................. # 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 44.58% 0.00% main+100 44.58% 0.00% std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% std::__complex_abs+100 44.58% 0.00% std::abs<double>+100 44.58% 0.00% std::norm<double>+100 36.01% 0.00% hypot+18446603487892193300 25.81% 0.00% main+41 25.81% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.75% 25.75% random.h:143 18.39% 0.00% main+57 18.39% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% random.tcc:3330 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ After: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ ........................................... # 94.30% 1.19% main.cpp:39 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 48.44% 1.70% random.h:1823 48.44% 0.00% random.h:1814 46.74% 2.53% random.h:185 44.68% 0.10% complex:589 44.68% 0.00% complex:597 44.68% 0.00% complex:654 44.68% 0.00% complex:664 40.61% 13.80% random.tcc:3330 36.01% 0.00% hypot+18446603487892193300 26.81% 0.00% random.h:151 26.81% 0.00% random.h:332 25.75% 25.75% random.h:143 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ Note that this change removes the symbol from the source:line hist column. If this information is desired, users should explicitly query for it if needed. I.e. run this command instead: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... ........................................... # 94.30% 1.19% [.] main main.cpp:39 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1814 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1823 46.74% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) random.h:185 44.68% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) complex:654 44.68% 0.00% [.] std::__complex_abs (inlined) complex:589 44.68% 0.00% [.] std::abs<double> (inlined) complex:597 44.68% 0.00% [.] std::norm<double> (inlined) complex:664 39.80% 13.59% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 36.01% 0.00% [.] hypot hypot+18446603487892193300 26.81% 0.00% [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) random.h:151 26.81% 0.00% [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) random.h:332 25.75% 0.00% [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) random.h:143 25.19% 25.19% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Compared to the old behavior, this reduces duplication in the output. Before we used to print the symbol name in the srcline column even when the sym column was explicitly requested. I.e. the output was: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... .................................................................................................................................. # 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 44.58% 0.00% [.] main main+100 44.58% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% [.] std::__complex_abs (inlined) std::__complex_abs+100 44.58% 0.00% [.] std::abs<double> (inlined) std::abs<double>+100 44.58% 0.00% [.] std::norm<double> (inlined) std::norm<double>+100 36.01% 0.00% [.] hypot hypot+18446603487892193300 25.81% 0.00% [.] main main+41 25.81% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.69% 25.69% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 18.39% 0.00% [.] main main+57 18.39% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Cache srclines for callchain nodesMilian Wolff
On one hand this ensures that the memory is properly freed when the DSO gets freed. On the other hand this significantly speeds up the processing of the callchain nodes when lots of srclines are requested. For one of my data files e.g.: Before: Performance counter stats for 'perf report -s srcline -g srcline --stdio': 52496.495043 task-clock (msec) # 0.999 CPUs utilized 634 context-switches # 0.012 K/sec 2 cpu-migrations # 0.000 K/sec 191,561 page-faults # 0.004 M/sec 165,074,498,235 cycles # 3.144 GHz 334,170,832,408 instructions # 2.02 insn per cycle 90,220,029,745 branches # 1718.591 M/sec 654,525,177 branch-misses # 0.73% of all branches 52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks! After: Performance counter stats for 'perf report -s srcline -g srcline --stdio': 22606.323706 task-clock (msec) # 1.000 CPUs utilized 31 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 185,471 page-faults # 0.008 M/sec 71,188,113,681 cycles # 3.149 GHz 133,204,943,083 instructions # 1.87 insn per cycle 34,886,384,979 branches # 1543.214 M/sec 278,214,495 branch-misses # 0.80% of all branches 22.609857253 seconds time elapsed Note that the difference is only this large when `--inline` is not passed. In such situations, we would use the inliner cache and thus do not run this code path that often. I think that this cache should actually be used in other places, too. When looking at the valgrind leak report for perf report, we see tons of srclines being leaked, most notably from calls to hist_entry__get_srcline. The problem is that get_srcline has many different formatting options (show_sym, show_addr, potentially even unwind_inlines when calling __get_srcline directly). As such, the srcline cannot easily be cached for all calls, or we'd have to add caches for all formatting combinations (6 so far). An alternative would be to remove the formatting options and handle that on a different level - i.e. print the sym/addr on demand wherever we actually output something. And the unwind_inlines could be moved into a separate function that does not return the srcline. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Cache failed lookups of inlined framesMilian Wolff
When no inlined frames could be found for a given address, we did not store this information anywhere. That means we potentially do the costly inliner lookup repeatedly for cases where we know it can never succeed. This patch makes dso__parse_addr_inlines always return a valid inline_node. It will be empty when no inliners are found. This enables us to cache the empty list in the DSO, thereby improving the performance when many addresses fail to find the inliners. For my trivial example, the performance impact is already quite significant: Before: ~~~~~ Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs): 594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 53 context-switches # 0.089 K/sec ( +- 4.09% ) 0 cpu-migrations # 0.000 K/sec ( +-100.00% ) 5,687 page-faults # 0.010 M/sec ( +- 0.02% ) 2,300,918,213 cycles # 3.868 GHz ( +- 0.09% ) 4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% ) 939,177,205 branches # 1578.969 M/sec ( +- 0.00% ) 11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% ) 0.596246531 seconds time elapsed ( +- 0.07% ) ~~~~~ After: ~~~~~ Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs): 113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% ) 29 context-switches # 0.255 K/sec ( +- 54.25% ) 0 cpu-migrations # 0.000 K/sec 5,380 page-faults # 0.048 M/sec ( +- 0.01% ) 432,378,779 cycles # 3.823 GHz ( +- 0.75% ) 670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% ) 141,001,247 branches # 1246.570 M/sec ( +- 0.01% ) 2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% ) 0.114222393 seconds time elapsed ( +- 1.19% ) ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25perf report: Properly handle branch count in match_chain()Milian Wolff
Some of the code paths I introduced before returned too early without running the code to handle a node's branch count. By refactoring match_chain to only have one exit point, this can be remedied. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Compare symbol name for inlined frames when sortingMilian Wolff
Similar to the callstack frame matching, we also have to compare the symbol name when sorting hist entries. The reason is twofold: On one hand, multiple inlined functions will use the same symbol start/end values of the parent, non-inlined symbol. As such, all of these symbols often end up missing from top-level report, as they get merged with the non-inlined frame. On the other hand, multiple different functions may end up inlining the same function, and we need to aggregate these values properly. Before: ~~~~~ perf report --stdio --inline -g none # Children Self Command Shared Object Symbol # ........ ........ ............ ............. ................................... # 100.00% 39.69% cpp-inlining cpp-inlining [.] main 100.00% 0.00% cpp-inlining cpp-inlining [.] _start 100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main 97.03% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined) 59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot 55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite 0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs ~~~~~ After: ~~~~~ perf report --stdio --inline -g none # Children Self Command Shared Object Symbol # ........ ........ ............ ............. ................................................................................................................................... # 100.00% 39.69% cpp-inlining cpp-inlining [.] main 100.00% 0.00% cpp-inlining cpp-inlining [.] _start 100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main 62.57% 0.00% cpp-inlining cpp-inlining [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::__complex_abs (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::abs<double> (inlined) 62.57% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined) 59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot 55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite 34.46% 0.00% cpp-inlining cpp-inlining [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) 32.39% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) 32.39% 0.00% cpp-inlining cpp-inlining [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) 12.29% 0.00% cpp-inlining cpp-inlining [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) 0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Compare symbol name for inlined frames when matchingMilian Wolff
The fake symbols we create for inlined frames will represent different functions but can use the symbol start address. This leads to issues when different inline branches all lead to the same function. Before: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | --37.57%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) | --36.36%--std::abs<double> (inlined) std::__complex_abs (inlined) | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) ~~~~~ Note that this backtrace representation is completely bogus. Complex abs does not call the linear congruential engine! It is just a side-effect of a longer inlined stack being appended to a shorter, different inlined stack, both of which originate in the same function (main). This patch fixes the issue: ~~~~~ $ perf report -s sym -i perf.inlining.data --inline --stdio -g function ... --38.86%--_start __libc_start_main main | |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) | std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) | | | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) | std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) | std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) | --1.99%--std::norm<double> (inlined) std::_Norm_helper<true>::_S_do_it<double> (inlined) std::abs<double> (inlined) std::__complex_abs (inlined) ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> [ Fix up conflict with c1fbc0cf81f1 ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf script: Mark inlined frames and do not print DSO for themMilian Wolff
Instead of showing the (repeated) DSO name of the non-inlined frame, we now show the "(inlined)" suffix instead. Before: 214f7 __hypot_finite (/usr/lib/libm-2.25.so) ace3 hypot (/usr/lib/libm-2.25.so) a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining) a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining) a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining) a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining) a4a main (/home/milian/projects/src/perf-tests/inlining) 20510 __libc_start_main (/usr/lib/libc-2.25.so) bd9 _start (/home/milian/projects/src/perf-tests/inlining) After: 214f7 __hypot_finite (/usr/lib/libm-2.25.so) ace3 hypot (/usr/lib/libm-2.25.so) a4a std::__complex_abs (inlined) a4a std::abs<double> (inlined) a4a std::_Norm_helper<true>::_S_do_it<double> (inlined) a4a std::norm<double> (inlined) a4a main (/home/milian/projects/src/perf-tests/inlining) 20510 __libc_start_main (/usr/lib/libc-2.25.so) bd9 _start (/home/milian/projects/src/perf-tests/inlining) Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Mark inlined frames in output by " (inlined)" suffixMilian Wolff
The original patch that introduced inline frame output in the various browsers used this suffix already. The new centralized approach that uses fake symbols for inlined frames was missing this approach so far. Instead of changing the symbol name itself, we only print the suffix where needed. This allows us to efficiently lookup the symbol for a given name without first having to append the suffix before the lookup. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Fall-back to function name comparison for -g srclineMilian Wolff
When a callchain entry has no srcline available, we ended up comparing the instruction pointer. I consider this to be not too useful. Rather, I think we should group the entries by function name, which this patch adds. For people who want to split the data on the IP boundary, using `-g address` is the correct choice. Before: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--56.03%--hypot | | | | | |--8.45%--__hypot_finite | | | | | |--7.62%--__hypot_finite | | | | | |--2.29%--__hypot_finite | | | | | |--2.24%--__hypot_finite | | | | | |--2.06%--__hypot_finite | | | | | |--1.81%--__hypot_finite ... ~~~~~ After: ~~~~~ 100.00% 38.86% [.] main | |--61.14%--main inlining.cpp:14 | std::norm<double> complex:664 | std::_Norm_helper<true>::_S_do_it<double> complex:654 | std::abs<double> complex:597 | std::__complex_abs complex:589 | | | |--60.29%--hypot | | | | | --56.03%--__hypot_finite | | | --0.85%--cabs ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Create real callchain entries for inlined framesMilian Wolff
The inline_node structs are maintained by the new dso->inlines tree. This in turn keeps ownership of the fake symbols and srcline string representing an inline frame. This tree is sorted by address to allow quick lookups. All other entries of the symbol beside the function name are unused for inline frames. The advantage of this approach is that all existing users of the callchain API can now transparently display inlined frames without having to patch their code. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Refactor inline_list to store srcline string directlyMilian Wolff
This is a preparation for the creation of real callchain entries for inlined frames. The rest of the perf code uses the srcline string. As such, using that also for the srcline API allows us to simplify some of the upcoming code. Most notably, it will allow us to cache the srcline for a given inline node and reuse it for different callchain entries. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Refactor inline_list to operate on symbolsMilian Wolff
This is a requirement to create real callchain entries for inlined frames. Since the list of inlines usually contains the target symbol too, i.e. the location where the frames get inlined to, we alias that symbol and reuse it as-is is. This ensures that other dependent functionality keeps working, most notably annotation of the target frames. For all other entries in the inline_list, a fake symbol is created. These are marked by new 'inlined' member which is set to true. Only those symbols are managed by the inline_list and get freed when the inline_list is deleted from within inline_node__delete. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf callchain: Store srcline in callchain_cursor_nodeMilian Wolff
This is mostly a preparation to enable the creation of full callchain nodes for inline frames. Such frames will reference the IP of the non-inlined frame, but hold the symbol and srcline for an inlined location. As such, we won't be able to query the srcline on-demand based on the IP alone. Instead, we will leverage the functionality provided by this patch here, and store the srcline for the inlined nodes in the new srcline member of callchain_cursor_node. Note that this patch on its own leaks the srcline, as there is no free_callchain_cursor_node or similar. A future patch will add caching of the srcline and handle deletion properly. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Remove code to handle inline frames from browsersMilian Wolff
The follow-up commits will make inline frames first-class citizens in the callchain, thereby obsoleting all of this special code. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf/x86/intel/bts: Fix exclusive event reference leakAlexander Shishkin
Commit: d2878d642a4ed ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems") ... adds a privilege check in the exactly wrong place in the event init path: after the 'LBR exclusive' reference has been taken, and doesn't release it in the case of insufficient privileges. After this, nobody in the system gets to use PT or LBR afterwards. This patch moves the privilege check to where it should have been in the first place. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: d2878d642a4ed ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems") Link: http://lkml.kernel.org/r/20171023123533.16973-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-24Merge tag 'perf-core-for-mingo-4.15-20171023' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Update vendor events JSON metrics for Intel's Broadwell, Broadwell Server, Haswell, Haswell Server, IvyBridge, IvyTown, JakeTown, Sandy Bridge, Skylake and SkyLake Server (Andi Kleen) - Add vendor event file for Intel's Goldmont Plus V1 (Kan Liang) - Move perf_mmap methods from 'perf record' and evlist.c to a separate mmap.[ch] pair, to better separate things and pave the way for further work on multithreading tools (Arnaldo Carvalho de Melo) - Do not check ABI headers in a detached tarball build, as it the kernel headers from where we copied tools/include/ are by definition not available (Arnaldo Carvalho de Melo) - Make 'perf script' use fprintf() like printing, i.e. receiving a FILE pointer so that it gets consistent with other tools/ code and allows for printing to per-event files (Arnaldo Carvalho de Melo) - Error handling fixes (resource release on exit) for 'perf script' and 'perf kmem' (Christophe JAILLET) - Make some 'perf event attr' tests optional on virtual machines, where tested counters are not available (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-23perf vendor events: Add Goldmont Plus V1 event fileKan Liang
Add a Intel event file for perf. Signed-off-by: Kan Liang <Kan.liang@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1508331907-395162-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf namespaces: Add more appropriate set of headersArnaldo Carvalho de Melo
We don't need perf.h, that is a kitchen sink, all we need is perf_events.h for perf_ns_link_info, sys/types.h for pid_t and linux/types.h for u64, list_head. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Zhijian <lizhijian@cn.fujitsu.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-f2uxyaj4s2hmntkrezpa6dqz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf kmem: Perform some cleanup if '--time' is given an invalid valueChristophe JAILLET
If the string passed in '--time' is invalid, we must do some cleanup before leaving. As in the other error handling paths of this function. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Fixes: 2a865bd8dddd ("perf kmem: Add option to specify time window of interest") Link: http://lkml.kernel.org/r/20170916060936.28199-1-christophe.jaillet@wanadoo.fr Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf script: Fix error handling pathChristophe JAILLET
If the string passed in '--time' is invalid, or if failed to set libtraceevent function resolver, we must do some cleanup before leaving. As in the other error handling paths of this function. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170916062537.28921-1-christophe.jaillet@wanadoo.fr Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf script: Use fprintf like printing uniformlyArnaldo Carvalho de Melo
We've been mixing print() with fprintf() style printing for a while, but now we need to use fprintf() like syntax uniformly as a preparatory patch for supporting printing to different files, one per event. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lkml.kernel.org/n/tip-kv5z3v8ptfghbarv3a9usvin@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf tools: Introduce binary__fprintf()Arnaldo Carvalho de Melo
Out of print_binary() but receiving a fp pointer and expecting that the printer be a fprintf like function, i.e. receive a FILE pointer and return the number of characters printed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lkml.kernel.org/n/tip-6oqnxr6lmgqe6q6p3iugnscx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf vendor events: Fix incorrect cmask syntax for some Intel metricsAndi Kleen
Some of the metrics use an incorrect syntax for specifying the cmask for an event. Convert to perf syntax so that they can be resolved. Fixes metrics on Broadwell, SandyBridge. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/n/tip-3k3fkfj8obek9dkmryyrqzhu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23perf tools: Do not check ABI headers in a detached tarball buildArnaldo Carvalho de Melo
When we use one of: [acme@jouet linux]$ make help | grep perf perf-tar-src-pkg - Build perf-4.14.0-rc3.tar source tarball perf-targz-src-pkg - Build perf-4.14.0-rc3.tar.gz source tarball perf-tarbz2-src-pkg - Build perf-4.14.0-rc3.tar.bz2 source tarball perf-tarxz-src-pkg - Build perf-4.14.0-rc3.tar.xz source tarball [acme@jouet linux]$ I.e. when we create a detached tarball to build perf outside outside the enveloping kernel sources (from a kernel tarball or a checked out linux.git directory) we by definition can't check for differences among the tools/{include,arch}, etc files we originally copied from the kernel, so bail out in that case, to avoid warnings when doing the detached builds. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vbrga0mhplv7niwxr3ghjyxv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-23Merge tag 'platform-drivers-x86-v4.14-3' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Use a spin_lock instead of mutex in atomic context. The devm_ fix is a dependency. Summary: intel_pmc_ipc: - Use spin_lock to protect GCR updates - Use devm_* calls in driver probe function" * tag 'platform-drivers-x86-v4.14-3' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: intel_pmc_ipc: Use spin_lock to protect GCR updates platform/x86: intel_pmc_ipc: Use devm_* calls in driver probe function
2017-10-23platform/x86: intel_pmc_ipc: Use spin_lock to protect GCR updatesKuppuswamy Sathyanarayanan
Currently, update_no_reboot_bit() function implemented in this driver uses mutex_lock() to protect its register updates. But this function is called with in atomic context in iTCO_wdt_start() and iTCO_wdt_stop() functions in iTCO_wdt.c driver, which in turn causes "sleeping into atomic context" issue. This patch fixes this issue by replacing the mutex_lock() with spin_lock() to protect the GCR read/write/update APIs. Fixes: 9d855d4 ("platform/x86: intel_pmc_ipc: Fix iTCO_wdt GCS memory mapping failure") Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kupuswamy@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>