summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
AgeCommit message (Collapse)Author
2015-10-05perf callchain: Switch default to 'graph,0.5,caller'Arnaldo Carvalho de Melo
Which is the most common default found in other similar tools. Requested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chandler Carruth <chandlerc@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://www.youtube.com/watch?v=nXaxk27zwlk Link: http://lkml.kernel.org/n/tip-v8lq36aispvdwgxdmt9p9jd9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05perf tools: Handle -h and -v optionsJiri Olsa
Adding handling for '-h' and '-v' options to invoke help and version command respectively. Current behaviour is: $ perf -v Unknown option: -v Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] $ perf -h Unknown option: -h Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] New behaviour: $ perf -h usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code archive Create archive with object files with build-ids found in perf.data file bench General framework for benchmark suites ... $ perf -v perf version 4.3.rc3.gc99e32 Updated man page. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1444068369-20978-10-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05perf tools: Introduce 'P' modifier to request max precisionJiri Olsa
The 'P' will cause the event to get maximum possible detected precise level. Following record: $ perf record -e cycles:P ... will detect maximum precise level for 'cycles' event and use it. Commiter note: Testing it: $ perf record -e cycles:P usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.013 MB perf.data (9 samples) ] $ perf evlist cycles:P $ perf evlist -v cycles:P: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, mmap2: 1, comm_exec: 1 $ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1444068369-20978-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02perf stat: Reduce min --interval-print to 10msKan Liang
The --interval-print parameter was limited to 100ms. However, for example, 10ms is required to do sophisticated bandwidth analysis using uncore events. The test shows that the overhead of the system-wide uncore monitoring with 10ms interval is only ~2%. So this patch reduces the minimal interval-print allowd to 10ms. But 10ms may not work well for all cases. For example, when the cpus/threads number is very large, for system-wide core event monitoring the overhead could be high. To handle this issue, a warning will be displayed when the interval-print is set between 10ms to 100ms. So users can make a decision according to their specific cases. # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1 print interval < 100ms. The overhead percentage could be high in some cases. Please proceed with caution. # time counts unit events 0.010200451 0.10 MiB uncore_imc_1/cas_count_read/ 0.020475117 0.02 MiB uncore_imc_1/cas_count_read/ 0.030692800 0.01 MiB uncore_imc_1/cas_count_read/ 0.040948161 0.02 MiB uncore_imc_1/cas_count_read/ 0.051159564 0.00 MiB uncore_imc_1/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com [ Added warning about overhead when using sub 100ms intervals to the man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-01perf list: Do event name substring search as last resort when no events foundArnaldo Carvalho de Melo
Before: # perf list _alloc_ | head -10 # After: # perf list _alloc_ | head -10 ext4:ext4_alloc_da_blocks [Tracepoint event] ext4:ext4_get_implied_cluster_alloc_exit [Tracepoint event] kmem:kmem_cache_alloc_node [Tracepoint event] kmem:mm_page_alloc_extfrag [Tracepoint event] kmem:mm_page_alloc_zone_locked [Tracepoint event] xen:xen_mmu_alloc_ptpage [Tracepoint event] # And it works for all types of events: # perf list br List of pre-defined events (to be used in -e): branch-instructions OR branches [Hardware event] branch-misses [Hardware event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] branch-misses OR cpu/branch-misses/ [Kernel PMU event] filelock:break_lease_block [Tracepoint event] filelock:break_lease_noblock [Tracepoint event] filelock:break_lease_unblock [Tracepoint event] syscalls:sys_enter_brk [Tracepoint event] syscalls:sys_exit_brk [Tracepoint event] # Suggested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qieivl18jdemoaghgndj36e6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30perf report: Amend documentation about max_stack and synthesized callchainsAdrian Hunter
The --max_stack option was added as an optimization to reduce processing time, so people specifying --max-stack might get a increased processing time if combined with synthesized callchains, but otherwise no real harm. A warning about setting both --max_stack and the synthesized callchains max depth seems like overkill. Amend the documentation. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/560A5155.4060105@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf intel-pt: Add mispred-all config option to aid use with autofdoAdrian Hunter
autofdo incorrectly expects branch flags to include either mispred or predicted. In fact mispred = predicted = 0 is valid and means the flags are not supported, which they aren't by Intel PT. To make autofdo work, add a config option which will cause Intel PT decoder to set the mispred flag on all branches. Below is an example of using Intel PT with autofdo. The example is also added to the Intel PT documentation. It requires autofdo (https://github.com/google/autofdo) and gcc version 5. The bubble sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial) amended to take the number of elements as a parameter. $ gcc-5 -O3 sort.c -o sort_optimized $ ./sort_optimized 30000 Bubble sorting array of 30000 elements 2254 ms $ cat ~/.perfconfig [intel-pt] mispred-all $ perf record -e intel_pt//u ./sort 3000 Bubble sorting array of 3000 elements 58 ms [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 3.939 MB perf.data ] $ perf inject -i perf.data -o inj --itrace=i100usle --strip $ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo $ ./sort_autofdo 30000 Bubble sorting array of 30000 elements 2155 ms Note there is currently no advantage to using Intel PT instead of LBR, but that may change in the future if greater use is made of the data. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf inject: Add --strip option to strip out non-synthesized eventsAdrian Hunter
Add a new option --strip which is used with --itrace to strip out non-synthesized events. This results in a perf.data file that is simpler for external tools to parse. In particular, this can be used to prepare a perf.data file for consumption by autofdo. A subsequent patch makes a change to Intel PT also to enable use with autofdo and gives an example of that use. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-25-git-send-email-adrian.hunter@intel.com [ Made it use perf_evlist__remove() + perf_evsel__delete() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf intel-pt: Support generating branch stackAdrian Hunter
Add support for generating branch stack context for PT samples. The decoder reports a configurable number of branches as branch context for each sample. Internally it keeps track of them by using a simple sliding window. We also flush the last branch buffer on each sample to avoid overlapping intervals. This is useful for: - Reporting accurate basic block edge frequencies through the perf report branch view - Using with --branch-history to get the wider context of samples - Other users of LBRs Also the Documentation is updated. Examples: Record with Intel PT: perf record -e intel_pt//u ls Branch stacks are used by default if synthesized so: perf report --itrace=ile is the same as: perf report --itrace=ile -b Branch history can be requested also: perf report --itrace=igle --branch-history Based-on-patch-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-15-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf auxtrace: Add option to synthesize branch stacks on samplesAdrian Hunter
Add AUX area tracing option 'l' to synthesize branch stacks on samples just like sample type PERF_SAMPLE_BRANCH_STACK. This is taken into use by Intel PT in a subsequent patch. Based-on-patch-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-9-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf script: Allow time to be displayed in nanosecondsAdrian Hunter
Add option --ns to display time to 9 decimal places. That is useful in some cases, for example when using Intel PT cycle accurate mode. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-6-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28perf auxtrace: Fix 'instructions' period of zeroAdrian Hunter
Instruction tracing options (i.e. --itrace) include an option for sampling instructions at an arbitrary period. e.g. --itrace=i10us means make an 'instructions' sample for every 10us of trace. Currently the logic does not distinguish between a period of zero and no period being specified at all, so it gets treated as the default period which is 100000. That doesn't really make sense. Fix it so that zero period is accepted and treated as meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-2-git-send-email-adrian.hunter@intel.com [ Add a few lines describing this in the Documentation/intel-pt.txt file ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28Merge branch 'linus' into perf/core, to pick up fixes before applying new ↵Ingo Molnar
changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-25perf intel-pt: Remove no_force_psb from documentationAdrian Hunter
no_force_psb was dropped as a late change to the kernel driver. Consequently, remove it from the documentation. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443089122-19082-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-14perf report: Introduce --socket-filter optionKan Liang
Introduce --socket-filter option for 'perf report' to only show entries for a processor socket that match this filter. $ perf report --socket-filter 1 --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 752 of event 'cycles' # Event count (approx.): 350995599 # Processor Socket: 1 # # Overhead Command Shared Object Symbol # ........ ......... ................ ................................. # 97.02% test test [.] plusB_c 0.97% test test [.] plusA_c 0.23% swapper [kernel.vmlinux] [k] acpi_idle_do_entry 0.09% rcu_sched [kernel.vmlinux] [k] dyntick_save_progress_counter 0.01% swapper [kernel.vmlinux] [k] task_waking_fair 0.00% swapper [kernel.vmlinux] [k] run_timer_softirq Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1441377946-44429-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-14perf tools: Introduce new sort type "socket" for the processor socketKan Liang
This patch enable perf report to sort by processor socket: $ perf report --stdio --sort socket,comm,dso,symbol # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 686 of event 'cycles' # Event count (approx.): 349215462 # # Overhead SOCKET Command Shared Object Symbol # ........ ...... ....... ................ ............................ # 97.05% 000 test test [.] plusB_c 0.98% 000 test test [.] plusA_c 0.93% 001 perf [kernel.vmlinux] [k] smp_call_function_single 0.19% 001 perf [kernel.vmlinux] [k] page_fault 0.19% 001 swapper [kernel.vmlinux] [k] pm_qos_request 0.16% 000 test [kernel.vmlinux] [k] add_mm_counter_fast Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1441377946-44429-2-git-send-email-kan.liang@intel.com [ Fix col calc, un-allcapsify col header & read the topology when not using perf.data ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-31perf record: Add ability to name registers to recordStephane Eranian
This patch modifies the -I/--int-regs option to enablepassing the name of the registers to sample on interrupt. Registers can be specified by their symbolic names. For instance on x86, --intr-regs=ax,si. The motivation is to reduce the size of the perf.data file and the overhead of sampling by only collecting the registers useful to a specific analysis. For instance, for value profiling, sampling only the registers used to passed arguements to functions. With no parameter, the --intr-regs still records all possible registers based on the architecture. To name registers, it is necessary to use the long form of the option, i.e., --intr-regs: $ perf record --intr-regs=si,di,r8,r9 ..... To record any possible registers: $ perf record -I ..... $ perf report --intr-regs ... To display the register, one can use perf report -D To list the available registers: $ perf record --intr-regs=\? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-31perf script: Enable printing of interrupted machine stateStephane Eranian
This patch adds the output of the interrupted machine state (iregs) to perf script. It presents them as NAME:VALUE so this is easy to parse during post processing. To capture the interrupted machine state: $ perf record -I .... to display iregs, use the -F option: $ perf script -F ip,iregs 40afc2 AX:0x6c5770 BX:0x1e CX:0x5f4d80a DX:0x101010101010101 SI:0x1 Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1441039273-16260-2-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-28perf script: Add --[no-]-demangle/--[no-]-demangle-kernelMark Drayton
Sometimes when post-processing output from `perf script` one does not want to demangle C++ symbol names. Add an option to allow this. Also add --[no-]demangle-kernel to be consistent with top/report/probe. Signed-off-by: Mark Drayton <mbd@fb.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1440616695-32340-1-git-send-email-scientist@fb.com Signed-off-by: Yannick Brosseau <scientist@fb.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-24perf tools: Update Intel PT documentationAdrian Hunter
Update Intel PT documentation to describe new features. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1437150840-31811-26-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-21perf tools: Put itrace options into an asciidoc includeAdrian Hunter
perf script, report and inject all have the same itrace options. Put them into an asciidoc include file. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1437150840-31811-10-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-21perf tools: Add Intel BTS supportAdrian Hunter
Intel BTS support fits within the new auxtrace infrastructure. Recording is supporting by identifying the Intel BTS PMU, parsing options and setting up events. Decoding is supported by queuing up trace data by thread and then decoding synchronously delivering synthesized event samples into the session processing for tools to consume. Committer note: E.g: [root@felicio ~]# perf record --per-thread -e intel_bts// ls anaconda-ks.cfg apctest.output bin kernel-rt-3.10.0-298.rt56.171.el7.x86_64.rpm libexec lock_page.bpf.c perf.data perf.data.old [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 4.367 MB perf.data ] [root@felicio ~]# perf evlist -v intel_bts//: type: 6, size: 112, { sample_period, sample_freq }: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1 dummy:u: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1 [root@felicio ~]# perf script # the navigate in the pager to some interesting place: ls 1843 1 branches: ffffffff810a60cb flush_signal_handlers ([kernel.kallsyms]) => ffffffff8121a522 setup_new_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8121a529 setup_new_exec ([kernel.kallsyms]) => ffffffff8122fa30 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fa5d do_close_on_exec ([kernel.kallsyms]) => ffffffff81767ae0 _raw_spin_lock ([kernel.kallsyms]) ls 1843 1 branches: ffffffff81767af4 _raw_spin_lock ([kernel.kallsyms]) => ffffffff8122fa62 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fac9 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fad2 do_close_on_exec ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8122fadd do_close_on_exec ([kernel.kallsyms]) => ffffffff8120fc80 filp_close ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8120fcaf filp_close ([kernel.kallsyms]) => ffffffff8120fcb6 filp_close ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8120fcc2 filp_close ([kernel.kallsyms]) => ffffffff812547f0 dnotify_flush ([kernel.kallsyms]) ls 1843 1 branches: ffffffff81254823 dnotify_flush ([kernel.kallsyms]) => ffffffff8120fcc7 filp_close ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8120fccd filp_close ([kernel.kallsyms]) => ffffffff81261790 locks_remove_posix ([kernel.kallsyms]) ls 1843 1 branches: ffffffff812617a3 locks_remove_posix ([kernel.kallsyms]) => ffffffff812617b9 locks_remove_posix ([kernel.kallsyms]) ls 1843 1 branches: ffffffff812617b9 locks_remove_posix ([kernel.kallsyms]) => ffffffff8120fcd2 filp_close ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8120fcd5 filp_close ([kernel.kallsyms]) => ffffffff812142c0 fput ([kernel.kallsyms]) ls 1843 1 branches: ffffffff812142d6 fput ([kernel.kallsyms]) => ffffffff812142df fput ([kernel.kallsyms]) ls 1843 1 branches: ffffffff8121430c fput ([kernel.kallsyms]) => ffffffff810b6580 task_work_add ([kernel.kallsyms]) ls 1843 1 branches: ffffffff810b65ad task_work_add ([kernel.kallsyms]) => ffffffff810b65b1 task_work_add ([kernel.kallsyms]) ls 1843 1 branches: ffffffff810b65c1 task_work_add ([kernel.kallsyms]) => ffffffff810bc710 kick_process ([kernel.kallsyms]) ls 1843 1 branches: ffffffff810bc725 kick_process ([kernel.kallsyms]) => ffffffff810bc742 kick_process ([kernel.kallsyms]) ls 1843 1 branches: ffffffff810bc742 kick_process ([kernel.kallsyms]) => ffffffff810b65c6 task_work_add ([kernel.kallsyms]) ls 1843 1 branches: ffffffff810b65c9 task_work_add ([kernel.kallsyms]) => ffffffff81214311 fput ([kernel.kallsyms]) Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1437150840-31811-9-git-send-email-adrian.hunter@intel.com [ Merged sample->time fix for bug found after first round of testing on slightly older kernel ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-17perf tools: Take Intel PT into useAdrian Hunter
To record an AUX area, the weak function auxtrace_record__init() must be implemented. Equally to decode an AUX area, the AUX area tracing type must be added to the perf_event__process_auxtrace_info() function. This patch makes those two changes plus hooks up default config for the intel_pt PMU. Also some brief documentation is provided for using the tools with intel_pt. Commiter note: E.g: [root@perf4 ~]# dmesg 451 [0.405807] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver. [root@perf4 ~]# perf --version perf version 4.1.g53874a [root@perf4 ~]# perf record -e intel_pt//u -a sleep 10 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.383 MB perf.data ] [root@perf4 ~]# perf evlist intel_pt//u sched:sched_switch dummy:u [root@perf4 ~]# perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 0 of event 'intel_pt//u' # Event count (approx.): 0 # # Overhead Command Shared Object Symbol # ........ ....... ............. ...... # # Samples: 393 of event 'sched:sched_switch' # Event count (approx.): 393 # # Overhead Command Shared Object Symbol # ........ .............. ................ .............. 49.62% swapper [kernel.vmlinux] [k] __schedule 10.69% rcu_sched [kernel.vmlinux] [k] __schedule 6.62% rcuos/0 [kernel.vmlinux] [k] __schedule 5.60% kworker/0:1 [kernel.vmlinux] [k] __schedule 3.56% rcuos/3 [kernel.vmlinux] [k] __schedule 3.05% kworker/u384:2 [kernel.vmlinux] [k] __schedule 2.54% kworker/2:0 [kernel.vmlinux] [k] __schedule 2.54% tuned [kernel.vmlinux] [k] __schedule <SNIP> # Samples: 0 of event 'dummy:u' # Event count (approx.): 0 # # Overhead Command Shared Object Symbol # ........ ....... ............. ...... # Samples: 28 of event 'instructions:u' # Event count (approx.): 5030172 # # Overhead Command Shared Object Symbol # ........ .......... ................... ................................ # 21.43% tuned libpython2.7.so.1.0 [.] PyEval_EvalFrameEx | ---PyEval_EvalFrameEx | |--83.33%-- PyEval_EvalCodeEx | PyEval_EvalFrameEx | | | |--60.00%-- PyEval_EvalCodeEx | | PyEval_EvalFrameEx | | PyEval_EvalFrameEx | | | --40.00%-- PyEval_EvalFrameEx | --16.67%-- PyEval_EvalFrameEx PyEval_EvalCodeEx PyEval_EvalFrameEx PyEval_EvalCodeEx PyEval_EvalFrameEx PyEval_EvalFrameEx 14.29% tuned libpython2.7.so.1.0 [.] _PyType_Lookup | ---_PyType_Lookup _PyObject_GenericGetAttrWithDict PyEval_EvalFrameEx PyEval_EvalCodeEx PyEval_EvalFrameEx PyEval_EvalCodeEx PyEval_EvalFrameEx | |--75.00%-- PyEval_EvalFrameEx | --25.00%-- PyEval_EvalCodeEx PyEval_EvalFrameEx PyEval_EvalFrameEx 3.57% irqbalance irqbalance [.] 0x0000000000004038 | ---0x4038 0x4761 0x4761 0x4761 0x49f1 0x2295 3.57% irqbalance libc-2.17.so [.] __GI_____strtoull_l_internal | ---__GI_____strtoull_l_internal 0x6f49 0x229a 3.57% irqbalance libc-2.17.so [.] __strchrnul | ---__strchrnul vfprintf __vsprintf_chk __sprintf_chk 0x2724 0x4038 0x2331 3.57% irqbalance libc-2.17.so [.] __strstr_sse42 | ---__strstr_sse42 0x71e0 0x229f # And now to some userspace ftrace on uninstrumented binaries 8-) : # Hand edited to make it a bit more compact, replacing /home/acme/bin/perf # with /bin/perf: [root@perf4 ~]# perf script perf 8921 [3] 7.310889: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310889: 1 branches:u: 481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310889: 1 branches:u: 481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310889: 1 branches:u: 4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310889: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310889: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.310889: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.310890: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310890: 1 branches:u: 481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310890: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310890: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.310890: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.310893: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310893: 1 branches:u: 4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310893: 1 branches:u: 4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310893: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310893: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.310893: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.310956: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310956: 1 branches:u: 481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310956: 1 branches:u: 481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310956: 1 branches:u: 4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310956: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310956: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.310956: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.310961: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310961: 1 branches:u: 481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310961: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310961: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.310961: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.310968: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310968: 1 branches:u: 4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310968: 1 branches:u: 4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310968: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.310968: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.310968: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.311040: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311040: 1 branches:u: 481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311040: 1 branches:u: 481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311040: 1 branches:u: 4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311040: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311040: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.311040: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.311046: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311046: 1 branches:u: 481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311046: 1 branches:u: 481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf) perf 8921 [3] 7.311046: 1 branches:u: 481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf) perf 8921 [3] 7.311046: 1 branches:u: 41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown]) perf 8921 [3] 7.311050: 1 branches:u: 0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so) perf 8921 [3] 7.311050: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf) : Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1437150840-31811-8-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-12perf report: Show call graph from reference eventsKan Liang
Introduce --show-ref-call-graph for perf report to print reference callgraph for no callgraph event. Here is an example. perf report --show-ref-call-graph --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 5 of event 'cpu/cpu-cycles,call-graph=fp/' # Event count (approx.): 144985 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................ ........................................ # 72.30% 0.00% sleep [kernel.vmlinux] [k] entry_SYSCALL_64_fastpath | ---entry_SYSCALL_64_fastpath | |--22.62%-- __GI___libc_nanosleep --77.38%-- [...] ...... # Samples: 6 of event 'cpu/instructions,call-graph=no/', show reference callgraph # Event count (approx.): 172780 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................ ........................................ # 73.16% 0.00% sleep [kernel.vmlinux] [k] entry_SYSCALL_64_fastpath | ---entry_SYSCALL_64_fastpath | |--31.44%-- __GI___libc_nanosleep --68.56%-- [...] Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1439289050-40510-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-12perf callchain: Allow disabling call graphs per eventKan Liang
This patch introduce "call-graph=no" to disable per-event callgraph. Here is an example. perf record -e 'cpu/cpu-cycles,call-graph=fp/,cpu/instructions,call-graph=no/' sleep 1 perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6 of event 'cpu/cpu-cycles,call-graph=fp/' # Event count (approx.): 774218 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................ ........................................ # 61.94% 0.00% sleep [kernel.vmlinux] [k] entry_SYSCALL_64_fastpath | ---entry_SYSCALL_64_fastpath | |--97.30%-- __brk | --2.70%-- mmap64 _dl_check_map_versions _dl_check_all_versions 61.94% 0.00% sleep [kernel.vmlinux] [k] perf_event_mmap | ---perf_event_mmap | |--97.30%-- do_brk | sys_brk | entry_SYSCALL_64_fastpath | __brk | --2.70%-- mmap_region do_mmap_pgoff vm_mmap_pgoff sys_mmap_pgoff sys_mmap entry_SYSCALL_64_fastpath mmap64 _dl_check_map_versions _dl_check_all_versions ...... # Samples: 6 of event 'cpu/instructions,call-graph=no/' # Event count (approx.): 359692 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................ ................................. # 89.03% 0.00% sleep [unknown] [.] 0xffff6598ffff6598 89.03% 0.00% sleep ld-2.17.so [.] _dl_resolve_conflicts 89.03% 0.00% sleep [kernel.vmlinux] [k] page_fault Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1439289050-40510-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-12perf callchain: Per-event type selection supportKan Liang
This patchkit adds the ability to set callgraph mode (fp, dwarf, lbr) per event. This in term can reduce sampling overhead and the size of the perf.data. Here is an example. perf record -e 'cpu/cpu-cycles,period=1000,call-graph=fp,time=1/,cpu/instructions,call-graph=lbr/' sleep 1 perf evlist -v cpu/cpu-cycles,period=1000,call-graph=fp,time=1/: type: 4, size: 112, config: 0x3c, { sample_period, sample_freq }: 1000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 cpu/instructions,call-graph=lbr/: type: 4, size: 112, config: 0xc0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1 Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1439289050-40510-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-10perf record: Support per-event freq termNamhyung Kim
Now perf can set per-event value of time and (sampling) period. But I guess most users like me just want to set frequency rather than period. So add the 'freq' term in the event parser. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1439102724-14079-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-10perf report: Add support for srcfile sort keyAndi Kleen
In some cases it's useful to characterize samples by file. This is useful to get a higher level categorization, for example to map cost to subsystems. Add a srcfile sort key to perf report. It builds on top of the existing srcline support. Commiter notes: E.g.: # perf record -F 10000 usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data (13 samples) ] [root@zoo ~]# perf report -s srcfile --stdio # Total Lost Samples: 0 # # Samples: 13 of event 'cycles' # Event count (approx.): 869878 # # Overhead Source File # ........ ........... 60.99% . 20.62% paravirt.h 14.23% rmap.c 4.04% signal.c 0.11% msr.h # The first line is collecting all the files for which srcfiles couldn't somehow get resolved to: # perf report -s srcfile,dso --stdio # Total Lost Samples: 0 # # Samples: 13 of event 'cycles' # Event count (approx.): 869878 # # Overhead Source File Shared Object # ........ ........... ................ 40.97% . ld-2.20.so 20.62% paravirt.h [kernel.vmlinux] 20.02% . libc-2.20.so 14.23% rmap.c [kernel.vmlinux] 4.04% signal.c [kernel.vmlinux] 0.11% msr.h [kernel.vmlinux] # XXX: Investigate why that is not resolving on Fedora 21, Andi says he hasn't seen this on Fedora 22. Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1438988064-21834-1-git-send-email-andi@firstfloor.org [ Added column length update, from 0e65bdb3f90f ('perf hists: Update the column width for the "srcline" sort key') ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-10perf tools: Support full source file paths for srclineAndi Kleen
For perf report/script srcline currently only the base file name of the source file is printed. This is a good default because it usually fits on the screen. But in some cases we want to know the full file name, for example to aggregate hits per file. In the later case we need more than the base file name to resolve file naming collisions: for example the kernel source has ~70 files named "core.c" It's also useful as input to post processing tools which want to point to the right file. Add a flag to allow full file name output. Add an option to perf report/script to enable this option. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1438986245-15191-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06perf top: Add branch annotation code to topAndi Kleen
Now that we can process branch data in annotate it makes sense to support enabling branch recording from top too. Most of the code needed for this is already in shared code with report. But we need to add: - The option parsing code (using shared code from the previous patch) - Document the options - Set up the IPC/cycles accounting state in the top session - Call the accounting code in the hist iter callback Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1437233094-12844-8-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-06perf tools: Add support for cycles, weight branch_info fieldAndi Kleen
cycles is a new branch_info field available on some CPUs that indicates the time deltas between branches in the LBR. Add a sort key and output code for the cycles to allow to display the basic block cycles individually in perf report. We also pass in the cycles for weight when LBRs are processed, which allows to get global and local weight, to get an estimate of the total cost. And also print the cycles information for perf report -D. I also added printing for the previously missing LBR flags (mispredict etc.) Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1437233094-12844-2-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-05perf tools: Per-event time supportKan Liang
This patchkit adds the ability to turn off time stamps per event. One usaful case for partial time is to work with per-event callgraph to enable "PEBS threshold > 1" (https://lkml.org/lkml/2015/5/10/196), which can significantly reduce the sampling overhead. The event samples with time stamps off will not be ordered. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1438677022-34296-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-29perf tools: Force period term to overload global settingsJiri Olsa
Currently the command line option settings beats the per event period settings: With no global settings, we get per-event configuration: $ perf record -e 'cpu/instructions,period=20000/' sleep 1 $ perf evlist -v ... { sample_period, sample_freq }: 20000 ... With 'c' option period setup, we get 'c' option value: $ perf record -e 'cpu/instructions,period=20000/' -c 1000 sleep 1 $ perf evlist -v ... { sample_period, sample_freq }: 1000 ... This patch makes the per-event settings overload the global 'c' option setup: $ perf record -e 'cpu/instructions,period=20000/' -c 1000 sleep 1 $ perf evlist -v ... { sample_period, sample_freq }: 20000 ... I think the making the per-event settings to overload any other config makes more sense than current state. However it breaks the current 'period' term handling, which might cause some noise.. so let's see ;-). Also fixing parse event tests with the new behaviour. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1438162936-59698-3-git-send-email-kan.liang@intel.com Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-23perf script: Add option --show-switch-eventsAdrian Hunter
Add option --show-switch-events to show switch events in a similar fashion to --show-task-events and --show-mmap-events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1437471846-26995-6-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-23perf record: Add option --switch-events to select PERF_RECORD_SWITCH eventsAdrian Hunter
Add an option to select PERF_RECORD_SWITCH events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1437471846-26995-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-20perf bench futex: Add lock_pi stresserDavidlohr Bueso
Allows a way of measuring low level kernel implementation of FUTEX_LOCK_PI and FUTEX_UNLOCK_PI. The program comes in two flavors: (i) single futex (default), all threads contend on the same uaddr. For the sake of the benchmark, we call into kernel space even when the lock is uncontended. The kernel will set it to TID, any waters that come in and contend for the pi futex will be handled respectively by the kernel. (ii) -M option for multiple futexes, each thread deals with its own futex. This is a trivial scenario and only measures kernel handling of 0->TID transition. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/1436259353.12255.78.camel@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-20perf record: Allow filtering perf's pid via --exclude-perfWang Nan
This patch allows 'perf record' to exclude events issued by perf itself by '--exclude-perf' option. Before this patch, when doing something like: # perf record -a -e syscalls:sys_enter_write <cmd> One could easily get result like this: # /tmp/perf report --stdio ... # Overhead Command Shared Object Symbol # ........ ....... .................. .................... # 99.99% perf libpthread-2.18.so [.] __write_nocancel 0.01% ls libc-2.18.so [.] write 0.01% sshd libc-2.18.so [.] write ... Where most events are generated by perf itself. A shell trick can be done to filter perf itself out: # cat << EOF > ./tmp > #!/bin/sh > exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10 > EOF # chmod a+x ./tmp # ./tmp However, doing so is user unfriendly. This patch extracts evsel iteration framework introduced by patch 'perf record: Apply filter to all events in a glob matching' into foreach_evsel_in_last_glob(), and makes exclude_perf() function append new filter expression to each evsel selected by a '-e' selector. To avoid losing filters if user pass '--filter' after '--exclude-perf', this patch uses perf_evsel__append_filter() in both case, instead of perf_evsel__set_filter() which removes old filter. As a side effect, now it is possible to use multiple '--filter' option for one selector. They are combinded with '&&'. Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1436513770-8896-2-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-14perf record: Document setting '-e pmu/period=N/' in man pageKan Liang
The 'period' param is not defined in /sys/bus/event_sources/devices/<pmu>/format/*, but can be used, document it. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1436345097-11113-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26perf stat: Introduce --per-thread optionJiri Olsa
Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf tools: Configurable per thread proc map processing time outKan Liang
The time out to limit the individual proc map processing was hard code to 500ms. This patch introduce a new option --proc-map-timeout to make the time limit configurable. Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ying Huang <ying.huang@intel.com> Link: http://lkml.kernel.org/r/1434549071-25611-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-10perf record: Amend option summariesPeter Zijlstra
Because there's too many options and I cannot read, I frequently get confused between -c and -P, and try to do things like: perf record -P 50000 -- foo Which does not work; try and make the option description slightly longer and hopefully less confusing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150610144850.GP19282@twins.programming.kicks-ass.net [ Do those changes on the man page as well ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-12perf tools: Document relation of per-thread event count featureNamhyung Kim
The 'perf record -s' and 'perf report -T' should be used together to see per-thread event counts. Document the relation of these commands. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1431184784-30525-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-08perf probe: Add --no-inlines option to avoid searching inline functionsMasami Hiramatsu
Add --no-inlines(--inlines) option to avoid searching inline functions. Searching all functions which matches glob pattern can take a long time and find a lot of inline functions. With this option perf-probe searches target on the non-inlined functions. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150508010333.24812.86568.stgit@localhost.localdomain Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-08perf bench futex: Support parallel waker threadsDavidlohr Bueso
The futex-wake benchmark only measures wakeups done within a single process. While this has value in its own, it does not really generate any hb->lock contention. A new benchmark 'wake-parallel' is added, by extending the futex-wake code such that we can measure parallel waker threads. The program output shows the avg per-thread latency in order to complete its share of wakeups: Run summary [PID 13474]: blocking on 512 threads (at [private] futex 0xa88668), 8 threads waking up 64 at a time. [Run 1]: Avg per-thread latency (waking 64/512 threads) in 0.6230 ms (+-15.31%) [Run 2]: Avg per-thread latency (waking 64/512 threads) in 0.5175 ms (+-29.95%) [Run 3]: Avg per-thread latency (waking 64/512 threads) in 0.7578 ms (+-18.03%) [Run 4]: Avg per-thread latency (waking 64/512 threads) in 0.8944 ms (+-12.54%) [Run 5]: Avg per-thread latency (waking 64/512 threads) in 1.1204 ms (+-23.85%) Avg per-thread latency (waking 64/512 threads) in 0.7826 ms (+-9.91%) Naturally, different combinations of numbers of blocking and waker threads will exhibit different information. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Link: http://lkml.kernel.org/r/1431110280-20231-1-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-08perf probe: Support $params special probe argumentMasami Hiramatsu
$params is similar to $vars but matches only function parameters not local variables. Thus, this is useful for tracing function parameter changing or tracing function call with parameters. Testing it: # perf probe tcp_sendmsg '$params' Added new event: probe:tcp_sendmsg (on tcp_sendmsg with $params) You can now use it in all perf tools, such as: perf record -e probe:tcp_sendmsg -aR sleep 1 # perf probe -l probe:tcp_sendmsg (on tcp_sendmsg@acme/git/linux/net/ipv4/tcp.c with iocb sk msg size) # perf record -a -e probe:* press some random letters to generate TCP (sshd) traffic... ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.223 MB perf.data (6 samples) ] # perf script sshd 6385 [2] 3.907529: probe:tcp_sendmsg: iocb=0xffff8800ac4cfe70 sk=0xffff88042196c140 msg=0xffff8800ac4cfda8 size=0x24 sshd 6385 [2] 4.138973: probe:tcp_sendmsg: iocb=0xffff8800ac4cfe70 sk=0xffff88042196c140 msg=0xffff8800ac4cfda8 size=0x24 sshd 6385 [2] 4.378966: probe:tcp_sendmsg: iocb=0xffff8800ac4cfe70 sk=0xffff88042196c140 msg=0xffff8800ac4cfda8 size=0x24 sshd 6385 [2] 4.603681: probe:tcp_sendmsg: iocb=0xffff8800ac4cfe70 sk=0xffff88042196c140 msg=0xffff8800ac4cfda8 size=0x24 sshd 6385 [2] 4.818455: probe:tcp_sendmsg: iocb=0xffff8800ac4cfe70 sk=0xffff88042196c140 msg=0xffff8800ac4cfda8 size=0x24 sshd 6385 [2] 5.043603: probe:tcp_sendmsg: iocb=0xffff8800ac4cfe70 sk=0xffff88042196c140 msg=0xffff8800ac4cfda8 size=0x24 # cat /sys/kernel/debug/tracing/events/probe/tcp_sendmsg/format name: tcp_sendmsg ID: 1927 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned long __probe_ip; offset:8; size:8; signed:0; field:u64 iocb; offset:16; size:8; signed:0; field:u64 sk; offset:24; size:8; signed:0; field:u64 msg; offset:32; size:8; signed:0; field:u64 size; offset:40; size:8; signed:0; print fmt: "(%lx) iocb=0x%Lx sk=0x%Lx msg=0x%Lx size=0x%Lx", REC->__probe_ip, REC->iocb, REC->sk, REC->msg, REC->size # Do some system wide tracing of this probe + write syscalls: # perf trace -e write --ev probe:* --filter-pids 6385 462.612 (0.010 ms): bash/19153 write(fd: 1</dev/pts/1>, buf: 0x7f7556c78000, count: 29 ) = 29 462.701 (0.027 ms): sshd/19152 write(fd: 3<socket:[63117]>, buf: 0x7f78dd12e160, count: 68 ) ... 462.701 ( ): probe:tcp_sendmsg:(ffffffff8163db30) iocb=0xffff8803ebec7e70 sk=0xffff88042196ab80 msg=0xffff8803ebec7da8 size=0x44) 462.710 (0.035 ms): sshd/19152 ... [continued]: write()) = 68 462.787 (0.009 ms): bash/19153 write(fd: 2</dev/pts/1>, buf: 0x7f7556c77000, count: 22 ) = 22 462.865 (0.002 ms): sshd/19152 write(fd: 3<socket:[63117]>, buf: 0x7f78dd12e160, count: 68 ) ... 462.865 ( ): probe:tcp_sendmsg:(ffffffff8163db30) iocb=0xffff8803ebec7e70 sk=0xffff88042196ab80 msg=0xffff8803ebec7da8 size=0x44) 462.873 (0.010 ms): sshd/19152 ... [continued]: write()) = 68 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150506124653.4961.59806.stgit@localhost.localdomain [ Add some examples to the changelog message showing how to use it ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-05perf probe: Accept filter argument for --funcsMasami Hiramatsu
This allows the user to pass the filter pattern directly to the --funcs option as below: ---- # ./perf probe -F *kmalloc __kmalloc devm_kmalloc mempool_kmalloc sg_kmalloc sock_kmalloc ---- We previously needed to use the --filter option for that. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150505022950.23399.22435.stgit@localhost.localdomain Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-05perf record: Add AUX area tracing Snapshot Mode supportAdrian Hunter
Add a new option and support for Instruction Tracing Snapshot Mode. When the new option is selected, no AUX area tracing data is captured until a signal (SIGUSR2) is received. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1430404667-10593-10-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-05perf auxtrace: Add option to synthesize events for transactionsAdrian Hunter
Add AUX area tracing option 'x' to synthesize events for transactions. This will be used by Intel PT to synthesize an event record for each TSX start, commit or abort. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1430404667-10593-6-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-04perf report: Fix placement of itrace option in documentationAdrian Hunter
Unwittingly the itrace options for perf report ended up below the Overhead Calculation section. Move it back with the other options. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1430404667-10593-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-04perf kmem: Add --live option for current allocation statNamhyung Kim
Currently 'perf kmem stat --page' shows total (page) allocation stat by default, but sometimes one might want to see live (total alloc-only) requests/pages only. The new --live option does this by subtracting freed allocation from the stat. E.g.: # perf kmem stat --page SUMMARY (page allocator) ======================== Total allocation requests : 988,858 [ 4,045,368 KB ] Total free requests : 886,484 [ 3,624,996 KB ] Total alloc+freed requests : 885,969 [ 3,622,628 KB ] Total alloc-only requests : 102,889 [ 422,740 KB ] Total free-only requests : 515 [ 2,368 KB ] Total allocation failures : 0 [ 0 KB ] Order Unmovable Reclaimable Movable Reserved CMA/Isolated ----- ------------ ------------ ------------ ------------ ------------ 0 172,173 3,083 806,686 . . 1 284 . . . . 2 6,124 58 . . . 3 114 335 . . . 4 . . . . . 5 . . . . . 6 . . . . . 7 . . . . . 8 . . . . . 9 . . 1 . . 10 . . . . . # perf kmem stat --page --live SUMMARY (page allocator) ======================== Total allocation requests : 988,858 [ 4,045,368 KB ] Total free requests : 886,484 [ 3,624,996 KB ] Total alloc+freed requests : 885,969 [ 3,622,628 KB ] Total alloc-only requests : 102,889 [ 422,740 KB ] Total free-only requests : 515 [ 2,368 KB ] Total allocation failures : 0 [ 0 KB ] Order Unmovable Reclaimable Movable Reserved CMA/Isolated ----- ------------ ------------ ------------ ------------ ------------ 0 2,214 3,025 97,156 . . 1 59 . . . . 2 19 58 . . . 3 23 335 . . . 4 . . . . . 5 . . . . . 6 . . . . . 7 . . . . . 8 . . . . . 9 . . . . . 10 . . . . . # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Pekka Enberg <penberg@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1429592107-1807-4-git-send-email-namhyung@kernel.org [ Added examples to the changeset log ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>