summaryrefslogtreecommitdiff
path: root/tools/perf/builtin-record.c
AgeCommit message (Collapse)Author
2019-05-15perf parse-regs: Split parse_regsKan Liang
The available registers for --int-regs and --user-regs may be different, e.g. XMM registers. Split parse_regs into two dedicated functions for --int-regs and --user-regs respectively. Modify the warning message. "--user-regs=?" should be applied to show the available registers for --user-regs. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1557865174-56264-1-git-send-email-kan.liang@linux.intel.com [ Changed docs as suggested by Ravi and agreed by Kan ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15perf record: Implement -z,--compression_level[=<n>] optionAlexey Budankov
Implemented -z,--compression_level[=<n>] option that enables compression of mmaped kernel data buffers content in runtime during perf record mode collection. Default option value is 1 (fastest compression). Compression overhead has been measured for serial and AIO streaming when profiling matrix multiplication workload: ------------------------------------------------------------- | SERIAL | AIO-1 | ----------------------------------------------------------------| |-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) | |---------------------------------------------------------------| | 0 | 1,00 | 1,000 179,424 | 1,00 | 1,000 187,527 | | 1 | 1,04 | 8,427 181,148 | 1,01 | 8,474 188,562 | | 2 | 1,07 | 8,055 186,953 | 1,03 | 7,912 191,773 | | 3 | 1,04 | 8,283 181,908 | 1,03 | 8,220 191,078 | | 5 | 1,09 | 8,101 187,705 | 1,05 | 7,780 190,065 | | 8 | 1,05 | 9,217 179,191 | 1,12 | 6,111 193,024 | ----------------------------------------------------------------- OVH = (Execution time with -z N) / (Execution time with -z 0) ratio - compression ratio size - number of bytes that was compressed size ~= trace size x ratio Committer notes: Testing it I noticed that it failed to disable build id processing when compression is enabled, and as we'd have to uncompress everything to look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids to read from DSOs, we better disable build id processing when compression is enabled, logging with pr_debug() when doing so: Original patch: # perf record -z2 ^C[ perf record: Woken up 1 times to write data ] 0x1746e0 [0x76]: failed to process type: 81 [Invalid argument] [ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ] # After auto-disabling build id processing when compression is enabled: $ perf record -z2 sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ] $ perf record -v -z2 sleep 1 Compression enabled, disabling build id collection at the end of the session. <SNIP extra -v pr_debug() messages> [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ] $ Also, with parts of the patch originally after this one moved to just before this one we get: $ perf record -z2 sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ] $ perf report -D | grep COMPRESS 0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled! 0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled! COMPRESSED events: 2 COMPRESSED events: 0 $ I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code to process, we just show it as not being handled, skip them and continue, while before we had: $ perf report -D | grep COMPRESS 0x1b8 [0x169]: failed to process type: 81 [Invalid argument] Error: failed to process sample 0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED $ Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15perf record: Implement compression for AIO trace streamingAlexey Budankov
Compression is implemented using the functions from zstd.c. As the memory to operate on the compression uses mmap->aio.data[] buffers. If Zstd streaming compression API fails for some reason the data to be compressed are just copied into the memory buffers using plain memcpy(). Compressed trace frame consists of an array of PERF_RECORD_COMPRESSED records. Each element of the array is not longer that PERF_SAMPLE_MAX_SIZE and consists of perf_event_header followed by the compressed chunk that is decompressed on the loading stage. perf_mmap__aio_push() is replaced by perf_mmap__push() which is now used in the both serial and AIO streaming cases. perf_mmap__push() is extended with positive return values to signify absence of data ready for processing. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/77db2b2c-5d03-dbb0-aeac-c4dd92129ab9@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15perf record: Implement compression for serial trace streamingAlexey Budankov
Compression is implemented using the functions from zstd.c. As the memory to operate on the compression uses mmap->data buffer. If Zstd streaming compression API fails for some reason the data to be compressed are just copied into the memory buffers using plain memcpy(). Compressed trace frame consists of an array of PERF_RECORD_COMPRESSED records. Each element of the array is not longer that PERF_SAMPLE_MAX_SIZE and consists of perf_event_header followed by the compressed chunk that is decompressed on the loading stage. Comitter notes: Undo some unnecessary line breaks, remove some unnecessary () around zstd_data to then just get its address, and fix conflicts with BPF_PROG_INFO/BPF_BTF patchkits. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/744df43f-3932-2594-ddef-1e99a3cad03a@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15perf mmap: Implement dedicated memory buffer for data compressionAlexey Budankov
Implemented mmap data buffer that is used as the memory to operate on when compressing data in case of serial trace streaming. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/49b31321-0f70-392b-9a4f-649d3affe090@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15perf record: Implement COMPRESSED event record and its attributesAlexey Budankov
Implemented PERF_RECORD_COMPRESSED event, related data types, header feature and functions to write, read and print feature attributes from the trace header section. comp_mmap_len preserves the size of mmaped kernel buffer that was used during collection. comp_mmap_len size is used on loading stage as the size of decomp buffer for decompression of COMPRESSED events content. Committer notes: Fixed up conflict with BPF_PROG_INFO and BTF_BTF header features. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ebbaf031-8dda-3864-ebc6-7922d43ee515@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15perf session: Define 'bytes_transferred' and 'bytes_compressed' metricsAlexey Budankov
Define 'bytes_transferred' and 'bytes_compressed' metrics to calculate ratio in the end of the data collection: compression ratio = bytes_transferred / bytes_compressed The 'bytes_transferred' metric accumulates the amount of bytes that was extracted from the mmaped kernel buffers for compression, while 'bytes_compressed' accumulates the amount of bytes that was received after applying compression. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1d4bf499-cb03-26dc-6fc6-f14fec7622ce@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15perf record: Fix suggestion to get list of registers usable with --user-regs ↵Arnaldo Carvalho de Melo
and --intr-regs $ perf record -h -I Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use -I ? to list register names $ m $ perf record -I ? Workload failed: No such file or directory $ After: $ perf record -h -I Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use '-I?' to list register names $ $ perf record -I? available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15 Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use '-I?' to list register names $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Fixes: bcc84ec65ad1 ("perf record: Add ability to name registers to record") Link: https://lkml.kernel.org/n/tip-r0xhfhy5radmkhhcbcfs5izf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-04-01perf record: Implement --mmap-flush=<number> optionAlexey Budankov
Implement a --mmap-flush option that specifies minimal number of bytes that is extracted from mmaped kernel buffer to store into a trace. The default option value is 1 byte what means every time trace writing thread finds some new data in the mmaped buffer the data is extracted, possibly compressed and written to a trace. $ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc $ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc The option is independent from -z setting, doesn't vary with compression level and can serve two purposes. The first purpose is to increase the compression ratio of a trace data. Larger data chunks are compressed more effectively so the implemented option allows specifying data chunk size to compress. Also at some cases executing more write syscalls with smaller data size can take longer than executing less write syscalls with bigger data size due to syscall overhead so extracting bigger data chunks specified by the option value could additionally decrease runtime overhead. The second purpose is to avoid self monitoring live-lock issue in system wide (-a) profiling mode. Profiling in system wide mode with compression (-a -z) can additionally induce data into the kernel buffers along with the data from monitored processes. If performance data rate and volume from the monitored processes is high then trace streaming and compression activity in the tool is also high. High tool process activity can lead to subtle live-lock effect when compression of single new byte from some of mmaped kernel buffer leads to generation of the next single byte at some mmaped buffer. So perf tool process ends up in endless self monitoring. Implemented synch parameter is the mean to force data move independently from the specified flush threshold value. Despite the provided flush value the tool needs capability to unconditionally drain memory buffers, at least in the end of the collection. Committer testing: Running with the default value, i.e. as soon as there is something to read go on consuming, we first write the synthesized events, small chunks of about 128 bytes: # perf trace -m 2048 --call-graph dwarf -e write -- perf record <SNIP> 101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) process_synthesized_event (/home/acme/bin/perf) perf_tool__process_synth_event (inlined) perf_event__synthesize_mmap_events (/home/acme/bin/perf) Then we move to reading the mmap buffers consuming the events put there by the kernel perf infrastructure: 107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336 __libc_write (/usr/lib64/libpthread-2.28.so) ion (/home/acme/bin/perf) record__write (inlined) record__pushfn (/home/acme/bin/perf) perf_mmap__push (/home/acme/bin/perf) record__mmap_read_evlist (inlined) record__mmap_read_all (inlined) __cmd_record (inlined) cmd_record (/home/acme/bin/perf) 12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984 <SNIP same backtrace as in the 107.561 timestamp> 12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816 <SNIP same backtrace as in the 107.561 timestamp> 12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832 <SNIP same backtrace as in the 107.561 timestamp> If we limit it to write only when more than 16MB are available for reading, it throttles that to a quarter of the --mmap-pages set for 'perf record', which by default get to 528384 bytes, found out using 'record -v': mmap flush: 132096 mmap size 528384B With that in place all the writes coming from record__mmap_read_evlist(), i.e. from the mmap buffers setup by the kernel perf infrastructure were at least 132096 bytes long. Trying with a bigger mmap size: perf trace -e write perf record -v -m 2048 --mmap-flush 16M 74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888 74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256 74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232 74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032 74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520 74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688 74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760 Was again limited to a quarter of the mmap size: mmap flush: 2098176 mmap size 8392704B A warning about that would be good to have but can be added later, something like: "max flush is a quarter of the mmap size, if wanting to bump the mmap flush further, bump the mmap size as well using -m/--mmap-pages" Also rename the 'sync' parameters to 'synch' to keep tools/perf building with older glibcs: cc1: warnings being treated as errors builtin-record.c: In function 'record__mmap_read_evlist': builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here builtin-record.c: In function 'record__mmap_read_all': builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration /usr/include/unistd.h:933: warning: shadowed declaration is here Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-21perf tools: Save bpf_prog_info and BTF of new BPF programsSong Liu
To fully annotate BPF programs with source code mapping, 4 different information are needed: 1) PERF_RECORD_KSYMBOL 2) PERF_RECORD_BPF_EVENT 3) bpf_prog_info 4) btf This patch handles 3) and 4) for BPF programs loaded after 'perf record|top'. For timely process of these information, a dedicated event is added to the side band evlist. When PERF_RECORD_BPF_EVENT is received via the side band event, the polling thread gathers 3) and 4) vis sys_bpf and store them in perf_env. This information is saved to perf.data at the end of 'perf record'. Committer testing: The 'wakeup_watermark' member in 'struct perf_event_attr' is inside a unnamed union, so can't be used in a struct designated initialization with older gccs, get it out of that, isolating as 'attr.wakeup_watermark = 1;' to work with all gcc versions. We also need to add '--no-bpf-event' to the 'perf record' perf_event_attr tests in 'perf test', as the way that that test goes is to intercept the events being setup and looking if they match the fields described in the control files, since now it finds first the side band event used to catch the PERF_RECORD_BPF_EVENT, they all fail. With these issues fixed: Same scenario as for testing BPF programs loaded before 'perf record' or 'perf top' starts, only start the BPF programs after 'perf record|top', so that its information get collected by the sideband threads, the rest works as for the programs loaded before start monitoring. Add missing 'inline' to the bpf_event__add_sb_event() when HAVE_LIBBPF_SUPPORT is not defined, fixing the build in systems without binutils devel files installed. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislav Fomichev <sdf@google.com> Link: http://lkml.kernel.org/r/20190312053051.2690567-16-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-21perf evlist: Introduce side band threadSong Liu
This patch introduces side band thread that captures extended information for events like PERF_RECORD_BPF_EVENT. This new thread uses its own evlist that uses ring buffer with very low watermark for lower latency. To use side band thread, we need to: 1. add side band event(s) by calling perf_evlist__add_sb_event(); 2. calls perf_evlist__start_sb_thread(); 3. at the end of perf run, perf_evlist__stop_sb_thread(). In the next patch, we use this thread to handle PERF_RECORD_BPF_EVENT. Committer notes: Add fix by Jiri Olsa for when te sb_tread can't get started and then at the end the stop_sb_thread() segfaults when joining the (non-existing) thread. That can happen when running 'perf top' or 'perf record' as a normal user, for instance. Further checks need to be done on top of this to more graciously handle these possible failure scenarios. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislav Fomichev <sdf@google.com> Link: http://lkml.kernel.org/r/20190312053051.2690567-15-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-19perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead ↵Song Liu
of perf_tool This patch changes the arguments of perf_event__synthesize_bpf_events() to include perf_session* instead of perf_tool*. perf_session will be used in the next patch. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/20190312053051.2690567-6-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-19perf record: Replace option --bpf-event with --no-bpf-eventSong Liu
Currently, monitoring of BPF programs through bpf_event is off by default for 'perf record'. To turn it on, the user need to use option "--bpf-event". As BPF gets wider adoption in different subsystems, this option becomes inconvenient. This patch makes bpf_event on by default, and adds option "--no-bpf-event" to turn it off. Since option --bpf-event is not released yet, it is safe to remove it. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: kernel-team@fb.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislav Fomichev <sdf@google.com> Link: http://lkml.kernel.org/r/20190312053051.2690567-2-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-19perf record: Clarify help for --switch-outputAndi Kleen
The help description for --switch-output looks like there are multiple comma separated fields. But it's actually a choice of different options. Make it clear and less confusing. Before: % perf record -h ... --switch-output[=<signal,size,time>] Switch output when receive SIGUSR2 or cross size,time threshold After: % perf record -h ... --switch-output[=<signal or size[BKMG] or time[smhd]>] Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 20190314225002.30108-4-andi@firstfloor.org Link: https://lkml.kernel.org/n/tip-9yecyuha04nyg8toyd1b2pgi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-19perf record: Allow to limit number of reported perf.data filesAndi Kleen
When doing long term recording and waiting for some event to snapshot on, we often only care about the last minute or so. The --switch-output command line option supports rotating the perf.data file when the size exceeds a threshold. But the disk would still be filled with unnecessary old files. Add a new option to only keep a number of rotated files, so that the disk space usage can be limited. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 20190314225002.30108-3-andi@firstfloor.org Link: https://lkml.kernel.org/n/tip-y5u2lik0ragt4vlktz6qc9ks@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11perf header: Add DIR_FORMAT feature to describe directory dataJiri Olsa
The data files layout is described by HEADER_DIR_FORMAT feature. Currently it holds only version number (1): uint64_t version; The current version holds only version value (1) means that data files: - Follow the 'data.*' name format. - Contain raw events data in standard perf format as read from kernel (and need to be sorted) Future versions are expected to describe different data files layout according to special needs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190308134745.5057-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11perf data: Don't store auxtrace index for directory data fileJiri Olsa
We can't store the auxtrace index when we store into multiple files, because we keep only offset for it, not the file. The auxtrace data will be processed correctly in the 'pipe' mode. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190308134745.5057-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-22perf data: Add global path holderJiri Olsa
Add a 'path' member to 'struct perf_data'. It will keep the configured path for the data (const char *). The path in struct perf_data_file is now dynamically allocated (duped) from it. This scheme is useful/used in following patches where struct perf_data::path holds the 'configure' directory path and struct perf_data_file::path holds the allocated path for specific files. Also it actually makes the code little simpler. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org [ Fixup data-convert-bt.c missing conversion ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-22perf data: Move size to struct perf_data_fileJiri Olsa
We are about to add support for multiple files, so we need each file to keep its size. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20190221094145.9151-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-11perf record: Implement --affinity=node|cpu optionAlexey Budankov
Implement --affinity=node|cpu option for the record mode defaulting to system affinity mask bouncing. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/083f5422-ece9-10dd-8305-bf59c860f10f@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf record: Apply affinity masks when reading mmap buffersAlexey Budankov
Build node cpu masks for mmap data buffers. Apply node cpu masks to tool thread every time it references data buffers cross node or cross cpu. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/b25e4ebc-078d-2c7b-216c-f0bed108d073@linux.intel.com [ Use cpu-set-sched.h to get the CPU_{EQUAL,OR}() fallbacks for older systems ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf record: Allocate affinity masksAlexey Budankov
Allocate affinity option and masks for mmap data buffers and record thread as well as initialize allocated objects. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/526fa2b0-07de-6dbd-a7e9-26ba875593c9@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf pmu: Remove set_drv_config APIMathieu Poirier
CoreSight was the only client of the PMU's set_drv_config() API. Now that it is no longer needed by CoreSight remove it from the code base. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Suzuki K Poulouse <suzuki.poulose@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-s390@vger.kernel.org Link: http://lkml.kernel.org/r/20190131184714.20388-8-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21perf tools: Synthesize PERF_RECORD_* for loaded BPF programsSong Liu
This patch synthesize PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT for BPF programs loaded before perf-record. This is achieved by gathering information about all BPF programs via sys_bpf. Committer notes: Fix the build on some older systems such as amazonlinux:1 where it was breaking with: util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog': util/bpf-event.c:52:9: error: missing initializer for field 'type' of 'struct bpf_prog_info' [-Werror=missing-field-initializers] struct bpf_prog_info info = {}; ^ In file included from /git/linux/tools/lib/bpf/bpf.h:26:0, from util/bpf-event.c:3: /git/linux/tools/include/uapi/linux/bpf.h:2699:8: note: 'type' declared here __u32 type; ^ cc1: all warnings being treated as errors Further fix on a centos:6 system: cc1: warnings being treated as errors util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog': util/bpf-event.c:50: error: 'func_info_rec_size' may be used uninitialized in this function The compiler is wrong, but to silence it, initialize that variable to zero. One more fix, this time for debian:experimental-x-mips, x-mips64 and x-mipsel: util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog': util/bpf-event.c:93:16: error: implicit declaration of function 'calloc' [-Werror=implicit-function-declaration] func_infos = calloc(sub_prog_cnt, func_info_rec_size); ^~~~~~ util/bpf-event.c:93:16: error: incompatible implicit declaration of built-in function 'calloc' [-Werror] util/bpf-event.c:93:16: note: include '<stdlib.h>' or provide a declaration of 'calloc' Add the missing header. Committer testing: # perf record --bpf-event sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.021 MB perf.data (7 samples) ] # perf report -D | grep PERF_RECORD_BPF_EVENT | nl 1 0 0x4b10 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13 2 0 0x4c60 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14 3 0 0x4db0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15 4 0 0x4f00 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16 5 0 0x5050 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17 6 0 0x51a0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18 7 0 0x52f0 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21 8 0 0x5440 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22 # bpftool prog 13: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 13,14 14: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 13,14 15: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 15,16 16: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 15,16 17: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:44-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 17,18 18: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:44-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 17,18 21: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:45-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 21,22 22: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:45-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 21,22 # # perf report -D | grep -B22 PERF_RECORD_KSYMBOL . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 ff 44 06 c0 ff ff ff ff ......8..D...... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b . 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%......... . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x49d8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc00644ff len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba -- . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 48 6d 06 c0 ff ff ff ff ......8.Hm...... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17 . 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........ . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x4b28 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0066d48 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174 -- . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 04 cf 03 c0 ff ff ff ff ......8......... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b . 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%......... . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x4c78 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc003cf04 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba -- . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 96 28 04 c0 ff ff ff ff ......8..(...... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17 . 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........ . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x4dc8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0042896 len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174 -- . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 05 13 17 c0 ff ff ff ff ......8......... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b . 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%......... . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x4f18 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0171305 len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba -- . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 0a 8c 23 c0 ff ff ff ff ......8...#..... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17 . 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........ . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x5068 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0238c0a len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174 -- . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 2a a5 a4 c0 ff ff ff ff ......8.*....... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 37 62 65 34 39 65 33 39 33 34 61 31 32 35 62 _7be49e3934a125b . 0030: 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 7b e4 9e 39 34 a1 25 ba 00 00 00 00 00 00 00 00 {..94.%......... . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x51b8 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4a52a len 229 type 1 flags 0x0 name bpf_prog_7be49e3934a125ba -- . ... raw event: size 312 bytes . 0000: 11 00 00 00 00 00 38 01 9b c9 a4 c0 ff ff ff ff ......8......... . 0010: e5 00 00 00 01 00 00 00 62 70 66 5f 70 72 6f 67 ........bpf_prog . 0020: 5f 32 61 31 34 32 65 66 36 37 61 61 61 64 31 37 _2a142ef67aaad17 . 0030: 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4............... <SNIP zeroes> . 0110: 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... . 0120: 2a 14 2e f6 7a aa d1 74 00 00 00 00 00 00 00 00 *...z..t........ . 0130: 00 00 00 00 00 00 00 00 ........ 0 0x5308 [0x138]: PERF_RECORD_KSYMBOL ksymbol event with addr ffffffffc0a4c99b len 229 type 1 flags 0x0 name bpf_prog_2a142ef67aaad174 Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20190117161521.1341602-8-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-21perf tools: Handle PERF_RECORD_BPF_EVENTSong Liu
This patch adds basic handling of PERF_RECORD_BPF_EVENT. Tracking of PERF_RECORD_BPF_EVENT is OFF by default. Option --bpf-event is added to turn it on. Committer notes: Add dummy machine__process_bpf_event() variant that returns zero for systems without HAVE_LIBBPF_SUPPORT, such as Alpine Linux, unbreaking the build in such systems. Remove the needless include <machine.h> from bpf->event.h, provide just forward declarations for the structs and unions in the parameters, to reduce compilation time and needless rebuilds when machine.h gets changed. Committer testing: When running with: # perf record --bpf-event On an older kernel where PERF_RECORD_BPF_EVENT and PERF_RECORD_KSYMBOL is not present, we fallback to removing those two bits from perf_event_attr, making the tool to continue to work on older kernels: perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off bpf_event ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 ------------------------------------------------------------ sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off ksymbol ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ------------------------------------------------------------ And then proceeds to work without those two features. As passing --bpf-event is an explicit action performed by the user, perhaps we should emit a warning telling that the kernel has no such feature, but this can be done on top of this patch. Now with a kernel that supports these events, start the 'record --bpf-event -a' and then run 'perf trace sleep 10000' that will use the BPF augmented_raw_syscalls.o prebuilt (for another kernel version even) and thus should generate PERF_RECORD_BPF_EVENT events: [root@quaco ~]# perf record -e dummy -a --bpf-event ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.713 MB perf.data ] [root@quaco ~]# bpftool prog 13: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 13,14 14: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 13,14 15: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 15,16 16: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 15,16 17: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:44-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 17,18 18: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:44-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 17,18 21: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:45-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 21,22 22: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:45-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 21,22 31: tracepoint name sys_enter tag 12504ba9402f952f gpl loaded_at 2019-01-19T09:19:56-0300 uid 0 xlated 512B jited 374B memlock 4096B map_ids 30,29,28 32: tracepoint name sys_exit tag c1bd85c092d6e4aa gpl loaded_at 2019-01-19T09:19:56-0300 uid 0 xlated 256B jited 191B memlock 4096B map_ids 30,29 # perf report -D | grep PERF_RECORD_BPF_EVENT | nl 1 0 55834574849 0x4fc8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13 2 0 60129542145 0x5118 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14 3 0 64424509441 0x5268 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15 4 0 68719476737 0x53b8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16 5 0 73014444033 0x5508 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17 6 0 77309411329 0x5658 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18 7 0 90194313217 0x57a8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21 8 0 94489280513 0x58f8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22 9 7 620922484360 0xb6390 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 29 10 7 620922486018 0xb6410 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 29 11 7 620922579199 0xb6490 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 30 12 7 620922580240 0xb6510 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 30 13 7 620922765207 0xb6598 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 31 14 7 620922874543 0xb6620 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 32 # There, the 31 and 32 tracepoint BPF programs put in place by 'perf trace'. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20190117161521.1341602-7-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17perf tools: Allow specifying proc-map-timeout in config fileMark Drayton
The default timeout of 500ms for parsing /proc/<pid>/maps files is too short for profiling many of our services. This can be overridden by passing --proc-map-timeout to the relevant command but it'd be nice to globally increase our default value. This patch permits setting a different default with the core.proc-map-timeout config file parameter. Signed-off-by: Mark Drayton <mbd@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20181204203420.1683114-1-mbd@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17perf record: Extend trace writing to multi AIOAlexey Budankov
Multi AIO trace writing allows caching more kernel data into userspace memory postponing trace writing for the sake of overall profiling data thruput increase. It could be seen as kernel data buffer extension into userspace memory. With an --aio option value different from 0 (default value is 1) the tool has capability to cache more and more data into user space along with delegating spill to AIO. That allows avoiding to suspend at record__aio_sync() between calls of record__mmap_read_evlist() and increases profiling data thruput at the cost of userspace memory. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/050bb053-e7f3-aa83-fde7-f27ff90be7f6@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17perf record: Enable asynchronous trace writingAlexey Budankov
The trace file offset is read once before mmaps iterating loop and written back after all performance data is enqueued for aio writing. The trace file offset is incremented linearly after every successful aio write operation. record__aio_sync() blocks till completion of the started AIO operation and then proceeds. record__aio_mmap_read_sync() implements a barrier for all incomplete aio write requests. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ce2d45e9-d236-871c-7c8f-1bed2d37e8ac@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05perf record: Support weak groupsAndi Kleen
Implement a weak group fallback for 'perf record', similar to the existing 'perf stat' support. This allows to use groups that might be longer than the available counters without failing. Before: $ perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles). /bin/dmesg | grep -i perf may provide additional information. After: $ ./perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1 WARNING: No sample_id_all support, falling back to unordered processing [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-18perf record: Encode -k clockid frequency into Perf traceAlexey Budankov
Store -k clockid frequency into Perf trace to enable timestamps derived metrics conversion into wall clock time on reporting stage. Below is the example of perf report output: tools/perf/perf record -k raw -- ../../matrix/linux/matrix.gcc ... [ perf record: Captured and wrote 31.222 MB perf.data (818054 samples) ] tools/perf/perf report --header # ======== ... # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, use_clockid = 1, clockid = 4 ... # clockid frequency: 1000 MHz ... # ======== Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/23a4a1dc-b160-85a0-347d-40a2ed6d007b@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-19perf tools: Add 'struct perf_mmap' arg to record__write()Jiri Olsa
The struct perf_mmap map argument will hold the file pointer to write the data to. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180913125450.21342-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-19perf auxtrace: Pass struct perf_mmap into mmap__read* functionsJiri Olsa
The perf_mmap struct will hold a file pointer to write the mmap's contents, so we need to propagate it down the stack to record__write callers instead of its member the auxtrace_mmap struct. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180913125450.21342-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-30perf tools: Switch 'session' argument to 'evlist' in ↵Jiri Olsa
perf_event__synthesize_attrs() To be able to pass in other than session's evlist. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180830063252.23729-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-16perf record: Synthesize features before events in pipe modeJiri Olsa
We need to synthesize events first, because some features works on top of them (on report side). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180314092205.23291-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-16perf record: Avoid duplicate call of perf_default_config()Yisheng Xie
We have brought perf_default_config to the very beginning at main(), so it no need to call perf_default_config() once more for most of config in perf-record but only for record.call-graph. Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1520853957-36106-2-git-send-email-xieyisheng1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08perf record: Remove progname from struct recordJiri Olsa
It's no longer used. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180307155020.32613-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08perf record: Move machine variable down the functionJiri Olsa
It's used far more down to be declared on the top of the __cmd_record. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180307155020.32613-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08perf mmap: Use the stored scope data in perf_mmap__push()Kan Liang
Using the 'start' and 'end' which are stored in struct perf_mmap to replace the temporary 'start' and 'end'. The temporary variables will be discarded later. It doesn't need to pass 'overwrite' to perf_mmap__push(). It's stored in struct perf_mmap. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1520350567-80082-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-07perf record: Combine some auxtrace initialization into a single functionAdrian Hunter
In preparation for adding AUX area sampling support, combine some auxtrace initialization into a single function. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1520327598-1317-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-05perf record: Fix crash in pipe modeJiri Olsa
Currently we can crash perf record when running in pipe mode, like: $ perf record ls | perf report # To display the perf.data header info, please use --header/--header-only options. # perf: Segmentation fault Error: The - file has no samples! The callstack of the crash is: 0x0000000000515242 in perf_event__synthesize_event_update_name 3513 ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]); (gdb) bt #0 0x0000000000515242 in perf_event__synthesize_event_update_name #1 0x00000000005158a4 in perf_event__synthesize_extra_attr #2 0x0000000000443347 in record__synthesize #3 0x00000000004438e3 in __cmd_record #4 0x000000000044514e in cmd_record #5 0x00000000004cbc95 in run_builtin #6 0x00000000004cbf02 in handle_internal_command #7 0x00000000004cc054 in run_argv #8 0x00000000004cc422 in main The reason of the crash is that the evsel does not have ids array allocated and the pipe's synthesize code tries to access it. We don't force evsel ids allocation when we have single event, because it's not needed. However we need it when we are in pipe mode even for single event as a key for evsel update event. Fixing this by forcing evsel ids allocation event for single event, when we are in pipe mode. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180302161354.30192-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-05perf record: Throttle user defined frequencies to the maximum allowedArnaldo Carvalho de Melo
# perf record -F 200000 sleep 1 warning: Maximum frequency rate (15,000 Hz) exceeded, throttling from 200,000 Hz to 15,000 Hz. The limit can be raised via /proc/sys/kernel/perf_event_max_sample_rate. The kernel will lower it when perf's interrupts take too long. Use --strict-freq to disable this throttling, refusing to record. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (15 samples) ] # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 For those wanting that it fails if the desired frequency can't be used: # perf record --strict-freq -F 200000 sleep 1 error: Maximum frequency rate (15,000 Hz) exceeded. Please use -F freq option with a lower value or consider tweaking /proc/sys/kernel/perf_event_max_sample_rate. # Suggested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-oyebruc44nlja499nqkr1nzn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-05perf record: Allow asking for the maximum allowed sample rateArnaldo Carvalho de Melo
Add the handy '-F max' shortcut to reading and using the kernel.perf_event_max_sample_rate value as the user supplied sampling frequency: # perf record -F max sleep 1 info: Using a maximum frequency rate of 15,000 Hz [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (14 samples) ] # sysctl kernel.perf_event_max_sample_rate kernel.perf_event_max_sample_rate = 15000 # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 # perf record -F 10 sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (4 samples) ] # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 # Suggested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-4y0tiuws62c64gp4cf0hme0m@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf record: Put new line after target override warningJiri Olsa
There's no new-line after target-override warning, now: $ perf record -a --per-thread Warning: SYSTEM/CPU switch overriding PER-THREAD^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.705 MB perf.data (2939 samples) ] with patch: $ perf record -a --per-thread Warning: SYSTEM/CPU switch overriding PER-THREAD ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.705 MB perf.data (2939 samples) ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 16ad2ffb822c ("perf tools: Introduce perf_target__strerror()") Link: http://lkml.kernel.org/r/20180206181813.10943-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-05perf record: Fix period option handlingJiri Olsa
Stephan reported we don't unset PERIOD sample type when --no-period is specified. Adding the unset check and reset PERIOD if --no-period is specified. Committer notes: Check the sample_type, it shouldn't have PERF_SAMPLE_PERIOD there when --no-period is used. Before: # perf record --no-period sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.018 MB perf.data (7 samples) ] # perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 # After: [root@jouet ~]# perf record --no-period sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data (17 samples) ] [root@jouet ~]# perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 [root@jouet ~]# Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180201083812.11359-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-01-08perf record: Record the first and last sample time in the headerJin Yao
In the default 'perf record' configuration, all samples are processed, to create the HEADER_BUILD_ID table. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). Later, at post processing time, perf report/script will fetch the time from perf file header. Committer testing: # perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ] [root@jouet home]# perf report --header | grep "time of " # time of first sample : 22947.909226 # time of last sample : 22948.910704 # # perf report -D | grep PERF_RECORD_SAMPLE\( 0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0 0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0 <SNIP> 3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0 0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0 2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0 # Changelog: v7: Just update the patch description according to Arnaldo's suggestion. v6: Currently '--buildid-all' is not enabled at default. So the walking on all samples is the default operation. There is no big overhead to calculate the timestamp boundary in process_sample_event handler once we already go through all samples. So the timestamp boundary calculation is enabled by default when '--buildid-all' is not enabled. While if '--buildid-all' is enabled, we creates a new option "--timestamp-boundary" for user to decide if it enables the timestamp boundary calculation. v5: There is an issue that the sample walking can only work when '--buildid-all' is not enabled. So we need to let the walking be able to work even if '--buildid-all' is enabled and let the processing skips the dso hit marking for this case. At first, I want to provide a new option "--record-time-boundaries". While after consideration, I think a new option is not very necessary. v3: Remove the definitions of first_sample_time and last_sample_time from struct record and directly save them in perf_evlist. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-27perf evsel: Enable ignore_missing_thread for pid optionMengting Zhang
While monitoring a multithread process with pid option, perf sometimes may return sys_perf_event_open failure with 3(No such process) if any of the process's threads die before we open the event. However, we want perf continue monitoring the remaining threads and do not exit with error. Here, the patch enables perf_evsel::ignore_missing_thread for -p option to ignore complete failure if any of threads die before we open the event. But it may still return sys_perf_event_open failure with 22(Invalid) if we monitors several event groups. sys_perf_event_open: pid 28960 cpu 40 group_fd 118202 flags 0x8 sys_perf_event_open: pid 28961 cpu 40 group_fd 118203 flags 0x8 WARNING: Ignored open failure for pid 28962 sys_perf_event_open: pid 28962 cpu 40 group_fd [118203] flags 0x8 sys_perf_event_open failed, error -22 That is because when we ignore a missing thread, we change the thread_idx without dealing with its fds, FD(evsel, cpu, thread). Then get_group_fd() may return a wrong group_fd for the next thread and sys_perf_event_open() return with 22. sys_perf_event_open(){ ... if (group_fd != -1) perf_fget_light()//to get corresponding group_leader by group_fd ... if (group_leader) if (group_leader->ctx->task != ctx->task)//should on the same task goto err_context ... } This patch also fixes this bug by introducing perf_evsel__remove_fd() and update_fds to allow removing fds for the missing thread. Changes since v1: - Change group_fd__remove() into a more genetic way without changing code logic - Remove redundant condition Changes since v2: - Use a proper function name and add some comment. - Multiline comment style fixes. Committer testing: Before this patch the recently added 'perf stat --per-thread' for system wide counting would race while enumerating all threads using /proc: [root@jouet ~]# perf stat --per-thread failed to parse CPUs map: No such file or directory Usage: perf stat [<options>] [<command>] -C, --cpu <cpu> list of cpus to monitor in system-wide -a, --all-cpus system-wide collection from all CPUs [root@jouet ~]# perf stat --per-thread failed to parse CPUs map: No such file or directory Usage: perf stat [<options>] [<command>] -C, --cpu <cpu> list of cpus to monitor in system-wide -a, --all-cpus system-wide collection from all CPUs [root@jouet ~]# When, say, the kernel was being built, so lots of shortlived threads, after this patch this doesn't happen. Signed-off-by: Mengting Zhang <zhangmengting@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Cheng Jian <cj.chengjian@huawei.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1513148513-6974-1-git-send-email-zhangmengting@huawei.com [ Remove one use 'evlist' alias variable ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-27perf perf: Remove duplicate includesPravin Shedge
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1512582204-6493-1-git-send-email-pravin.shedge4linux@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-05perf tools: Rename 'backward' to 'overwrite' in evlist, mmap and recordWang Nan
Remove the backward/forward concept to make it uniform with user interface (the '--overwrite' option). Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Mengting Zhang <zhangmengting@huawei.com> Link: http://lkml.kernel.org/r/20171204165107.95327-4-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-05perf mmap: Remove overwrite from arguments list of perf_mmap__pushWang Nan
'overwrite' argument is always 'false'. Remove it from arguments list of perf_mmap__push(). Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/20171203020044.81680-5-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-12-05perf evlist: Remove evlist->overwriteWang Nan
evlist->overwrite is set to false in all users. It can be removed. Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/20171203020044.81680-4-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>