summaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)Author
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-11perf env: Find correct branch counter info on hybridKan Liang
No event is printed in the "Branch Counter" column on hybrid machines. For example, $ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter $ perf report --total-cycles # Branch counter abbr list: # cpu_core/branch-instructions/pp = A # cpu_core/branches/ = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter # ............... .............. ........... .......... .............. 44.54% 727.1K 0.00% 1 |+ |+ | 36.31% 592.7K 0.00% 2 |+ |+ | 17.83% 291.1K 0.00% 1 |+ |+ | The branch counter information (br_cntr_width and br_cntr_nr) in the perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS is not available on hybrid machines. Without the width information, the number of occurrences of an event cannot be calculated. For a hybrid machine, the caps information should be retrieved from the PMU_CAPS, and stored in the perf_env->pmu_caps. Add a perf_env__find_br_cntr_info() to return the correct branch counter information from the corresponding fields. Committer notes: While testing I couldn't s ee those "Branch counter" columns enabled by pressing 'B' on the TUI, after reporting it to the list Kan explained the situation: <quote Kan Liang> For a hybrid client, the "Branch Counter" feature is only supported starting from the just released Lunar Lake. Perf falls back to only "ANY" on your Raptor Lake. The "The branch counter is not available" message is expected. Here is the 'perf evlist' result from my Lunar Lake machine, # perf evlist -v cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS # </quote> Fixes: 6f9d8d1de2c61288 ("perf script: Add branch counters") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240909184201.553519-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11perf pmu: To info add event_type_descIan Rogers
All PMU events are assumed to be "Kernel PMU event", however, this isn't true for fake PMUs and won't be true with the addition of more software PMUs. Make the PMU's type description name configurable - largely for printing callbacks. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-5-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11perf evsel: Add accessor for tool_eventIan Rogers
Currently tool events use a dedicated variable within the evsel. Later changes will move this to the unused struct perf_event_attr config for these events. Add an accessor to allow the later change to be well typed and avoid changing all uses. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11perf pmus: Fake PMU clean upIan Rogers
Rather than passing a fake PMU around, just pass that the fake PMU should be used - true when doing testing. Move the fake PMU into pmus.[ch] and try to abstract the PMU's properties in pmu.c, ie so there is less "if fake_pmu" in non-PMU code. Give the fake PMU a made up type number. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240907050830.6752-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf callchain: Allow symbols to be optional when resolving a callchainIan Rogers
In uses like 'perf inject' it is not necessary to gather the symbol for each call chain location, the map for the sample IP is wanted so that build IDs and the like can be injected. Make gathering the symbol in the callchain_cursor optional. For a 'perf inject -B' command this lowers the peak RSS from 54.1MB to 29.6MB by avoiding loading symbols. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf inject: Lazy build-id mmap2 event insertionIan Rogers
Add -B option that lazily inserts mmap2 events thereby dropping all mmap events without samples. This is similar to the behavior of -b where only build_id events are inserted when a dso is accessed in a sample. File size savings can be significant in system-wide mode, consider: $ perf record -g -a -o perf.data sleep 1 $ perf inject -B -i perf.data -o perf.new.data $ ls -al perf.data perf.new.data 5147049 perf.data 2248493 perf.new.data Give test coverage of the new option in pipe test. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf inject: Add new mmap2-buildid-all optionIan Rogers
Add an option that allows all mmap or mmap2 events to be rewritten as mmap2 events with build IDs. This is similar to the existing -b/--build-ids and --buildid-all options except instead of adding a build_id event an existing mmap/mmap2 event is used as a template and a new mmap2 event synthesized from it. As mmap2 events are typical this avoids the insertion of build_id events. Add test coverage to the pipe test. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf inject: Fix build ID injectionIan Rogers
Build ID injection wasn't inserting a sample ID and aligning events to 64 bytes rather than 8. No sample ID means events are unordered and two different build_id events for the same path, as happens when a file is replaced, can't be differentiated. Add in sample ID insertion for the build_id events alongside some refactoring. The refactoring better aligns the function arguments for different use cases, such as synthesizing build_id events without needing to have a dso. The misc bits are explicitly passed as with callchains the maps/dsos may span user and kernel land, so using sample->cpumode isn't good enough. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Casey Chen <cachen@purestorage.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Link: https://lore.kernel.org/r/20240909203740.143492-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf annotate-data: Add pr_debug_scope()Namhyung Kim
The pr_debug_scope() is to print more information about the scope DIE during the instruction tracking so that it can help finding relevant debug info and the source code like inlined functions more easily. $ perf --debug type-profile annotate --data-type ... ----------------------------------------------------------- find data type for 0(reg0, reg12) at set_task_cpu+0xdd CU for kernel/sched/core.c (die:0x1268dae) frame base: cfa=1 fbreg=7 scope: [3/3] (die:12b6d28) [inlined] set_task_rq <<<--- (here) bb: [9f - dd] var [9f] reg3 type='struct task_struct*' size=0x8 (die:0x126aff0) var [9f] reg6 type='unsigned int' size=0x4 (die:0x1268e0d) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240909214251.3033827-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf annotate: Treat 'call' instruction as stack operationNamhyung Kim
I found some portion of mem-store events sampled on CALL instruction which has no memory access. But it actually saves a return address into stack. It should be considered as a stack operation like RET instruction. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf trace: Collect augmented data using BPFHoward Chu
Include trace_augment.h for TRACE_AUG_MAX_BUF, so that BPF reads TRACE_AUG_MAX_BUF bytes of buffer maximum. Determine what type of argument and how many bytes to read from user space, us ing the value in the beauty_map. This is the relation of parameter type and its corres ponding value in the beauty map, and how many bytes we read eventually: string: 1 -> size of string (till null) struct: size of struct -> size of struct buffer: -1 * (index of paired len) -> value of paired len (maximum: TRACE_AUG_ MAX_BUF) After reading from user space, we output the augmented data using bpf_perf_event_output(). If the struct augmenter, augment_sys_enter() failed, we fall back to using bpf_tail_call(). I have to make the payload 6 times the size of augmented_arg, to pass the BPF verifier. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-10-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-7-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf trace: Pretty print buffer dataHoward Chu
Define TRACE_AUG_MAX_BUF in trace_augment.h data, which is the maximum buffer size we can augment. BPF will include this header too. Print buffer in a way that's different than just printing a string, we print all the control characters in \digits (such as \0 for null, and \10 for newline, LF). For character that has a bigger value than 127, we print the digits instead of the character itself as well. Committer notes: Simplified the buffer scnprintf to avoid using multiple buffers as discussed in the patch review thread. We can't really all 'buf' args to SCA_BUF as we're collecting so far just on the sys_enter path, so we would be printing the previous 'read' arg buffer contents, not what the kernel puts there. So instead of: static int syscall_fmt__cmp(const void *name, const void *fmtp) @@ -1987,8 +1989,6 @@ syscall_arg_fmt__init_array(struct syscall_arg_fmt *arg, struct tep_format_field - else if (strstr(field->type, "char *") && strstr(field->name, "buf")) - arg->scnprintf = SCA_BUF; Do: static const struct syscall_fmt syscall_fmts[] = { + { .name = "write", .errpid = true, + .arg = { [1] = { .scnprintf = SCA_BUF /* buf */, from_user = true, }, }, }, Signed-off-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-8-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-10perf trace: Add trace__bpf_sys_enter_beauty_map() to prepare for fetching ↵Howard Chu
data in BPF Set up beauty_map, load it to BPF, in such format: if argument No.3 is a struct of size 32 bytes (of syscall number 114) beauty_map[114][2] = 32; if argument No.3 is a string (of syscall number 114) beauty_map[114][2] = 1; if argument No.3 is a buffer, its size is indicated by argument No.4 (of syscall number 114) beauty_map[114][2] = -4; /* -1 ~ -6, we'll read this buffer size in BPF */ Committer notes: Moved syscall_arg_fmt__cache_btf_struct() from a ifdef HAVE_LIBBPF_SUPPORT to closer to where it is used, that is ifdef'ed on HAVE_BPF_SKEL and thus breaks the build when building with BUILD_BPF_SKEL=0, as detected using 'make -C tools/perf build-test'. Also add 'struct beauty_map_enter' to tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c as we're using it in this patch, otherwise we get this while trying to build at this point in the original patch series: builtin-trace.c: In function ‘trace__init_syscalls_bpf_prog_array_maps’: builtin-trace.c:3725:58: error: ‘struct <anonymous>’ has no member named ‘beauty_map_enter’ 3725 | int beauty_map_fd = bpf_map__fd(trace->skel->maps.beauty_map_enter); | We also have to take into account syscall_arg_fmt.from_user when telling the kernel what to copy in the sys_enter generic collector, we don't want to collect bogus data in buffers that will only be available to us at sys_exit time, i.e. after the kernel has filled it, so leave this for when we have such a sys_exit based collector. Committer testing: Not wired up yet, so all continues to work, using the existing BPF collector and userspace beautifiers that are augmentation aware: root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654 0.000 ( 0.031 ms): mv/20888 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# perf trace -e connect,sendto ping -c 1 www.google.com 0.000 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 0.040 ( 0.003 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff17980, len: 97, flags: DONTWAIT|NOSIGNAL) = 97 0.480 ( 0.017 ms): ping/20892 sendto(fd: 5, buff: 0x7ffd82d07150, len: 20, addr: { .family: NETLINK }, addr_len: 0xc) = 20 0.526 ( 0.014 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET6, port: 0, addr: 2800:3f0:4004:810::2004 }, addrlen: 28) = 0 0.542 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 0.544 ( 0.004 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 142.251.135.100 }, addrlen: 16) = 0 0.559 ( 0.002 ms): ping/20892 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 142.251.135.100 }, addrlen: 16PING www.google.com (142.251.135.100) 56(84) bytes of data. ) = 0 0.589 ( 0.058 ms): ping/20892 sendto(fd: 3, buff: 0x560b4ff11ac0, len: 64, addr: { .family: INET, port: 0, addr: 142.251.135.100 }, addr_len: 0x10) = 64 45.250 ( 0.029 ms): ping/20892 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 45.344 ( 0.012 ms): ping/20892 sendto(fd: 5, buff: 0x560b4ff19340, len: 111, flags: DONTWAIT|NOSIGNAL) = 111 64 bytes from rio09s08-in-f4.1e100.net (142.251.135.100): icmp_seq=1 ttl=49 time=44.4 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 44.361/44.361/44.361/0.000 ms root@number:~# Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240815013626.935097-4-howardchu95@gmail.com Link: https://lore.kernel.org/r/20240824163322.60796-3-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09perf trace: Use a common encoding for augmented arguments, with size + error ↵Arnaldo Carvalho de Melo
+ payload We were using a more compact format, without explicitely encoding the size and possible error in the payload for an argument. To do it generically, at least as Howard Chu did in his GSoC activities, it is more convenient to use the same model that was being used for string arguments, passing { size, error, payload }. So use that for the non string syscall args we have so far: struct timespec struct perf_event_attr struct sockaddr (this one has even a variable size) With this in place we have the userspace pretty printers: perf_event_attr___scnprintf() syscall_arg__scnprintf_augmented_sockaddr() syscall_arg__scnprintf_augmented_timespec() Ready to have the generic BPF collector in tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c sending its generic payload and thus we'll use them instead of a generic libbpf btf_dump interface that doesn't know about about the sockaddr mux, perf_event_attr non-trivial fields (sample_type, etc), leaving it as a (useful) fallback that prints just basic types until we put in place a more sophisticated pretty printer infrastructure that associates synthesized enums to struct fields using the header scrapers we have in tools/perf/trace/beauty/, some of them in this list: $ ls tools/perf/trace/beauty/*.sh tools/perf/trace/beauty/arch_errno_names.sh tools/perf/trace/beauty/kcmp_type.sh tools/perf/trace/beauty/perf_ioctl.sh tools/perf/trace/beauty/statx_mask.sh tools/perf/trace/beauty/clone.sh tools/perf/trace/beauty/kvm_ioctl.sh tools/perf/trace/beauty/pkey_alloc_access_rights.sh tools/perf/trace/beauty/sync_file_range.sh tools/perf/trace/beauty/drm_ioctl.sh tools/perf/trace/beauty/madvise_behavior.sh tools/perf/trace/beauty/prctl_option.sh tools/perf/trace/beauty/usbdevfs_ioctl.sh tools/perf/trace/beauty/fadvise.sh tools/perf/trace/beauty/mmap_flags.sh tools/perf/trace/beauty/rename_flags.sh tools/perf/trace/beauty/vhost_virtio_ioctl.sh tools/perf/trace/beauty/fs_at_flags.sh tools/perf/trace/beauty/mmap_prot.sh tools/perf/trace/beauty/sndrv_ctl_ioctl.sh tools/perf/trace/beauty/x86_arch_prctl.sh tools/perf/trace/beauty/fsconfig.sh tools/perf/trace/beauty/mount_flags.sh tools/perf/trace/beauty/sndrv_pcm_ioctl.sh tools/perf/trace/beauty/fsmount.sh tools/perf/trace/beauty/move_mount_flags.sh tools/perf/trace/beauty/sockaddr.sh tools/perf/trace/beauty/fspick.sh tools/perf/trace/beauty/mremap_flags.sh tools/perf/trace/beauty/socket.sh $ Testing it: root@number:~# rm -f 987654 ; touch 123456 ; perf trace -e rename* mv 123456 987654 0.000 ( 0.031 ms): mv/1193096 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# perf trace -e *nanosleep sleep 1.2345678901 0.000 (1234.654 ms): sleep/1192697 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567891 }, rmtp: 0x7ffe1ea80460) = 0 root@number:~# perf trace -e perf_event_open* perf stat -e cpu-clock sleep 1 0.000 ( 0.011 ms): perf/1192701 perf_event_open(attr_uptr: { type: 1 (software), size: 136, config: 0 (PERF_COUNT_SW_CPU_CLOCK), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 1192702 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 Performance counter stats for 'sleep 1': 0.51 msec cpu-clock # 0.001 CPUs utilized 1.001242090 seconds time elapsed 0.000000000 seconds user 0.001010000 seconds sys root@number:~# perf trace -e connect* ping -c 1 bsky.app 0.000 ( 0.130 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: LOCAL, path: /run/systemd/resolve/io.systemd.Resolve }, addrlen: 42) = 0 23.907 ( 0.006 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.20.108.158 }, addrlen: 16) = 0 23.915 PING bsky.app (3.20.108.158) 56(84) bytes of data. ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.917 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.12.170.30 }, addrlen: 16) = 0 23.921 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.923 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.217.70.179 }, addrlen: 16) = 0 23.925 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.927 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.132.20.46 }, addrlen: 16) = 0 23.930 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.931 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.142.89.165 }, addrlen: 16) = 0 23.934 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.935 ( 0.002 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 18.119.147.159 }, addrlen: 16) = 0 23.938 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.940 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.22.38.164 }, addrlen: 16) = 0 23.942 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: UNSPEC }, addrlen: 16) = 0 23.944 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 0, addr: 3.13.14.133 }, addrlen: 16) = 0 23.956 ( 0.001 ms): ping/1192740 connect(fd: 5, uservaddr: { .family: INET, port: 1025, addr: 3.20.108.158 }, addrlen: 16) = 0 ^C --- bsky.app ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms root@number:~# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fW4=2GoP6foAN6qbrCiUzy0a_TzHbd8rvDsakTPfdzvfg@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-09perf trace augmented_syscalls.bpf: Move the renameat aumenter to renameat2, ↵Arnaldo Carvalho de Melo
temporarily While trying to shape Howard Chu's generic BPF augmenter transition into the codebase I got stuck with the renameat2 syscall. Until I noticed that the attempt at reusing augmenters were making it use the 'openat' syscall augmenter, that collect just one string syscall arg, for the 'renameat2' syscall, that takes two strings. So, for the moment, just to help in this transition period, since 'renameat2' is what is used these days in the 'mv' utility, just make the BPF collector be associated with the more widely used syscall, hopefully the transition to Howard's generic BPF augmenter will cure this, so get this out of the way for now! So now we still have that odd "reuse", but for something we're not testing so won't get in the way anymore: root@number:~# rm -f 987654 ; touch 123456 ; perf trace -vv -e rename* mv 123456 987654 |& grep renameat Reusing "openat" BPF sys_enter augmenter for "renameat" 0.000 ( 0.079 ms): mv/1158612 renameat2(olddfd: CWD, oldname: "123456", newdfd: CWD, newname: "987654", flags: NOREPLACE) = 0 root@number:~# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fXjGYs=tpBgETK-P9U-CuXssytk9pSnTXpfphrmmOydWA@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06perf mem: Fix missed p-core mem events on ADL and RPLKan Liang
The p-core mem events are missed when launching 'perf mem record' on ADL and RPL. root@number:~# perf mem record sleep 1 Memory events are enabled on a subset of CPUs: 16-27 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.032 MB perf.data ] root@number:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u A variable 'record' in the 'struct perf_mem_event' is to indicate whether a mem event in a mem_events[] should be recorded. The current code only configure the variable for the first eligible PMU. It's good enough for a non-hybrid machine or a hybrid machine which has the same mem_events[]. However, if a different mem_events[] is used for different PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second PMU never get a chance to be set. The mem_events[] of the second PMU are always ignored. 'perf mem' doesn't support the per-PMU configuration now. A per-PMU mem_events[] 'record' variable doesn't make sense. Make it global. That could also avoid searching for the per-PMU mem_events[] via perf_pmu__mem_events_ptr every time. Committer testing: root@number:~# perf evlist -g cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/} cpu_core/mem-stores/P dummy:u root@number:~# The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is not being added by 'perf evlist -g', to be checked. Fixes: abbdd79b786e036e ("perf mem: Clean up perf_mem_events__name()") Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/ Link: https://lore.kernel.org/r/20240905170737.4070743-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06perf mem: Check mem_events for all eligible PMUsKan Liang
The current perf_pmu__mem_events_init() only checks the availability of the mem_events for the first eligible PMU. It works for non-hybrid machines and hybrid machines that have the same mem_events. However, it may bring issues if a hybrid machine has a different mem_events on different PMU, e.g., Alder Lake and Raptor Lake. A mem-loads-aux event is only required for the p-core. The mem_events on both e-core and p-core should be checked and marked. The issue was not found, because it's hidden by another bug, which only records the mem-events for the e-core. The wrong check for the p-core events didn't yell. Fixes: abbdd79b786e036e ("perf mem: Clean up perf_mem_events__name()") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240905170737.4070743-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-06perf script python: Avoid buffer overflow in python PEBS register interfaceAndi Kleen
Running a script that processes PEBS records gives buffer overflows in valgrind. The problem is that the allocation of the register string doesn't include the terminating 0 byte. Fix this. I also replaced the very magic "28" with a more reasonable larger buffer that should fit all registers. There's no need to conserve memory here. ==2106591== Memcheck, a memory error detector ==2106591== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==2106591== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==2106591== Command: ../perf script -i tcall.data gcov.py tcall.gcov ==2106591== ==2106591== Invalid write of size 1 ==2106591== at 0x713354: regs_map (trace-event-python.c:748) ==2106591== by 0x7134EB: set_regs_in_dict (trace-event-python.c:784) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== ==2106591== Invalid read of size 1 ==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502) ==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899) ==2106591== by 0x7134F7: set_regs_in_dict (trace-event-python.c:786) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== ==2106591== Invalid write of size 1 ==2106591== at 0x713354: regs_map (trace-event-python.c:748) ==2106591== by 0x713539: set_regs_in_dict (trace-event-python.c:789) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== ==2106591== Invalid read of size 1 ==2106591== at 0x484B6C6: strlen (vg_replace_strmem.c:502) ==2106591== by 0x555D494: PyUnicode_FromString (unicodeobject.c:1899) ==2106591== by 0x713545: set_regs_in_dict (trace-event-python.c:791) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== Address 0x7186fe0 is 0 bytes after a block of size 0 alloc'd ==2106591== at 0x484280F: malloc (vg_replace_malloc.c:442) ==2106591== by 0x7134AD: set_regs_in_dict (trace-event-python.c:780) ==2106591== by 0x713E58: get_perf_sample_dict (trace-event-python.c:940) ==2106591== by 0x716327: python_process_general_event (trace-event-python.c:1499) ==2106591== by 0x7164E1: python_process_event (trace-event-python.c:1531) ==2106591== by 0x44F9AF: process_sample_event (builtin-script.c:2549) ==2106591== by 0x6294DC: evlist__deliver_sample (session.c:1534) ==2106591== by 0x6296D0: machines__deliver_event (session.c:1573) ==2106591== by 0x629C39: perf_session__deliver_event (session.c:1655) ==2106591== by 0x625830: ordered_events__deliver_event (session.c:193) ==2106591== by 0x630B23: do_flush (ordered-events.c:245) ==2106591== by 0x630E7A: __ordered_events__flush (ordered-events.c:324) ==2106591== 73056 total, 29 ignored Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240905151058.2127122-2-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf parse-events: Vary default_breakpoint_len on i386 and arm64Ian Rogers
On arm64 the breakpoint length should be 4-bytes but 8-bytes is tolerated as perf passes that as sizeof(long). Just pass the correct value. On i386 the sizeof(long) check in the kernel needs to match the kernel's long size. Check using an environment (uname checks) whether 4 or 8 bytes needs to be passed. Cache the value in a static. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240904050606.752788-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf parse-events: Add default_breakpoint_len helperIan Rogers
The default breakpoint length is "sizeof(long)" however this is incorrect on platforms like Aarch64 where sizeof(long) is 8 but the breakpoint length is 4. Add a helper function that can be used to determine the correct breakpoint length, in this change it just returns the existing default sizeof(long) value. Use the helper in the bp_account test so that, when modifying the event from a watchpoint to a breakpoint, the breakpoint length is appropriate for the architecture and not just sizeof(long). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240904050606.752788-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf parse-events: Pass cpu_list as a perf_cpu_map in __add_event()Ian Rogers
Previously the cpu_list is a string and typically no cpu_list is passed to __add_event(). Wanting to make events have their cpus distinct from the PMU means that in more occassions we want to pass a cpu_list. If we're reading this from sysfs it is easier to read a perf_cpu_map than allocate and pass around strings that will later be parsed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf pmu: Merge boolean sysfs event option parsingIan Rogers
Merge perf_pmu__parse_per_pkg() and perf_pmu__parse_snapshot() that do the same parsing except for the file suffix used. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf script: Minimize "not reaching sample" for '-F +brstackinsn'Andi Kleen
In some situations 'perf script -F +brstackinsn' sees a lot of "not reaching sample" messages. This happens when the last LBR block before the sample contains a branch that is not in the LBR, and the instruction dumping stops. $ perf record -b emacs -Q --batch '()' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ] $ perf script -F +brstackinsn ... 00007f0ab2d171a4 insn: 41 0f 94 c0 00007f0ab2d171a8 insn: 83 fa 01 00007f0ab2d171ab insn: 74 d3 # PRED 6 cycles [313] 1.00 IPC 00007f0ab2d17180 insn: 45 84 c0 00007f0ab2d17183 insn: 74 28 ... not reaching sample ... $ perf script -F +brstackinsn | grep -c reach 136 $ This is a problem for further analysis that wants to see the full code upto the sample. There are two common cases where the message is bogus: - The LBR only logs taken branches, but the branch might be a conditional branch that is not taken (that is the most common case actually) - The LBR sampling uses a filter ignoring some branches, but the perf script check checks for all branches. This patch fixes these two conditions, by only checking for conditional branches, as well as checking the perf_event_attr's branch filter attributes. For the test case above it fixes all the messages: $ ./perf script -F +brstackinsn | grep -c reach 0 Note that there are still conditions when the message is hit -- sometimes there can be a unconditional branch that misses the LBR update before the sample -- but they are much more rare now. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240229161828.386397-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf record offcpu: Constify control data for BPFNamhyung Kim
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf record --off-cpu ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.807 MB perf.data (5645 samples) ] root@x1:~# perf evlist cpu_atom/cycles/P cpu_core/cycles/P offcpu-time dummy:u root@x1:~# perf evlist -v cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 offcpu-time: type: 1 (software), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|IDENTIFIER, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@x1:~# perf trace -e bpf --max-events 5 perf record --off-cpu 0.000 ( 0.015 ms): :2949124/2949124 bpf(cmd: 36, uattr: 0x7ffefc6dbe30, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.031 ( 0.115 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbb60, size: 148) = 14 0.159 ( 0.037 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbc20, size: 148) = 14 23.868 ( 0.144 ms): perf/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbad0, size: 148) = 14 24.027 ( 0.014 ms): perf/2949124 bpf(uattr: 0x7ffefc6dbc80, size: 80) = 14 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf lock contention: Constify control data for BPFNamhyung Kim
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf lock contention --use-bpf contended total wait max wait avg wait type caller 5 31.57 us 14.93 us 6.31 us mutex btrfs_delayed_update_inode+0x43 1 16.91 us 16.91 us 16.91 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 15.13 us 15.13 us 15.13 us spinlock btrfs_getattr+0xd1 1 6.65 us 6.65 us 6.65 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 4.34 us 4.34 us 4.34 us spinlock process_one_work+0x1a9 root@x1:~# root@x1:~# perf trace -e bpf --max-events 10 perf lock contention --use-bpf 0.000 ( 0.013 ms): :2948281/2948281 bpf(cmd: 36, uattr: 0x7ffd5f12d730, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.024 ( 0.120 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d460, size: 148) = 16 0.158 ( 0.034 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d520, size: 148) = 16 26.653 ( 0.154 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d3d0, size: 148) = 16 26.825 ( 0.014 ms): perf/2948281 bpf(uattr: 0x7ffd5f12d580, size: 80) = 16 87.924 ( 0.038 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d400, size: 40) = 16 87.988 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d470, size: 40) = 16 88.019 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d250, size: 40) = 16 88.029 ( 0.172 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d320, size: 148) = 17 88.217 ( 0.005 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d4d0, size: 40) = 16 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf kwork: Constify control data for BPFNamhyung Kim
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf kwork report --use-bpf Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- (w)intel_atomic_commit_work [ | 0009 | 18.680 ms | 2 | 18.553 ms | 362410.681580 s | 362410.700133 s | (w)pm_runtime_work | 0007 | 13.300 ms | 1 | 13.300 ms | 362410.254996 s | 362410.268295 s | (w)intel_atomic_commit_work [ | 0009 | 9.846 ms | 2 | 9.717 ms | 362410.172352 s | 362410.182069 s | (w)acpi_ec_event_processor | 0002 | 8.106 ms | 1 | 8.106 ms | 362410.463187 s | 362410.471293 s | (s)SCHED:7 | 0000 | 1.351 ms | 106 | 0.063 ms | 362410.658017 s | 362410.658080 s | i915:157 | 0008 | 0.994 ms | 13 | 0.361 ms | 362411.222125 s | 362411.222486 s | (s)SCHED:7 | 0001 | 0.703 ms | 98 | 0.047 ms | 362410.245004 s | 362410.245051 s | (s)SCHED:7 | 0005 | 0.674 ms | 42 | 0.074 ms | 362411.483039 s | 362411.483113 s | (s)NET_RX:3 | 0001 | 0.556 ms | 10 | 0.079 ms | 362411.066388 s | 362411.066467 s | <SNIP> root@x1:~# perf trace -e bpf --max-events 5 perf kwork report --use-bpf 0.000 ( 0.016 ms): perf/2948007 bpf(cmd: 36, uattr: 0x7ffededa6660, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.026 ( 0.106 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6390, size: 148) = 12 0.152 ( 0.032 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6450, size: 148) = 12 26.247 ( 0.138 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6300, size: 148) = 12 26.396 ( 0.012 ms): perf/2948007 bpf(uattr: 0x7ffededa64b0, size: 80) = 12 Starting trace, Hit <Ctrl+C> to stop and report root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240902200515.2103769-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf ftrace latency: Constify control data for BPFNamhyung Kim
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf ftrace latency --use-bpf -T schedule ^C# DURATION | COUNT | GRAPH | 0 - 1 us | 0 | | 1 - 2 us | 0 | | 2 - 4 us | 0 | | 4 - 8 us | 0 | | 8 - 16 us | 1 | | 16 - 32 us | 5 | | 32 - 64 us | 2 | | 64 - 128 us | 6 | | 128 - 256 us | 7 | | 256 - 512 us | 5 | | 512 - 1024 us | 22 | # | 1 - 2 ms | 36 | ## | 2 - 4 ms | 68 | ##### | 4 - 8 ms | 22 | # | 8 - 16 ms | 91 | ####### | 16 - 32 ms | 11 | | 32 - 64 ms | 26 | ## | 64 - 128 ms | 213 | ################# | 128 - 256 ms | 19 | # | 256 - 512 ms | 14 | # | 512 - 1024 ms | 5 | | 1 - ... s | 8 | | root@x1:~# root@x1:~# perf trace -e bpf perf ftrace latency --use-bpf -T schedule 0.000 ( 0.015 ms): perf/2944525 bpf(cmd: 36, uattr: 0x7ffe80de7b40, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.025 ( 0.102 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7870, size: 148) = 8 0.136 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7930, size: 148) = 8 0.174 ( 0.026 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de77e0, size: 148) = 8 0.205 ( 0.010 ms): perf/2944525 bpf(uattr: 0x7ffe80de7990, size: 80) = 8 0.227 ( 0.011 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7810, size: 40) = 8 0.244 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.257 ( 0.006 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7660, size: 40) = 8 0.265 ( 0.058 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7730, size: 148) = 9 0.330 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78e0, size: 40) = 8 0.337 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40) = 8 0.343 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.349 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40) = 8 0.355 ( 0.004 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7890, size: 40) = 8 0.361 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de78b0, size: 40) = 8 0.367 ( 0.003 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7880, size: 40) = 8 0.373 ( 0.014 ms): perf/2944525 bpf(cmd: BTF_LOAD, uattr: 0x7ffe80de7a00, size: 40) = 8 0.390 ( 0.358 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.763 ( 0.014 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.783 ( 0.011 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.798 ( 0.017 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.819 ( 0.003 ms): perf/2944525 bpf(uattr: 0x7ffe80de7700, size: 80) = 9 0.824 ( 0.047 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de76c0, size: 148) = 10 0.878 ( 0.008 ms): perf/2944525 bpf(uattr: 0x7ffe80de7950, size: 80) = 9 0.891 ( 0.014 ms): perf/2944525 bpf(cmd: MAP_UPDATE_ELEM, uattr: 0x7ffe80de79e0, size: 32) = 0 0.910 ( 0.103 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148) = 9 1.016 ( 0.143 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7880, size: 148) = 10 3.777 ( 0.068 ms): perf/2944525 bpf(cmd: PROG_LOAD, uattr: 0x7ffe80de7570, size: 148) = 12 3.848 ( 0.003 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de7550, size: 64) = -1 EBADF (Bad file descriptor) 3.859 ( 0.006 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64) = 12 6.504 ( 0.010 ms): perf/2944525 bpf(cmd: LINK_CREATE, uattr: 0x7ffe80de77c0, size: 64) = 14 ^C# DURATION | COUNT | GRAPH | 0 - 1 us | 0 | | 1 - 2 us | 0 | | 2 - 4 us | 1 | | 4 - 8 us | 3 | | 8 - 16 us | 3 | | 16 - 32 us | 11 | | 32 - 64 us | 9 | | 64 - 128 us | 17 | | 128 - 256 us | 30 | # | 256 - 512 us | 20 | | 512 - 1024 us | 42 | # | 1 - 2 ms | 151 | ###### | 2 - 4 ms | 106 | #### | 4 - 8 ms | 18 | | 8 - 16 ms | 149 | ###### | 16 - 32 ms | 30 | # | 32 - 64 ms | 17 | | 64 - 128 ms | 360 | ############### | 128 - 256 ms | 52 | ## | 256 - 512 ms | 18 | | 512 - 1024 ms | 28 | # | 1 - ... s | 5 | | root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf stat: Constify control data for BPFNamhyung Kim
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1 Performance counter stats for 'sleep 1': 2,442,583 cpu_core/cycles/ 2,494,425 cpu_core/instructions/ 1.002687372 seconds time elapsed 0.001126000 seconds user 0.001166000 seconds sys root@x1:~# perf trace -e bpf --max-events 10 perf stat --bpf-counters -e cpu_core/cycles/,cpu_core/instructions/ sleep 1 0.000 ( 0.019 ms): perf/2944119 bpf(cmd: OBJ_GET, uattr: 0x7fffdf5cdd40, size: 20) = 5 0.021 ( 0.002 ms): perf/2944119 bpf(cmd: OBJ_GET_INFO_BY_FD, uattr: 0x7fffdf5cdcd0, size: 16) = 0 0.030 ( 0.005 ms): perf/2944119 bpf(cmd: MAP_LOOKUP_ELEM, uattr: 0x7fffdf5ceda0, size: 32) = 0 0.037 ( 0.004 ms): perf/2944119 bpf(cmd: LINK_GET_FD_BY_ID, uattr: 0x7fffdf5ced80, size: 12) = -1 ENOENT (No such file or directory) 0.189 ( 0.004 ms): perf/2944119 bpf(cmd: 36, uattr: 0x7fffdf5cec10, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.201 ( 0.095 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5ce940, size: 148) = 10 0.305 ( 0.026 ms): perf/2944119 bpf(cmd: PROG_LOAD, uattr: 0x7fffdf5cea00, size: 148) = 10 0.347 ( 0.012 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce8e0, size: 40) = 10 0.364 ( 0.004 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce950, size: 40) = 10 0.376 ( 0.006 ms): perf/2944119 bpf(cmd: BTF_LOAD, uattr: 0x7fffdf5ce730, size: 40) = 10 root@x1:~# Performance counter stats for 'sleep 1': 271,221 cpu_core/cycles/ 139,150 cpu_core/instructions/ 1.002881677 seconds time elapsed 0.001318000 seconds user 0.001314000 seconds sys root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf time-utils: Fix 32-bit nsec parsingIan Rogers
The "time utils" test fails in 32-bit builds: ... parse_nsec_time("18446744073.709551615") Failed. ptime 4294967295709551615 expected 18446744073709551615 ... Switch strtoul to strtoull as an unsigned long in 32-bit build isn't 64-bits. Fixes: c284d669a20d408b ("perf tools: Move parse_nsec_time to time-utils.c") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf pmus: Fix name comparisons on 32-bit systemsIan Rogers
The hex PMU suffix maybe 64-bit but the comparisons were "unsigned long" or 32-bit on 32-bit systems. This was causing the "PMU name comparison" test to fail in a 32-bit build. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: David Ahern <dsa@cumulusnetworks.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240831070415.506194-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf annotate: LLVM-based disassemblerSteinar H. Gunderson
Support using LLVM as a disassembler method, allowing helperless annotation in non-distro builds. (It is also much faster than using libbfd or bfd objdump on binaries with a lot of debug information.) This is nearly identical to the output of llvm-objdump; there are some very rare whitespace differences, some minor changes to demangling (since we use perf's regular demangling and not LLVM's own) and the occasional case where llvm-objdump makes a different choice when multiple symbols share the same address. It should work across all of LLVM's supported architectures, although I've only tested 64-bit x86, and finding the right triple from perf's idea of machine architecture can sometimes be a bit tricky. Ideally, we should have some way of finding the triplet just from the file itself. Committer notes: Address this on 32-bit systems by using PRIu64 from inttypes.h 3 17.58 almalinux:9-i386 : FAIL gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC) util/llvm-c-helpers.cpp: In function ‘char* make_symbol_relative_string(dso*, const char*, u64, u64)’: util/llvm-c-helpers.cpp:150:52: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘u64’ {aka +‘long long unsigned int’} [-Werror=format=] 150 | snprintf(buf, sizeof(buf), "%s+0x%lx", | ~~^ | | | long unsigned int | %llx 151 | demangled ? demangled : sym_name, addr - base_addr); | ~~~~~~~~~~~~~~~~ | | | u64 {aka long long unsigned int} cc1plus: all warnings being treated as errors Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-3-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf annotate: Split out read_symbol()Steinar H. Gunderson
The Capstone disassembler code has a useful code snippet to read the bytes for a given code symbol into memory. Split it out into its own function, so that the LLVM disassembler can use it in the next patch. Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-2-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf report: Support LLVM for addr2line()Steinar H. Gunderson
In addition to the existing support for libbfd and calling out to an external addr2line command, add support for using libllvm directly. This is both faster than libbfd, and can be enabled in distro builds (the LLVM license has an explicit provision for GPLv2 compatibility). Thus, it is set as the primary choice if available. As an example, running 'perf report' on a medium-size profile with DWARF-based backtraces took 58 seconds with LLVM, 78 seconds with libbfd, 153 seconds with external llvm-addr2line, and I got tired and aborted the test after waiting for 55 minutes with external bfd addr2line (which is the default for perf as compiled by distributions today). Evidently, for this case, the bfd addr2line process needs 18 seconds (on a 5.2 GHz Zen 3) to load the .debug ELF in question, hits the 1-second timeout and gets killed during initialization, getting restarted anew every time. Having an in-process addr2line makes this much more robust. As future extensions, libllvm can be used in many other places where we currently use libbfd or other libraries: - Symbol enumeration (in particular, for PE binaries). - Demangling (including non-Itanium demangling, e.g. Microsoft or Rust). - Disassembling (perf annotate). However, these are much less pressing; most people don't profile PE binaries, and perf has non-bfd paths for ELF. The same with demangling; the default _cxa_demangle path works fine for most users, and while bfd objdump can be slow on large binaries, it is possible to use --objdump=llvm-objdump to get the speed benefits. (It appears LLVM-based demangling is very simple, should we want that.) Tested with LLVM 14, 15, 16, 18 and 19. For some reason, LLVM 12 was not correctly detected using feature_check, and thus was not tested. Committer notes: Added the name and a __maybe_unused to address: 1 13.50 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-22) (GCC) util/srcline.c: In function 'dso__free_a2l': util/srcline.c:184:20: error: parameter name omitted void dso__free_a2l(struct dso *) ^~~~~~~~~~~~ make[3]: *** [/git/perf-6.11.0-rc3/tools/build/Makefile.build:158: util] Error 2 Signed-off-by: Steinar H. Gunderson <sesse@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240803152008.2818485-1-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-02perf tools: Build x86 32-bit syscall table from ↵Arnaldo Carvalho de Melo
arch/x86/entry/syscalls/syscall_32.tbl To remove one more use of the audit libs and address a problem reported with a recent change where a function isn't available when using the audit libs method, that should really go away, this being one step in that direction. The script used to generate the 64-bit syscall table was already parametrized to generate for both 64-bit and 32-bit, so just use it and wire the generated table to the syscalltbl.c routines. Reported-by: Jiri Slaby <jirislaby@kernel.org> Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Jiri Slaby <jirislaby@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/6fe63fa3-6c63-4b75-ac09-884d26f6fb95@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30perf lock contention: Fix spinlock and rwlock accountingNamhyung Kim
The spinlock and rwlock use a single-element per-cpu array to track current locks due to performance reason. But this means the key is always available and it cannot simply account lock stats in the array because some of them are invalid. In fact, the contention_end() program in the BPF invalidates the entry by setting the 'lock' value to 0 instead of deleting the entry for the hashmap. So it should skip entries with the lock value of 0 in the account_end_timestamp(). Otherwise, it'd have spurious high contention on an idle machine: $ sudo perf lock con -ab -Y spinlock sleep 3 contended total wait max wait avg wait type caller 8 4.72 s 1.84 s 590.46 ms spinlock rcu_core+0xc7 8 1.87 s 1.87 s 233.48 ms spinlock process_one_work+0x1b5 2 1.87 s 1.87 s 933.92 ms spinlock worker_thread+0x1a2 3 1.81 s 1.81 s 603.93 ms spinlock tmigr_update_events+0x13c 2 1.72 s 1.72 s 861.98 ms spinlock tick_do_update_jiffies64+0x25 6 42.48 us 13.02 us 7.08 us spinlock futex_q_lock+0x2a 1 13.03 us 13.03 us 13.03 us spinlock futex_wake+0xce 1 11.61 us 11.61 us 11.61 us spinlock rcu_core+0xc7 I don't believe it has contention on a spinlock longer than 1 second. After this change, it only reports some small contentions. $ sudo perf lock con -ab -Y spinlock sleep 3 contended total wait max wait avg wait type caller 4 133.51 us 43.29 us 33.38 us spinlock tick_do_update_jiffies64+0x25 4 69.06 us 31.82 us 17.27 us spinlock process_one_work+0x1b5 2 50.66 us 25.77 us 25.33 us spinlock rcu_core+0xc7 1 28.45 us 28.45 us 28.45 us spinlock rcu_core+0xc7 1 24.77 us 24.77 us 24.77 us spinlock tmigr_update_events+0x13c 1 23.34 us 23.34 us 23.34 us spinlock raw_spin_rq_lock_nested+0x15 Fixes: b5711042a1c8cc88 ("perf lock contention: Use per-cpu array map for spinlocks") Reported-by: Xi Wang <xii@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: bpf@vger.kernel.org Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240828052953.1445862-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30perf lock contention: Do not fail EEXIST for updateNamhyung Kim
When it updates the lock stat for the first time, it needs to create an element in the BPF hash map. But if there's a concurrent thread waiting for the same lock (like for rwsem or rwlock), it might race with the thread and possibly fail to update with -EEXIST. In that case, it can lookup the map again and put the data there instead of failing. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240830065150.1758962-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30perf lock contention: Simplify spinlock checkNamhyung Kim
The LCB_F_SPIN bit is used for spinlock, rwlock and optimistic spinning in mutex. In get_tstamp_elem() it needs to check spinlock and rwlock only. As mutex sets the LCB_F_MUTEX, it can check those two bits and reduce the number of operations. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240830065150.1758962-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30perf lock contention: Handle error in a single placeNamhyung Kim
It has some duplicate codes to do the same job. Let's add a label and goto there to handle errors in a single place. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240830065150.1758962-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30perf header: Remove repipe optionIan Rogers
No longer used by `perf inject` the repipe_fd is always -1 and repipe is always false. Remove the options and associated code knowing the constant values of the removed variables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-30perf inject: Overhaul handling of pipe filesIan Rogers
Previously inject->is_pipe was set if the input or output were a pipe. Determining the input was a pipe had to be done prior to starting the session and opening the file. This was done by comparing the input file name with '-' but it fails if the pipe file is written to disk. Opening a pipe file from disk will correctly set perf_data.is_pipe, but this is too late for 'perf inject' and results in a broken file. A workaround is 'cat pipe_perf|perf inject -i - ...'. This change removes inject->is_pipe and changes the dependent conditions to use the is_pipe flag on the input (inject->session->data) and output files (inject->output). This ensures the is_pipe condition reflects things like the header being read. The change removes the use of perf file header repiping, that is writing the file header out while reading it in. The case of input pipe and output file cannot repipe as the attributes for the file are unknown. To resolve this, write the file header when writing to disk and as the attributes may be unknown, write them after the data. Update sessions repipe variable to be trace_event_repipe as those are the only events now impacted by it. Update __perf_session__new as the repipe_fd no longer needs passing. Fully removing repipe from session header reading will be done in a later change. Committer testing: root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 | perf report -i - # To display the perf.data header info, please use --header/--header-only options. # [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.050 MB - ] # # Total Lost Samples: 0 # # Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep' # Event count (approx.): 1 # # Overhead Command Shared Object Symbol # ........ ....... ............. ............................... # 100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5 | ---__libc_start_main@@GLIBC_2.34 __libc_start_call_main 0x562fc2560a9f clock_nanosleep@GLIBC_2.2.5 # # (Tip: Create an archive with symtabs to analyse on other machine: perf archive) # root@number:~# perf record -e syscalls:sys_enter_*sleep/max-stack=4/ -o - sleep 0.01 > pipe.data [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.050 MB - ] root@number:~# perf report --stdio -i pipe.data # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1 of event 'syscalls:sys_enter_clock_nanosleep' # Event count (approx.): 1 # # Overhead Command Shared Object Symbol # ........ ....... ............. ............................... # 100.00% sleep libc.so.6 [.] clock_nanosleep@GLIBC_2.2.5 | ---__libc_start_main@@GLIBC_2.34 __libc_start_call_main 0x55f775975a9f clock_nanosleep@GLIBC_2.2.5 # # (Tip: To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ...) # root@number:~# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf header: Allow attributes to be written after dataIan Rogers
With a file, to write data an offset needs to be known. Typically data follows the event attributes in a file. However, if processing a pipe the number of event attributes may not be known. It is convenient in that case to write the attributes after the data. Expand perf_session__do_write_header() to allow this when the data offset and size are known. This approach may be useful for more than just taking a pipe file to write into a data file, `perf inject --itrace` will reserve and additional 8kb for attributes, which would be unnecessary if the attributes were written after the data. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf header: Fail read if header sections overlapIan Rogers
Buggy perf.data files can have the attributes and data overlapping. For example, when processing pipe data the attributes aren't known and so file offset header calculations can consider them not present. Later this can cause the attributes to overwrite the data. This can be seen in: $ perf record -o - true > a.data [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.059 MB - ] $ perf inject -i a.data -o b.data $ perf report --stats -i b.data 0x68 [0]: failed to process type: 510379 [Invalid argument] Error: failed to process sample $ This change makes reading the corrupt file fail: $ perf report --stats -i b.data Perf file header corrupt: Attributes and data overlap incompatible file format (rerun with -v to learn more) $ Which is more informative. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf header: Add kerneldoc to 'struct perf_file_header'Ian Rogers
Some of the values are a little strange so add documentation to resolve ambiguity. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf session: Document 'struct perf_session' and constify its 'auxtrace' memberIan Rogers
perf_session is a central data structure to the tool so let's comment it. The auxtrace callbacks are never modified in session so constify. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240829150154.37929-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf: cs-etm: Print queue number in raw trace dumpJames Clark
Now that we have overlapping trace IDs it's also useful to know what the queue number is to be able to distinguish the source of the trace so print it inline. Hide it behind the -v option because it might not be obvious to users what the queue number is. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-8-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf: cs-etm: Support version 0.1 of HW_ID packetsJames Clark
v0.1 HW_ID packets have a new field that describes which sink each CPU writes to. Use the sink ID to link trace ID maps to each other so that mappings are shared wherever the sink is shared. Also update the error message to show that overlapping IDs aren't an error in per-thread mode, just not supported. In the future we can use the CPU ID from the AUX records, or watch for changing sink IDs on HW_ID packets to use the correct decoders. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-7-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf: cs-etm: Create decoders based on the trace ID mappingsJames Clark
Now that each queue has a unique set of trace ID mappings, use this list to create the decoders. In unformatted mode just add a single mapping so only one decoder is made. Previously each queue would have a decoder created for each traced CPU on the system but this won't work anymore because CPUs can have overlapping trace IDs. This also means that the CORESIGHT_TRACE_ID_UNUSED_FLAG isn't needed any more. If mappings aren't added then decoders aren't created, rather than needing a flag to suppress creation. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-5-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf: cs-etm: Move traceid_list to each queueJames Clark
The global list won't work for per-sink trace ID allocations, so put a list in each queue where the IDs will be unique to that queue. To keep the same behavior as before, for version 0 of the HW_ID packets, copy all the HW_ID mappings into all queues. This change doesn't effect the decoders, only trace ID lookups on the Perf side. The decoders are still created with global mappings which will be fixed in a later commit. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-4-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-29perf: cs-etm: Allocate queues for all CPUsJames Clark
Make cs_etm__setup_queue() setup a queue even if it's empty, and pre-allocate queues based on the max CPU that was recorded. In per-CPU mode aux queues are indexed based on CPU ID even if all CPUs aren't recorded, sparse queue arrays aren't used. This will allow HW_IDs to be saved even if no aux data was received in that queue without having to call cs_etm__setup_queue() from two different places. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240722101202.26915-3-james.clark@linaro.org Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>