summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2025-05-26Merge tag 'x86-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: | Since commit | | c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") | | dated Jun 6 2017, we have been using C code on the boot path in a way | that is not supported by the toolchain, i.e., to execute non-PIC C | code from a mapping of memory that is different from the one provided | to the linker. It should have been obvious at the time that this was a | bad idea, given the need to sprinkle fixup_pointer() calls left and | right to manipulate global variables (including non-pointer variables) | without crashing. | | This C startup code has been expanding, and in particular, the SEV-SNP | startup code has been expanding over the past couple of years, and | grown many of these warts, where the C code needs to use special | annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)" * tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits) x86/bugs: Fix spectre_v2 mitigation default on Intel x86/bugs: Restructure ITS mitigation x86/xen/msr: Fix uninitialized variable 'err' x86/msr: Remove a superfluous inclusion of <asm/asm.h> x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only x86/mm/64: Make 5-level paging support unconditional x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model x86/mm/64: Always use dynamic memory layout x86/bugs: Fix indentation due to ITS merge x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() x86/cpuid: Rename have_cpuid_p() to cpuid_feature() x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h> x86/msr: Add rdmsrl_on_cpu() compatibility wrapper x86/mm: Fix kernel-doc descriptions of various pgtable methods x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too x86/boot: Defer initialization of VM space related global variables ...
2025-05-25Merge branch 'locking/futex' into locking/core, to pick up pending futex changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-21tools headers: Synchronize prctl.h ABI headerSebastian Andrzej Siewior
The prctl.h ABI header was slightly updated during the development of the interface. In particular the "immutable" parameter became a bit in the option argument. Synchronize prctl.h ABI header again and make use of the definition in the testsuite and "perf bench futex". Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20250517151455.1065363-5-bigeasy@linutronix.de
2025-05-06Merge tag 'v6.15-rc5' into x86/cpu, to resolve conflictsIngo Molnar
Conflicts: tools/arch/x86/include/asm/cpufeatures.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-03tools/perf: Allow to select the number of hash bucketsSebastian Andrzej Siewior
Add the -b/ --buckets argument to specify the number of hash buckets for the private futex hash. This is directly passed to prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, buckets, immutable) and must return without an error if specified. The `immutable' is 0 by default and can be set to 1 via the -I/ --immutable argument. The size of the private hash is verified with PR_FUTEX_HASH_GET_SLOTS. If PR_FUTEX_HASH_GET_SLOTS failed then it is assumed that an older kernel was used without the support and that the global hash is used. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250416162921.513656-20-bigeasy@linutronix.de
2025-04-29perf tools: Fix in-source libperf buildJames Clark
When libperf is built alone in-source, $(OUTPUT) isn't set. This causes the generated uapi path to resolve to '/../arch' which results in a permissions error: mkdir: cannot create directory '/../arch': Permission denied Fix it by removing the preceding '/..' which means that it gets generated either in the tools/lib/perf part of the tree or the OUTPUT folder. Some other rules that rely on OUTPUT further refine this conditionally depending on whether it's an in-source or out-of-source build, but I don't think we need the extra complexity here. And this rule is slightly different to others because the header is needed by both libperf and Perf. This is further complicated by the fact that Perf always passes O=... to libperf even for in source builds, meaning that OUTPUT isn't set consistently between projects. Because we're no longer going one level up to try to generate the file in the tools/ folder, Perf's include rule needs to descend into libperf. Also fix the clean rule while we're here. Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Closes: https://lore.kernel.org/linux-perf-users/7703f88e-ccb7-4c98-9da4-8aad224e780f@leemhuis.info/ Fixes: bfb713ea53c7 ("perf tools: Fix arm64 build by generating unistd_64.h") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Link: https://lore.kernel.org/r/20250429-james-perf-fix-libperf-in-source-build-v1-1-a1a827ac15e5@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-23perf tools: Fix arm64 build by generating unistd_64.hJames Clark
Since pulling in the kernel changes in commit 22f72088ffe6 ("tools headers: Update the syscall table with the kernel sources"), arm64 is no longer using a generic syscall header and generates one from the syscall table. Therefore we must also generate the syscall header for arm64 before building Perf. Add it as a dependency to libperf which uses one syscall number. Perf uses more, but as libperf is a dependency of Perf it will be generated for both. Future platforms that need this will have to add their own syscall-y targets in libperf manually. Unfortunately the arch specific files that do this (e.g. arch/arm64/include/asm/Kbuild) can't easily be imported into the Perf build. But Perf only needs a subset of the generated files anyway, so redefining them is probably the correct thing to do. Fixes: 22f72088ffe6 ("tools headers: Update the syscall table with the kernel sources") Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20250417-james-perf-fix-gen-syscall-v1-1-1d268c923901@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-14x86/platform/amd: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>Ingo Molnar
Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-2-mingo@kernel.org
2025-04-11perf tools: Remove evsel__handle_error_quirks()Namhyung Kim
The evsel__handle_error_quirks() is to fixup invalid event attributes on some architecture based on the error code. Currently it's only used for AMD to disable precise_ip not to use IBS which has more restrictions. But the commit c33aea446bf555ab changed call evsel__precise_ip_fallback for any errors so there's no difference with the above function. To make matter worse, it caused a problem with branch stack on Zen3. The IBS doesn't support branch stack so it should use a regular core PMU event. The default event is set precise_max and it starts with 3. And evsel__precise_ip_fallback() tries with it and reduces the level one by one. At last it tries with 0 but it also failed on Zen3 since the branch stack is not supported for the cycles event. At this point, evsel__precise_ip_fallback() restores the original precise_ip value (3) in the hope that it can succeed with other modifier (like exclude_kernel). Then evsel__handle_error_quirks() see it has precise_ip != 0 and make it retry with 0. This created an infinite loop. Before: $ perf record -b -vv |& grep removing removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD removing precise_ip on AMD ... After: $ perf record -b true Error: Failure to open event 'cycles:P' on PMU 'cpu' which will be removed. Invalid event (cycles:P) in per-thread mode, enable system wide with '-a'. Error: Failure to open any events for recording. Fixes: c33aea446bf555ab ("perf tools: Fix precise_ip fallback logic") Tested-by: Chun-Tse Shao <ctshao@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250410010252.402221-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10perf libunwind arm64: Fix missing close parens in an if statementArnaldo Carvalho de Melo
While testing building with libunwind (using LIBUNWIND=1) in various arches I noticed a problem on arm64, on an rpi5 system, a missing close parens in a change related to dso__data_get_fd() usage, fix it. Fixes: 5ac22c35aa8519f1 ("perf dso: Use lock annotations to fix asan deadlock") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/Z_Z3o8KvB2i5c6ab@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10tools headers: Update the arch/x86/lib/memset_64.S copy with the kernel sourcesNamhyung Kim
To pick up the changes in: 2981557cb0408e14 x86,kcfi: Fix EXPORT_SYMBOL vs kCFI That required adding a copy of include/linux/cfi_types.h and its checking in tools/perf/check-headers.h. Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S Please see tools/include/uapi/README for further details. Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20250410001125.391820-11-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10tools headers: Update the linux/unaligned.h copy with the kernel sourcesNamhyung Kim
To pick up the changes in: 3846699217798061 ALSA: rawmidi: Make tied_device=0 as default / unknown 7bb49d2e8b52adac ALSA: rawmidi: Bump protocol version to 2.0.5 b8fefed73a952a33 ALSA: rawmidi: Show substream activity in info ioctl bdf46443f350dd5d ALSA: rawmidi: Expose the tied device number in info ioctl Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Please see tools/include/uapi/README for further details. Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: linux-sound@vger.kernel.org Link: https://lore.kernel.org/r/20250410001125.391820-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10tools headers: Update the uapi/linux/prctl.h copy with the kernel sourcesNamhyung Kim
To pick up the changes in: ec2d0c04624b3c8a posix-timers: Provide a mechanism to allocate a given timer ID Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Please see tools/include/uapi/README for further details. Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250410001125.391820-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10tools headers: Update the syscall table with the kernel sourcesNamhyung Kim
To pick up the changes in: c4a16820d9019940 fs: add open_tree_attr() 2df1ad0d25803399 x86/arch_prctl: Simplify sys_arch_prctl() e632bca07c8eef1d arm64: generate 64-bit syscall.tbl This is basically to support the new open_tree_attr syscall. But it also needs to update asm-generic unistd.h header to get the new syscall number. And arm64 unistd.h header was converted to use the generic 64-bit header. Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/scripts/syscall.tbl scripts/syscall.tbl diff -u tools/perf/arch/x86/entry/syscalls/syscall_32.tbl arch/x86/entry/syscalls/syscall_32.tbl diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl diff -u tools/perf/arch/arm/entry/syscalls/syscall.tbl arch/arm/tools/syscall.tbl diff -u tools/perf/arch/sh/entry/syscalls/syscall.tbl arch/sh/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/sparc/entry/syscalls/syscall.tbl arch/sparc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/xtensa/entry/syscalls/syscall.tbl arch/xtensa/kernel/syscalls/syscall.tbl diff -u tools/arch/arm64/include/uapi/asm/unistd.h arch/arm64/include/uapi/asm/unistd.h diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Please see tools/include/uapi/README for further details. Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: linux-arch@vger.kernel.org Link: https://lore.kernel.org/r/20250410001125.391820-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10tools headers: Update the VFS headers with the kernel sourcesNamhyung Kim
To pick up the changes in: 7ed6cbe0f8caa6ee fs: add STATX_DIO_READ_ALIGN 8fc7e23a9bd851e6 fs: reformat the statx definition a5874fde3c0884a3 exec: Add a new AT_EXECVE_CHECK flag to execveat(2) 1ebd4a3c095cd538 blk-crypto: add ioctls to create and prepare hardware-wrapped keys af6505e5745b9f3a fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag 10783d0ba0d7731e fs, iov_iter: define meta io descriptor 8f6116b5b77b0536 statmount: add a new supported_mask field 37c4a9590e1efcae statmount: allow to retrieve idmappings Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h diff -u tools/perf/trace/beauty/include/uapi/linux/stat.h include/uapi/linux/stat.h diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h Please see tools/include/uapi/README for further details. Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250410001125.391820-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-10tools headers: Update the socket headers with the kernel sourcesNamhyung Kim
To pick up the changes in: 64e844505bc08cde include: uapi: protocol number and packet structs for AGGFRAG in ESP 18912c520674ec4d tcp: devmem: don't write truncated dmabuf CMSGs to userspace Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Please see tools/include/uapi/README for further details. Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20250410001125.391820-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-04-05tracing/timers: Rename the hrtimer_init event to hrtimer_setupNam Cao
The function hrtimer_init() doesn't exist anymore. It was replaced by hrtimer_setup(). Thus, rename the hrtimer_init trace event to hrtimer_setup to keep it consistent. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/cba84c3d853c5258aa3a262363a6eac08e2c7afc.1738746927.git.namcao@linutronix.de
2025-03-31Merge tag 'perf-tools-for-v6.15-2025-03-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf record: - Introduce latency profiling using scheduler information. The latency profiling is to show impacts on wall-time rather than cpu-time. By tracking context switches, it can weight samples and find which part of the code contributed more to the execution latency. The value (period) of the sample is weighted by dividing it by the number of parallel execution at the moment. The parallelism is tracked in perf report with sched-switch records. This will reduce the portion that are run in parallel and in turn increase the portion of serial executions. For now, it's limited to profile processes, IOW system-wide profiling is not supported. You can add --latency option to enable this. $ perf record --latency -- make -C tools/perf I've run the above command for perf build which adds -j option to make with the number of CPUs in the system internally. Normally it'd show something like below: $ perf report -F overhead,comm ... # # Overhead Command # ........ ............... # 78.97% cc1 6.54% python3 4.21% shellcheck 3.28% ld 1.80% as 1.37% cc1plus 0.80% sh 0.62% clang 0.56% gcc 0.44% perl 0.39% make ... The cc1 takes around 80% of the overhead as it's the actual compiler. However it runs in parallel so its contribution to latency may be less than that. Now, perf report will show both overhead and latency (if --latency was given at record time) like below: $ perf report -s comm ... # # Overhead Latency Command # ........ ........ ............... # 78.97% 48.66% cc1 6.54% 25.68% python3 4.21% 0.39% shellcheck 3.28% 13.70% ld 1.80% 2.56% as 1.37% 3.08% cc1plus 0.80% 0.98% sh 0.62% 0.61% clang 0.56% 0.33% gcc 0.44% 1.71% perl 0.39% 0.83% make ... You can see latency of cc1 goes down to around 50% and python3 and ld contribute a lot more than their overhead. You can use --latency option in perf report to get the same result but ordered by latency. $ perf report --latency -s comm perf report: - As a side effect of the latency profiling work, it adds a new output field 'latency' and a sort key 'parallelism'. The below is a result from my system with 64 CPUs. The build was well-parallelized but contained some serial portions. $ perf report -s parallelism ... # # Overhead Latency Parallelism # ........ ........ ........... # 16.95% 1.54% 62 13.38% 1.24% 61 12.50% 70.47% 1 11.81% 1.06% 63 7.59% 0.71% 60 4.33% 12.20% 2 3.41% 0.33% 59 2.05% 0.18% 64 1.75% 1.09% 9 1.64% 1.85% 5 ... - Support Feodra mini-debuginfo which is a LZMA compressed symbol table inside ".gnu_debugdata" ELF section. perf annotate: - Add --code-with-type option to enable data-type profiling with the usual annotate output. Instead of focusing on data structure, it shows code annotation together with data type it accesses in case the instruction refers to a memory location (and it was able to resolve the target data type). Currently it only works with --stdio. $ perf annotate --stdio --code-with-type ... Percent | Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period) ---------------------------------------------------------------------------------------------------------------------- : 0 0xffffffff81050610 <__fdget>: 0.00 : ffffffff81050610: callq 0xffffffff81c01b80 <__fentry__> # data-type: (stack operation) 0.00 : ffffffff81050615: pushq %rbp # data-type: (stack operation) 0.00 : ffffffff81050616: movq %rsp, %rbp 0.00 : ffffffff81050619: pushq %r15 # data-type: (stack operation) 0.00 : ffffffff8105061b: pushq %r14 # data-type: (stack operation) 0.00 : ffffffff8105061d: pushq %rbx # data-type: (stack operation) 0.00 : ffffffff8105061e: subq $0x10, %rsp 0.00 : ffffffff81050622: movl %edi, %ebx 0.00 : ffffffff81050624: movq %gs:0x7efc4814(%rip), %rax # 0x14e40 <current_task> # data-type: struct task_struct* +0 0.00 : ffffffff8105062c: movq 0x8d0(%rax), %r14 # data-type: struct task_struct +0x8d0 (files) 0.00 : ffffffff81050633: movl (%r14), %eax # data-type: struct files_struct +0 (count.counter) 0.00 : ffffffff81050636: cmpl $0x1, %eax 0.00 : ffffffff81050639: je 0xffffffff810506a9 <__fdget+0x99> 0.00 : ffffffff8105063b: movq 0x20(%r14), %rcx # data-type: struct files_struct +0x20 (fdt) 0.00 : ffffffff8105063f: movl (%rcx), %eax # data-type: struct fdtable +0 (max_fds) 0.00 : ffffffff81050641: cmpl %ebx, %eax 0.00 : ffffffff81050643: jbe 0xffffffff810506ef <__fdget+0xdf> 0.00 : ffffffff81050649: movl %ebx, %r15d 5.56 : ffffffff8105064c: movq 0x8(%rcx), %rdx # data-type: struct fdtable +0x8 (fd) ... The "# data-type:" part was added with this change. The first few entries are not very interesting. But later you can it accesses a couple of fields in the task_struct, files_struct and fdtable. perf trace: - Support syscall tracing for different ABI. For example it can trace system calls for 32-bit applications on 64-bit kernel transparently. - Add --summary-mode=total option to show global syscall summary. The default is 'thread' to show per-thread syscall summary. Python support: - Add more interfaces to 'perf' module to parse events, and config, enable or disable the event list properly so that it can implement basic functionalities purely in Python. There is an example code for these new interfaces in python/tracepoint.py. - Add mypy and pylint support to enable build time checking. Fix some code based on the findings from these tools. Internals: - Introduce io_dir__readdir() API to make directory traveral (usually for proc or sysfs) efficient with less memory footprint. JSON vendor events: - Add events and metrics for ARM Neoverse N3 and V3 - Update events and metrics on various Intel CPUs - Add/update events for a number of SiFive processors" * tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits) perf bpf-filter: Fix a parsing error with comma perf report: Fix a memory leak for perf_env on AMD perf trace: Fix wrong size to bpf_map__update_elem call perf tools: annotate asm_pure_loop.S perf python: Fix setup.py mypy errors perf test: Address attr.py mypy error perf build: Add pylint build tests perf build: Add mypy build tests perf build: Rename TEST_LOGS to SHELL_TEST_LOGS tools/build: Don't pass test log files to linker perf bench sched pipe: fix enforced blocking reads in worker_thread perf tools: Fix is_compat_mode build break in ppc64 perf build: filter all combinations of -flto for libperl perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata perf trace: Fix evlist memory leak perf trace: Fix BTF memory leak perf trace: Make syscall table stable perf syscalltbl: Mask off ABI type for MIPS system calls perf build: Remove Makefile.syscalls ...
2025-03-24perf bpf-filter: Fix a parsing error with commaNamhyung Kim
The previous change to support cgroup filters introduced a bug that pathname can include commas. It confused the lexer to treat an item and the trailing comma as a single token. And it resulted in a parse error: $ sudo perf record -e cycles:P --filter 'period > 0, ip > 64' -- true perf_bpf_filter: Error: Unexpected item: 0, perf_bpf_filter: syntax error, unexpected BFT_ERROR, expecting BFT_NUM Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] --filter <filter> event filter It should get "0" and "," separately. An easiest fix would be to remove "," from the possible pathname characters. As it's for cgroup names, probably ok to assume it won't have commas in the pathname. I found that the existing BPF filtering test didn't have any complex filter condition with commas. Let's update the group filter test which is supposed to test filter combinations like this. Link: https://lore.kernel.org/r/20250307220922.434319-1-namhyung@kernel.org Fixes: 91e88437d5156b20 ("perf bpf-filter: Support filtering on cgroups") Reported-by: Sally Shi <sshii@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf report: Fix a memory leak for perf_env on AMDNamhyung Kim
The env.pmu_mapping can be leaked when it reads data from a pipe on AMD. For a pipe data, it reads the header data including pmu_mapping from PERF_RECORD_HEADER_FEATURE runtime. But it's already set in: perf_session__new() __perf_session__new() evlist__init_trace_event_sample_raw() evlist__has_amd_ibs() perf_env__nr_pmu_mappings() Then it'll overwrite that when it processes the HEADER_FEATURE record. Here's a report from address sanitizer. Direct leak of 2689 byte(s) in 1 object(s) allocated from: #0 0x7fed8f814596 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98 #1 0x5595a7d416b1 in strbuf_grow util/strbuf.c:64 #2 0x5595a7d414ef in strbuf_init util/strbuf.c:25 #3 0x5595a7d0f4b7 in perf_env__read_pmu_mappings util/env.c:362 #4 0x5595a7d12ab7 in perf_env__nr_pmu_mappings util/env.c:517 #5 0x5595a7d89d2f in evlist__has_amd_ibs util/amd-sample-raw.c:315 #6 0x5595a7d87fb2 in evlist__init_trace_event_sample_raw util/sample-raw.c:23 #7 0x5595a7d7f893 in __perf_session__new util/session.c:179 #8 0x5595a7b79572 in perf_session__new util/session.h:115 #9 0x5595a7b7e9dc in cmd_report builtin-report.c:1603 #10 0x5595a7c019eb in run_builtin perf.c:351 #11 0x5595a7c01c92 in handle_internal_command perf.c:404 #12 0x5595a7c01deb in run_argv perf.c:448 #13 0x5595a7c02134 in main perf.c:556 #14 0x7fed85833d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Let's free the existing pmu_mapping data if any. Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250311000416.817631-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf trace: Fix wrong size to bpf_map__update_elem callThomas Richter
In linux-next commit c760174401f6 ("perf cpumap: Reduce cpu size from int to int16_t") causes the perf tests 100 126 to fail on s390: Output before: # ./perf test 100 100: perf trace BTF general tests : FAILED! # The root cause is the change from int to int16_t for the cpu maps. The size of the CPU key value pair changes from four bytes to two bytes. However a two byte key size is not supported for bpf_map__update_elem(). Note: validate_map_op() in libbpf.c emits warning libbpf: map '__augmented_syscalls__': \ unexpected key size 2 provided, expected 4 when key size is set to int16_t. Therefore change to variable size back to 4 bytes for invocation of bpf_map__update_elem(). Output after: # ./perf test 100 100: perf trace BTF general tests : Ok # Fixes: c760174401f6 ("perf cpumap: Reduce cpu size from int to int16_t") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Howard Chu <howardchu95@gmail.com> Cc: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250324152756.3879571-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf tools: annotate asm_pure_loop.SMarcus Meissner
Annotate so it is built with non-executable stack. Fixes: 8b97519711c3 ("perf test: Add asm pureloop test tool") Signed-off-by: Marcus Meissner <meissner@suse.de> Reviewed-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250323085410.23751-1-meissner@suse.de Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf python: Fix setup.py mypy errorsIan Rogers
getenv may return None, so assert it isn't None for CC and srctree environmental variables required for the script. Disable an optional warning related to Popen. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311213628.569562-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf test: Address attr.py mypy errorIan Rogers
ConfigParser existed in python2 but not in python3 causing mypy to fail. Whilst removing a python2 workaround remove reference to __future__. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311213628.569562-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf build: Add pylint build testsIan Rogers
If PYLINT=1 is passed to the build then run pylint over python code in perf. Unlike shellcheck this isn't default on as there are currently too many errors. An example of an error: ``` ************* Module setup util/setup.py:19:0: C0301: Line too long (127/100) (line-too-long) util/setup.py:20:0: C0301: Line too long (138/100) (line-too-long) util/setup.py:63:0: C0301: Line too long (106/100) (line-too-long) util/setup.py:1:0: C0114: Missing module docstring (missing-module-docstring) util/setup.py:24:4: W0622: Redefining built-in 'vars' (redefined-builtin) util/setup.py:11:4: C0103: Constant name "cc_options" doesn't conform to UPPER_CASE naming style (invalid-name) util/setup.py:13:4: C0103: Constant name "cc_options" doesn't conform to UPPER_CASE naming style (invalid-name) util/setup.py:15:34: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) util/setup.py:18:0: C0116: Missing function or method docstring (missing-function-docstring) util/setup.py:19:16: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) util/setup.py:44:0: C0413: Import "from setuptools import setup, Extension" should be placed at the top of the module (wrong-import-position) util/setup.py:46:0: C0413: Import "from setuptools.command.build_ext import build_ext as _build_ext" should be placed at the top of the module (wrong-import-position) util/setup.py:47:0: C0413: Import "from setuptools.command.install_lib import install_lib as _install_lib" should be placed at the top of the module (wrong-import-position) util/setup.py:49:0: C0115: Missing class docstring (missing-class-docstring) util/setup.py:49:0: C0103: Class name "build_ext" doesn't conform to PascalCase naming style (invalid-name) util/setup.py:52:8: W0201: Attribute 'build_lib' defined outside __init__ (attribute-defined-outside-init) util/setup.py:53:8: W0201: Attribute 'build_temp' defined outside __init__ (attribute-defined-outside-init) util/setup.py:55:0: C0115: Missing class docstring (missing-class-docstring) util/setup.py:55:0: C0103: Class name "install_lib" doesn't conform to PascalCase naming style (invalid-name) util/setup.py:58:8: W0201: Attribute 'build_dir' defined outside __init__ (attribute-defined-outside-init) *----------------------------------------------------------------- Your code has been rated at 6.67/10 (previous run: 6.51/10, +0.16) make[4]: *** [util/Build:442: util/setup.py.pylint_log] Error 1 ``` Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311213628.569562-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf build: Add mypy build testsIan Rogers
If MYPY=1 is passed to the build then run mypy over python code in perf. Unlike shellcheck this isn't default on as there are currently too many errors. An example of an error: ``` util/setup.py:8: error: Item "None" of "str | None" has no attribute "split" [union-attr] util/setup.py:15: error: Item "None" of "IO[bytes] | None" has no attribute "readline" [union-attr] util/setup.py:15: error: List item 0 has incompatible type "str | None"; expected "str | bytes | PathLike[str] | PathLike[bytes]" [list-item] util/setup.py:16: error: Unsupported left operand type for + ("None") [operator] util/setup.py:16: note: Left operand is of type "str | None" util/setup.py:74: error: Unsupported left operand type for + ("None") [operator] util/setup.py:74: note: Left operand is of type "str | None" Found 5 errors in 1 file (checked 1 source file) make[4]: *** [util/Build:430: util/setup.py.mypy_log] Error 1 ``` Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311213628.569562-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-24perf build: Rename TEST_LOGS to SHELL_TEST_LOGSIan Rogers
Rename TEST_LOGS to SHELL_TEST_LOGS as later changes will add more kinds of test logs. Minor comment tweak in Makefile.perf as more than just test shell tests are checked. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311213628.569562-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-23perf bench sched pipe: fix enforced blocking reads in worker_threadDirk Gouders
The function worker_thread() is programmed in a way that roughly doubles the number of expectable context switches, because it enforces blocking reads: Performance counter stats for 'perf bench sched pipe': 2,000,004 context-switches 11.859548321 seconds time elapsed 0.674871000 seconds user 8.076890000 seconds sys The result of this behavior is that the blocking reads by far dominate the performance analysis of 'perf bench sched pipe': Samples: 78K of event 'cycles:P', Event count (approx.): 27964965844 Overhead Command Shared Object Symbol 25.28% sched-pipe [kernel.kallsyms] [k] read_hpet 8.11% sched-pipe [kernel.kallsyms] [k] retbleed_untrain_ret 2.82% sched-pipe [kernel.kallsyms] [k] pipe_write From the code, it is unclear if that behavior is wanted but the log says that at least Ingo Molnar aims to mimic lmbench's lat_ctx, that doesn't handle the pipe ends that way (https://sourceforge.net/p/lmbench/code/HEAD/tree/trunk/lmbench2/src/lat_ctx.c) Fix worker_thread() by always first feeding the write ends of the pipes and then trying to read. This roughly halves the context switches and runtime of pure 'perf bench sched pipe': Performance counter stats for 'perf bench sched pipe': 1,005,770 context-switches 6.033448041 seconds time elapsed 0.423142000 seconds user 4.519829000 seconds sys And the blocking reads do no longer dominate the analysis at the above extreme: Samples: 40K of event 'cycles:P', Event count (approx.): 14309364879 Overhead Command Shared Object Symbol 12.20% sched-pipe [kernel.kallsyms] [k] read_hpet 9.23% sched-pipe [kernel.kallsyms] [k] retbleed_untrain_ret 3.68% sched-pipe [kernel.kallsyms] [k] pipe_write Signed-off-by: Dirk Gouders <dirk@gouders.net> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250323140316.19027-2-dirk@gouders.net Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-23perf tools: Fix is_compat_mode build break in ppc64Likhitha Korrapati
Commit 54f9aa1092457 ("tools/perf/powerpc/util: Add support to handle compatible mode PVR for perf json events") introduced to select proper JSON events in case of compat mode using auxiliary vector. But this caused a compilation error in ppc64 Big Endian. arch/powerpc/util/header.c: In function 'is_compat_mode': arch/powerpc/util/header.c:20:21: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 20 | if (!strcmp((char *)platform, (char *)base_platform)) | ^ arch/powerpc/util/header.c:20:39: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 20 | if (!strcmp((char *)platform, (char *)base_platform)) | Commit saved the getauxval(AT_BASE_PLATFORM) and getauxval(AT_PLATFORM) return values in u64 which causes the compilation error. Patch fixes this issue by changing u64 to "unsigned long". Fixes: 54f9aa1092457 ("tools/perf/powerpc/util: Add support to handle compatible mode PVR for perf json events") Signed-off-by: Likhitha Korrapati <likhitha@linux.ibm.com> Reviewed-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250321100726.699956-1-likhitha@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-23perf build: filter all combinations of -flto for libperlHolger Hoffstätte
When enabling the libperl feature the build uses perl's build flags (ccopts) but filters out various flags, e.g. for LTO. While this is conceptually correct, it is insufficient in practice, since only "-flto=auto" is filtered out. When perl itself is built with "-flto" this can cause parts of perf being built with LTO and others without, giving exciting build errors like e.g.: ../tools/perf/pmu-events/pmu-events.c:72851:(.text+0xb79): undefined reference to `strcmp_cpuid_str' collect2: error: ld returned 1 exit status Fix this by filtering all matching flag values of -flto{=n,auto,..}. Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com> Link: https://lore.kernel.org/r/20250321082038.27901-2-holger@applied-asynchrony.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf vendor events arm64 AmpereOneX: Fix frontend_bound calculationIlkka Koskinen
frontend_bound metrics was miscalculated due to different scaling in a couple of metrics it depends on. Change the scaling to match with AmpereOne. Fixes: 16438b652b46 ("perf vendor events arm64 AmpereOneX: Add core PMU events and metrics") Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250313201559.11332-3-ilkka@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by ↵Ilkka Koskinen
errata Atomic instructions are both memory-reading and memory-writing instructions and so should be counted by both LD_RETIRED and ST_RETIRED performance monitoring events. However LD_RETIRED does not count atomic instructions. Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250313201559.11332-2-ilkka@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf trace: Fix evlist memory leakIan Rogers
Leak sanitizer was reporting a memory leak in the "perf record and replay" test. Add evlist__delete to trace__exit, also ensure trace__exit is called after trace__record. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf trace: Fix BTF memory leakIan Rogers
Add missing btf__free in trace__exit. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf trace: Make syscall table stableIan Rogers
Namhyung fixed the syscall table being reallocated and moving by reloading the system call pointer after a move: https://lore.kernel.org/lkml/Z9YHCzINiu4uBQ8B@google.com/ This could be brittle so this patch changes the syscall table to be an array of pointers of "struct syscall" that don't move. Remove unnecessary copies and searches with this change. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf syscalltbl: Mask off ABI type for MIPS system callsIan Rogers
Arnd Bergmann described that MIPS system calls don't necessarily start from 0 as an ABI prefix is applied: https://lore.kernel.org/lkml/8ed7dfb2-1e4d-4aa4-a04b-0397a89365d1@app.fastmail.com/ When decoding the "id" (aka system call number) for MIPS ignore values greater-than 1000. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf build: Remove Makefile.syscallsIan Rogers
Now a single beauty file is generated and used by all architectures, remove the per-architecture Makefiles, Kbuild files and previous generator script. Note: there was conversation with Charlie Jenkins <charlie@rivosinc.com> and they'd written an alternate approach to support multiple architectures: https://lore.kernel.org/all/20250114-perf_syscall_arch_runtime-v1-1-5b304e408e11@rivosinc.com/ It would have been better to have helped Charlie fix their series (my apologies) but they agreed that the approach taken here was likely best for longer term maintainability: https://lore.kernel.org/lkml/Z6Jk_UN9i69QGqUj@ghost/ Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf syscalltbl: Use lookup table containing multiple architecturesIan Rogers
Switch to use the lookup table containing all architectures rather than tables matching the perf binary. This fixes perf trace when executed on a 32-bit i386 binary on an x86-64 machine. Note in the following the system call names of the 32-bit i386 binary as seen by an x86-64 perf. Before: ``` ? ( ): a.out/447296 ... [continued]: munmap()) = 0 0.024 ( 0.001 ms): a.out/447296 recvfrom(ubuf: 0x2, size: 4160585708, flags: DONTROUTE|CTRUNC|TRUNC|DONTWAIT|EOR|WAITALL|FIN|SYN|CONFIRM|RST|ERRQUEUE|NOSIGNAL|WAITFORONE|BATCH|SOCK_DEVMEM|ZEROCOPY|FASTOPEN|CMSG_CLOEXEC|0x91f80000, addr: 0xe30, addr_len: 0xffce438c) = 1475198976 0.042 ( 0.003 ms): a.out/447296 lgetxattr(name: "", value: 0x3, size: 34) = 4160344064 0.054 ( 0.003 ms): a.out/447296 dup2(oldfd: -134422744, newfd: 4) = -1 ENOENT (No such file or directory) 0.060 ( 0.009 ms): a.out/447296 preadv(fd: 4294967196, vec: (struct iovec){.iov_base = (void *)0x2e646c2f6374652f,.iov_len = (__kernel_size_t)7307199665335594867,}, vlen: 557056, pos_h: 4160585708) = 3 0.074 ( 0.004 ms): a.out/447296 lgetxattr(name: "", value: 0x1, size: 2) = 4160237568 0.080 ( 0.001 ms): a.out/447296 lstat(filename: "", statbuf: 0x193f6) = 0 0.089 ( 0.007 ms): a.out/447296 preadv(fd: 4294967196, vec: (struct iovec){.iov_base = (void *)0x3833692f62696c2f,.iov_len = (__kernel_size_t)3276497845987585334,}, vlen: 557056, pos_h: 4160585708) = 3 0.097 ( 0.002 ms): a.out/447296 close(fd: 3</proc/447296/status>) = 512 0.103 ( 0.002 ms): a.out/447296 lgetxattr(name: "", value: 0x1, size: 2050) = 4157935616 0.107 ( 0.007 ms): a.out/447296 lgetxattr(pathname: "", name: "", value: 0x5, size: 2066) = 4158078976 0.116 ( 0.003 ms): a.out/447296 lgetxattr(pathname: "", name: "", value: 0x1, size: 2066) = 4159639552 0.121 ( 0.003 ms): a.out/447296 lgetxattr(pathname: "", name: "", value: 0x3, size: 2066) = 4160184320 0.129 ( 0.002 ms): a.out/447296 lgetxattr(pathname: "", name: "", value: 0x3, size: 50) = 4160196608 0.138 ( 0.001 ms): a.out/447296 lstat(filename: "") = 0 0.145 ( 0.002 ms): a.out/447296 mq_timedreceive(mqdes: 4291706800, u_msg_ptr: 0xf7f9ea48, msg_len: 134616640, u_msg_prio: 0xf7fd7fec, u_abs_timeout: (struct __kernel_timespec){.tv_sec = (__kernel_time64_t)-578174027777317696,.tv_nsec = (long long int)4160349376,}) = 0 0.148 ( 0.001 ms): a.out/447296 mkdirat(dfd: -134617816, pathname: " ��� ���▒���▒���", mode: IFREG|ISUID|IRUSR|IWGRP|0xf7fd0000) = 447296 0.150 ( 0.001 ms): a.out/447296 process_vm_writev(pid: -134617812, lvec: (struct iovec){.iov_base = (void *)0xf7f9e9c8f7f9e4c0,.iov_len = (__kernel_size_t)4160349376,}, liovcnt: 4160588048, rvec: (struct iovec){}, riovcnt: 4160585708, flags: 4291707352) = 0 0.197 ( 0.004 ms): a.out/447296 capget(header: 4160184320, dataptr: 8192) = 0 0.202 ( 0.002 ms): a.out/447296 capget(header: 1448669184, dataptr: 4096) = 0 0.208 ( 0.002 ms): a.out/447296 capget(header: 4160577536, dataptr: 8192) = 0 0.220 ( 0.001 ms): a.out/447296 getxattr(pathname: "", name: "c������", value: 0xf7f77e34, size: 1) = 0 0.228 ( 0.005 ms): a.out/447296 fchmod(fd: -134729728, mode: IRUGO|IWUGO|IFREG|IFIFO|ISVTX|IXUSR|0x10000) = 0 0.240 ( 0.009 ms): a.out/447296 preadv(fd: 4294967196, vec: 0x5658e008, pos_h: 4160192052) = 3 0.250 ( 0.008 ms): a.out/447296 close(fd: 3</proc/447296/status>) = 1436 0.260 ( 0.018 ms): a.out/447296 stat(filename: "", statbuf: 0xffce32ac) = 1436 0.288 (1000.213 ms): a.out/447296 readlinkat(buf: 0xffce31d4, bufsiz: 4291703244) = 0 ``` After: ``` ? ( ): a.out/442930 ... [continued]: execve()) = 0 0.023 ( 0.002 ms): a.out/442930 brk() = 0x57760000 0.052 ( 0.003 ms): a.out/442930 access(filename: 0xf7f5af28, mode: R) = -1 ENOENT (No such file or directory) 0.059 ( 0.009 ms): a.out/442930 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC|LARGEFILE) = 3 0.078 ( 0.001 ms): a.out/442930 close(fd: 3</proc/442930/status>) = 0 0.087 ( 0.007 ms): a.out/442930 openat(dfd: CWD, filename: "/lib/i386-linux-", flags: RDONLY|CLOEXEC|LARGEFILE) = 3 0.095 ( 0.002 ms): a.out/442930 read(fd: 3</proc/442930/status>, buf: 0xffbdbb70, count: 512) = 512 0.135 ( 0.001 ms): a.out/442930 close(fd: 3</proc/442930/status>) = 0 0.148 ( 0.001 ms): a.out/442930 set_tid_address(tidptr: 0xf7f2b528) = 442930 (a.out) 0.150 ( 0.001 ms): a.out/442930 set_robust_list(head: 0xf7f2b52c, len: 12) = 0.196 ( 0.004 ms): a.out/442930 mprotect(start: 0xf7f03000, len: 8192, prot: READ) = 0 0.202 ( 0.002 ms): a.out/442930 mprotect(start: 0x5658e000, len: 4096, prot: READ) = 0 0.207 ( 0.002 ms): a.out/442930 mprotect(start: 0xf7f63000, len: 8192, prot: READ) = 0 0.230 ( 0.005 ms): a.out/442930 munmap(addr: 0xf7f10000, len: 103414) = 0 0.244 ( 0.010 ms): a.out/442930 openat(dfd: CWD, filename: 0x5658d008) = 3 0.255 ( 0.007 ms): a.out/442930 read(fd: 3</proc/442930/status>, buf: 0xffbdb67c, count: 4096) = 1436 0.264 ( 0.018 ms): a.out/442930 write(fd: 1</dev/pts/4>, buf: , count: 1436) = 1436 0.292 (1000.173 ms): a.out/442930 clock_nanosleep(rqtp: { .tv_sec: 17866546940376776704, .tv_nsec: 4159878336 }, rmtp: 0xffbdb59c) = 0 1000.478 ( ): a.out/442930 exit_group() = ? ``` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf trace beauty: Add syscalltbl.sh generating all system call tablesIan Rogers
Rather than generating individual syscall header files generate a single trace/beauty/generated/syscalltbl.c. In a syscalltbls array have references to each architectures tables along with the corresponding e_machine. When the 32-bit or 64-bit table is ambiguous, match the perf binary's type. For ARM32 don't use the arm64 32-bit table which is smaller. EM_NONE is present for is no machine matches. Conditionally compile the tables, only having the appropriate 32 and 64-bit table. If ALL_SYSCALLTBL is defined all tables can be compiled. Add comment for noreturn column suggested by Arnd Bergmann: https://lore.kernel.org/lkml/d47c35dd-9c52-48e7-a00d-135572f11fbb@app.fastmail.com/ and added in commit 9142be9e6443 ("x86/syscall: Mark exit[_group] syscall handlers __noreturn"). Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf thread: Add support for reading the e_machine type for a threadIan Rogers
First try to read the e_machine from the dsos associated with the thread's maps. If live use the executable from /proc/pid/exe and read the e_machine from the ELF header. On failure use EM_HOST. Change builtin-trace syscall functions to pass e_machine from the thread rather than EM_HOST, so that in later patches when syscalltbl can use the e_machine the system calls are specific to the architecture. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf dso: Add support for reading the e_machine type for a dsoIan Rogers
For ELF file dsos read the e_machine from the ELF header. For kernel types assume the e_machine matches the perf tool. In other cases return EM_NONE. When reading from the ELF header use DSO__SWAP that may need dso->needs_swap initializing. Factor out dso__swap_init to allow this. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf syscalltbl: Remove struct syscalltblIan Rogers
The syscalltbl held entries of system call name and number pairs, generated from a native syscalltbl at start up. As there are gaps in the system call number there is a notion of index into the table. Going forward we want the system call table to be identifiable by a machine type, for example, i386 vs x86-64. Change the interface to the syscalltbl so (1) a (currently unused machine type of EM_HOST) is passed (2) the index to syscall number and system call name mapping is computed at build time. Two tables are used for this, an array of system call number to name, an array of system call numbers sorted by the system call name. The sorted array doesn't store strings in part to save memory and relocations. The index notion is carried forward and is an index into the sorted array of system call numbers, the data structures are opaque (held only in syscalltbl.c), and so the number of indices for a machine type is exposed as a new API. The arrays are computed in the syscalltbl.sh script and so no start-up time computation and storage is necessary. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf trace: Reorganize syscallsIan Rogers
Identify struct syscall information in the syscalls table by a machine type and syscall number, not just system call number. Having the machine type means that 32-bit system calls can be differentiated from 64-bit ones on a machine capable of both. Having a table for all machine types and all system call numbers would be too large, so maintain a sorted array of system calls as they are encountered. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf syscalltbl: Remove syscall_table.hIan Rogers
The definition of "static const char *const syscalltbl[] = {" is done in a generated syscalls_32.h or syscalls_64.h that is architecture dependent. In order to include the appropriate file a syscall_table.h is found via the perf include path and it includes the syscalls_32.h or syscalls_64.h as appropriate. To support having multiple syscall tables, one for 32-bit and one for 64-bit, or for different architectures, an include path cannot be used. Remove syscall_table.h because of this and inline what it does into syscalltbl.c. For architectures without a syscall_table.h this will cause a failure to include either syscalls_32.h or syscalls_64.h rather than a failure to include syscall_table.h. For architectures that only included one or other, the behavior matches BITS_PER_LONG as previously done on architectures supporting both syscalls_32.h and syscalls_64.h. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Howard Chu <howardchu95@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf dso: kernel-doc for enum dso_binary_typeIan Rogers
There are many and non-obvious meanings to the dso_binary_type enum values. Add kernel-doc to speed interpretting their meanings. Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf dso: Move libunwind dso_data variables into ifdefIan Rogers
The variables elf_base_addr, debug_frame_offset, eh_frame_hdr_addr and eh_frame_hdr_offset are only accessed in unwind-libunwind-local.c which is conditionally built on having libunwind support. Make the variables conditional on libunwind support too. Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org> Link: https://lore.kernel.org/r/20250319050741.269828-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf report: Disable children column for data type profilingNamhyung Kim
I've realized that it doesn't make sense to accumulate the samples to parent in the callchain when data type profiling is enabled. Because it won't have the same data type access in the parent. Otherwise it'd see something like this: $ perf report -s type --stdio -g none # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:Pu' # Event count (approx.): 8266456478 # # Children Latency Self Latency Data Type # ........ ....... ........ ........ ......... # 698.97% 697.72% 99.80% 99.61% (unknown) 0.09% 0.18% 0.09% 0.18% Elf64_Rela 0.05% 0.10% 0.05% 0.10% unsigned char 0.05% 0.10% 0.05% 0.10% struct exit_function_list 0.00% 0.01% 0.00% 0.01% struct rtld_global Link: https://lore.kernel.org/r/20250307080829.354947-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf report: Allow hierarchy mode for --childrenNamhyung Kim
It was prohibited because the output fields in the children mode were not handled properly with hierarchy. But we can have the output fields in the same level, it can allow them together. For example, latency mode adds more output fields by default and now they are displayed properly. $ perf record --latency -g -- perf test -w thloop $ perf report -H --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:Pu' # Event count (approx.): 8266456478 # # Children Latency Overhead Latency Command / Shared Object / Symbol # ........................................... ........................................................ # 0.08% 0.16% 100.00% 100.00% perf 0.08% 0.16% 0.24% 0.47% ld-linux-x86-64.so.2 0.12% 0.24% 0.12% 0.24% [.] _dl_relocate_object 0.08% 0.16% 0.08% 0.16% [.] _dl_lookup_symbol_x 0.03% 0.06% 0.03% 0.06% [.] strcmp 0.00% 0.01% 0.00% 0.01% [.] _dl_start 0.00% 0.00% 0.00% 0.00% [.] _dl_start_user 0.00% 0.00% 0.00% 0.00% [.] _dl_sysdep_start 0.00% 0.00% 0.00% 0.00% [.] _start 0.00% 0.00% 0.00% 0.00% [.] dl_main 0.03% 0.06% 0.03% 0.06% libLLVM-16.so.1 0.03% 0.06% 0.03% 0.06% [.] llvm::StringMapImpl::RehashTable(unsigned int) 0.00% 0.00% 0.00% 0.00% [.] 0x00007f137ccd18e8 0.00% 0.00% 99.66% 99.31% perf 99.66% 99.31% 99.66% 99.31% [.] test_loop | |--49.86%--0x7f137b633d68 | 0x55dbdbbb7d2c ... Link: https://lore.kernel.org/r/20250307080829.354947-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-20perf sort: Keep output fields in the same levelNamhyung Kim
This is useful for hierarchy output mode where the first level is considered as output fields. We want them in the same level so that it can show only the remaining groups in the hierarchy. Before: $ perf report -s overhead,sample,period,comm,dso -H --stdio ... # Overhead Samples / Period / Command / Shared Object # ................. .......................................... # 100.00% 4035 100.00% 3835883066 100.00% perf 99.37% perf 0.50% ld-linux-x86-64.so.2 0.06% [unknown] 0.04% libc.so.6 0.02% libLLVM-16.so.1 After: $ perf report -s overhead,sample,period,comm,dso -H --stdio ... # Overhead Samples Period Command / Shared Object # ....................................... ....................... # 100.00% 4035 3835883066 perf 99.37% 4005 3811826223 perf 0.50% 19 19210014 ld-linux-x86-64.so.2 0.06% 8 2367089 [unknown] 0.04% 2 1720336 libc.so.6 0.02% 1 759404 libLLVM-16.so.1 Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250307080829.354947-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-19perf pmu: Handle memory failure in tool_pmu__new()Thomas Richter
On linux-next commit 72c6f57a4193 ("perf pmu: Dynamically allocate tool PMU") allocated PMU named "tool" dynamicly. However that allocation can fail and a NULL pointer is returned. That case is currently not handled and would result in an invalid address reference. Add a check for NULL pointer. Fixes: 72c6f57a4193 ("perf pmu: Dynamically allocate tool PMU") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250319122820.2898333-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>