diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-06-30 11:35:41 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-06-30 11:35:41 -0700 |
commit | b30d7a77c53ec04a6d94683d7680ec406b7f3ac8 (patch) | |
tree | 5c8d99d15eb1a9b28810a5358b098ac18daefa71 /tools/perf/util/thread.c | |
parent | d2a6fd45c5c4a5c5fdfe6c57f74f630e61d8d9a0 (diff) | |
parent | 4d60e83dfcee794213878155463d8f7353a80864 (diff) |
Merge tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull perf tools updates from Namhyung Kim:
"Internal cleanup:
- Refactor PMU data management to handle hybrid systems in a generic
way.
Do more work in the lexer so that legacy event types parse more
easily. A side-effect of this is that if a PMU is specified,
scanning sysfs is avoided improving start-up time.
- Fix hybrid metrics, for example, the TopdownL1 works for both
performance and efficiency cores on Intel machines. To support
this, sort and regroup events after parsing.
- Add reference count checking for the 'thread' data structure.
- Lots of fixes for memory leaks in various places thanks to the ASAN
and Ian's refcount checker.
- Reduce the binary size by replacing static variables with local or
dynamically allocated memory.
- Introduce shared_mutex for annotate data to reduce memory
footprint.
- Make filesystem access library functions more thread safe.
Test:
- Organize cpu_map tests into a single suite.
- Add metric value validation test to check if the values are within
correct value ranges.
- Add perf stat stdio output test to check if event and metric names
match.
- Add perf data converter JSON output test.
- Fix a lot of issues reported by shellcheck(1). This is a
preparation to enable shellcheck by default.
- Make the large x86 new instructions test optional at build time
using EXTRA_TESTS=1.
- Add a test for libpfm4 events.
perf script:
- Add 'dsoff' outpuf field to display offset from the DSO.
$ perf script -F comm,pid,event,ip,dsoff
ls 2695501 cycles: 152cc73ef4b5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so+0x1c4b5)
ls 2695501 cycles: ffffffff99045b3e ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968e107 ([kernel.kallsyms])
ls 2695501 cycles: ffffffffc1f54afb ([kernel.kallsyms])
ls 2695501 cycles: ffffffff9968382f ([kernel.kallsyms])
ls 2695501 cycles: ffffffff99e00094 ([kernel.kallsyms])
ls 2695501 cycles: 152cc718a8d0 (/usr/lib/x86_64-linux-gnu/libselinux.so.1+0x68d0)
ls 2695501 cycles: ffffffff992a6db0 ([kernel.kallsyms])
- Adjust width for large PID/TID values.
perf report:
- Robustify reading addr2line output for srcline by checking sentinel
output before the actual data and by using timeout of 1 second.
- Allow config terms (like 'name=ABC') with breakpoint events.
$ perf record -e mem:0x55feb98dd169:x/name=breakpoint/ -p 19646 -- sleep 1
perf annotate:
- Handle x86 instruction suffix like 'l' in 'movl' generally.
- Parse instruction operands properly even with a whitespace. This is
needed for llvm-objdump output.
- Support RISC-V binutils lookup using the triplet prefixes.
- Add '<' and '>' key to navigate to prev/next symbols in TUI.
- Fix instruction association and parsing for LoongArch.
perf stat:
- Add --per-cache aggregation option, optionally specify a cache
level like `--per-cache=L2`.
$ sudo perf stat --per-cache -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
taskset -c 0-15,64-79,128-143,192-207\
perf bench sched messaging -p -t -l 100000 -g 8
# Running 'sched/messaging' benchmark:
# 20 sender and receiver threads per group
# 8 groups == 320 threads run
Total time: 7.648 [sec]
Performance counter stats for 'system wide':
S0-D0-L3-ID0 16 17,145,912 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID8 16 14,977,628 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID16 16 262,539 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID24 16 3,140 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID32 16 27,403 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID40 16 17,026 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID48 16 7,292 ls_dmnd_fills_from_sys.ext_cache_remote
S0-D0-L3-ID56 16 2,464 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID64 16 22,489,306 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID72 16 21,455,257 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID80 16 11,619 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID88 16 30,978 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID96 16 37,628 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID104 16 13,594 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID112 16 10,164 ls_dmnd_fills_from_sys.ext_cache_remote
S1-D1-L3-ID120 16 11,259 ls_dmnd_fills_from_sys.ext_cache_remote
7.779171484 seconds time elapsed
- Change default (no event/metric) formatting for default metrics so
that events are hidden and the metric and group appear.
Performance counter stats for 'ls /':
1.85 msec task-clock # 0.594 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
97 page-faults # 52.517 K/sec
2,187,173 cycles # 1.184 GHz
2,474,459 instructions # 1.13 insn per cycle
531,584 branches # 287.805 M/sec
13,626 branch-misses # 2.56% of all branches
TopdownL1 # 23.5 % tma_backend_bound
# 11.5 % tma_bad_speculation
# 39.1 % tma_frontend_bound
# 25.9 % tma_retiring
- Allow --cputype option to have any PMU name (not just hybrid).
- Fix output value not to added when it runs multiple times with -r
option.
perf list:
- Show metricgroup description from JSON file called
metricgroups.json.
- Allow 'pfm' argument to list only libpfm4 events and check each
event is supported before showing it.
JSON vendor events:
- Avoid event grouping using "NO_GROUP_EVENTS" constraints. The
topdown events are correctly grouped even if no group exists.
- Add "Default" metric group to print it in the default output. And
use "DefaultMetricgroupName" to indicate the real metric group
name.
- Add AmpereOne core PMU events.
Misc:
- Define man page date correctly.
- Track exception level properly on ARM CoreSight ETM.
- Allow anonymous struct, union or enum when retrieving type names
from DWARF.
- Fix incorrect filename when calling `perf inject --jit`.
- Handle PLT size correctly on LoongArch"
* tag 'perf-tools-for-v6.5-1-2023-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (269 commits)
perf test: Skip metrics w/o event name in stat STD output linter
perf test: Reorder event name checks in stat STD output linter
perf pmu: Remove a hard coded cpu PMU assumption
perf pmus: Add notion of default PMU for JSON events
perf unwind: Fix map reference counts
perf test: Set PERF_EXEC_PATH for script execution
perf script: Initialize buffer for regs_map()
perf tests: Fix test_arm_callgraph_fp variable expansion
perf symbol: Add LoongArch case in get_plt_sizes()
perf test: Remove x permission from lib/stat_output.sh
perf test: Rerun failed metrics with longer workload
perf test: Add skip list for metrics known would fail
perf test: Add metric value validation test
perf jit: Fix incorrect file name in DWARF line table
perf annotate: Fix instruction association and parsing for LoongArch
perf annotation: Switch lock from a mutex to a sharded_mutex
perf sharded_mutex: Introduce sharded_mutex
tools: Fix incorrect calculation of object size by sizeof
perf subcmd: Fix missing check for return value of malloc() in add_cmdname()
perf parse-events: Remove unneeded semicolon
...
Diffstat (limited to 'tools/perf/util/thread.c')
-rw-r--r-- | tools/perf/util/thread.c | 222 |
1 files changed, 106 insertions, 116 deletions
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c index 4b5bdc277baa..0b166404c5c3 100644 --- a/tools/perf/util/thread.c +++ b/tools/perf/util/thread.c @@ -21,38 +21,40 @@ int thread__init_maps(struct thread *thread, struct machine *machine) { - pid_t pid = thread->pid_; + pid_t pid = thread__pid(thread); - if (pid == thread->tid || pid == -1) { - thread->maps = maps__new(machine); + if (pid == thread__tid(thread) || pid == -1) { + thread__set_maps(thread, maps__new(machine)); } else { struct thread *leader = __machine__findnew_thread(machine, pid, pid); + if (leader) { - thread->maps = maps__get(leader->maps); + thread__set_maps(thread, maps__get(thread__maps(leader))); thread__put(leader); } } - return thread->maps ? 0 : -1; + return thread__maps(thread) ? 0 : -1; } struct thread *thread__new(pid_t pid, pid_t tid) { char *comm_str; struct comm *comm; - struct thread *thread = zalloc(sizeof(*thread)); - - if (thread != NULL) { - thread->pid_ = pid; - thread->tid = tid; - thread->ppid = -1; - thread->cpu = -1; - thread->guest_cpu = -1; - thread->lbr_stitch_enable = false; - INIT_LIST_HEAD(&thread->namespaces_list); - INIT_LIST_HEAD(&thread->comm_list); - init_rwsem(&thread->namespaces_lock); - init_rwsem(&thread->comm_lock); + RC_STRUCT(thread) *_thread = zalloc(sizeof(*_thread)); + struct thread *thread; + + if (ADD_RC_CHK(thread, _thread) != NULL) { + thread__set_pid(thread, pid); + thread__set_tid(thread, tid); + thread__set_ppid(thread, -1); + thread__set_cpu(thread, -1); + thread__set_guest_cpu(thread, -1); + thread__set_lbr_stitch_enable(thread, false); + INIT_LIST_HEAD(thread__namespaces_list(thread)); + INIT_LIST_HEAD(thread__comm_list(thread)); + init_rwsem(thread__namespaces_lock(thread)); + init_rwsem(thread__comm_lock(thread)); comm_str = malloc(32); if (!comm_str) @@ -64,12 +66,11 @@ struct thread *thread__new(pid_t pid, pid_t tid) if (!comm) goto err_thread; - list_add(&comm->list, &thread->comm_list); - refcount_set(&thread->refcnt, 1); - RB_CLEAR_NODE(&thread->rb_node); + list_add(&comm->list, thread__comm_list(thread)); + refcount_set(thread__refcnt(thread), 1); /* Thread holds first ref to nsdata. */ - thread->nsinfo = nsinfo__new(pid); - srccode_state_init(&thread->srccode_state); + RC_CHK_ACCESS(thread)->nsinfo = nsinfo__new(pid); + srccode_state_init(thread__srccode_state(thread)); } return thread; @@ -84,89 +85,69 @@ void thread__delete(struct thread *thread) struct namespaces *namespaces, *tmp_namespaces; struct comm *comm, *tmp_comm; - BUG_ON(!RB_EMPTY_NODE(&thread->rb_node)); - thread_stack__free(thread); - if (thread->maps) { - maps__put(thread->maps); - thread->maps = NULL; + if (thread__maps(thread)) { + maps__put(thread__maps(thread)); + thread__set_maps(thread, NULL); } - down_write(&thread->namespaces_lock); + down_write(thread__namespaces_lock(thread)); list_for_each_entry_safe(namespaces, tmp_namespaces, - &thread->namespaces_list, list) { + thread__namespaces_list(thread), list) { list_del_init(&namespaces->list); namespaces__free(namespaces); } - up_write(&thread->namespaces_lock); + up_write(thread__namespaces_lock(thread)); - down_write(&thread->comm_lock); - list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) { + down_write(thread__comm_lock(thread)); + list_for_each_entry_safe(comm, tmp_comm, thread__comm_list(thread), list) { list_del_init(&comm->list); comm__free(comm); } - up_write(&thread->comm_lock); + up_write(thread__comm_lock(thread)); - nsinfo__zput(thread->nsinfo); - srccode_state_free(&thread->srccode_state); + nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo); + srccode_state_free(thread__srccode_state(thread)); - exit_rwsem(&thread->namespaces_lock); - exit_rwsem(&thread->comm_lock); + exit_rwsem(thread__namespaces_lock(thread)); + exit_rwsem(thread__comm_lock(thread)); thread__free_stitch_list(thread); - free(thread); + RC_CHK_FREE(thread); } struct thread *thread__get(struct thread *thread) { - if (thread) - refcount_inc(&thread->refcnt); - return thread; + struct thread *result; + + if (RC_CHK_GET(result, thread)) + refcount_inc(thread__refcnt(thread)); + + return result; } void thread__put(struct thread *thread) { - if (thread && refcount_dec_and_test(&thread->refcnt)) { - /* - * Remove it from the dead threads list, as last reference is - * gone, if it is in a dead threads list. - * - * We may not be there anymore if say, the machine where it was - * stored was already deleted, so we already removed it from - * the dead threads and some other piece of code still keeps a - * reference. - * - * This is what 'perf sched' does and finally drops it in - * perf_sched__lat(), where it calls perf_sched__read_events(), - * that processes the events by creating a session and deleting - * it, which ends up destroying the list heads for the dead - * threads, but before it does that it removes all threads from - * it using list_del_init(). - * - * So we need to check here if it is in a dead threads list and - * if so, remove it before finally deleting the thread, to avoid - * an use after free situation. - */ - if (!list_empty(&thread->node)) - list_del_init(&thread->node); + if (thread && refcount_dec_and_test(thread__refcnt(thread))) thread__delete(thread); - } + else + RC_CHK_PUT(thread); } -static struct namespaces *__thread__namespaces(const struct thread *thread) +static struct namespaces *__thread__namespaces(struct thread *thread) { - if (list_empty(&thread->namespaces_list)) + if (list_empty(thread__namespaces_list(thread))) return NULL; - return list_first_entry(&thread->namespaces_list, struct namespaces, list); + return list_first_entry(thread__namespaces_list(thread), struct namespaces, list); } struct namespaces *thread__namespaces(struct thread *thread) { struct namespaces *ns; - down_read(&thread->namespaces_lock); + down_read(thread__namespaces_lock(thread)); ns = __thread__namespaces(thread); - up_read(&thread->namespaces_lock); + up_read(thread__namespaces_lock(thread)); return ns; } @@ -180,7 +161,7 @@ static int __thread__set_namespaces(struct thread *thread, u64 timestamp, if (!new) return -ENOMEM; - list_add(&new->list, &thread->namespaces_list); + list_add(&new->list, thread__namespaces_list(thread)); if (timestamp && curr) { /* @@ -200,25 +181,25 @@ int thread__set_namespaces(struct thread *thread, u64 timestamp, { int ret; - down_write(&thread->namespaces_lock); + down_write(thread__namespaces_lock(thread)); ret = __thread__set_namespaces(thread, timestamp, event); - up_write(&thread->namespaces_lock); + up_write(thread__namespaces_lock(thread)); return ret; } -struct comm *thread__comm(const struct thread *thread) +struct comm *thread__comm(struct thread *thread) { - if (list_empty(&thread->comm_list)) + if (list_empty(thread__comm_list(thread))) return NULL; - return list_first_entry(&thread->comm_list, struct comm, list); + return list_first_entry(thread__comm_list(thread), struct comm, list); } -struct comm *thread__exec_comm(const struct thread *thread) +struct comm *thread__exec_comm(struct thread *thread) { struct comm *comm, *last = NULL, *second_last = NULL; - list_for_each_entry(comm, &thread->comm_list, list) { + list_for_each_entry(comm, thread__comm_list(thread), list) { if (comm->exec) return comm; second_last = last; @@ -231,7 +212,7 @@ struct comm *thread__exec_comm(const struct thread *thread) * thread, that is very probably wrong. Prefer a later comm to avoid * that case. */ - if (second_last && !last->start && thread->pid_ == thread->tid) + if (second_last && !last->start && thread__pid(thread) == thread__tid(thread)) return second_last; return last; @@ -243,7 +224,7 @@ static int ____thread__set_comm(struct thread *thread, const char *str, struct comm *new, *curr = thread__comm(thread); /* Override the default :tid entry */ - if (!thread->comm_set) { + if (!thread__comm_set(thread)) { int err = comm__override(curr, str, timestamp, exec); if (err) return err; @@ -251,13 +232,13 @@ static int ____thread__set_comm(struct thread *thread, const char *str, new = comm__new(str, timestamp, exec); if (!new) return -ENOMEM; - list_add(&new->list, &thread->comm_list); + list_add(&new->list, thread__comm_list(thread)); if (exec) - unwind__flush_access(thread->maps); + unwind__flush_access(thread__maps(thread)); } - thread->comm_set = true; + thread__set_comm_set(thread, true); return 0; } @@ -267,9 +248,9 @@ int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp, { int ret; - down_write(&thread->comm_lock); + down_write(thread__comm_lock(thread)); ret = ____thread__set_comm(thread, str, timestamp, exec); - up_write(&thread->comm_lock); + up_write(thread__comm_lock(thread)); return ret; } @@ -281,7 +262,7 @@ int thread__set_comm_from_proc(struct thread *thread) int err = -1; if (!(snprintf(path, sizeof(path), "%d/task/%d/comm", - thread->pid_, thread->tid) >= (int)sizeof(path)) && + thread__pid(thread), thread__tid(thread)) >= (int)sizeof(path)) && procfs__read_str(path, &comm, &sz) == 0) { comm[sz - 1] = '\0'; err = thread__set_comm(thread, comm, 0); @@ -290,7 +271,7 @@ int thread__set_comm_from_proc(struct thread *thread) return err; } -static const char *__thread__comm_str(const struct thread *thread) +static const char *__thread__comm_str(struct thread *thread) { const struct comm *comm = thread__comm(thread); @@ -304,9 +285,9 @@ const char *thread__comm_str(struct thread *thread) { const char *str; - down_read(&thread->comm_lock); + down_read(thread__comm_lock(thread)); str = __thread__comm_str(thread); - up_read(&thread->comm_lock); + up_read(thread__comm_lock(thread)); return str; } @@ -315,23 +296,23 @@ static int __thread__comm_len(struct thread *thread, const char *comm) { if (!comm) return 0; - thread->comm_len = strlen(comm); + thread__set_comm_len(thread, strlen(comm)); - return thread->comm_len; + return thread__var_comm_len(thread); } /* CHECKME: it should probably better return the max comm len from its comm list */ int thread__comm_len(struct thread *thread) { - int comm_len = thread->comm_len; + int comm_len = thread__var_comm_len(thread); if (!comm_len) { const char *comm; - down_read(&thread->comm_lock); + down_read(thread__comm_lock(thread)); comm = __thread__comm_str(thread); comm_len = __thread__comm_len(thread, comm); - up_read(&thread->comm_lock); + up_read(thread__comm_lock(thread)); } return comm_len; @@ -339,33 +320,33 @@ int thread__comm_len(struct thread *thread) size_t thread__fprintf(struct thread *thread, FILE *fp) { - return fprintf(fp, "Thread %d %s\n", thread->tid, thread__comm_str(thread)) + - maps__fprintf(thread->maps, fp); + return fprintf(fp, "Thread %d %s\n", thread__tid(thread), thread__comm_str(thread)) + + maps__fprintf(thread__maps(thread), fp); } int thread__insert_map(struct thread *thread, struct map *map) { int ret; - ret = unwind__prepare_access(thread->maps, map, NULL); + ret = unwind__prepare_access(thread__maps(thread), map, NULL); if (ret) return ret; - maps__fixup_overlappings(thread->maps, map, stderr); - return maps__insert(thread->maps, map); + maps__fixup_overlappings(thread__maps(thread), map, stderr); + return maps__insert(thread__maps(thread), map); } static int __thread__prepare_access(struct thread *thread) { bool initialized = false; int err = 0; - struct maps *maps = thread->maps; + struct maps *maps = thread__maps(thread); struct map_rb_node *rb_node; down_read(maps__lock(maps)); maps__for_each_entry(maps, rb_node) { - err = unwind__prepare_access(thread->maps, rb_node->map, &initialized); + err = unwind__prepare_access(thread__maps(thread), rb_node->map, &initialized); if (err || initialized) break; } @@ -388,21 +369,22 @@ static int thread__prepare_access(struct thread *thread) static int thread__clone_maps(struct thread *thread, struct thread *parent, bool do_maps_clone) { /* This is new thread, we share map groups for process. */ - if (thread->pid_ == parent->pid_) + if (thread__pid(thread) == thread__pid(parent)) return thread__prepare_access(thread); - if (thread->maps == parent->maps) { + if (thread__maps(thread) == thread__maps(parent)) { pr_debug("broken map groups on thread %d/%d parent %d/%d\n", - thread->pid_, thread->tid, parent->pid_, parent->tid); + thread__pid(thread), thread__tid(thread), + thread__pid(parent), thread__tid(parent)); return 0; } /* But this one is new process, copy maps. */ - return do_maps_clone ? maps__clone(thread, parent->maps) : 0; + return do_maps_clone ? maps__clone(thread, thread__maps(parent)) : 0; } int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp, bool do_maps_clone) { - if (parent->comm_set) { + if (thread__comm_set(parent)) { const char *comm = thread__comm_str(parent); int err; if (!comm) @@ -412,7 +394,7 @@ int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp, bo return err; } - thread->ppid = parent->tid; + thread__set_ppid(thread, thread__tid(parent)); return thread__clone_maps(thread, parent, do_maps_clone); } @@ -436,13 +418,13 @@ void thread__find_cpumode_addr_location(struct thread *thread, u64 addr, struct thread *thread__main_thread(struct machine *machine, struct thread *thread) { - if (thread->pid_ == thread->tid) + if (thread__pid(thread) == thread__tid(thread)) return thread__get(thread); - if (thread->pid_ == -1) + if (thread__pid(thread) == -1) return NULL; - return machine__find_thread(machine, thread->pid_, thread->pid_); + return machine__find_thread(machine, thread__pid(thread), thread__pid(thread)); } int thread__memcpy(struct thread *thread, struct machine *machine, @@ -456,24 +438,31 @@ int thread__memcpy(struct thread *thread, struct machine *machine, if (machine__kernel_ip(machine, ip)) cpumode = PERF_RECORD_MISC_KERNEL; - if (!thread__find_map(thread, cpumode, ip, &al)) - return -1; + addr_location__init(&al); + if (!thread__find_map(thread, cpumode, ip, &al)) { + addr_location__exit(&al); + return -1; + } dso = map__dso(al.map); - if( !dso || dso->data.status == DSO_DATA_STATUS_ERROR || map__load(al.map) < 0) + if (!dso || dso->data.status == DSO_DATA_STATUS_ERROR || map__load(al.map) < 0) { + addr_location__exit(&al); return -1; + } offset = map__map_ip(al.map, ip); if (is64bit) *is64bit = dso->is_64_bit; + addr_location__exit(&al); + return dso__data_read_offset(dso, machine, offset, buf, len); } void thread__free_stitch_list(struct thread *thread) { - struct lbr_stitch *lbr_stitch = thread->lbr_stitch; + struct lbr_stitch *lbr_stitch = thread__lbr_stitch(thread); struct stitch_list *pos, *tmp; if (!lbr_stitch) @@ -490,5 +479,6 @@ void thread__free_stitch_list(struct thread *thread) } zfree(&lbr_stitch->prev_lbr_cursor); - zfree(&thread->lbr_stitch); + free(thread__lbr_stitch(thread)); + thread__set_lbr_stitch(thread, NULL); } |