summaryrefslogtreecommitdiff
path: root/kernel/trace/trace_events_hist.c
AgeCommit message (Collapse)Author
2022-12-10tracing/hist: Fix issue of losting command info in error_logZheng Yejian
When input some constructed invalid 'trigger' command, command info in 'error_log' are lost [1]. The root cause is that there is a path that event_hist_trigger_parse() is recursely called once and 'last_cmd' which save origin command is cleared, then later calling of hist_err() will no longer record origin command info: event_hist_trigger_parse() { last_cmd_set() // <1> 'last_cmd' save origin command here at first create_actions() { onmatch_create() { action_create() { trace_action_create() { trace_action_create_field_var() { create_field_var_hist() { event_hist_trigger_parse() { // <2> recursely called once hist_err_clear() // <3> 'last_cmd' is cleared here } hist_err() // <4> No longer find origin command!!! Since 'glob' is empty string while running into the recurse call, we can trickly check it and bypass the call of hist_err_clear() to solve it. [1] # cd /sys/kernel/tracing # echo "my_synth_event int v1; int v2; int v3;" >> synthetic_events # echo 'hist:keys=pid' >> events/sched/sched_waking/trigger # echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\ pid,pid1)" >> events/sched/sched_switch/trigger # cat error_log [ 8.405018] hist:sched:sched_switch: error: Couldn't find synthetic event Command: hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1) ^ [ 8.816902] hist:sched:sched_switch: error: Couldn't find field Command: hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1) ^ [ 8.816902] hist:sched:sched_switch: error: Couldn't parse field variable Command: hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(pid,pid1) ^ [ 8.999880] : error: Couldn't find field Command: ^ [ 8.999880] : error: Couldn't parse field variable Command: ^ [ 8.999880] : error: Couldn't find field Command: ^ [ 8.999880] : error: Couldn't create histogram for field Command: ^ Link: https://lore.kernel.org/linux-trace-kernel/20221207135326.3483216-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Fixes: f404da6e1d46 ("tracing: Add 'last error' error facility for hist triggers") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'Zheng Yejian
When generate a synthetic event with many params and then create a trace action for it [1], kernel panic happened [2]. It is because that in trace_action_create() 'data->n_params' is up to SYNTH_FIELDS_MAX (current value is 64), and array 'data->var_ref_idx' keeps indices into array 'hist_data->var_refs' for each synthetic event param, but the length of 'data->var_ref_idx' is TRACING_MAP_VARS_MAX (current value is 16), so out-of-bound write happened when 'data->n_params' more than 16. In this case, 'data->match_data.event' is overwritten and eventually cause the panic. To solve the issue, adjust the length of 'data->var_ref_idx' to be SYNTH_FIELDS_MAX and add sanity checks to avoid out-of-bound write. [1] # cd /sys/kernel/tracing/ # echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\ int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\ int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\ int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\ int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\ int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\ int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\ int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\ int v63" >> synthetic_events # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="bash"' >> \ events/sched/sched_waking/trigger # echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\ pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\ pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\ pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\ pid,pid,pid,pid,pid,pid,pid,pid,pid)" >> events/sched/sched_switch/trigger [2] BUG: unable to handle page fault for address: ffff91c900000000 PGD 61001067 P4D 61001067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 322 Comm: bash Tainted: G W 6.1.0-rc8+ #229 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:strcmp+0xc/0x30 Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3 RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000 RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000 R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580 R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538 FS: 00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0 Call Trace: <TASK> __find_event_file+0x55/0x90 action_create+0x76c/0x1060 event_hist_trigger_parse+0x146d/0x2060 ? event_trigger_write+0x31/0xd0 trigger_process_regex+0xbb/0x110 event_trigger_write+0x6b/0xd0 vfs_write+0xc8/0x3e0 ? alloc_fd+0xc0/0x160 ? preempt_count_add+0x4d/0xa0 ? preempt_count_add+0x70/0xa0 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f1d1d0cf077 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 RSP: 002b:00007ffcebb0e568 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000143 RCX: 00007f1d1d0cf077 RDX: 0000000000000143 RSI: 00005639265aa7e0 RDI: 0000000000000001 RBP: 00005639265aa7e0 R08: 000000000000000a R09: 0000000000000142 R10: 000056392639c017 R11: 0000000000000246 R12: 0000000000000143 R13: 00007f1d1d1ae6a0 R14: 00007f1d1d1aa4a0 R15: 00007f1d1d1a98a0 </TASK> Modules linked in: CR2: ffff91c900000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:strcmp+0xc/0x30 Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3 RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000 RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000 R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580 R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538 FS: 00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0 Link: https://lore.kernel.org/linux-trace-kernel/20221207035143.2278781-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: d380dcde9a07 ("tracing: Fix now invalid var_ref_vals assumption in trace action") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-10tracing/hist: Fix wrong return value in parse_action_params()Zheng Yejian
When number of synth fields is more than SYNTH_FIELDS_MAX, parse_action_params() should return -EINVAL. Link: https://lore.kernel.org/linux-trace-kernel/20221207034635.2253990-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: c282a386a397 ("tracing: Add 'onmatch' hist trigger action support") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-09tracing: Add nohitcount option for suppressing display of raw hitcountMasami Hiramatsu (Google)
Add 'nohitcount' ('NOHC' for short) option for suppressing display of the raw hitcount column in the histogram. Note that you must specify at least one value except raw 'hitcount' when you specify this nohitcount option. # cd /sys/kernel/debug/tracing/ # echo hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:NOHC > \ events/sched/sched_stat_runtime/trigger # sleep 10 # cat events/sched/sched_stat_runtime/hist # event histogram # # trigger info: hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:size=2048:nohitcount [active] # { pid: 8 } runtime (%): 3.02 runtime: # { pid: 14 } runtime (%): 2.25 runtime: { pid: 16 } runtime (%): 2.25 runtime: { pid: 26 } runtime (%): 0.17 runtime: { pid: 61 } runtime (%): 11.52 runtime: #### { pid: 67 } runtime (%): 1.56 runtime: { pid: 68 } runtime (%): 0.84 runtime: { pid: 76 } runtime (%): 0.92 runtime: { pid: 117 } runtime (%): 2.50 runtime: # { pid: 146 } runtime (%): 49.88 runtime: #################### { pid: 157 } runtime (%): 16.63 runtime: ###### { pid: 158 } runtime (%): 8.38 runtime: ### Link: https://lore.kernel.org/linux-trace-kernel/166610814787.56030.4980636083486339906.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org>
2022-12-09tracing: Add .graph suffix option to histogram valueMasami Hiramatsu (Google)
Add the .graph suffix which shows the bar graph of the histogram value. For example, the below example shows that the bar graph of the histogram of the runtime for each tasks. ------ # cd /sys/kernel/debug/tracing/ # echo hist:keys=pid:vals=runtime.graph:sort=pid > \ events/sched/sched_stat_runtime/trigger # sleep 10 # cat events/sched/sched_stat_runtime/hist # event histogram # # trigger info: hist:keys=pid:vals=hitcount,runtime.graph:sort=pid:size=2048 [active] # { pid: 14 } hitcount: 2 runtime: { pid: 16 } hitcount: 8 runtime: { pid: 26 } hitcount: 1 runtime: { pid: 57 } hitcount: 3 runtime: { pid: 61 } hitcount: 20 runtime: ### { pid: 66 } hitcount: 2 runtime: { pid: 70 } hitcount: 3 runtime: { pid: 72 } hitcount: 2 runtime: { pid: 145 } hitcount: 14 runtime: #################### { pid: 152 } hitcount: 5 runtime: ####### { pid: 153 } hitcount: 2 runtime: #### Totals: Hits: 62 Entries: 11 Dropped: 0 ------- Link: https://lore.kernel.org/linux-trace-kernel/166610813953.56030.10944148382315789485.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org>
2022-12-09tracing: Add .percent suffix option to histogram valuesMasami Hiramatsu (Google)
Add .percent suffix option to show the histogram values in percentage. This feature is useful when we need yo undersntand the overall trend for the histograms of large values. E.g. this shows the runtime percentage for each tasks. ------ # cd /sys/kernel/debug/tracing/ # echo hist:keys=pid:vals=hitcount,runtime.percent:sort=pid > \ events/sched/sched_stat_runtime/trigger # sleep 10 # cat events/sched/sched_stat_runtime/hist # event histogram # # trigger info: hist:keys=pid:vals=hitcount,runtime.percent:sort=pid:size=2048 [active] # { pid: 8 } hitcount: 7 runtime (%): 4.14 { pid: 14 } hitcount: 5 runtime (%): 3.69 { pid: 16 } hitcount: 11 runtime (%): 3.41 { pid: 61 } hitcount: 41 runtime (%): 19.75 { pid: 65 } hitcount: 4 runtime (%): 1.48 { pid: 70 } hitcount: 6 runtime (%): 3.60 { pid: 72 } hitcount: 2 runtime (%): 1.10 { pid: 144 } hitcount: 10 runtime (%): 32.01 { pid: 151 } hitcount: 8 runtime (%): 22.66 { pid: 152 } hitcount: 2 runtime (%): 8.10 Totals: Hits: 96 Entries: 10 Dropped: 0 ----- Link: https://lore.kernel.org/linux-trace-kernel/166610813077.56030.4238090506973562347.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org>
2022-12-09tracing: Allow multiple hitcount values in histogramsTom Zanussi
The hitcount is treated specially in the histograms - since it's always expected to be there regardless of whether the user specified anything or not, it's always added as the first histogram value. Currently the code doesn't allow it to be added more than once as a value, which is inconsistent with all the other possible values. It would seem to be a pointless thing to want to do, but other features being added such as percent and graph modifiers don't work properly with the current hitcount restrictions. Fix this by allowing multiple hitcounts to be added. Link: https://lore.kernel.org/linux-trace-kernel/166610812248.56030.16754785928712505251.stgit@devnote2 Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org>
2022-11-23tracing: Fix race where histograms can be called before the eventSteven Rostedt (Google)
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before the event") fixed an issue where if an event is soft disabled, and the trigger is being added, there's a small window where the event sees that there's a trigger but does not see that it requires reading the event yet, and then calls the trigger with the record == NULL. This could be solved with adding memory barriers in the hot path, or to make sure that all the triggers requiring a record check for NULL. The latter was chosen. Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but the same needs to be done with histograms. Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22tracing/hist: add in missing * in comment blocksColin Ian King
There are a couple of missing * in comment blocks. Fix these. Cleans up two clang warnings: kernel/trace/trace_events_hist.c:986: warning: bad line: kernel/trace/trace_events_hist.c:3229: warning: bad line: Link: https://lkml.kernel.org/r/20221020133019.1547587-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/hist: Call hist functions directly via a switch statementSteven Rostedt (Google)
Due to retpolines, indirect calls are much more expensive than direct calls. The histograms have a select set of functions it uses for the histograms, instead of using function pointers to call them, create a hist_fn_call() function that uses a switch statement to call the histogram functions directly. This gives a 13% speedup to the histogram logic. Using the histogram benchmark: Before: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 129 } hitcount: 2213 { delta: 130 } hitcount: 285965 { delta: 131 } hitcount: 1146545 { delta: 132 } hitcount: 5185432 { delta: 133 } hitcount: 19896215 { delta: 134 } hitcount: 53118616 { delta: 135 } hitcount: 83816709 { delta: 136 } hitcount: 68329562 { delta: 137 } hitcount: 41859349 { delta: 138 } hitcount: 46257797 { delta: 139 } hitcount: 54400831 { delta: 140 } hitcount: 72875007 { delta: 141 } hitcount: 76193272 { delta: 142 } hitcount: 49504263 { delta: 143 } hitcount: 38821072 { delta: 144 } hitcount: 47702679 { delta: 145 } hitcount: 41357297 { delta: 146 } hitcount: 22058238 { delta: 147 } hitcount: 9720002 { delta: 148 } hitcount: 3193542 { delta: 149 } hitcount: 927030 { delta: 150 } hitcount: 850772 { delta: 151 } hitcount: 1477380 { delta: 152 } hitcount: 2687977 { delta: 153 } hitcount: 2865985 { delta: 154 } hitcount: 1977492 { delta: 155 } hitcount: 2475607 { delta: 156 } hitcount: 3403612 After: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 113 } hitcount: 272 { delta: 114 } hitcount: 840 { delta: 118 } hitcount: 344 { delta: 119 } hitcount: 25428 { delta: 120 } hitcount: 350590 { delta: 121 } hitcount: 1892484 { delta: 122 } hitcount: 6205004 { delta: 123 } hitcount: 11583521 { delta: 124 } hitcount: 37590979 { delta: 125 } hitcount: 108308504 { delta: 126 } hitcount: 131672461 { delta: 127 } hitcount: 88700598 { delta: 128 } hitcount: 65939870 { delta: 129 } hitcount: 45055004 { delta: 130 } hitcount: 33174464 { delta: 131 } hitcount: 31813493 { delta: 132 } hitcount: 29011676 { delta: 133 } hitcount: 22798782 { delta: 134 } hitcount: 22072486 { delta: 135 } hitcount: 17034113 { delta: 136 } hitcount: 8982490 { delta: 137 } hitcount: 2865908 { delta: 138 } hitcount: 980382 { delta: 139 } hitcount: 1651944 { delta: 140 } hitcount: 4112073 { delta: 141 } hitcount: 3963269 { delta: 142 } hitcount: 1712508 { delta: 143 } hitcount: 575941 { delta: 144 } hitcount: 351427 { delta: 145 } hitcount: 218077 { delta: 146 } hitcount: 167297 { delta: 147 } hitcount: 146198 { delta: 148 } hitcount: 116122 { delta: 149 } hitcount: 58993 { delta: 150 } hitcount: 40228 The delta above is in nanoseconds. It brings the fastest time down from 129ns to 113ns, and the peak from 141ns to 126ns. Link: https://lkml.kernel.org/r/20220906225529.411545333@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12tracing/histograms: Simplify create_hist_fields()Zheng Yejian
When I look into implements of create_hist_fields(), I think there can be following two simplifications: 1. If something wrong happened in parse_var_defs(), free_var_defs() would have been called in it, so no need goto free again after calling it; 2. After calling create_key_fields(), regardless of the value of 'ret', it then always runs into 'out: ', so the judge of 'ret' is redundant. Link: https://lkml.kernel.org/r/20220630013152.164871-1-zhengyejian1@huawei.com Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Reviewed-by: Tom Rix <trix@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12tracing/histograms: Fix memory leak problemZheng Yejian
This reverts commit 46bbe5c671e06f070428b9be142cc4ee5cedebac. As commit 46bbe5c671e0 ("tracing: fix double free") said, the "double free" problem reported by clang static analyzer is: > In parse_var_defs() if there is a problem allocating > var_defs.expr, the earlier var_defs.name is freed. > This free is duplicated by free_var_defs() which frees > the rest of the list. However, if there is a problem allocating N-th var_defs.expr: + in parse_var_defs(), the freed 'earlier var_defs.name' is actually the N-th var_defs.name; + then in free_var_defs(), the names from 0th to (N-1)-th are freed; IF ALLOCATING PROBLEM HAPPENED HERE!!! -+ \ | 0th 1th (N-1)-th N-th V +-------------+-------------+-----+-------------+----------- var_defs: | name | expr | name | expr | ... | name | expr | name | /// +-------------+-------------+-----+-------------+----------- These two frees don't act on same name, so there was no "double free" problem before. Conversely, after that commit, we get a "memory leak" problem because the above "N-th var_defs.name" is not freed. If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th var_defs.expr allocated, then execute on shell like: $ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger Then kmemleak reports: unreferenced object 0xffff8fb100ef3518 (size 8): comm "bash", pid 196, jiffies 4295681690 (age 28.538s) hex dump (first 8 bytes): 76 31 00 00 b1 8f ff ff v1...... backtrace: [<0000000038fe4895>] kstrdup+0x2d/0x60 [<00000000c99c049a>] event_hist_trigger_parse+0x206f/0x20e0 [<00000000ae70d2cc>] trigger_process_regex+0xc0/0x110 [<0000000066737a4c>] event_trigger_write+0x75/0xd0 [<000000007341e40c>] vfs_write+0xbb/0x2a0 [<0000000087fde4c2>] ksys_write+0x59/0xd0 [<00000000581e9cdf>] do_syscall_64+0x3a/0x80 [<00000000cf3b065c>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lkml.kernel.org/r/20220711014731.69520-1-zhengyejian1@huawei.com Cc: stable@vger.kernel.org Fixes: 46bbe5c671e0 ("tracing: fix double free") Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26tracing: Change "char *" string form to "char []"liqiong
The "char []" string form declares a single variable. It is better than "char *" which creates two variables in the final assembly. Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com Signed-off-by: liqiong <liqiong@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26tracing: Fix potential double free in create_var_ref()Keita Suzuki
In create_var_ref(), init_var_ref() is called to initialize the fields of variable ref_field, which is allocated in the previous function call to create_hist_field(). Function init_var_ref() allocates the corresponding fields such as ref_field->system, but frees these fields when the function encounters an error. The caller later calls destroy_hist_field() to conduct error handling, which frees the fields and the variable itself. This results in double free of the fields which are already freed in the previous function. Fix this by storing NULL to the corresponding fields when they are freed in init_var_ref(). Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp Fixes: 067fe038e70f ("tracing: Add variable reference handling to hist triggers") CC: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-27tracing: Replace usage of found with dedicated list iterator variableJakob Koschel
To move the list iterator variable into the list_for_each_entry_*() macro in the future it should be avoided to use the list iterator variable after the loop body. To *never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lkml.kernel.org/r/20220427170734.819891-4-jakobkoschel@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-26tracing: Change `if (strlen(glob))` to `if (glob[0])`Ammar Faizi
No need to traverse to the end of string. If the first byte is not a NUL char, it's guaranteed `if (strlen(glob))` is true. Link: https://lkml.kernel.org/r/20220417185630.199062-3-ammarfaizi2@gnuweeb.org Cc: Ingo Molnar <mingo@redhat.com> Cc: GNU/Weeb Mailing List <gwml@vger.gnuweeb.org> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-26tracing: Return -EINVAL if WARN_ON(!glob) triggered in ↵Ammar Faizi
event_hist_trigger_parse() If `WARN_ON(!glob)` is ever triggered, we will still continue executing the next lines. This will trigger the more serious problem, a NULL pointer dereference bug. Just return -EINVAL if @glob is NULL. Link: https://lkml.kernel.org/r/20220417185630.199062-2-ammarfaizi2@gnuweeb.org Cc: Ingo Molnar <mingo@redhat.com> Cc: GNU/Weeb Mailing List <gwml@vger.gnuweeb.org> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-26tracing: Separate hist state updates from hist registrationTom Zanussi
hist_register_trigger() handles both new hist registration as well as existing hist registration through event_command.reg(). Adding a new function, existing_hist_update_only(), that checks and updates existing histograms and exits after doing so allows the confusing logic in event_hist_trigger_parse() to be simplified. Link: https://lkml.kernel.org/r/211b2cd3e3d7e00f4f8ad45ef8b33063da6a7e05.1644010576.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-26tracing: Have existing event_command.parse() implementations use helpersTom Zanussi
Simplify the existing event_command.parse() implementations by having them make use of the helper functions previously introduced. Link: https://lkml.kernel.org/r/b353e3427a81f9d3adafd98fd7d73e78a8209f43.1644010576.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-26tracing: Remove redundant trigger_ops paramsTom Zanussi
Since event_trigger_data contains the .ops trigger_ops field, there's no reason to pass the trigger_ops separately. Remove it as a param from functions whenever event_trigger_data is passed. Link: https://lkml.kernel.org/r/9856c9bc81bde57077f5b8d6f8faa47156c6354a.1644010575.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-26tracing: Remove logic for registering multiple event triggers at a timeTom Zanussi
Code for registering triggers assumes it's possible to register more than one trigger at a time. In fact, it's unimplemented and there doesn't seem to be a reason to do that. Remove the n_registered param from event_trigger_register() and fix up callers. Doing so simplifies the logic in event_trigger_register to the point that it just becomes a wrapper calling event_command.reg(). It also removes the problematic call to event_command.unreg() in case of failure. A new function, event_trigger_unregister() is also added for callers to call themselves. The changes to trace_events_hist.c simply allow compilation; a separate patch follows which updates the hist triggers to work correctly with the new changes. Link: https://lkml.kernel.org/r/6149fec7a139d93e84fa4535672fb5bef88006b0.1644010575.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-23Merge tag 'trace-v5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - New user_events interface. User space can register an event with the kernel describing the format of the event. Then it will receive a byte in a page mapping that it can check against. A privileged task can then enable that event like any other event, which will change the mapped byte to true, telling the user space application to start writing the event to the tracing buffer. - Add new "ftrace_boot_snapshot" kernel command line parameter. When set, the tracing buffer will be saved in the snapshot buffer at boot up when the kernel hands things over to user space. This will keep the traces that happened at boot up available even if user space boot up has tracing as well. - Have TRACE_EVENT_ENUM() also update trace event field type descriptions. Thus if a static array defines its size with an enum, the user space trace event parsers can still know how to parse that array. - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the TRACE_EVENT() macro, but will attach to an existing tracepoint. This will make one tracepoint be able to trace different content and not be stuck at only what the original TRACE_EVENT() macro exports. - Fixes to tracing error logging. - Better saving of cmdlines to PIDs when tracing (use the wakeup events for mapping). * tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits) tracing: Have type enum modifications copy the strings user_events: Add trace event call as root for low permission cases tracing/user_events: Use alloc_pages instead of kzalloc() for register pages tracing: Add snapshot at end of kernel boot up tracing: Have TRACE_DEFINE_ENUM affect trace event types as well tracing: Fix strncpy warning in trace_events_synth.c user_events: Prevent dyn_event delete racing with ioctl add/delete tracing: Add TRACE_CUSTOM_EVENT() macro tracing: Move the defines to create TRACE_EVENTS into their own files tracing: Add sample code for custom trace events tracing: Allow custom events to be added to the tracefs directory tracing: Fix last_cmd_set() string management in histogram code user_events: Fix potential uninitialized pointer while parsing field tracing: Fix allocation of last_cmd in last_cmd_set() user_events: Add documentation file user_events: Add sample code for typical usage user_events: Add self-test for validator boundaries user_events: Add self-test for perf_event integration user_events: Add self-test for dynamic_events integration user_events: Add self-test for ftrace integration ...
2022-03-10tracing: Fix last_cmd_set() string management in histogram codeSteven Rostedt (Google)
Using strnlen(dest, str, n) is confusing, as the size of dest must be strlen(dest) + n + 1. Even more confusing, using sizeof(string constant) gives you strlen(string constant) + 1 and not just strlen(). These two together made using strncat() with a constant string a bit off in the calculations as we have: len = sizeof(HIST_PREFIX) + strlen(str) + 1; kfree(last_cmd); last_cmd = kzalloc(len, GFP_KERNEL); strcpy(last_cmd, HIST_PREFIX); len -= sizeof(HIST_PREFIX) + 1; strncat(last_cmd, str, len); The above works if we s/sizeof/strlen/ with HIST_PREFIX (which is defined as "hist:", but because sizeof(HIST_PREFIX) is equal to strlen(HIST_PREFIX) + 1, we can drop the +1 in the code. But at least comment that we are doing so. Link: https://lore.kernel.org/all/202203082112.Iu7tvFl4-lkp@intel.com/ Fixes: 9f8e5aee93ed2 ("tracing: Fix allocation of last_cmd in last_cmd_set()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-01tracing/histogram: Fix sorting on old "cpu" valueSteven Rostedt (Google)
When trying to add a histogram against an event with the "cpu" field, it was impossible due to "cpu" being a keyword to key off of the running CPU. So to fix this, it was changed to "common_cpu" to match the other generic fields (like "common_pid"). But since some scripts used "cpu" for keying off of the CPU (for events that did not have "cpu" as a field, which is most of them), a backward compatibility trick was added such that if "cpu" was used as a key, and the event did not have "cpu" as a field name, then it would fallback and switch over to "common_cpu". This fix has a couple of subtle bugs. One was that when switching over to "common_cpu", it did not change the field name, it just set a flag. But the code still found a "cpu" field. The "cpu" field is used for filtering and is returned when the event does not have a "cpu" field. This was found by: # cd /sys/kernel/tracing # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger # cat events/sched/sched_wakeup/hist Which showed the histogram unsorted: { cpu: 19, pid: 1175 } hitcount: 1 { cpu: 6, pid: 239 } hitcount: 2 { cpu: 23, pid: 1186 } hitcount: 14 { cpu: 12, pid: 249 } hitcount: 2 { cpu: 3, pid: 994 } hitcount: 5 Instead of hard coding the "cpu" checks, take advantage of the fact that trace_event_field_field() returns a special field for "cpu" and "CPU" if the event does not have "cpu" as a field. This special field has the "filter_type" of "FILTER_CPU". Check that to test if the returned field is of the CPU type instead of doing the string compare. Also, fix the sorting bug by testing for the hist_field flag of HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use the special CPU field to know what compare routine to use, and since that special field does not have a size, it returns tracing_map_cmp_none. Cc: stable@vger.kernel.org Fixes: 1e3bac71c505 ("tracing/histogram: Rename "cpu" to "common_cpu"") Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-23tracing: Fix allocation of last_cmd in last_cmd_set()Steven Rostedt (Google)
The strncat() used in last_cmd_set() includes the nul byte of length of the string being copied in, when it should only hold the size of the string being copied (not the nul byte). Change it to subtract the length of the allocated space and the nul byte to pass that into the strncat(). Also, assign "len" instead of initializing it to zero and its first update is to do a "+=". Link: https://lore.kernel.org/all/202202140628.fj6e4w4v-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-10tracing: Remove size restriction on hist trigger cmd error loggingTom Zanussi
Currently, hist trigger command error strings are restricted to a length of MAX_FILTER_STR_VAL (256), which is too short for some commands already seen in the wild (with cmd strings longer than that showing up truncated in err_log). Remove the restriction so that no hist trigger command error string is ever truncated. Link: https://lkml.kernel.org/r/0f9d46407222eaf6632cd3b417bc50a11f401b71.1643399022.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27tracing: Propagate is_signed to expressionTom Zanussi
During expression parsing, a new expression field is created which should inherit the properties of the operands, such as size and is_signed. is_signed propagation was missing, causing spurious errors with signed operands. Add it in parse_expr() and parse_unary() to fix the problem. Link: https://lkml.kernel.org/r/f4dac08742fd7a0920bf80a73c6c44042f5eaa40.1643319703.git.zanussi@kernel.org Cc: stable@vger.kernel.org Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers") Reported-by: Yordan Karadzhov <ykaradzhov@vmware.com> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215513 Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27tracing: Fix smatch warning for do while check in event_hist_trigger_parse()Tom Zanussi
The patch ec5ce0987541: "tracing: Allow whitespace to surround hist trigger filter" from Jan 15, 2018, leads to the following Smatch static checker warning: kernel/trace/trace_events_hist.c:6199 event_hist_trigger_parse() warn: 'p' can't be NULL. Since p is always checked for a NULL value at the top of loop and nothing in the rest of the loop will set it to NULL, the warning is correct and might as well be 1 to silence the warning. Link: https://lkml.kernel.org/r/a1d4c79766c0cf61e20438dc35244d216633fef6.1643319703.git.zanussi@kernel.org Fixes: ec5ce09875410 ("tracing: Allow whitespace to surround hist trigger filter") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27tracing: Fix smatch warning for null glob in event_hist_trigger_parse()Tom Zanussi
The recent rename of event_hist_trigger_parse() caused smatch re-evaluation of trace_events_hist.c and as a result an old warning was found: kernel/trace/trace_events_hist.c:6174 event_hist_trigger_parse() error: we previously assumed 'glob' could be null (see line 6166) glob should never be null (and apparently smatch can also figure that out and skip the warning when using the cross-function DB (but which can't be used with a 0day build as it takes too much time to generate)). Nonetheless for clarity, remove the test but add a WARN_ON() in case the code ever changes. Link: https://lkml.kernel.org/r/96925e5c1f116654ada7ea0613d930b1266b5e1c.1643319703.git.zanussi@kernel.org Fixes: f404da6e1d46c ("tracing: Add 'last error' error facility for hist triggers") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27tracing/histogram: Fix a potential memory leak for kstrdup()Xiaoke Wang
kfree() is missing on an error path to free the memory allocated by kstrdup(): p = param = kstrdup(data->params[i], GFP_KERNEL); So it is better to free it via kfree(p). Link: https://lkml.kernel.org/r/tencent_C52895FD37802832A3E5B272D05008866F0A@qq.com Cc: stable@vger.kernel.org Fixes: d380dcde9a07c ("tracing: Fix now invalid var_ref_vals assumption in trace action") Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-10tracing: Remove ops param from event_command reg()/unreg() callbacksTom Zanussi
The event_trigger_ops for an event_command are already accessible via event_trigger_data.ops so remove the redundant ops from the callback. Link: https://lkml.kernel.org/r/4c6f2a41820452f9cacddc7634ad442928aa2aa6.1641823001.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-10tracing: Change event_trigger_ops func() to trigger()Tom Zanussi
The name of the func() callback on event_trigger_ops is too generic and is easily confused with other callbacks with that name, so change it to something that reflects its actual purpose. In this case, the main purpose of the callback is to implement an event trigger, so call it trigger() instead. Also add some more documentation to event_trigger_ops describing the callbacks a bit better. Link: https://lkml.kernel.org/r/36ab812e3ee74ee03ae0043fda41a858ee728c00.1641823001.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2022-01-10tracing: Change event_command func() to parse()Tom Zanussi
The name of the func() callback on event_command is too generic and is easily confused with other callbacks with that name, so change it to something that reflects its actual purpose. In this case, the main purpose of the callback is to parse an event command, so call it parse() instead. Link: https://lkml.kernel.org/r/7784e321840752ed88aac0b349c0c685fc9247b1.1641823001.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2021-12-06tracing: Support __rel_loc relative dynamic data location attributeMasami Hiramatsu
Add '__rel_loc' new dynamic data location attribute which encodes the data location from the next to the field itself. The '__data_loc' is used for encoding the dynamic data location on the trace event record. But '__data_loc' is not useful if the writer doesn't know the event header (e.g. user event), because it records the dynamic data offset from the entry of the record, not the field itself. This new '__rel_loc' attribute encodes the data location relatively from the next of the field. For example, when there is a record like below (the number in the parentheses is the size of fields) |header(N)|common(M)|fields(K)|__data_loc(4)|fields(L)|data(G)| In this case, '__data_loc' field will be __data_loc = (G << 16) | (N+M+K+4+L) If '__rel_loc' is used, this will be |header(N)|common(M)|fields(K)|__rel_loc(4)|fields(L)|data(G)| where __rel_loc = (G << 16) | (L) This case shows L bytes after the '__rel_loc' attribute field, if there is no fields after the __rel_loc field, L must be 0. This is relatively easy (and no need to consider the kernel header change) when the event data fields are composed by user who doesn't know header and common fields. Link: https://lkml.kernel.org/r/163757341258.510314.4214431827833229956.stgit@devnote2 Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-01tracing/histograms: String compares should not care about signed valuesSteven Rostedt (VMware)
When comparing two strings for the "onmatch" histogram trigger, fields that are strings use string comparisons, which do not care about being signed or not. Do not fail to match two string fields if one is unsigned char array and the other is a signed char array. Link: https://lore.kernel.org/all/20211129123043.5cfd687a@gandalf.local.home/ Cc: stable@vgerk.kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Yafang Shao <laoar.shao@gmail.com> Fixes: b05e89ae7cf3b ("tracing: Accept different type for synthetic event fields") Reviewed-by: Masami Hiramatsu <mhiramatsu@kernel.org> Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-18tracing/histogram: Fix UAF in destroy_hist_field()Kalesh Singh
Calling destroy_hist_field() on an expression will recursively free any operands associated with the expression. If during expression parsing the operands of the expression are already set when an error is encountered, there is no need to explicity free the operands. Doing so will result in destroy_hist_field() being called twice for the operands and lead to a use-after-free (UAF) error. If the operands are associated with the expression, only call destroy_hist_field() on the expression since the operands will be recursively freed. Link: https://lore.kernel.org/all/CAHk-=wgcrEbFgkw9720H3tW-AhHOoEKhYwZinYJw4FpzSaJ6_Q@mail.gmail.com/ Link: https://lkml.kernel.org/r/20211118011542.1420131-1-kaleshsingh@google.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Fixes: 8b5d46fd7a38 ("tracing/histogram: Optimize division by constants") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-14Merge tag 'trace-v5.16-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Update to tracing histogram variable string copy A fix to only copy the size of the field to the histogram string did not take into account that the size can be larger than the storage" * tag 'trace-v5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add length protection to histogram string copies
2021-11-14tracing: Add length protection to histogram string copiesSteven Rostedt (VMware)
The string copies to the histogram storage has a max size of 256 bytes (defined by MAX_FILTER_STR_VAL). Only the string size of the event field needs to be copied to the event storage, but no more than what is in the event storage. Although nothing should be bigger than 256 bytes, there's no protection against overwriting of the storage if one day there is. Copy no more than the destination size, and enforce it. Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the min() comparison of the string sizes of comparable types. Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 63f84ae6b82b ("tracing/histogram: Do not copy the fixed-size char array field over the field size") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-13Merge tag 'trace-v5.16-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Three tracing fixes: - Make local osnoise_instances static - Copy just actual size of histogram strings - Properly check missing operands in histogram expressions" * tag 'trace-v5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/histogram: Fix check for missing operands in an expression tracing/histogram: Do not copy the fixed-size char array field over the field size tracing/osnoise: Make osnoise_instances static
2021-11-12tracing/histogram: Fix check for missing operands in an expressionKalesh Singh
If a binary operation is detected while parsing an expression string, the operand strings are deduced by splitting the experssion string at the position of the detected binary operator. Both operand strings are sub-strings (can be empty string) of the expression string but will never be NULL. Currently a NULL check is used for missing operands, fix this by checking for empty strings instead. Link: https://lkml.kernel.org/r/20211112191324.1302505-1-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Fixes: 9710b2f341a0 ("tracing: Fix operator precedence for hist triggers expression") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-12tracing/histogram: Do not copy the fixed-size char array field over the ↵Masami Hiramatsu
field size Do not copy the fixed-size char array field of the events over the field size. The histogram treats char array as a string and there are 2 types of char array in the event, fixed-size and dynamic string. The dynamic string (__data_loc) field must be null terminated, but the fixed-size char array field may not be null terminated (not a string, but just a data). In that case, histogram can copy the data after the field. This uses the original field size for fixed-size char array field to restrict the histogram not to access over the original field size. Link: https://lkml.kernel.org/r/163673292822.195747.3696966210526410250.stgit@devnote2 Fixes: 02205a6752f2 (tracing: Add support for 'field variables') Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-11-03Merge tag 'trace-v5.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - osnoise and timerlat updates that will work with the RTLA tool (Real-Time Linux Analysis). Specifically it disconnects the work load (threads that look for latency) from the tracing instances attached to them, allowing for more than one instance to retrieve data from the work load. - Optimization on division in the trace histogram trigger code to use shift and multiply when possible. Also added documentation. - Fix prototype to my_direct_func in direct ftrace trampoline sample code. * tag 'trace-v5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/samples: Add missing prototype for my_direct_func tracing/selftests: Add tests for hist trigger expression parsing tracing/histogram: Document hist trigger variables tracing/histogram: Update division by 0 documentation tracing/histogram: Optimize division by constants tracing/osnoise: Remove PREEMPT_RT ifdefs from inside functions tracing/osnoise: Remove STACKTRACE ifdefs from inside functions tracing/osnoise: Allow multiple instances of the same tracer tracing/osnoise: Remove TIMERLAT ifdefs from inside functions tracing/osnoise: Support a list of trace_array *tr tracing/osnoise: Use start/stop_per_cpu_kthreads() on osnoise_cpus_write() tracing/osnoise: Split workload start from the tracer start tracing/osnoise: Improve comments about barrier need for NMI callbacks tracing/osnoise: Do not follow tracing_cpumask
2021-11-01Merge tag 'trace-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - kprobes: Restructured stack unwinder to show properly on x86 when a stack dump happens from a kretprobe callback. - Fix to bootconfig parsing - Have tracefs allow owner and group permissions by default (only denying others). There's been pressure to allow non root to tracefs in a controlled fashion, and using groups is probably the safest. - Bootconfig memory managament updates. - Bootconfig clean up to have the tools directory be less dependent on changes in the kernel tree. - Allow perf to be traced by function tracer. - Rewrite of function graph tracer to be a callback from the function tracer instead of having its own trampoline (this change will happen on an arch by arch basis, and currently only x86_64 implements it). - Allow multiple direct trampolines (bpf hooks to functions) be batched together in one synchronization. - Allow histogram triggers to add variables that can perform calculations against the event's fields. - Use the linker to determine architecture callbacks from the ftrace trampoline to allow for proper parameter prototypes and prevent warnings from the compiler. - Extend histogram triggers to key off of variables. - Have trace recursion use bit magic to determine preempt context over if branches. - Have trace recursion disable preemption as all use cases do anyway. - Added testing for verification of tracing utilities. - Various small clean ups and fixes. * tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits) tracing/histogram: Fix semicolon.cocci warnings tracing/histogram: Fix documentation inline emphasis warning tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together tracing: Show size of requested perf buffer bootconfig: Initialize ret in xbc_parse_tree() ftrace: do CPU checking after preemption disabled ftrace: disable preemption when recursion locked tracing/histogram: Document expression arithmetic and constants tracing/histogram: Optimize division by a power of 2 tracing/histogram: Covert expr to const if both operands are constants tracing/histogram: Simplify handling of .sym-offset in expressions tracing: Fix operator precedence for hist triggers expression tracing: Add division and multiplication support for hist triggers tracing: Add support for creating hist trigger variables from literal selftests/ftrace: Stop tracing while reading the trace file by default MAINTAINERS: Update KPROBES and TRACING entries test_kprobes: Move it from kernel/ to lib/ docs, kprobes: Remove invalid URL and add new reference samples/kretprobes: Fix return value if register_kretprobe() failed lib/bootconfig: Fix the xbc_get_info kerneldoc ...
2021-11-01tracing/histogram: Optimize division by constantsKalesh Singh
If the divisor is a constant use specific division functions to avoid extra branches when the trigger is hit. If the divisor constant but not a power of 2, the division can be replaced with a multiplication and shift in the following case: Let X = dividend and Y = divisor. Choose Z = some power of 2. If Y <= Z, then: X / Y = (X * (Z / Y)) / Z (Z / Y) is a constant (mult) which is calculated at parse time, so: X / Y = (X * mult) / Z The division by Z can be replaced by a shift since Z is a power of 2: X / Y = (X * mult) >> shift As long, as X < Z the results will not be off by more than 1. Link: https://lkml.kernel.org/r/20211029232410.3494196-1-kaleshsingh@google.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29tracing/histogram: Fix semicolon.cocci warningskernel test robot
kernel/trace/trace_events_hist.c:6039:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Link: https://lkml.kernel.org/r/20211030005615.GA41257@3074f0d39c61 Fixes: c5eac6ee8bc5 ("tracing/histogram: Simplify handling of .sym-offset in expressions") CC: Kalesh Singh <kaleshsingh@google.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Optimize division by a power of 2Kalesh Singh
The division is a slow operation. If the divisor is a power of 2, use a shift instead. Results were obtained using Android's version of perf (simpleperf[1]) as described below: 1. hist_field_div() is modified to call 2 test functions: test_hist_field_div_[not]_optimized(); passing them the same args. Use noinline and volatile to ensure these are not optimized out by the compiler. 2. Create a hist event trigger that uses division: events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>' >> trigger events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x' >> trigger 3. Run Android's lmkd_test[2] to generate rss_stat events, and record CPU samples with Android's simpleperf: simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g -f 2000 -o perf.data == Results == Divisor is a power of 2 (divisor == 32): test_hist_field_div_not_optimized | 8,717,091 cpu-cycles test_hist_field_div_optimized | 1,643,137 cpu-cycles If the divisor is a power of 2, the optimized version is ~5.3x faster. Divisor is not a power of 2 (divisor == 33): test_hist_field_div_not_optimized | 4,444,324 cpu-cycles test_hist_field_div_optimized | 5,497,958 cpu-cycles If the divisor is not a power of 2, as expected, the optimized version is slightly slower (~24% slower). [1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md [2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Covert expr to const if both operands are constantsKalesh Singh
If both operands of a hist trigger expression are constants, convert the expression to a constant. This optimization avoids having to perform the same calculation multiple times and also saves on memory since the merged constants are represented by a single struct hist_field instead or multiple. Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Simplify handling of .sym-offset in expressionsKalesh Singh
The '-' in .sym-offset can confuse the hist trigger arithmetic expression parsing. Simplify the handling of this by replacing the 'sym-offset' with 'symXoffset'. This allows us to correctly evaluate expressions where the user may have inadvertently added a .sym-offset modifier to one of the operands in an expression, instead of bailing out. In this case the .sym-offset has no effect on the evaluation of the expression. The only valid use of the .sym-offset is as a hist key modifier. Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Fix operator precedence for hist triggers expressionKalesh Singh
The current histogram expression evaluation logic evaluates the expression from right to left. This can lead to incorrect results if the operations are not associative (as is the case for subtraction and, the now added, division operators). e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2 64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2 Division and multiplication are currently limited to single operation expression due to operator precedence support not yet implemented. Rework the expression parsing to support the correct evaluation of expressions containing operators of different precedences; and fix the associativity error by evaluating expressions with operators of the same precedence from left to right. Examples: (1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \ >> event/trigger (2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger (3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger (4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Add division and multiplication support for hist triggersKalesh Singh
Adds basic support for division and multiplication operations for hist trigger variable expressions. For simplicity this patch only supports, division and multiplication for a single operation expression (e.g. x=$a/$b), as currently expressions are always evaluated right to left. This can lead to some incorrect results: e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger 8-4-2 should evaluate to 2 i.e. (8-4)-2 but currently x evaluate to 6 i.e. 8-(4-2) Multiplication and division in sub-expressions will work correctly, once correct operator precedence support is added (See next patch in this series). For the undefined case of division by 0, the histogram expression evaluates to (u64)(-1). Since this cannot be detected when the expression is created, it is the responsibility of the user to be aware and account for this possibility. Examples: echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \ >> event/trigger echo 'hist:keys=common_pid:y=5*$b' \ >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>