summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2018-03-23Merge tag 'trace-v4.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull kprobe fixes from Steven Rostedt: "The documentation for kprobe events says that symbol offets can take both a + and - sign to get to befor and after the symbol address. But in actuality, the code does not support the minus. This fixes that issue, and adds a few more selftests to kprobe events" * tag 'trace-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add a testcase for probepoint selftests: ftrace: Add a testcase for string type with kprobe_event selftests: ftrace: Add probe event argument syntax testcase tracing: probeevent: Fix to support minus offset from symbol
2018-03-23tracing: probeevent: Fix to support minus offset from symbolMasami Hiramatsu
In Documentation/trace/kprobetrace.txt, it says @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) However, the parser doesn't parse minus offset correctly, since commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") drops minus ("-") offset support for kprobe probe address usage. This fixes the traceprobe_split_symbol_offset() to parse minus offset again with checking the offset range, and add a minus offset check in kprobe probe address usage. Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned") Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tracing: Fix a potential NULL dereferenceDan Carpenter
We forgot to set the error code on this path so we return ERR_PTR(0) which is NULL. It results in a NULL dereference in the caller. Link: http://lkml.kernel.org/r/20180323113735.GC28518@mwanda Fixes: 100719dcef44 ("tracing: Add simple expression support to hist triggers") Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-20trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programsYonghong Song
Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") added helper bpf_perf_prog_read_value so that perf_event type program can read event counter and enabled/running time. This commit, however, introduced a bug which allows this helper for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value needs to access perf_event through its bpf_perf_event_data_kern type context, which is not available for tracepoint type program. This patch fixed the issue by separating bpf_func_proto between tracepoint and perf_event type programs and removed bpf_perf_prog_read_value from tracepoint func prototype. Fixes: 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-14tracing: Rewrite filter logic to be simpler and fasterSteven Rostedt (VMware)
Al Viro reviewed the filter logic of ftrace trace events and found it to be very troubling. It creates a binary tree based on the logic operators and walks it during tracing. He sent myself and Tom Zanussi a long explanation (and formal proof) of how to do the string parsing better and end up with a program array that can be simply iterated to come up with the correct results. I took his ideas and his pseudo code and rewrote the filter logic based on them. In doing so, I was able to remove a lot of code, and have a much more condensed filter logic in the process. I wrote a very long comment describing the methadology that Al proposed in my own words. For more info on how this works, read the comment above predicate_parse(). Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-14tracing: Clean up and document pred_funcs_##type creation and useSteven Rostedt (VMware)
The pred_funcs_##type arrays consist of five functions that are assigned based on the ops. The array must be in the same order of the ops each function represents. The PRED_FUNC_START macro denotes the op enum that starts the op that maps to the pred_funcs_##type arrays. This is all very subtle and prone to bugs if the code is changed. Add comments describing how PRED_FUNC_START and pred_funcs_##type array is used, and also a PRED_FUNC_MAX that is the maximum number of functions in the arrays. Clean up select_comparison_fn() that assigns the predicates to the pred_funcs_##type array function as well as add protection in case an op is passed in that does not map correctly to the array. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-14tracing: Combine enum and arrays into single macro in filter codeSteven Rostedt (VMware)
Instead of having a separate enum that is the index into another array, like a string array, make a single macro that combines them into a single list, and then the two can not get out of sync. This makes it easier to add and remove items. The macro trick is: #define DOGS \ C( JACK, "Jack Russell") \ C( ITALIAN, "Italian Greyhound") \ C( GERMAN, "German Shepherd") #undef C #define C(a, b) a enum { DOGS }; #undef C #define C(a, b) b static char dogs[] = { DOGS }; Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-13tracing: Unify the "boot" and "mono" tracing clocksThomas Gleixner
Unify the "boot" and "mono" tracing clocks and document the new behaviour. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Easton <kevin@guarana.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20180301165150.489635255@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-10tracing: Embed replace_filter_string() helper functionSteven Rostedt (VMware)
The replace_filter_string() frees the current string and then copies a given string. But in the two locations that it was used, the allocation happened right after the filter was allocated (nothing to replace). There's no need for this to be a helper function. Embedding the allocation in the two places where it was called will make changing the code in the future easier. Also make the variable consistent (always use "filter_string" as the name, as it was used in one instance as "filter_str") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Only add filter list when neededSteven Rostedt (VMware)
replace_system_preds() creates a filter list to free even when it doesn't really need to have it. Only save filters that require synchronize_sched() in the filter list to free. This will allow the code to be updated a bit easier in the future. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Remove filter allocator helperSteven Rostedt (VMware)
The __alloc_filter() function does nothing more that allocate the filter. There's no reason to have it as a helper function. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Use trace_seq instead of open code string appendingSteven Rostedt (VMware)
The filter code does open code string appending to produce an error message. Instead it can be simplified by using trace_seq function helpers. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Remove BUG_ON() from append_filter_string()Steven Rostedt (VMware)
There's no reason to BUG if there's a bug in the filtering code. Simply do a warning and return. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add inter-event blurb to HIST_TRIGGERS config optionTom Zanussi
So that users know that inter-event tracing is supported as part of the HIST_TRIGGERS option, include text to that effect in the help text. Link: http://lkml.kernel.org/r/a38e24231d8d980be636b56d35814570acfd167a.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Use the ring-buffer nesting to allow synthetic events to be tracedSteven Rostedt (VMware)
Synthetic events can be done within the recording of other events. Notify the ring buffer via ring_buffer_nest_start() and ring_buffer_nest_end() that this is intended and not to block it due to its recursion protection. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10ring-buffer: Add nesting for adding events within eventsSteven Rostedt (VMware)
The ring-buffer code has recusion protection in case tracing ends up tracing itself, the ring-buffer will detect that it was called at the same context (normal, softirq, interrupt or NMI), and not continue to record the event. With the histogram synthetic events, they are called while tracing another event at the same context. The recusion protection triggers because it detects tracing at the same context and stops it. Add ring_buffer_nest_start() and ring_buffer_nest_end() that will notify the ring buffer that a trace is about to happen within another trace and that it is intended, and not to trigger the recursion blocking. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add a clock attribute for hist triggersTom Zanussi
The default clock if timestamps are used in a histogram is "global". If timestamps aren't used, the clock is irrelevant. Use the "clock=" param only if you want to override the default "global" clock for a histogram with timestamps. Link: http://lkml.kernel.org/r/427bed1389c5d22aa40c3e0683e30cc3d151e260.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Make tracing_set_clock() non-staticTom Zanussi
Allow tracing code outside of trace.c to access tracing_set_clock(). Some applications may require a particular clock in order to function properly, such as latency calculations. Also, add an accessor returning the current clock string. Link: http://lkml.kernel.org/r/6d1c53e9ee2163f54e1849f5376573f54f0e6009.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add 'last error' error facility for hist triggersTom Zanussi
With the addition of variables and actions, it's become necessary to provide more detailed error information to users about syntax errors. Add a 'last error' facility accessible via the erroring event's 'hist' file. Reading the hist file after an error will display more detailed information about what went wrong, if information is available. This extended error information will be available until the next hist trigger command for that event. # echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger echo: write error: Invalid argument # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist ERROR: Couldn't yyy: zzz Last command: xxx Also add specific error messages for variable and action errors. Link: http://lkml.kernel.org/r/64e9c422fc8aeafcc2f7a3b4328c0cffe7969129.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add hist trigger support for variable reference aliasesTom Zanussi
Add support for alias=$somevar where alias can be used as onmatch.xxx($alias). Aliases are a way of creating a new name for an existing variable, for flexibly in making naming more clear in certain cases. For example in the below the user perhaps feels that using $new_lat in the synthetic event invocation is opaque or doesn't fit well stylistically with previous triggers, so creates an alias of $new_lat named $latency and uses that in the call instead: # echo 'hist:keys=next_pid:new_lat=common_timestamp.usecs' > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=pid:latency=$new_lat: onmatch(sched.sched_switch).wake2($latency,pid)' > /sys/kernel/debug/tracing/events/synthetic/wake1/trigger Link: http://lkml.kernel.org/r/ef20a65d921af3a873a6f1e8c71407c926d5586f.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add cpu field for hist triggersTom Zanussi
A common key to use in a histogram is the cpuid - add a new cpu 'synthetic' field named 'cpu' for that purpose. Link: http://lkml.kernel.org/r/89537645bfc957e0d76e2cacf5f0ada88691a6cc.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Allow whitespace to surround hist trigger filterTom Zanussi
The existing code only allows for one space before and after the 'if' specifying the filter for a hist trigger. Add code to make that more permissive as far as whitespace goes. Specifically, we want to allow spaces in the trigger itself now that we have additional syntax (onmatch/onmax) where spaces are more natural e.g. spaces after commas in param lists. Link: http://lkml.kernel.org/r/1053090c3c308d4f431accdeb59dff4b511d4554.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add 'onmax' hist trigger action supportTom Zanussi
Add an 'onmax(var).save(field,...)' hist trigger action which is invoked whenever an event exceeds the current maximum. The end result is that the trace event fields or variables specified as the onmax.save() params will be saved if 'var' exceeds the current maximum for that hist trigger entry. This allows context from the event that exhibited the new maximum to be saved for later reference. When the histogram is displayed, additional fields displaying the saved values will be printed. As an example the below defines a couple of hist triggers, one for sched_wakeup and another for sched_switch, keyed on pid. Whenever a sched_wakeup occurs, the timestamp is saved in the entry corresponding to the current pid, and when the scheduler switches back to that pid, the timestamp difference is calculated. If the resulting latency exceeds the current maximum latency, the specified save() values are saved: # echo 'hist:keys=pid:ts0=common_timestamp.usecs \ if comm=="cyclictest"' >> \ /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger # echo 'hist:keys=next_pid:\ wakeup_lat=common_timestamp.usecs-$ts0:\ onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \ if next_comm=="cyclictest"' >> \ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger When the histogram is displayed, the max value and the saved values corresponding to the max are displayed following the rest of the fields: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist { next_pid: 3728 } hitcount: 199 \ max: 123 next_comm: cyclictest prev_pid: 0 \ prev_prio: 120 prev_comm: swapper/3 { next_pid: 3730 } hitcount: 1321 \ max: 15 next_comm: cyclictest prev_pid: 0 \ prev_prio: 120 prev_comm: swapper/1 { next_pid: 3729 } hitcount: 1973\ max: 25 next_comm: cyclictest prev_pid: 0 \ prev_prio: 120 prev_comm: swapper/0 Totals: Hits: 3493 Entries: 3 Dropped: 0 Link: http://lkml.kernel.org/r/006907f71b1e839bb059337ec3c496f84fcb71de.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add 'onmatch' hist trigger action supportTom Zanussi
Add an 'onmatch(matching.event).<synthetic_event_name>(param list)' hist trigger action which is invoked with the set of variables or event fields named in the 'param list'. The result is the generation of a synthetic event that consists of the values contained in those variables and/or fields at the time the invoking event was hit. As an example the below defines a simple synthetic event using a variable defined on the sched_wakeup_new event, and shows the event definition with unresolved fields, since the sched_wakeup_new event with the testpid variable hasn't been defined yet: # echo 'wakeup_new_test pid_t pid; int prio' >> \ /sys/kernel/debug/tracing/synthetic_events # cat /sys/kernel/debug/tracing/synthetic_events wakeup_new_test pid_t pid; int prio The following hist trigger both defines a testpid variable and specifies an onmatch() trace action that uses that variable along with a non-variable field to generate a wakeup_new_test synthetic event whenever a sched_wakeup_new event occurs, which because of the 'if comm == "cyclictest"' filter only happens when the executable is cyclictest: # echo 'hist:testpid=pid:keys=$testpid:\ onmatch(sched.sched_wakeup_new).wakeup_new_test($testpid, prio) \ if comm=="cyclictest"' >> \ /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger Creating and displaying a histogram based on those events is now just a matter of using the fields and new synthetic event in the tracing/events/synthetic directory, as usual: # echo 'hist:keys=pid,prio:sort=pid,prio' >> \ /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger Link: http://lkml.kernel.org/r/8c2a574bcb7530c876629c901ecd23911b14afe8.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add support for 'field variables'Tom Zanussi
Users should be able to directly specify event fields in hist trigger 'actions' rather than being forced to explicitly create a variable for that purpose. Add support allowing fields to be used directly in actions, which essentially does just that - creates 'invisible' variables for each bare field specified in an action. If a bare field refers to a field on another (matching) event, it even creates a special histogram for the purpose (since variables can't be defined on an existing histogram after histogram creation). Here's a simple example that demonstrates both. Basically the onmatch() action creates a list of variables corresponding to the parameters of the synthetic event to be generated, and then uses those values to generate the event. So for the wakeup_latency synthetic event 'call' below the first param, $wakeup_lat, is a variable defined explicitly on sched_switch, where 'next_pid' is just a normal field on sched_switch, and prio is a normal field on sched_waking. Since the mechanism works on variables, those two normal fields just have 'invisible' variables created internally for them. In the case of 'prio', which is on another event, we actually need to create an additional hist trigger and define the invisible variable on that, since once a hist trigger is defined, variables can't be added to it later. echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> /sys/kernel/debug/tracing/synthetic_events echo 'hist:keys=pid:ts0=common_timestamp.usecs >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger Link: http://lkml.kernel.org/r/8e8dcdac1ea180ed7a3689e1caeeccede9dc42b3.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add support for 'synthetic' eventsTom Zanussi
Synthetic events are user-defined events generated from hist trigger variables saved from one or more other events. To define a synthetic event, the user writes a simple specification consisting of the name of the new event along with one or more variables and their type(s), to the tracing/synthetic_events file. For instance, the following creates a new event named 'wakeup_latency' with 3 fields: lat, pid, and prio: # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \ /sys/kernel/debug/tracing/synthetic_events Reading the tracing/synthetic_events file lists all the currently-defined synthetic events, in this case the event we defined above: # cat /sys/kernel/debug/tracing/synthetic_events wakeup_latency u64 lat; pid_t pid; int prio At this point, the synthetic event is ready to use, and a histogram can be defined using it: # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \ /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger The new event is created under the tracing/events/synthetic/ directory and looks and behaves just like any other event: # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency enable filter format hist id trigger Although a histogram can be defined for it, nothing will happen until an action tracing that event via the trace_synth() function occurs. The trace_synth() function is very similar to all the other trace_* invocations spread throughout the kernel, except in this case the trace_ function and its corresponding tracepoint isn't statically generated but defined by the user at run-time. How this can be automatically hooked up via a hist trigger 'action' is discussed in a subsequent patch. Link: http://lkml.kernel.org/r/c68df2284b7d172669daf9be29db62ad49bbc559.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> [fix noderef.cocci warnings, sizeof pointer for kcalloc of event->fields] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add hist trigger action hookTom Zanussi
Add a hook for executing extra actions whenever a histogram entry is added or updated. The default 'action' when a hist entry is added to a histogram is to update the set of values associated with it. Some applications may want to perform additional actions at that point, such as generate another event, or compare and save a maximum. Add a simple framework for doing that; specific actions will be implemented on top of it in later patches. Link: http://lkml.kernel.org/r/9482ba6a3eaf5ca6e60954314beacd0e25c05b24.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add variable reference handling to hist triggersTom Zanussi
Add the necessary infrastructure to allow the variables defined on one event to be referenced in another. This allows variables set by a previous event to be referenced and used in expressions combining the variable values saved by that previous event and the event fields of the current event. For example, here's how a latency can be calculated and saved into yet another variable named 'wakeup_lat': # echo 'hist:keys=pid,prio:ts0=common_timestamp ... # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ... In the first event, the event's timetamp is saved into the variable ts0. In the next line, ts0 is subtracted from the second event's timestamp to produce the latency. Further users of variable references will be described in subsequent patches, such as for instance how the 'wakeup_lat' variable above can be displayed in a latency histogram. Link: http://lkml.kernel.org/r/b1d3e6975374e34d501ff417c20189c3f9b2c7b8.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add hist_field 'type' fieldTom Zanussi
Future support for synthetic events requires hist_field 'type' information, so add a field for that. Also, make other hist_field attribute usage consistent (size, is_signed, etc). Link: http://lkml.kernel.org/r/3fd12a2e86316b05151ba0d7c68268e780af2c9d.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Pass tracing_map_elt to hist_field accessor functionsTom Zanussi
Some accessor functions, such as for variable references, require access to a corrsponding tracing_map_elt. Add a tracing_map_elt param to the function signature and update the accessor functions accordingly. Link: http://lkml.kernel.org/r/e0f292b068e9e4948da1d5af21b5ae0efa9b5717.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Generalize per-element hist trigger dataTom Zanussi
Up until now, hist triggers only needed per-element support for saving 'comm' data, which was saved directly as a private data pointer. In anticipation of the need to save other data besides 'comm', add a new hist_elt_data struct for the purpose, and switch the current 'comm'-related code over to that. Link: http://lkml.kernel.org/r/4502c338c965ddf5fc19fb1ec4764391e001ed4b.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add simple expression support to hist triggersTom Zanussi
Add support for simple addition, subtraction, and unary expressions (-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in order to support a minimal set of useful inter-event calculations. These operations are needed for calculating latencies between events (timestamp1-timestamp0) and for combined latencies (latencies over 3 or more events). In the process, factor out some common code from key and value parsing. Link: http://lkml.kernel.org/r/9a9308ead4fe32a433d9c7e95921fb798394f6b2.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> [kbuild test robot fix, add static to parse_atom()] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> [ Replaced '//' comments with normal /* */ comments ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Move get_hist_field_flags()Tom Zanussi
Move get_hist_field_flags() to make it more easily accessible for new code (and keep the move separate from new functionality). Link: http://lkml.kernel.org/r/32470f0a7047ec7a6e84ba5ec89d6142cc6ede7d.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Account for variables in named trigger compatibilityTom Zanussi
Named triggers must also have the same set of variables in order to be considered compatible - update the trigger match test to account for that. The reason for this requirement is that named triggers with variables are meant to allow one or more events to set the same variable. Link: http://lkml.kernel.org/r/a17eae6328a99917f9d5c66129c9fcd355279ee9.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add variable support to hist triggersTom Zanussi
Add support for saving the value of a current event's event field by assigning it to a variable that can be read by a subsequent event. The basic syntax for saving a variable is to simply prefix a unique variable name not corresponding to any keyword along with an '=' sign to any event field. Both keys and values can be saved and retrieved in this way: # echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ... # echo 'hist:timer_pid=common_pid:key=$timer_pid ...' If a variable isn't a key variable or prefixed with 'vals=', the associated event field will be saved in a variable but won't be summed as a value: # echo 'hist:keys=next_pid:ts1=common_timestamp:... Multiple variables can be assigned at the same time: # echo 'hist:keys=pid:vals=$ts0,$b,field2:ts0=common_timestamp,b=field1 ... Multiple (or single) variables can also be assigned at the same time using separate assignments: # echo 'hist:keys=pid:vals=$ts0:ts0=common_timestamp:b=field1:c=field2 ... Variables set as above can be used by being referenced from another event, as described in a subsequent patch. Link: http://lkml.kernel.org/r/fc93c4944d9719dbcb1d0067be627d44e98e2adc.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add usecs modifier for hist trigger timestampsTom Zanussi
Appending .usecs onto a common_timestamp field will cause the timestamp value to be in microseconds instead of the default nanoseconds. A typical latency histogram using usecs would look like this: # echo 'hist:keys=pid,prio:ts0=common_timestamp.usecs ... # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 ... This also adds an external trace_clock_in_ns() to trace.c for the timestamp conversion. Link: http://lkml.kernel.org/r/4e813705a170b3e13e97dc3135047362fb1a39f3.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add hist_data member to hist_fieldTom Zanussi
Allow hist_data access via hist_field. Some users of hist_fields require or will require more access to the associated hist_data. Link: http://lkml.kernel.org/r/d04cd0768f5228ebb4ac0ba4a847bc4d14d4826f.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add per-element variable support to tracing_mapTom Zanussi
In order to allow information to be passed between trace events, add support for per-element variables to tracing_map. This provides a means for histograms to associate a value or values with an entry when it's saved or updated, and retrieved by a subsequent event occurrences. Variables can be set using tracing_map_set_var() and read using tracing_map_read_var(). tracing_map_var_set() returns true or false depending on whether or not the variable has been set or not, which is important for event-matching applications. tracing_map_read_var_once() reads the variable and resets it to the 'unset' state, implementing read-once variables, which are also important for event-matching uses. Link: http://lkml.kernel.org/r/7fa001108252556f0c6dd9d63145eabfe3370d1a.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add hist trigger timestamp supportTom Zanussi
Add support for a timestamp event field. This is actually a 'pseudo-' event field in that it behaves like it's part of the event record, but is really part of the corresponding ring buffer event. To make use of the timestamp field, users can specify "common_timestamp" as a field name for any histogram. Note that this doesn't make much sense on its own either as either a key or value, but needs to be supported even so, since follow-on patches will add support for making use of this field in time deltas. The common_timestamp 'field' is not a bona fide event field - so you won't find it in the event description - but rather it's a synthetic field that can be used like a real field. Note that the use of this field requires the ring buffer be put into 'absolute timestamp' mode, which saves the complete timestamp for each event rather than an offset. This mode will be enabled if and only if a histogram makes use of the "common_timestamp" field. Link: http://lkml.kernel.org/r/97afbd646ed146e26271f3458b4b33e16d7817c2.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Baohong Liu <baohong.liu@intel.com> [kasan use-after-free fix] Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Break out hist trigger assignment parsingTom Zanussi
This will make it easier to add variables, and makes the parsing code cleaner regardless. Link: http://lkml.kernel.org/r/e574b3291bbe15e35a4dfc87e5395aa715701c98.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add ring buffer event param to hist field functionsTom Zanussi
Some events such as timestamps require access to a ring_buffer_event struct; add a param so that hist field functions can access that. Link: http://lkml.kernel.org/r/2ff4af18e72b6002eb86b26b2a7f39cef7d1dfe4.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Give event triggers access to ring_buffer_eventTom Zanussi
The ring_buffer event can provide a timestamp that may be useful to various triggers - pass it into the handlers for that purpose. Link: http://lkml.kernel.org/r/6de592683b59fa70ffa5d43d0109896623fc1367.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add timestamp_mode trace fileTom Zanussi
Add a new option flag indicating whether or not the ring buffer is in 'absolute timestamp' mode. Currently this is only set/unset by hist triggers that make use of a common_timestamp. As such, there's no reason to make this writeable for users - its purpose is only to allow users to determine unequivocally whether or not the ring buffer is in that mode (although absolute timestamps can coexist with the normal delta timestamps, when the ring buffer is in absolute mode, timestamps written while absolute mode is in effect take up more space in the buffer, and are not as efficient). Link: http://lkml.kernel.org/r/e8aa7b1cde1cf15014e66545d06ac6ef2ebba456.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10ring-buffer: Redefine the unimplemented RINGBUF_TYPE_TIME_STAMPTom Zanussi
RINGBUF_TYPE_TIME_STAMP is defined but not used, and from what I can gather was reserved for something like an absolute timestamp feature for the ring buffer, if not a complete replacement of the current time_delta scheme. This code redefines RINGBUF_TYPE_TIME_STAMP to implement absolute time stamps. Another way to look at it is that it essentially forces extended time_deltas for all events. The motivation for doing this is to enable time_deltas that aren't dependent on previous events in the ring buffer, making it feasible to use the ring_buffer_event timetamps in a more random-access way, for purposes other than serial event printing. To set/reset this mode, use tracing_set_timestamp_abs() from the previous interface patch. Link: http://lkml.kernel.org/r/477b362dba1ce7fab9889a1a8e885a62c472f041.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10ring-buffer: Add interface for setting absolute time stampsTom Zanussi
Define a new function, tracing_set_time_stamp_abs(), which can be used to enable or disable the use of absolute timestamps rather than time deltas for a trace array. Only the interface is added here; a subsequent patch will add the underlying implementation. Link: http://lkml.kernel.org/r/ce96119de44c7fe0ee44786d15254e9b493040d3.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Remove code which merges duplicatesVedang Patel
We now have the logic to detect and remove duplicates in the tracing_map hash table. The code which merges duplicates in the histogram is redundant now. So, modify this code just to detect duplicates. The duplication detection code is still kept to ensure that any rare race condition which might cause duplicates does not go unnoticed. Link: http://lkml.kernel.org/r/55215cf59e2674391bdaf772fdafc4c393352b03.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10tracing: Add support to detect and avoid duplicatesVedang Patel
A duplicate in the tracing_map hash table is when 2 different entries have the same key and, as a result, the key_hash. This is possible due to a race condition in the algorithm. This race condition is inherent to the algorithm and not a bug. This was fine because, until now, we were only interested in the sum of all the values related to a particular key (the duplicates are dealt with in tracing_map_sort_entries()). But, with the inclusion of variables[1], we are interested in individual values. So, it will not be clear what value to choose when there are duplicates. So, the duplicates need to be removed. The duplicates can occur in the code in the following scenarios: - A thread is in the process of adding a new element. It has successfully executed cmpxchg() and inserted the key. But, it is still not done acquiring the trace_map_elt struct, populating it and storing the pointer to the struct in the value field of tracing_map hash table. If another thread comes in at this time and wants to add an element with the same key, it will not see the current element and add a new one. - There are multiple threads trying to execute cmpxchg at the same time, one of the threads will succeed and the others will fail. The ones which fail will go ahead increment 'idx' and add a new element there creating a duplicate. This patch detects and avoids the first condition by asking the thread which detects the duplicate to loop one more time. There is also a possibility of infinite loop if the thread which is trying to insert goes to sleep indefinitely and the one which is trying to insert a new element detects a duplicate. Which is why, the thread loops for map_size iterations before returning NULL. The second scenario is avoided by preventing the threads which failed cmpxchg() from incrementing idx. This way, they will loop around and check if the thread which succeeded in executing cmpxchg() had the same key. [1] http://lkml.kernel.org/r/cover.1498510759.git.tom.zanussi@linux.intel.com Link: http://lkml.kernel.org/r/e178e89ec399240331d383bd5913d649713110f4.1516069914.git.tom.zanussi@linux.intel.com Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-08bpf: add support to read sample address in bpf programTeng Qin
This commit adds new field "addr" to bpf_perf_event_data which could be read and used by bpf programs attached to perf events. The value of the field is copied from bpf_perf_event_data_kern.addr and contains the address value recorded by specifying sample_type with PERF_SAMPLE_ADDR when calling perf_event_open. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-06Merge tag 'v4.16-rc4' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>