summaryrefslogtreecommitdiff
path: root/kernel/trace/trace_events.c
AgeCommit message (Collapse)Author
2019-12-02tracing: Introduce trace event injectionCong Wang
We have been trying to use rasdaemon to monitor hardware errors like correctable memory errors. rasdaemon uses trace events to monitor various hardware errors. In order to test it, we have to inject some hardware errors, unfortunately not all of them provide error injections. MCE does provide a way to inject MCE errors, but errors like PCI error and devlink error don't, it is not easy to add error injection to each of them. Instead, it is relatively easier to just allow users to inject trace events in a generic way so that all trace events can be injected. This patch introduces trace event injection, where a new 'inject' is added to each tracepoint directory. Users could write into this file with key=value pairs to specify the value of each fields of the trace event, all unspecified fields are set to zero values by default. For example, for the net/net_dev_queue tracepoint, we can inject: INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject echo "" > $INJECT echo "name='test'" > $INJECT echo "name='test' len=1024" > $INJECT cat /sys/kernel/debug/tracing/trace ... <...>-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0 <...>-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0 <...>-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024 Triggers could be triggered as usual too: echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger echo "len=1025" > $INJECT cat /sys/kernel/debug/tracing/trace ... bash-614 [000] .... 36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0 bash-614 [001] .... 136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0 bash-614 [001] .N.. 208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024 bash-614 [001] .N.1 284.236349: <stack trace> => event_inject_write => vfs_write => ksys_write => do_syscall_64 => entry_SYSCALL_64_after_hwframe The only thing that can't be injected is string pointers as they require constant string pointers, this can't be done at run time. Link: http://lkml.kernel.org/r/20191130045218.18979-1-xiyou.wangcong@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-22tracing: Adding new functions for kernel access to Ftrace instancesDivya Indi
Adding 2 new functions - 1) struct trace_array *trace_array_get_by_name(const char *name); Return pointer to a trace array with given name. If it does not exist, create and return pointer to the new trace array. 2) int trace_array_set_clr_event(struct trace_array *tr, const char *system ,const char *event, bool enable); Enable/Disable events to this trace array. Additionally, - To handle reference counters, export trace_array_put() - Due to introduction of the above 2 new functions, we no longer need to export - ftrace_set_clr_event & trace_array_create APIs. Link: http://lkml.kernel.org/r/1574276919-11119-2-git-send-email-divya.indi@oracle.com Signed-off-by: Divya Indi <divya.indi@oracle.com> Reviewed-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-13tracing: Adding NULL checks for trace_array descriptor pointerDivya Indi
As part of commit f45d1225adb0 ("tracing: Kernel access to Ftrace instances") we exported certain functions. Here, we are adding some additional NULL checks to ensure safe usage by users of these APIs. Link: http://lkml.kernel.org/r/1565805327-579-4-git-send-email-divya.indi@oracle.com Signed-off-by: Divya Indi <divya.indi@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Add locked_down checks to the open calls of files created for tracefsSteven Rostedt (VMware)
Added various checks on open tracefs calls to see if tracefs is in lockdown mode, and if so, to return -EPERM. Note, the event format files (which are basically standard on all machines) as well as the enabled_functions file (which shows what is currently being traced) are not lockde down. Perhaps they should be, but it seems counter intuitive to lockdown information to help you know if the system has been modified. Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Add tracing_check_open_get_tr()Steven Rostedt (VMware)
Currently, most files in the tracefs directory test if tracing_disabled is set. If so, it should return -ENODEV. The tracing_disabled is called when tracing is found to be broken. Originally it was done in case the ring buffer was found to be corrupted, and we wanted to prevent reading it from crashing the kernel. But it's also called if a tracing selftest fails on boot. It's a one way switch. That is, once it is triggered, tracing is disabled until reboot. As most tracefs files can also be used by instances in the tracefs directory, they need to be carefully done. Each instance has a trace_array associated to it, and when the instance is removed, the trace_array is freed. But if an instance is opened with a reference to the trace_array, then it requires looking up the trace_array to get its ref counter (as there could be a race with it being deleted and the open itself). Once it is found, a reference is added to prevent the instance from being removed (and the trace_array associated with it freed). Combine the two checks (tracing_disabled and trace_array_get()) into a single helper function. This will also make it easier to add lockdown to tracefs later. Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Have trace events system open call tracing_open_generic_tr()Steven Rostedt (VMware)
Instead of having the trace events system open call open code the taking of the trace_array descriptor (with trace_array_get()) and then calling trace_open_generic(), have it use the tracing_open_generic_tr() that does the combination of the two. This requires making tracing_open_generic_tr() global. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-16Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...
2019-08-31tracing: Make exported ftrace_set_clr_event non-staticDenis Efremov
The function ftrace_set_clr_event is declared static and marked EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the function was decided to be a part of API, this commit removes the static attribute and adds the declaration to the header. Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances") Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-31tracing: Use CONFIG_PREEMPTIONThomas Gleixner
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the conditionals in the tracer over to CONFIG_PREEMPTION. This is the first step to make the tracer work on RT. The other small tweaks are submitted separately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.409766323@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-16tracing: Make trace_get_fields() globalCong Wang
trace_get_fields() is the only way to read tracepoint fields at run time, as their fields are defined at compile-time with macros. Make this function visible to all users and it will be used by trace event injection code to calculate the size of a tracepoint entry. Link: http://lkml.kernel.org/r/20190525165802.25944-4-xiyou.wangcong@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-25tracing: Make a separate config for trace event self testsSteven Rostedt (VMware)
The trace event self tests enable loop through *all* events, enables each one, one at a time, runs some code to trigger various events (not necessarily the same events), and checks if anything went wrong. The issue is that trace events are usually the least likely start up test to cause a problem, but they take the longest to run (because there are so many events). When one of the other tests trigger a bug, the trace event start up tests causes the bisect to take much longer, because it takes 10s of seconds to get through the trace event tests. By making them a separate config (even though they are enabled by default if start up tests are set), it is possible to turn them off and still run the other tracing start up tests much quicker. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-08tracing: Fix partial reading of trace event's id fileElazar Leibovich
When reading only part of the id file, the ppos isn't tracked correctly. This is taken care by simple_read_from_buffer. Reading a single byte, and then the next byte would result EOF. While this seems like not a big deal, this breaks abstractions that reads information from files unbuffered. See for example https://github.com/golang/go/issues/29399 This code was mentioned as problematic in commit cd458ba9d5a5 ("tracing: Do not (ab)use trace_seq in event_id_read()") An example C code that show this bug is: #include <stdio.h> #include <stdint.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char **argv) { if (argc < 2) return 1; int fd = open(argv[1], O_RDONLY); char c; read(fd, &c, 1); printf("First %c\n", c); read(fd, &c, 1); printf("Second %c\n", c); } Then run with, e.g. sudo ./a.out /sys/kernel/debug/tracing/events/tcp/tcp_set_state/id You'll notice you're getting the first character twice, instead of the first two characters in the id file. Link: http://lkml.kernel.org/r/20181231115837.4932-1-elazar@lightbitslabs.com Cc: Orit Wasserman <orit.was@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 23725aeeab10b ("ftrace: provide an id file for each event") Signed-off-by: Elazar Leibovich <elazar@lightbitslabs.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02tracing: Kernel access to Ftrace instancesDivya Indi
Ftrace provides the feature “instances” that provides the capability to create multiple Ftrace ring buffers. However, currently these buffers are created/accessed via userspace only. The kernel APIs providing these features are not exported, hence cannot be used by other kernel components. This patch aims to extend this infrastructure to provide the flexibility to create/log/remove/ enable-disable existing trace events to these buffers from within the kernel. Link: http://lkml.kernel.org/r/1553106531-3281-2-git-send-email-divya.indi@oracle.com Signed-off-by: Divya Indi <divya.indi@oracle.com> Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-22tracing: Use str_has_prefix() instead of using fixed sizesSteven Rostedt (VMware)
There are several instances of strncmp(str, "const", 123), where 123 is the strlen of the const string to check if "const" is the prefix of str. But this can be error prone. Use str_has_prefix() instead. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-10tracing: Consolidate trace_add/remove_event_call back to the nolock functionsSteven Rostedt (VMware)
The trace_add/remove_event_call_nolock() functions were added to allow the tace_add/remove_event_call() code be called when the event_mutex lock was already taken. Now that all callers are done within the event_mutex, there's no reason to have two different interfaces. Remove the current wrapper trace_add/remove_event_call()s and rename the _nolock versions back to the original names. Link: http://lkml.kernel.org/r/154140866955.17322.2081425494660638846.stgit@devbox Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08tracing: Lock event_mutex before synth_event_mutexMasami Hiramatsu
synthetic event is using synth_event_mutex for protecting synth_event_list, and event_trigger_write() path acquires locks as below order. event_trigger_write(event_mutex) ->trigger_process_regex(trigger_cmd_mutex) ->event_hist_trigger_func(synth_event_mutex) On the other hand, synthetic event creation and deletion paths call trace_add_event_call() and trace_remove_event_call() which acquires event_mutex. In that case, if we keep the synth_event_mutex locked while registering/unregistering synthetic events, its dependency will be inversed. To avoid this issue, current synthetic event is using a 2 phase process to create/delete events. For example, it searches existing events under synth_event_mutex to check for event-name conflicts, and unlocks synth_event_mutex, then registers a new event under event_mutex locked. Finally, it locks synth_event_mutex and tries to add the new event to the list. But it can introduce complexity and a chance for name conflicts. To solve this simpler, this introduces trace_add_event_call_nolock() and trace_remove_event_call_nolock() which don't acquire event_mutex inside. synthetic event can lock event_mutex before synth_event_mutex to solve the lock dependency issue simpler. Link: http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-16tracing: Add SPDX License format tags to tracing filesSteven Rostedt (VMware)
Add the SPDX License header to ease license compliance management. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-10tracing: Fix synchronizing to event changes with ↵Steven Rostedt (VMware)
tracepoint_synchronize_unregister() Now that some trace events can be protected by srcu_read_lock(tracepoint_srcu), we need to make sure all locations that depend on this are also protected. There were many places that did a synchronize_sched() thinking that it was enough to protect againts access to trace events. This use to be the case, but now that we use SRCU for _rcuidle() trace events, they may not be protected by synchronize_sched(), as they may be called in paths that RCU is not watching for preempt disable. Fixes: e6753f23d961d ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-03trace: Use rcu_dereference_raw for hooks from trace-event subsystemJoel Fernandes (Google)
Since we switched to using SRCU for tracepoints used in the idle path, we can no longer use rcu_dereference_sched for dereferencing points in trace-event hooks. Since tracepoints can now use either SRCU or sched-RCU, just use rcu_dereference_raw for traceevents just like we're doing when dereferencing the tracepoint table. This prevents an RCU warning reported by Masami: [ 282.060593] WARNING: can't dereference registers at 00000000f3c7f62b [ 282.063200] ============================= [ 282.064082] WARNING: suspicious RCU usage [ 282.064963] 4.18.0-rc6+ #15 Tainted: G W [ 282.066048] ----------------------------- [ 282.066923] /home/mhiramat/ksrc/linux/kernel/trace/trace_events.c:242 suspicious rcu_dereference_check() usage! [ 282.068974] [ 282.068974] other info that might help us debug this: [ 282.068974] [ 282.070770] [ 282.070770] RCU used illegally from idle CPU! [ 282.070770] rcu_scheduler_active = 2, debug_locks = 1 [ 282.072938] RCU used illegally from extended quiescent state! [ 282.074183] no locks held by swapper/0/0. [ 282.075071] [ 282.075071] stack backtrace: [ 282.076121] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W [ 282.077782] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) [ 282.079604] Call Trace: [ 282.080212] <IRQ> [ 282.080755] dump_stack+0x85/0xcb [ 282.081523] trace_event_ignore_this_pid+0x66/0x70 [ 282.082541] trace_event_raw_event_preemptirq_template+0xa2/0xb0 [ 282.083774] ? interrupt_entry+0xc4/0xe0 [ 282.084665] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 282.085669] trace_hardirqs_off_caller+0x90/0xd0 [ 282.086597] trace_hardirqs_off_thunk+0x1a/0x1c [ 282.087433] ? call_function_interrupt+0xa/0x20 [ 282.088201] interrupt_entry+0xc4/0xe0 [ 282.088848] ? call_function_interrupt+0xa/0x20 [ 282.089579] </IRQ> [ 282.090029] ? native_safe_halt+0x2/0x10 [ 282.090695] ? default_idle+0x1f/0x160 [ 282.091330] ? default_idle_call+0x24/0x40 [ 282.091997] ? do_idle+0x210/0x250 [ 282.092658] ? cpu_startup_entry+0x6f/0x80 [ 282.093338] ? start_kernel+0x49d/0x4bd [ 282.093987] ? secondary_startup_64+0xa5/0xb0 Link: http://lkml.kernel.org/r/20180803023407.225852-1-joel@joelfernandes.org Reported-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-29tracing: Do not show filter file for ftrace internal eventsSteven Rostedt (VMware)
The filter file in the ftrace internal events, like in /sys/kernel/tracing/events/ftrace/function/filter is not attached to any functionality. Do not create them as they are meaningless. In the future, if an ftrace internal event gets filter functionality, then it will need to create it directly. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-29tracing: Have event_trace_init() called by trace_init_tracefs()Steven Rostedt (VMware)
Instead of having both trace_init_tracefs() and event_trace_init() be called by fs_initcall() routines, have event_trace_init() called directly by trace_init_tracefs(). This will guarantee order of how the events are created with respect to the rest of the ftrace infrastructure. This is needed to be able to assoctiate event files with ftrace internal events, such as the trace_marker. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-29tracing: Add __find_event_file() to find event files without restrictionsSteven Rostedt (VMware)
By adding the function __find_event_file() that can search for files without restrictions, such as if the event associated with the file has a reg function, or if it has the "ignore" flag set, the files that are associated to ftrace internal events (like trace_marker and function events) can be found and used. find_event_file() still returns a "filtered" file, as most callers need a valid trace event file. One created by the trace_events.h macros and not one created for parsing ftrace specific events. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23tracing: Make sure the parsed string always terminates with '\0'Changbin Du
Always mark the parsed string with a terminated nul '\0' character. This removes the need for the users to have to append the '\0' before using the parsed string. Link: http://lkml.kernel.org/r/1516093350-12045-4-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18tracing: Fix converting enum's from the map in trace_event_eval_update()Steven Rostedt (VMware)
Since enums do not get converted by the TRACE_EVENT macro into their values, the event format displaces the enum name and not the value. This breaks tools like perf and trace-cmd that need to interpret the raw binary data. To solve this, an enum map was created to convert these enums into their actual numbers on boot up. This is done by TRACE_EVENTS() adding a TRACE_DEFINE_ENUM() macro. Some enums were not being converted. This was caused by an optization that had a bug in it. All calls get checked against this enum map to see if it should be converted or not, and it compares the call's system to the system that the enum map was created under. If they match, then they call is processed. To cut down on the number of iterations needed to find the maps with a matching system, since calls and maps are grouped by system, when a match is made, the index into the map array is saved, so that the next call, if it belongs to the same system as the previous call, could start right at that array index and not have to scan all the previous arrays. The problem was, the saved index was used as the variable to know if this is a call in a new system or not. If the index was zero, it was assumed that the call is in a new system and would keep incrementing the saved index until it found a matching system. The issue arises when the first matching system was at index zero. The next map, if it belonged to the same system, would then think it was the first match and increment the index to one. If the next call belong to the same system, it would begin its search of the maps off by one, and miss the first enum that should be converted. This left a single enum not converted properly. Also add a comment to describe exactly what that index was for. It took me a bit too long to figure out what I was thinking when debugging this issue. Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com Cc: stable@vger.kernel.org Fixes: 0c564a538aa93 ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values") Reported-by: Chuck Lever <chuck.lever@oracle.com> Teste-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-04tracing: Reverse the order of trace_types_lock and event_mutexSteven Rostedt (VMware)
In order to make future changes where we need to call tracing_set_clock() from within an event command, the order of trace_types_lock and event_mutex must be reversed, as the event command will hold event_mutex and the trace_types_lock is taken from within tracing_set_clock(). Link: http://lkml.kernel.org/r/20170921162249.0dde3dca@gandalf.local.home Requested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-09-05tracing: Fix clear of RECORDED_TGID flag when disabling trace eventChunyu Hu
When disabling one trace event, the RECORDED_TGID flag in the event file is not correctly cleared. It's clearing RECORDED_CMD flag when it should clear RECORDED_TGID flag. Link: http://lkml.kernel.org/r/1504589806-8425-1-git-send-email-chuhu@redhat.com Cc: Joel Fernandes <joelaf@google.com> Cc: stable@vger.kernel.org Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-08-31tracing: Only have rmmod clear buffers that its events were active inSteven Rostedt (VMware)
Currently, when a module event is enabled, when that module is removed, it clears all ring buffers. This is to prevent another module from being loaded and having one of its trace event IDs from reusing a trace event ID of the removed module. This could cause undesirable effects as the trace event of the new module would be using its own processing algorithms to process raw data of another event. To prevent this, when a module is loaded, if any of its events have been used (signified by the WAS_ENABLED event call flag, which is never cleared), all ring buffers are cleared, just in case any one of them contains event data of the removed event. The problem is, there's no reason to clear all ring buffers if only one (or less than all of them) uses one of the events. Instead, only clear the ring buffers that recorded the events of a module that is being removed. To do this, instead of keeping the WAS_ENABLED flag with the trace event call, move it to the per instance (per ring buffer) event file descriptor. The event file descriptor maps each event to a separate ring buffer instance. Then when the module is removed, only the ring buffers that activated one of the module's events get cleared. The rest are not touched. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-27tracing: Add support for recording tgid of tasksJoel Fernandes
Inorder to support recording of tgid, the following changes are made: * Introduce a new API (tracing_record_taskinfo) to additionally record the tgid along with the task's comm at the same time. This has has the benefit of not setting trace_cmdline_save before all the information for a task is saved. * Add a new API tracing_record_taskinfo_sched_switch to record task information for 2 tasks at a time (previous and next) and use it from sched_switch probe. * Preserve the old API (tracing_record_cmdline) and create it as a wrapper around the new one so that existing callers aren't affected. * Reuse the existing sched_switch and sched_wakeup probes to record tgid information and add a new option 'record-tgid' to enable recording of tgid When record-tgid option isn't enabled to being with, we take care to make sure that there's isn't memory or runtime overhead. Link: http://lkml.kernel.org/r/20170627020155.5139-1-joelaf@google.com Cc: kernel-team@android.com Cc: Ingo Molnar <mingo@redhat.com> Tested-by: Michael Sartain <mikesart@gmail.com> Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13tracing: Rename enum_replace to eval_replaceJeremy Linton
The enum_replace stanza works as is for sizeof() calls as well as enums. Rename it as well. Link: http://lkml.kernel.org/r/20170531215653.3240-9-jeremy.linton@arm.com Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13trace: rename enum_map functionsJeremy Linton
Rename the core trace enum routines to use eval, to reflect their use by more than just enum to value mapping. Link: http://lkml.kernel.org/r/20170531215653.3240-8-jeremy.linton@arm.com Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-13trace: rename trace_enum_map to trace_eval_mapJeremy Linton
Each enum is loaded into the trace_enum_map, as we are now using this for more than enums rename it. Link: http://lkml.kernel.org/r/20170531215653.3240-3-jeremy.linton@arm.com Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20tracing/ftrace: Add a better way to pass data via the probe functionsSteven Rostedt (VMware)
With the redesign of the registration and execution of the function probes (triggers), data can now be passed from the setup of the probe to the probe callers that are specific to the trace_array it is on. Although, all probes still only affect the toplevel trace array, this change will allow for instances to have their own probes separated from other instances and the top array. That is, something like the stacktrace probe can be set to trace only in an instance and not the toplevel trace array. This isn't implement yet, but this change sets the ground work for the change. When a probe callback is triggered (someone writes the probe format into set_ftrace_filter), it calls register_ftrace_function_probe() passing in init_data that will be used to initialize the probe. Then for every matching function, register_ftrace_function_probe() will call the probe_ops->init() function with the init data that was passed to it, as well as an address to a place holder that is associated with the probe and the instance. The first occurrence will have a NULL in the pointer. The init() function will then initialize it. If other probes are added, or more functions are part of the probe, the place holder will be passed to the init() function with the place holder data that it was initialized to the last time. Then this place_holder is passed to each of the other probe_ops functions, where it can be used in the function callback. When the probe_ops free() function is called, it can be called either with the rip of the function that is being removed from the probe, or zero, indicating that there are no more functions attached to the probe, and the place holder is about to be freed. This gives the probe_ops a way to free the data it assigned to the place holder if it was allocade during the first init call. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Dynamically create the probe ftrace_ops for the trace_arraySteven Rostedt (VMware)
In order to eventually have each trace_array instance have its own unique set of function probes (triggers), the trace array needs to hold the ops and the filters for the probes. This is the first step to accomplish this. Instead of having the private data of the probe ops point to the trace_array, create a separate list that the trace_array holds. There's only one private_data for a probe, we need one per trace_array. The probe ftrace_ops will be dynamically created for each instance, instead of being static. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20tracing: Pass the trace_array into ftrace_probe_ops functionsSteven Rostedt (VMware)
Pass the trace_array associated to a ftrace_probe_ops into the probe_ops func(), init() and free() functions. The trace_array is the descriptor that describes a tracing instance. This will help create the infrastructure that will allow having function probes unique to tracing instances. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20tracing: Have the trace_array hold the list of registered func probesSteven Rostedt (VMware)
Add a link list to the trace_array to hold func probes that are registered. Currently, all function probes are the same for all instances as it was before, that is, only the top level trace_array holds the function probes. But this lays the ground work to have function probes be attached to individual instances, and having the event trigger only affect events in the given instance. But that work is still to be done. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Have unregister_ftrace_function_probe_func() return a valueSteven Rostedt (VMware)
Currently unregister_ftrace_function_probe_func() is a void function. It does not give any feedback if an error occurred or no item was found to remove and nothing was done. Change it to return status and success if it removed something. Also update the callers to return that feedback to the user. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Remove data field from ftrace_func_probe structureSteven Rostedt (VMware)
No users of the function probes uses the data field anymore. Remove it, and change the init function to take a void *data parameter instead of a void **data, because the init will just get the data that the registering function was received, and there's no state after it is called. The other functions for ftrace_probe_ops still take the data parameter, but it will currently only be passed NULL. It will stay as a parameter for future data to be passed to these functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Added ftrace_func_mapper for function probe triggersSteven Rostedt (VMware)
In order to move the ops to the function probes directly, they need a way to map function ips to their own data without depending on the infrastructure of the function probes, as the data field will be going away. New helper functions are added that are based on the ftrace_hash code. ftrace_func_mapper functions are there to let the probes map ips to their data. These can be allocated by the probe ops, and referenced in the function callbacks. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Pass probe ops to probe functionSteven Rostedt (VMware)
In preparation to cleaning up the probe function registration code, the "data" parameter will eventually be removed from the probe->func() call. Instead it will receive its own "ops" function, in which it can set up its own data that it needs to map. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2016-12-09tracing: Have system enable return error if one of the events failSteven Rostedt (Red Hat)
If one of the events within a system fails to enable when "1" is written to the system "enable" file, it should return an error. Note, some events may still be enabled, but the user should know that something did go wrong. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23tracing: Make tracepoint_printk a static_keySteven Rostedt (Red Hat)
Currently, when tracepoint_printk is set (enabled by the "tp_printk" kernel command line), it causes trace events to print via printk(). This is a very dangerous operation, but is useful for debugging. The issue is, it's seldom used, but it is always checked even if it's not enabled by the kernel command line. Instead of having this feature called by a branch against a variable, turn that variable into a static key, and this will remove the test and jump. To simplify things, the functions output_printk() and trace_event_buffer_commit() were moved from trace_events.c to trace.c. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-22tracing: Add error checks to creation of event filesSteven Rostedt (Red Hat)
The creation of the set_event_pid file was assigned to a variable "entry" but that variable was never used. Ideally, it should be used to check if the file was created and warn if it was not. The files header_page, header_event should also be checked and a warning if they fail to be created. The "enable" file was moved up, as it is a more crucial file to have and a hard failure (return -ENOMEM) should be returned if it is not created. Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Show the preempt count of when the event was calledSteven Rostedt (Red Hat)
Because tracepoint callbacks are done with preemption enabled, the trace events are always called with preempt disable due to the rcu_read_lock_sched_notrace() in __DO_TRACE(). This causes the preempt count shown in the recorded trace event to be inaccurate. It is always one more that what the preempt_count was when the tracepoint was called. If CONFIG_PREEMPT is enabled, subtract 1 from the preempt_count before recording it in the trace buffer. Link: http://lkml.kernel.org/r/20160525132537.GA10808@linutronix.de Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move pid_list write processing into its own functionSteven Rostedt (Red Hat)
The addition of PIDs into a pid_list via the write operation of set_event_pid is a bit complex. The same operation will be needed for function tracing pids. Move the code into its own generic function in trace.c, so that we can avoid duplication of this code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move the pid_list seq_file functions to be globalSteven Rostedt (Red Hat)
To allow other aspects of ftrace to use the pid_list logic, we need to reuse the seq_file functions. Making the generic part into functions that can be called by other files will help in this regard. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move filtered_pid helper functions into trace.cSteven Rostedt
As the filtered_pid functions are going to be used by function tracer as well as trace_events, move the code into the generic trace.c file. The functions moved are: trace_find_filtered_pid() trace_ignore_this_task() trace_filter_add_remove_task() Kernel Doc text was also added. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Make the pid filtering helper functions globalSteven Rostedt
Make the functions used for pid filtering global for tracing, such that the function tracer can use the pid code as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-18Merge tag 'trace-v4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This includes two new updates for the ftrace infrastructure. - With the changing of the code for filtering events by pid, from a list of pids to a bitmask, we can now easily implement following forks. With a new tracing option "event-fork" which, when set, will have tasks with pids in set_event_pid, when they fork, to have their child pids added to set_event_pid and the child will be traced as well. Note, if "event-fork" is set and a task with its pid in set_event_pid exits, its pid will be removed from set_event_pid - The addition of Tom Zanussi's hist triggers. This includes a very thorough documentatino on how to use the hist triggers with events. This introduces a quick and easy way to get histogram data from events and their fields. Some other cleanups and updates were added as well. Like Masami Hiramatsu added test cases for the event trigger and hist triggers. Also I added a speed up of filtering by using a temp buffer when filters are set" * tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits) tracing: Use temp buffer when filtering events tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic tracing: Remove unused function trace_current_buffer_lock_reserve() tracing: Remove one use of trace_current_buffer_lock_reserve() tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL tracing: Remove unused function trace_current_buffer_discard_commit() tracing: Move trace_buffer_unlock_commit{_regs}() to local header tracing: Fold filter_check_discard() into its only user tracing: Make filter_check_discard() local tracing: Move event_trigger_unlock_commit{_regs}() to local header tracing: Don't use the address of the buffer array name in copy_from_user tracing: Handle tracing_map_alloc_elts() error path correctly tracing: Add check for NULL event field when creating hist field tracing: checking for NULL instead of IS_ERR() tracing: Do not inherit event-fork option for instances tracing: Fix unsigned comparison to zero in hist trigger code kselftests/ftrace: Add a test for log2 modifier of hist trigger tracing: Add hist trigger 'log2' modifier kselftests/ftrace: Add hist trigger testcases kselftests/ftrace : Add event trigger testcases ...
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03tracing: Use temp buffer when filtering eventsSteven Rostedt (Red Hat)
Filtering of events requires the data to be written to the ring buffer before it can be decided to filter or not. This is because the parameters of the filter are based on the result that is written to the ring buffer and not on the parameters that are passed into the trace functions. The ftrace ring buffer is optimized for writing into the ring buffer and committing. The discard procedure used when filtering decides the event should be discarded is much more heavy weight. Thus, using a temporary filter when filtering events can speed things up drastically. Without a temp buffer we have: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.790706626 seconds time elapsed ( +- 0.71% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.566904059 seconds time elapsed ( +- 0.27% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.690598511 seconds time elapsed ( +- 0.19% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.707486364 seconds time elapsed ( +- 0.30% ) The first run above is without any tracing, just to get a based figure. hackbench takes ~0.79 seconds to run on the system. The second run enables tracing all events where nothing is filtered. This increases the time by 100% and hackbench takes 1.57 seconds to run. The third run filters all events where the preempt count will equal "20" (this should never happen) thus all events are discarded. This takes 1.69 seconds to run. This is 10% slower than just committing the events! The last run enables all events and filters where the filter will commit all events, and this takes 1.70 seconds to run. The filtering overhead is approximately 10%. Thus, the discard and commit of an event from the ring buffer may be about the same time. With this patch, the numbers change: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.778233033 seconds time elapsed ( +- 0.38% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.582102692 seconds time elapsed ( +- 0.28% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.309230710 seconds time elapsed ( +- 0.22% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.786001924 seconds time elapsed ( +- 0.20% ) The first run is again the base with no tracing. The second run is all tracing with no filtering. It is a little slower, but that may be well within the noise. The third run shows that discarding all events only took 1.3 seconds. This is a speed up of 23%! The discard is much faster than even the commit. The one downside is shown in the last run. Events that are not discarded by the filter will take longer to add, this is due to the extra copy of the event. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>