summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2017-04-20ftrace: Have each function probe use its own ftrace_opsSteven Rostedt (VMware)
Have the function probes have their own ftrace_ops, and remove the trace_probe_ops. This simplifies some of the ftrace infrastructure code. Individual entries for each function is still allocated for the use of the output for set_ftrace_filter, but they will be removed soon too. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Have unregister_ftrace_function_probe_func() return a valueSteven Rostedt (VMware)
Currently unregister_ftrace_function_probe_func() is a void function. It does not give any feedback if an error occurred or no item was found to remove and nothing was done. Change it to return status and success if it removed something. Also update the callers to return that feedback to the user. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Add helper function ftrace_hash_move_and_update_ops()Steven Rostedt (VMware)
The processes of updating a ops filter_hash is a bit complex, and requires setting up an old hash to perform the update. This is done exactly the same in two locations for the same reasons. Create a helper function that does it in one place. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Remove data field from ftrace_func_probe structureSteven Rostedt (VMware)
No users of the function probes uses the data field anymore. Remove it, and change the init function to take a void *data parameter instead of a void **data, because the init will just get the data that the registering function was received, and there's no state after it is called. The other functions for ftrace_probe_ops still take the data parameter, but it will currently only be passed NULL. It will stay as a parameter for future data to be passed to these functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Remove printing of data in showing of a function probeSteven Rostedt (VMware)
None of the probe users uses the data field anymore of the entry. They all have their own print() function. Remove showing the data field in the generic function as the data field will be going away. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Remove unused unregister_ftrace_function_probe_all() functionSteven Rostedt (VMware)
There are no users of unregister_ftrace_function_probe_all(). The only probe function that is used is unregister_ftrace_function_probe_func(). Rename the internal static function __unregister_ftrace_function_probe() to unregister_ftrace_function_probe_func() and make it global. Also remove the PROBE_TEST_FUNC as it would be always set. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Remove unused unregister_ftrace_function_probe() functionSteven Rostedt (VMware)
Nothing calls unregister_ftrace_function_probe(). Remove it as well as the flag PROBE_TEST_DATA, as this function was the only one to set it. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Convert the rest of the function trigger over to the mapping functionsSteven Rostedt (VMware)
As the data pointer for individual ips will soon be removed and no longer passed to the callback function probe handlers, convert the rest of the function trigger counters over to the new ftrace_func_mapper helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20tracing: Have the snapshot trigger use the mapping helper functionsSteven Rostedt (VMware)
As the data pointer for individual ips will soon be removed and no longer passed to the callback function probe handlers, convert the snapshot trigger counter over to the new ftrace_func_mapper helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Added ftrace_func_mapper for function probe triggersSteven Rostedt (VMware)
In order to move the ops to the function probes directly, they need a way to map function ips to their own data without depending on the infrastructure of the function probes, as the data field will be going away. New helper functions are added that are based on the ftrace_hash code. ftrace_func_mapper functions are there to let the probes map ips to their data. These can be allocated by the probe ops, and referenced in the function callbacks. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Pass probe ops to probe functionSteven Rostedt (VMware)
In preparation to cleaning up the probe function registration code, the "data" parameter will eventually be removed from the probe->func() call. Instead it will receive its own "ops" function, in which it can set up its own data that it needs to map. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Remove unused "flags" field from struct ftrace_func_probeSteven Rostedt (VMware)
Nothing uses "flags" in the ftrace_func_probe descriptor. Remove it. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20ftrace: Move the function commands into the tracing directorySteven Rostedt (VMware)
As nothing outside the tracing directory uses the function command mechanism, I'm moving the prototypes out of the include/linux/ftrace.h and into the local kernel/trace/trace.h header. I plan on making them hook to the trace_array structure which is local to kernel/trace, and I do not want to expose it to the rest of the kernel. This requires that the command functions must also be local to tracing. But luckily nothing else uses them. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-20block: remove the errors field from struct requestChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20blktrace: remove the unused block_rq_abort tracepointChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
A function in kernel/bpf/syscall.c which got a bug fix in 'net' was moved to kernel/bpf/verifier.c in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-19ring-buffer: Have ring_buffer_iter_empty() return true when emptySteven Rostedt (VMware)
I noticed that reading the snapshot file when it is empty no longer gives a status. It suppose to show the status of the snapshot buffer as well as how to allocate and use it. For example: ># cat snapshot # tracer: nop # # # * Snapshot is allocated * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') But instead it just showed an empty buffer: ># cat snapshot # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | What happened was that it was using the ring_buffer_iter_empty() function to see if it was empty, and if it was, it showed the status. But that function was returning false when it was empty. The reason was that the iter header page was on the reader page, and the reader page was empty, but so was the buffer itself. The check only tested to see if the iter was on the commit page, but the commit page was no longer pointing to the reader page, but as all pages were empty, the buffer is also. Cc: stable@vger.kernel.org Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19tracing: Allocate the snapshot buffer before enabling probeSteven Rostedt (VMware)
Currently the snapshot trigger enables the probe and then allocates the snapshot. If the probe triggers before the allocation, it could cause the snapshot to fail and turn tracing off. It's best to allocate the snapshot buffer first, and then enable the trigger. If something goes wrong in the enabling of the trigger, the snapshot buffer is still allocated, but it can also be freed by the user by writting zero into the snapshot buffer file. Also add a check of the return status of alloc_snapshot(). Cc: stable@vger.kernel.org Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18ftrace: Move the probe function into the tracing directorySteven Rostedt (VMware)
As nothing outside the tracing directory uses the function probes mechanism, I'm moving the prototypes out of the include/linux/ftrace.h and into the local kernel/trace/trace.h header. I plan on making them hook to the trace_array structure which is local to kernel/trace, and I do not want to expose it to the rest of the kernel. This requires that the probe functions must also be local to tracing. But luckily nothing else uses them. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17ftrace: Add 'function-fork' trace optionNamhyung Kim
The function-fork option is same as event-fork that it tracks task fork/exit and set the pid filter properly. This can be useful if user wants to trace selected tasks including their children only. Link: http://lkml.kernel.org/r/20170417024430.21194-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17ftrace: Fix function pid filter on instancesNamhyung Kim
When function tracer has a pid filter, it adds a probe to sched_switch to track if current task can be ignored. The probe checks the ftrace_ignore_pid from current tr to filter tasks. But it misses to delete the probe when removing an instance so that it can cause a crash due to the invalid tr pointer (use-after-free). This is easily reproducible with the following: # cd /sys/kernel/debug/tracing # mkdir instances/buggy # echo $$ > instances/buggy/set_ftrace_pid # rmdir instances/buggy ============================================================================ BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90 Read of size 8 by task kworker/0:1/17 CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198 Call Trace: dump_stack+0x68/0x9f kasan_object_err+0x21/0x70 kasan_report.part.1+0x22b/0x500 ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90 kasan_report+0x25/0x30 __asan_load8+0x5e/0x70 ftrace_filter_pid_sched_switch_probe+0x3d/0x90 ? fpid_start+0x130/0x130 __schedule+0x571/0xce0 ... To fix it, use ftrace_clear_pids() to unregister the probe. As instance_rmdir() already updated ftrace codes, it can just free the filter safely. Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org Fixes: 0c8916c34203 ("tracing: Add rmdir to remove multibuffer instances") Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17tracing: Have the trace_event benchmark thread call cond_resched_rcu_qs()Steven Rostedt (VMware)
The trace_event benchmark thread runs in kernel space in an infinite loop while also calling cond_resched() in case anything else wants to schedule in. Unfortunately, on a PREEMPT kernel, that makes it a nop, in which case, this will never voluntarily schedule. That will cause synchronize_rcu_tasks() to forever block on this thread, while it is running. This is exactly what cond_resched_rcu_qs() is for. Use that instead. Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17ftrace: Fix indexing of t_hash_start() from t_next()Steven Rostedt (VMware)
t_hash_start() does not increment *pos, where as t_next() must. But when t_next() does increment *pos, it must still pass in the original *pos to t_hash_start() otherwise it will skip the first instance: # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo call_rcu > set_ftrace_filter # cat set_ftrace_filter call_rcu schedule:traceoff:unlimited do_IRQ:traceoff:unlimited The above called t_hash_start() from t_start() as there was only one function (call_rcu), but if we add another function: # echo xfrm_policy_destroy_rcu >> set_ftrace_filter # cat set_ftrace_filter call_rcu xfrm_policy_destroy_rcu do_IRQ:traceoff:unlimited The "schedule:traceoff" disappears. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-15ftrace: Fix removing of second function probeSteven Rostedt (VMware)
When two function probes are added to set_ftrace_filter, and then one of them is removed, the update to the function locations is not performed, and the record keeping of the function states are corrupted, and causes an ftrace_bug() to occur. This is easily reproducable by adding two probes, removing one, and then adding it back again. # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter Causes: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220 Modules linked in: [...] CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: dump_stack+0x68/0x9f __warn+0x111/0x130 ? trace_irq_work_interrupt+0xa0/0xa0 warn_slowpath_null+0x1d/0x20 ftrace_get_addr_curr+0x143/0x220 ? __fentry__+0x10/0x10 ftrace_replace_code+0xe3/0x4f0 ? ftrace_int3_handler+0x90/0x90 ? printk+0x99/0xb5 ? 0xffffffff81000000 ftrace_modify_all_code+0x97/0x110 arch_ftrace_update_code+0x10/0x20 ftrace_run_update_code+0x1c/0x60 ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0 register_ftrace_function_probe+0x4b6/0x590 ? ftrace_startup+0x310/0x310 ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30 ? update_stack_state+0x88/0x110 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? preempt_count_sub+0x18/0xd0 ? mutex_lock_nested+0x104/0x800 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? __unwind_start+0x1c0/0x1c0 ? _mutex_lock_nest_lock+0x800/0x800 ftrace_trace_probe_callback.isra.3+0xc0/0x130 ? func_set_flag+0xe0/0xe0 ? __lock_acquire+0x642/0x1790 ? __might_fault+0x1e/0x20 ? trace_get_user+0x398/0x470 ? strcmp+0x35/0x60 ftrace_trace_onoff_callback+0x48/0x70 ftrace_regex_write.isra.43.part.44+0x251/0x320 ? match_records+0x420/0x420 ftrace_filter_write+0x2b/0x30 __vfs_write+0xd7/0x330 ? do_loop_readv_writev+0x120/0x120 ? locks_remove_posix+0x90/0x2f0 ? do_lock_file_wait+0x160/0x160 ? __lock_is_held+0x93/0x100 ? rcu_read_lock_sched_held+0x5c/0xb0 ? preempt_count_sub+0x18/0xd0 ? __sb_start_write+0x10a/0x230 ? vfs_write+0x222/0x240 vfs_write+0xef/0x240 SyS_write+0xab/0x130 ? SyS_read+0x130/0x130 ? trace_hardirqs_on_caller+0x182/0x280 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fe61c157c30 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c ? trace_hardirqs_off_caller+0xc0/0x110 ---[ end trace 99fa09b3d9869c2c ]--- Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150) Cc: stable@vger.kernel.org Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-14ftrace: Fix removing of second function probeSteven Rostedt (VMware)
When two function probes are added to set_ftrace_filter, and then one of them is removed, the update to the function locations is not performed, and the record keeping of the function states are corrupted, and causes an ftrace_bug() to occur. This is easily reproducable by adding two probes, removing one, and then adding it back again. # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter Causes: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220 Modules linked in: [...] CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: dump_stack+0x68/0x9f __warn+0x111/0x130 ? trace_irq_work_interrupt+0xa0/0xa0 warn_slowpath_null+0x1d/0x20 ftrace_get_addr_curr+0x143/0x220 ? __fentry__+0x10/0x10 ftrace_replace_code+0xe3/0x4f0 ? ftrace_int3_handler+0x90/0x90 ? printk+0x99/0xb5 ? 0xffffffff81000000 ftrace_modify_all_code+0x97/0x110 arch_ftrace_update_code+0x10/0x20 ftrace_run_update_code+0x1c/0x60 ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0 register_ftrace_function_probe+0x4b6/0x590 ? ftrace_startup+0x310/0x310 ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30 ? update_stack_state+0x88/0x110 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? preempt_count_sub+0x18/0xd0 ? mutex_lock_nested+0x104/0x800 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? __unwind_start+0x1c0/0x1c0 ? _mutex_lock_nest_lock+0x800/0x800 ftrace_trace_probe_callback.isra.3+0xc0/0x130 ? func_set_flag+0xe0/0xe0 ? __lock_acquire+0x642/0x1790 ? __might_fault+0x1e/0x20 ? trace_get_user+0x398/0x470 ? strcmp+0x35/0x60 ftrace_trace_onoff_callback+0x48/0x70 ftrace_regex_write.isra.43.part.44+0x251/0x320 ? match_records+0x420/0x420 ftrace_filter_write+0x2b/0x30 __vfs_write+0xd7/0x330 ? do_loop_readv_writev+0x120/0x120 ? locks_remove_posix+0x90/0x2f0 ? do_lock_file_wait+0x160/0x160 ? __lock_is_held+0x93/0x100 ? rcu_read_lock_sched_held+0x5c/0xb0 ? preempt_count_sub+0x18/0xd0 ? __sb_start_write+0x10a/0x230 ? vfs_write+0x222/0x240 vfs_write+0xef/0x240 SyS_write+0xab/0x130 ? SyS_read+0x130/0x130 ? trace_hardirqs_on_caller+0x182/0x280 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fe61c157c30 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c ? trace_hardirqs_off_caller+0xc0/0x110 ---[ end trace 99fa09b3d9869c2c ]--- Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150) Cc: stable@vger.kernel.org Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-11bpf: remove struct bpf_prog_type_listJohannes Berg
There's no need to have struct bpf_prog_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. In order to save duplicating the list, move it to a new header file and include it in the places needing it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11Merge tag 'v4.11-rc6' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-10rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not workSteven Rostedt (VMware)
Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when it needs to use rcu_read_lock() and friends. This is because tracing can happen as RCU is about to enter user space, or about to go idle, and RCU does not watch for RCU read side critical sections as it makes the transition. There is a small location within the RCU infrastructure that rcu_irq_enter() itself will not work. If tracing were to occur in that section it will break if it tries to use rcu_irq_enter(). Originally, this happens with the stack_tracer, because it will call save_stack_trace when it encounters stack usage that is greater than any stack usage it had encountered previously. There was a case where that happened in the RCU section where rcu_irq_enter() did not work, and lockdep complained loudly about it. To fix it, stack tracing added a call to be disabled and RCU would disable stack tracing during the critical section that rcu_irq_enter() was inoperable. This solution worked, but there are other cases that use rcu_irq_enter() and it would be a good idea to let RCU give a way to let others know that rcu_irq_enter() will not work. For example, in trace events. Another helpful aspect of this change is that it also moves the per cpu variable called in the RCU critical section into a cache locale along with other RCU per cpu variables used in that same location. I'm keeping the stack_trace_disable() code, as that still could be used in the future by places that really need to disable it. And since it's only a static inline, it wont take up any kernel text if it is not used. Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10tracing: Rename trace_active to disable_stack_tracer and inline its modificationSteven Rostedt (VMware)
In order to eliminate a function call, make "trace_active" into "disable_stack_tracer" and convert stack_tracer_disable() and friends into static inline functions. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10tracing: Add stack_tracer_disable/enable() functionsSteven Rostedt (VMware)
There are certain parts of the kernel that cannot let stack tracing proceed (namely in RCU), because the stack tracer uses RCU, and parts of RCU internals cannot handle having RCU read side locks taken. Add stack_tracer_disable() and stack_tracer_enable() functions to let RCU stop stack tracing on the current CPU when it is in those critical sections. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10tracing: Replace the per_cpu() with __this_cpu*() in trace_stack.cSteven Rostedt (VMware)
The updates to the trace_active per cpu variable can be updated with the __this_cpu_*() functions as it only gets updated on the CPU that the variable is on. Thanks to Paul McKenney for suggesting __this_cpu_* instead of this_cpu_*. Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-07ftrace: Add use of synchronize_rcu_tasks() with dynamic trampolinesSteven Rostedt (VMware)
The function tracer needs to be more careful than other subsystems when it comes to freeing data. Especially if that data is actually executable code. When a single function is traced, a trampoline can be dynamically allocated which is called to jump to the function trace callback. When the callback is no longer needed, the dynamic allocated trampoline needs to be freed. This is where the issues arise. The dynamically allocated trampoline must not be used again. As function tracing can trace all subsystems, including subsystems that are used to serialize aspects of freeing (namely RCU), it must take extra care when doing the freeing. Before synchronize_rcu_tasks() was around, there was no way for the function tracer to know that nothing was using the dynamically allocated trampoline when CONFIG_PREEMPT was enabled. That's because a task could be indefinitely preempted while sitting on the trampoline. Now with synchronize_rcu_tasks(), it will wait till all tasks have either voluntarily scheduled (not on the trampoline) or goes into userspace (not on the trampoline). Then it is safe to free the trampoline even with CONFIG_PREEMPT set. Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-05ring-buffer: Fix return value check in test_ringbuffer()Wei Yongjun
In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com Cc: stable@vger.kernel.org Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest") Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-04tracing/kprobes: expose maxactive for kretprobe in kprobe_eventsAlban Crequy
When a kretprobe is installed on a kernel function, there is a maximum limit of how many calls in parallel it can catch (aka "maxactive"). A kernel module could call register_kretprobe() and initialize maxactive (see example in samples/kprobes/kretprobe_example.c). But that is not exposed to userspace and it is currently not possible to choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events The default maxactive can be as low as 1 on single-core with a non-preemptive kernel. This is too low and we need to increase it not only for recursive functions, but for functions that sleep or resched. This patch updates the format of the command that can be written to kprobe_events so that maxactive can be optionally specified. I need this for a bpf program attached to the kretprobe of inet_csk_accept, which can sleep for a long time. This patch includes a basic selftest: > # ./ftracetest -v test.d/kprobe/ > === Ftrace unit tests === > [1] Kprobe dynamic event - adding and removing [PASS] > [2] Kprobe dynamic event - busy event check [PASS] > [3] Kprobe dynamic event with arguments [PASS] > [4] Kprobes event arguments with types [PASS] > [5] Kprobe dynamic event with function tracer [PASS] > [6] Kretprobe dynamic event with arguments [PASS] > [7] Kretprobe dynamic event with maxactive [PASS] > > # of passed: 7 > # of failed: 0 > # of unresolved: 0 > # of untested: 0 > # of unsupported: 0 > # of xfailed: 0 > # of undefined(test bug): 0 BugLink: https://github.com/iovisor/bcc/issues/1072 Link: http://lkml.kernel.org/r/1491215782-15490-1-git-send-email-alban@kinvolk.io Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alban Crequy <alban@kinvolk.io> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-03ftrace: Have init/main.c call ftrace directly to free init memorySteven Rostedt (VMware)
Relying on free_reserved_area() to call ftrace to free init memory proved to not be sufficient. The issue is that on x86, when debug_pagealloc is enabled, the init memory is not freed, but simply set as not present. Since ftrace was uninformed of this, starting function tracing still tries to update pages that are not present according to the page tables, causing ftrace to bug, as well as killing the kernel itself. Instead of relying on free_reserved_area(), have init/main.c call ftrace directly just before it frees the init memory. Then it needs to use __init_begin and __init_end to know where the init memory location is. Looking at all archs (and testing what I can), it appears that this should work for each of them. Reported-by: kernel test robot <xiaolong.ye@intel.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-02Merge branch 'parisc-4.11-3' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc
2017-03-31ftrace: Create separate t_func_next() to simplify the function / hash logicSteven Rostedt (VMware)
I noticed that if I use dd to read the set_ftrace_filter file that the first hash command is repeated. # cd /sys/kernel/debug/tracing # echo schedule > set_ftrace_filter # echo do_IRQ >> set_ftrace_filter # echo schedule:traceoff >> set_ftrace_filter # echo do_IRQ:traceoff >> set_ftrace_filter # cat set_ftrace_filter schedule do_IRQ schedule:traceoff:unlimited do_IRQ:traceoff:unlimited # dd if=set_ftrace_filter bs=1 schedule do_IRQ schedule:traceoff:unlimited schedule:traceoff:unlimited do_IRQ:traceoff:unlimited 98+0 records in 98+0 records out 98 bytes copied, 0.00265011 s, 37.0 kB/s This is due to the way t_start() calls t_next() as well as the seq_file calls t_next() and the state is slightly different between the two. Namely, t_start() will call t_next() with a local "pos" variable. By separating out the function listing from t_next() into its own function, we can have better control of outputting the functions and the hash of triggers. This simplifies the code. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31ftrace: Update func_pos in t_start() when all functions are enabledSteven Rostedt (VMware)
If all functions are enabled, there's a comment displayed in the file to denote that: # cd /sys/kernel/debug/tracing # cat set_ftrace_filter #### all functions enabled #### If a function trigger is set, those are displayed as well: # echo schedule:traceoff >> /debug/tracing/set_ftrace_filter # cat set_ftrace_filter #### all functions enabled #### schedule:traceoff:unlimited But if you read that file with dd, the output can change: # dd if=/debug/tracing/set_ftrace_filter bs=1 #### all functions enabled #### 32+0 records in 32+0 records out 32 bytes copied, 7.0237e-05 s, 456 kB/s This is because the "pos" variable is updated for the comment, but func_pos is not. "func_pos" is used by the triggers (or hashes) to know how many functions were printed and it bases its index from the pos - func_pos. func_pos should be 1 to count for the comment printed. But since it is not, t_hash_start() thinks that one trigger was already printed. The cat gets to t_hash_start() via t_next() and not t_start() which updates both pos and func_pos. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31ftrace: Return NULL at end of t_start() instead of calling t_hash_start()Steven Rostedt (VMware)
The loop in t_start() of calling t_next() will call t_hash_start() if the pos is beyond the functions and enters the hash items. There's no reason to check if p is NULL and call t_hash_start(), as that would be redundant. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31ftrace: Assign iter->hash to filter or notrace hashes on seq readSteven Rostedt (VMware)
Instead of testing if the hash to use is the filter_hash or the notrace_hash at each iteration, do the test at open, and set the iter->hash to point to the corresponding filter or notrace hash. Then use that directly instead of testing which hash needs to be used each iteration. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-31ftrace: Clean up __seq_open_private() return checkSteven Rostedt (VMware)
The return status check of __seq_open_private() is rather strange: iter = __seq_open_private(); if (iter) { /* do stuff */ } return iter ? 0 : -ENOMEM; It makes much more sense to do the return of failure right away: iter = __seq_open_private(); if (!iter) return -ENOMEM; /* do stuff */ return 0; This clean up will make updates to this code a bit nicer. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-28new helper: uaccess_kernel()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-24tracing: Move trace_handle_return() out of lineSteven Rostedt (VMware)
Currently trace_handle_return() looks like this: static inline enum print_line_t trace_handle_return(struct trace_seq *s) { return trace_seq_has_overflowed(s) ? TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED; } Where trace_seq_overflowed(s) is: static inline bool trace_seq_has_overflowed(struct trace_seq *s) { return s->full || seq_buf_has_overflowed(&s->seq); } And seq_buf_has_overflowed(&s->seq) is: static inline bool seq_buf_has_overflowed(struct seq_buf *s) { return s->len > s->size; } Making trace_handle_return() into: return (s->full || (s->seq->len > s->seq->size)) ? TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED; One would think this is not an issue to keep as an inline. But because this is used in the TRACE_EVENT() macro, it is extended for every tracepoint in the system. Taking a look at a single tracepoint x86_irq_vector (was the first one I randomly chosen). As trace_handle_return is used in the TRACE_EVENT() macro of trace_raw_output_##call() we disassemble trace_raw_output_x86_irq_vector and do a diff: - is the original + is the out-of-line code I removed identical lines that were different just due to different addresses. --- /tmp/irq-vec-orig 2017-03-16 09:12:48.569384851 -0400 +++ /tmp/irq-vec-ool 2017-03-16 09:13:39.378153385 -0400 @@ -6,27 +6,23 @@ 53 push %rbx 48 89 fb mov %rdi,%rbx 4c 8b a7 c0 20 00 00 mov 0x20c0(%rdi),%r12 e8 f7 72 13 00 callq ffffffff81155c80 <trace_raw_output_prep> 83 f8 01 cmp $0x1,%eax 74 05 je ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23> 5b pop %rbx 41 5c pop %r12 5d pop %rbp c3 retq 41 8b 54 24 08 mov 0x8(%r12),%edx - 48 8d bb 98 10 00 00 lea 0x1098(%rbx),%rdi + 48 81 c3 98 10 00 00 add $0x1098,%rbx - 48 c7 c6 7b 8a a0 81 mov $0xffffffff81a08a7b,%rsi + 48 c7 c6 ab 8a a0 81 mov $0xffffffff81a08aab,%rsi - e8 c5 85 13 00 callq ffffffff81156f70 <trace_seq_printf> === here's the start of the main difference === + 48 89 df mov %rbx,%rdi + e8 62 7e 13 00 callq ffffffff81156810 <trace_seq_printf> - 8b 93 b8 20 00 00 mov 0x20b8(%rbx),%edx - 31 c0 xor %eax,%eax - 85 d2 test %edx,%edx - 75 11 jne ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58> - 48 8b 83 a8 20 00 00 mov 0x20a8(%rbx),%rax - 48 39 83 a0 20 00 00 cmp %rax,0x20a0(%rbx) - 0f 93 c0 setae %al + 48 89 df mov %rbx,%rdi + e8 4a c5 12 00 callq ffffffff8114af00 <trace_handle_return> 5b pop %rbx - 0f b6 c0 movzbl %al,%eax === end === 41 5c pop %r12 5d pop %rbp c3 retq If you notice, the original has 22 bytes of text more than the out of line version. As this is for every TRACE_EVENT() defined in the system, this can become quite large. text data bss dec hex filename 8690305 5450490 1298432 15439227 eb957b vmlinux-orig 8681725 5450490 1298432 15430647 eb73f7 vmlinux-handle This change has a total of 8580 bytes in savings. $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l 324 That's 324 tracepoints. But this does not include modules (which contain many more tracepoints). For an allyesconfig build: $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l 1401 That's 1401 tracepoints giving us: text data bss dec hex filename 137920629 140221067 53264384 331406080 13c0db00 vmlinux-allyes-orig 137827709 140221067 53264384 331313160 13bf7008 vmlinux-allyes-handle 92920 bytes in savings!!! Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.org Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24ftrace: Allow for function tracing to record init functions on boot upSteven Rostedt (VMware)
Adding a hook into free_reserve_area() that informs ftrace that boot up init text is being free, lets ftrace safely remove those init functions from its records, which keeps ftrace from trying to modify text that no longer exists. Note, this still does not allow for tracing .init text of modules, as modules require different work for freeing its init code. Link: http://lkml.kernel.org/r/1488502497.7212.24.camel@linux.intel.com Cc: linux-mm@kvack.org Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Requested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24ftrace: Have function tracing start in early boot upSteven Rostedt (VMware)
Register the function tracer right after the tracing buffers are initialized in early boot up. This will allow function tracing to begin early if it is enabled via the kernel command line. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24tracing: Postpone tracer start-up tests till the system is more robustSteven Rostedt (VMware)
As tracing can now be enabled very early in boot up, even before some critical system services (like scheduling), do not run the tracer selftests until after early_initcall() is performed. If a tracer is registered before such time, it is saved off in a list and the test is run when the system is able to handle more diverse functions. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24tracing: Split tracing initialization into two for early initializationSteven Rostedt (VMware)
Create an early_trace_init() function that will initialize the buffers and allow for ealier use of trace_printk(). This will also allow for future work to have function tracing start earlier at boot up. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-16Merge tag 'perf-core-for-mingo-4.12-20170316' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New features: - Add 'brstackinsn' field in 'perf script' to reuse the x86 instruction decoder used in the Intel PT code to study hot paths to samples (Andi Kleen) Kernel changes: - Default UPROBES_EVENTS to Y (Alexei Starovoitov) - Fix check for kretprobe offset within function entry (Naveen N. Rao) Infrastructure changes: - Introduce util func is_sdt_event() (Ravi Bangoria) - Make perf_event__synthesize_mmap_events() scale on older kernels where reading /proc/pid/maps is way slower than reading /proc/pid/task/pid/maps (Stephane Eranian) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16uprobes: Default UPROBES_EVENTS to YArnaldo Carvalho de Melo
As it is already turned on by most distros, so just flip the default to Y. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: David Miller <davem@davemloft.net> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170316005817.GA6805@ast-mbp.thefacebook.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>