summaryrefslogtreecommitdiff
path: root/kernel/kprobes.c
AgeCommit message (Collapse)Author
2020-11-07Merge branch 'linus' into perf/kprobesIngo Molnar
Merge recent kprobes updates into perf/kprobes that came from -mm. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-10-12kprobes: Replace rp->free_instance with freelistPeter Zijlstra
Gets rid of rp->lock, and as a result kretprobes are now fully lockless. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/159870623583.1229682.17472357584134058687.stgit@devnote2
2020-10-12kprobes: Remove kretprobe hashPeter Zijlstra
The kretprobe hash is mostly superfluous, replace it with a per-task variable. This gets rid of the task hash and it's related locking. Note that this may change the kprobes module-exported API for kretprobe handlers. If any out-of-tree kretprobe user uses ri->rp, use get_kretprobe(ri) instead. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/159870620431.1229682.16325792502413731312.stgit@devnote2
2020-09-21kprobes: Init kprobes in early_initcallMasami Hiramatsu
Init kprobes feature in early_initcall as same as jump_label and dynamic_debug does, so that we can use kprobes events in earlier boot stage. Link: https://lkml.kernel.org/r/159974151897.478751.8342374158615496628.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18kprobes: tracing/kprobes: Fix to kill kprobes on initmem after bootMasami Hiramatsu
Since kprobe_event= cmdline option allows user to put kprobes on the functions in initmem, kprobe has to make such probes gone after boot. Currently the probes on the init functions in modules will be handled by module callback, but the kernel init text isn't handled. Without this, kprobes may access non-exist text area to disable or remove it. Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2 Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter") Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()Masami Hiramatsu
Commit 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") fixed one bug but not completely fixed yet. If we run a kprobe_module.tc of ftracetest, kernel showed a warning as below. # ./ftracetest test.d/kprobe/kprobe_module.tc === Ftrace unit tests === [1] Kprobe dynamic event - probing module ... [ 22.400215] ------------[ cut here ]------------ [ 22.400962] Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2) [ 22.402139] WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0 [ 22.403358] Modules linked in: trace_printk(-) [ 22.404028] CPU: 7 PID: 200 Comm: rmmod Not tainted 5.9.0-rc2+ #66 [ 22.404870] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 22.406139] RIP: 0010:__disarm_kprobe_ftrace.isra.0+0x7e/0xa0 [ 22.406947] Code: 30 8b 03 eb c9 80 3d e5 09 1f 01 00 75 dc 49 8b 34 24 89 c2 48 c7 c7 a0 c2 05 82 89 45 e4 c6 05 cc 09 1f 01 01 e8 a9 c7 f0 ff <0f> 0b 8b 45 e4 eb b9 89 c6 48 c7 c7 70 c2 05 82 89 45 e4 e8 91 c7 [ 22.409544] RSP: 0018:ffffc90000237df0 EFLAGS: 00010286 [ 22.410385] RAX: 0000000000000000 RBX: ffffffff83066024 RCX: 0000000000000000 [ 22.411434] RDX: 0000000000000001 RSI: ffffffff810de8d3 RDI: ffffffff810de8d3 [ 22.412687] RBP: ffffc90000237e10 R08: 0000000000000001 R09: 0000000000000001 [ 22.413762] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88807c478640 [ 22.414852] R13: ffffffff8235ebc0 R14: ffffffffa00060c0 R15: 0000000000000000 [ 22.415941] FS: 00000000019d48c0(0000) GS:ffff88807d7c0000(0000) knlGS:0000000000000000 [ 22.417264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.418176] CR2: 00000000005bb7e3 CR3: 0000000078f7a000 CR4: 00000000000006a0 [ 22.419309] Call Trace: [ 22.419990] kill_kprobe+0x94/0x160 [ 22.420652] kprobes_module_callback+0x64/0x230 [ 22.421470] notifier_call_chain+0x4f/0x70 [ 22.422184] blocking_notifier_call_chain+0x49/0x70 [ 22.422979] __x64_sys_delete_module+0x1ac/0x240 [ 22.423733] do_syscall_64+0x38/0x50 [ 22.424366] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 22.425176] RIP: 0033:0x4bb81d [ 22.425741] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48 [ 22.428726] RSP: 002b:00007ffc70fef008 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 [ 22.430169] RAX: ffffffffffffffda RBX: 00000000019d48a0 RCX: 00000000004bb81d [ 22.431375] RDX: 0000000000000000 RSI: 0000000000000880 RDI: 00007ffc70fef028 [ 22.432543] RBP: 0000000000000880 R08: 00000000ffffffff R09: 00007ffc70fef320 [ 22.433692] R10: 0000000000656300 R11: 0000000000000246 R12: 00007ffc70fef028 [ 22.434635] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000 [ 22.435682] irq event stamp: 1169 [ 22.436240] hardirqs last enabled at (1179): [<ffffffff810df542>] console_unlock+0x422/0x580 [ 22.437466] hardirqs last disabled at (1188): [<ffffffff810df19b>] console_unlock+0x7b/0x580 [ 22.438608] softirqs last enabled at (866): [<ffffffff81c0038e>] __do_softirq+0x38e/0x490 [ 22.439637] softirqs last disabled at (859): [<ffffffff81a00f42>] asm_call_on_stack+0x12/0x20 [ 22.440690] ---[ end trace 1e7ce7e1e4567276 ]--- [ 22.472832] trace_kprobe: This probe might be able to register after target module is loaded. Continue. This is because the kill_kprobe() calls disarm_kprobe_ftrace() even if the given probe is not enabled. In that case, ftrace_set_filter_ip() fails because the given probe point is not registered to ftrace. Fix to check the given (going) probe is enabled before invoking disarm_kprobe_ftrace(). Link: https://lkml.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2 Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-14kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()Masami Hiramatsu
Commit: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") fixed one bug but the underlying bugs are not completely fixed yet. If we run a kprobe_module.tc of ftracetest, a warning triggers: # ./ftracetest test.d/kprobe/kprobe_module.tc === Ftrace unit tests === [1] Kprobe dynamic event - probing module ... ------------[ cut here ]------------ Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2) WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0 This is because the kill_kprobe() calls disarm_kprobe_ftrace() even if the given probe is not enabled. In that case, ftrace_set_filter_ip() fails because the given probe point is not registered to ftrace. Fix to check the given (going) probe is enabled before invoking disarm_kprobe_ftrace(). Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2
2020-09-08kprobes: Make local functions staticMasami Hiramatsu
Since we unified the kretprobe trampoline handler from arch/* code, some functions and objects do not need to be exported anymore. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/159870618256.1229682.8692046612635810882.stgit@devnote2
2020-09-08kprobes: Free kretprobe_instance with RCU callbackMasami Hiramatsu
Free kretprobe_instance with RCU callback instead of directly freeing the object in the kretprobe handler context. This will make kretprobe run safer in NMI context. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/159870616685.1229682.11978742048709542226.stgit@devnote2
2020-09-08kprobes: Remove NMI context checkMasami Hiramatsu
The in_nmi() check in pre_handler_kretprobe() is meant to avoid recursion, and blindly assumes that anything NMI is recursive. However, since commit: 9b38cc704e84 ("kretprobe: Prevent triggering kretprobe from within kprobe_flush_task") there is a better way to detect and avoid actual recursion. By setting a dummy kprobe, any actual exceptions will terminate early (by trying to handle the dummy kprobe), and recursion will not happen. Employ this to avoid the kretprobe_table_lock() recursion, replacing the over-eager in_nmi() check. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/159870615628.1229682.6087311596892125907.stgit@devnote2
2020-09-08kprobes: Add generic kretprobe trampoline handlerMasami Hiramatsu
Add a generic kretprobe trampoline handler for unifying the all cloned /arch/* kretprobe trampoline handlers. The generic kretprobe trampoline handler is based on the x86 implementation, because it is the latest implementation. It has frame pointer checking, kprobe_busy_begin/end and return address fixup for user handlers. [ mingo: Minor edits. ] Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/159870600138.1229682.3424065380448088833.stgit@devnote2
2020-08-07Merge tag 'trace-v5.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - The biggest news in that the tracing ring buffer can now time events that interrupted other ring buffer events. Before this change, if an interrupt came in while recording another event, and that interrupt also had an event, those events would all have the same time stamp as the event it interrupted. Now, with the new design, those events will have a unique time stamp and rightfully display the time for those events that were recorded while interrupting another event. - Bootconfig how has an "override" operator that lets the users have a default config, but then add options to override the default. - A fix was made to properly filter function graph tracing to the ftrace PIDs. This came in at the end of the -rc cycle, and needs to be backported. - Several clean ups, performance updates, and minor fixes as well. * tag 'trace-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (39 commits) tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE tracing: Use trace_sched_process_free() instead of exit() for pid tracing bootconfig: Fix to find the initargs correctly Documentation: bootconfig: Add bootconfig override operator tools/bootconfig: Add testcases for value override operator lib/bootconfig: Add override operator support kprobes: Remove show_registers() function prototype tracing/uprobe: Remove dead code in trace_uprobe_register() kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler ftrace: Fix ftrace_trace_task return value tracepoint: Use __used attribute definitions from compiler_attributes.h tracepoint: Mark __tracepoint_string's __used trace : Have tracing buffer info use kvzalloc instead of kzalloc tracing: Remove outdated comment in stack handling ftrace: Do not let direct or IPMODIFY ftrace_ops be added to module and set trampolines ftrace: Setup correct FTRACE_FL_REGS flags for module tracing/hwlat: Honor the tracing_cpumask tracing/hwlat: Drop the duplicate assignment in start_kthread() tracing: Save one trace_event->type by using __TRACE_LAST_TYPE ...
2020-08-06kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACEMuchun Song
Fix compiler warning(as show below) for !CONFIG_KPROBES_ON_FTRACE. kernel/kprobes.c: In function 'kill_kprobe': kernel/kprobes.c:1116:33: warning: statement with no effect [-Wunused-value] 1116 | #define disarm_kprobe_ftrace(p) (-ENODEV) | ^ kernel/kprobes.c:2154:3: note: in expansion of macro 'disarm_kprobe_ftrace' 2154 | disarm_kprobe_ftrace(p); Link: https://lore.kernel.org/r/20200805142136.0331f7ea@canb.auug.org.au Link: https://lkml.kernel.org/r/20200805172046.19066-1-songmuchun@bytedance.com Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-03kprobes: Fix NULL pointer dereference at kprobe_ftrace_handlerMuchun Song
We found a case of kernel panic on our server. The stack trace is as follows(omit some irrelevant information): BUG: kernel NULL pointer dereference, address: 0000000000000080 RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0 RSP: 0018:ffffb512c6550998 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000 RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0 RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ftrace_ops_assist_func+0x56/0xe0 ftrace_call+0x5/0x34 tcpa_statistic_send+0x5/0x130 [ttcp_engine] The tcpa_statistic_send is the function being kprobed. After analysis, the root cause is that the fourth parameter regs of kprobe_ftrace_handler is NULL. Why regs is NULL? We use the crash tool to analyze the kdump. crash> dis tcpa_statistic_send -r <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller> The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller. So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler is NULL. In theory, we should call the ftrace_regs_caller instead of the ftrace_caller. After in-depth analysis, we found a reproducible path. Writing a simple kernel module which starts a periodic timer. The timer's handler is named 'kprobe_test_timer_handler'. The module name is kprobe_test.ko. 1) insmod kprobe_test.ko 2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}' 3) echo 0 > /proc/sys/kernel/ftrace_enabled 4) rmmod kprobe_test 5) stop step 2) kprobe 6) insmod kprobe_test.ko 7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}' We mark the kprobe as GONE but not disarm the kprobe in the step 4). The step 5) also do not disarm the kprobe when unregister kprobe. So we do not remove the ip from the filter. In this case, when the module loads again in the step 6), we will replace the code to ftrace_caller via the ftrace_module_enable(). When we register kprobe again, we will not replace ftrace_caller to ftrace_regs_caller because the ftrace is disabled in the step 3). So the step 7) will trigger kernel panic. Fix this problem by disarming the kprobe when the module is going away. Link: https://lkml.kernel.org/r/20200728064536.24405-1-songmuchun@bytedance.com Cc: stable@vger.kernel.org Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-28kprobes: Remove unnecessary module_mutex locking from kprobe_optimizer()Masami Hiramatsu
Since we already lock both kprobe_mutex and text_mutex in the optimizer, text will not be changed and the module unloading will be stopped inside kprobes_module_callback(). The mutex_lock() has originally been introduced to avoid conflict with text modification, at that point we didn't hold text_mutex. But after: f1c6ece23729 ("kprobes: Fix potential deadlock in kprobe_optimizer()") We started holding the text_mutex and don't need the modules mutex anyway. So remove the module_mutex locking. [ mingo: Amended the changelog. ] Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Link: https://lore.kernel.org/r/20200728163400.e00b09c594763349f99ce6cb@kernel.org
2020-07-28Merge tag 'v5.8-rc7' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-08kprobes: Do not expose probe addresses to non-CAP_SYSLOGKees Cook
The kprobe show() functions were using "current"'s creds instead of the file opener's creds for kallsyms visibility. Fix to use seq_file->file->f_cred. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 81365a947de4 ("kprobes: Show address of kprobes if kallsyms does") Fixes: ffb9bd68ebdb ("kprobes: Show blacklist addresses as same as kallsyms does") Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-08kallsyms: Refactor kallsyms_show_value() to take credKees Cook
In order to perform future tests against the cred saved during open(), switch kallsyms_show_value() to operate on a cred, and have all current callers pass current_cred(). This makes it very obvious where callers are checking the wrong credential in their "read" contexts. These will be fixed in the coming patches. Additionally switch return value to bool, since it is always used as a direct permission check, not a 0-on-success, negative-on-error style function return. Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-02Merge branch 'perf/vlbr'Peter Zijlstra
2020-06-16kretprobe: Prevent triggering kretprobe from within kprobe_flush_taskJiri Olsa
Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave. My test was also able to trigger lockdep output: ============================================ WARNING: possible recursive locking detected 5.6.0-rc6+ #6 Not tainted -------------------------------------------- sched-messaging/2767 is trying to acquire lock: ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0 but task is already holding lock: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(kretprobe_table_locks[i].lock)); lock(&(kretprobe_table_locks[i].lock)); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by sched-messaging/2767: #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 stack backtrace: CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6 Call Trace: dump_stack+0x96/0xe0 __lock_acquire.cold.57+0x173/0x2b7 ? native_queued_spin_lock_slowpath+0x42b/0x9e0 ? lockdep_hardirqs_on+0x590/0x590 ? __lock_acquire+0xf63/0x4030 lock_acquire+0x15a/0x3d0 ? kretprobe_hash_lock+0x52/0xa0 _raw_spin_lock_irqsave+0x36/0x70 ? kretprobe_hash_lock+0x52/0xa0 kretprobe_hash_lock+0x52/0xa0 trampoline_handler+0xf8/0x940 ? kprobe_fault_handler+0x380/0x380 ? find_held_lock+0x3a/0x1c0 kretprobe_trampoline+0x25/0x50 ? lock_acquired+0x392/0xbc0 ? _raw_spin_lock_irqsave+0x50/0x70 ? __get_valid_kprobe+0x1f0/0x1f0 ? _raw_spin_unlock_irqrestore+0x3b/0x40 ? finish_task_switch+0x4b9/0x6d0 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 The code within the kretprobe handler checks for probe reentrancy, so we won't trigger any _raw_spin_lock_irqsave probe in there. The problem is in outside kprobe_flush_task, where we call: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave where _raw_spin_lock_irqsave triggers the kretprobe and installs kretprobe_trampoline handler on _raw_spin_lock_irqsave return. The kretprobe_trampoline handler is then executed with already locked kretprobe_table_locks, and first thing it does is to lock kretprobe_table_locks ;-) the whole lockup path like: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed ---> kretprobe_table_locks locked kretprobe_trampoline trampoline_handler kretprobe_hash_lock(current, &head, &flags); <--- deadlock Adding kprobe_busy_begin/end helpers that mark code with fake probe installed to prevent triggering of another kprobe within this code. Using these helpers in kprobe_flush_task, so the probe recursion protection check is hit and the probe is never set to prevent above lockup. Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2 Fixes: ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16kprobes: Remove redundant arch_disarm_kprobe() callMasami Hiramatsu
Fix to remove redundant arch_disarm_kprobe() call in force_unoptimize_kprobe(). This arch_disarm_kprobe() will be invoked if the kprobe is optimized but disabled, but that means the kprobe (optprobe) is unused (and unoptimized) state. In that case, unoptimize_kprobe() puts it in freeing_list and kprobe_optimizer (do_unoptimize_kprobes()) automatically disarm it. Thus this arch_disarm_kprobe() is redundant. Link: http://lkml.kernel.org/r/158927058719.27680.17183632908465341189.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutexMasami Hiramatsu
In kprobe_optimizer() kick_kprobe_optimizer() is called without kprobe_mutex, but this can race with other caller which is protected by kprobe_mutex. To fix that, expand kprobe_mutex protected area to protect kick_kprobe_optimizer() call. Link: http://lkml.kernel.org/r/158927057586.27680.5036330063955940456.stgit@devnote2 Fixes: cd7ebe2298ff ("kprobes: Use text_poke_smp_batch for optimizing") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ziqian SUN <zsun@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16kprobes: Use non RCU traversal APIs on kprobe_tables if possibleMasami Hiramatsu
Current kprobes uses RCU traversal APIs on kprobe_tables even if it is safe because kprobe_mutex is locked. Make those traversals to non-RCU APIs where the kprobe_mutex is locked. Link: http://lkml.kernel.org/r/158927056452.27680.9710575332163005121.stgit@devnote2 Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-16kprobes: Suppress the suspicious RCU warning on kprobesMasami Hiramatsu
Anders reported that the lockdep warns that suspicious RCU list usage in register_kprobe() (detected by CONFIG_PROVE_RCU_LIST.) This is because get_kprobe() access kprobe_table[] by hlist_for_each_entry_rcu() without rcu_read_lock. If we call get_kprobe() from the breakpoint handler context, it is run with preempt disabled, so this is not a problem. But in other cases, instead of rcu_read_lock(), we locks kprobe_mutex so that the kprobe_table[] is not updated. So, current code is safe, but still not good from the view point of RCU. Joel suggested that we can silent that warning by passing lockdep_is_held() to the last argument of hlist_for_each_entry_rcu(). Add lockdep_is_held(&kprobe_mutex) at the end of the hlist_for_each_entry_rcu() to suppress the warning. Link: http://lkml.kernel.org/r/158927055350.27680.10261450713467997503.stgit@devnote2 Reported-by: Anders Roxell <anders.roxell@linaro.org> Suggested-by: Joel Fernandes <joel@joelfernandes.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-06-15kprobes: Add perf ksymbol events for kprobe insn pagesAdrian Hunter
Symbols are needed for tools to describe instruction addresses. Pages allocated for kprobe's purposes need symbols to be created for them. Add such symbols to be visible via perf ksymbol events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200512121922.8997-5-adrian.hunter@intel.com
2020-06-15kprobes: Add symbols for kprobe insn pagesAdrian Hunter
Symbols are needed for tools to describe instruction addresses. Pages allocated for kprobe's purposes need symbols to be created for them. Add such symbols to be visible via /proc/kallsyms. Note: kprobe insn pages are not used if ftrace is configured. To see the effect of this patch, the kernel must be configured with: # CONFIG_FUNCTION_TRACER is not set CONFIG_KPROBES=y and for optimised kprobes: CONFIG_OPTPROBES=y Example on x86: # perf probe __schedule Added new event: probe:__schedule (on __schedule) # cat /proc/kallsyms | grep '\[__builtin__kprobes\]' ffffffffc00d4000 t kprobe_insn_page [__builtin__kprobes] ffffffffc00d6000 t kprobe_optinsn_page [__builtin__kprobes] Note: This patch adds "__builtin__kprobes" as a module name in /proc/kallsyms for symbols for pages allocated for kprobes' purposes, even though "__builtin__kprobes" is not a module. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200528080058.20230-1-adrian.hunter@intel.com
2020-06-04kernel/kprobes.c: convert to use DEFINE_SEQ_ATTRIBUTE macroKefeng Wang
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/20200509064031.181091-4-wangkefeng.wang@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-05-19kprobes: Prevent probes in .noinstr.text sectionThomas Gleixner
Instrumentation is forbidden in the .noinstr.text section. Make kprobes respect this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200505134100.179862032@linutronix.de
2020-05-12kprobes: Support NOKPROBE_SYMBOL() in modulesMasami Hiramatsu
Support NOKPROBE_SYMBOL() in modules. NOKPROBE_SYMBOL() records only symbol address in "_kprobe_blacklist" section in the module. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.771170126@linutronix.de
2020-05-12kprobes: Support __kprobes blacklist in modulesMasami Hiramatsu
Support __kprobes attribute for blacklist functions in modules. The __kprobes attribute functions are stored in .kprobes.text section. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.678201813@linutronix.de
2020-05-12kprobes: Lock kprobe_mutex while showing kprobe_blacklistMasami Hiramatsu
Lock kprobe_mutex while showing kprobe_blacklist to prevent updating the kprobe_blacklist. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134059.571125195@linutronix.de
2020-04-27sysctl: pass kernel pointers to ->proc_handlerChristoph Hellwig
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-09kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logicMasami Hiramatsu
optimize_kprobe() and unoptimize_kprobe() cancels if a given kprobe is on the optimizing_list or unoptimizing_list already. However, since the following commit: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") modified the update timing of the KPROBE_FLAG_OPTIMIZED, it doesn't work as expected anymore. The optimized_kprobe could be in the following states: - [optimizing]: Before inserting jump instruction op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty. - [optimized]: jump inserted op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is empty. - [unoptimizing]: Before removing jump instruction (including unused optprobe) op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty. - [unoptimized]: jump removed op.kp->flags doesn't have KPROBE_FLAG_OPTIMIZED and op->list is empty. Current code mis-expects [unoptimizing] state doesn't have KPROBE_FLAG_OPTIMIZED, and that can cause incorrect results. To fix this, introduce optprobe_queued_unopt() to distinguish [optimizing] and [unoptimizing] states and fixes the logic in optimize_kprobe() and unoptimize_kprobe(). [ mingo: Cleaned up the changelog and the code a bit. ] Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Link: https://lkml.kernel.org/r/157840814418.7181.13478003006386303481.stgit@devnote2 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27kprobes: Set unoptimized flag after unoptimizing codeMasami Hiramatsu
Set the unoptimized flag after confirming the code is completely unoptimized. Without this fix, when a kprobe hits the intermediate modified instruction (the first byte is replaced by an INT3, but later bytes can still be a jump address operand) while unoptimizing, it can return to the middle byte of the modified code, which causes an invalid instruction exception in the kernel. Usually, this is a rare case, but if we put a probe on the function call while text patching, it always causes a kernel panic as below: # echo p text_poke+5 > kprobe_events # echo 1 > events/kprobes/enable # echo 0 > events/kprobes/enable invalid opcode: 0000 [#1] PREEMPT SMP PTI RIP: 0010:text_poke+0x9/0x50 Call Trace: arch_unoptimize_kprobe+0x22/0x28 arch_unoptimize_kprobes+0x39/0x87 kprobe_optimizer+0x6e/0x290 process_one_work+0x2a0/0x610 worker_thread+0x28/0x3d0 ? process_one_work+0x610/0x610 kthread+0x10d/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x3a/0x50 text_poke() is used for patching the code in optprobes. This can happen even if we blacklist text_poke() and other functions, because there is a small time window during which we show the intermediate code to other CPUs. [ mingo: Edited the changelog. ] Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Fixes: 6274de4984a6 ("kprobes: Support delayed unoptimizing") Link: https://lkml.kernel.org/r/157483422375.25881.13508326028469515760.stgit@devnote2 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-20Merge tag 'trace-v5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Addition of multiprobes to kprobe and uprobe events (allows for more than one probe attached to the same location) - Addition of adding immediates to probe parameters - Clean up of the recordmcount.c code. This brings us closer to merging recordmcount into objtool, and reuse code. - Other small clean ups * tag 'trace-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) selftests/ftrace: Update kprobe event error testcase tracing/probe: Reject exactly same probe event tracing/probe: Fix to allow user to enable events on unloaded modules selftests/ftrace: Select an existing function in kprobe_eventname test tracing/kprobe: Fix NULL pointer access in trace_porbe_unlink() tracing: Make sure variable reference alias has correct var_ref_idx tracing: Be more clever when dumping hex in __print_hex() ftrace: Simplify ftrace hash lookup code in clear_func_from_hash() tracing: Add "gfp_t" support in synthetic_events tracing: Rename tracing_reset() to tracing_reset_cpu() tracing: Document the stack trace algorithm in the comments tracing/arm64: Have max stack tracer handle the case of return address after data recordmcount: Clarify what cleanup() does recordmcount: Remove redundant cleanup() calls recordmcount: Kernel style formatting recordmcount: Kernel style function signature formatting recordmcount: Rewrite error/success handling selftests/ftrace: Add syntax error test for multiprobe selftests/ftrace: Add syntax error test for immediates selftests/ftrace: Add a testcase for kprobe multiprobe event ...
2019-09-16Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...
2019-09-05kprobes: Prohibit probing on BUG() and WARN() addressMasami Hiramatsu
Since BUG() and WARN() may use a trap (e.g. UD2 on x86) to get the address where the BUG() has occurred, kprobes can not do single-step out-of-line that instruction. So prohibit probing on such address. Without this fix, if someone put a kprobe on WARN(), the kernel will crash with invalid opcode error instead of outputing warning message, because kernel can not find correct bug address. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N . Rao <naveen.n.rao@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/156750890133.19112.3393666300746167111.stgit@devnote2 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-31kprobes: Allow kprobes coexist with livepatchMasami Hiramatsu
Allow kprobes which do not modify regs->ip, coexist with livepatch by dropping FTRACE_OPS_FL_IPMODIFY from ftrace_ops. User who wants to modify regs->ip (e.g. function fault injection) must set a dummy post_handler to its kprobes when registering. However, if such regs->ip modifying kprobes is set on a function, that function can not be livepatched. Link: http://lkml.kernel.org/r/156403587671.30117.5233558741694155985.stgit@devnote2 Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-19kprobes: Fix potential deadlock in kprobe_optimizer()Andrea Righi
lockdep reports the following deadlock scenario: WARNING: possible circular locking dependency detected kworker/1:1/48 is trying to acquire lock: 000000008d7a62b2 (text_mutex){+.+.}, at: kprobe_optimizer+0x163/0x290 but task is already holding lock: 00000000850b5e2d (module_mutex){+.+.}, at: kprobe_optimizer+0x31/0x290 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (module_mutex){+.+.}: __mutex_lock+0xac/0x9f0 mutex_lock_nested+0x1b/0x20 set_all_modules_text_rw+0x22/0x90 ftrace_arch_code_modify_prepare+0x1c/0x20 ftrace_run_update_code+0xe/0x30 ftrace_startup_enable+0x2e/0x50 ftrace_startup+0xa7/0x100 register_ftrace_function+0x27/0x70 arm_kprobe+0xb3/0x130 enable_kprobe+0x83/0xa0 enable_trace_kprobe.part.0+0x2e/0x80 kprobe_register+0x6f/0xc0 perf_trace_event_init+0x16b/0x270 perf_kprobe_init+0xa7/0xe0 perf_kprobe_event_init+0x3e/0x70 perf_try_init_event+0x4a/0x140 perf_event_alloc+0x93a/0xde0 __do_sys_perf_event_open+0x19f/0xf30 __x64_sys_perf_event_open+0x20/0x30 do_syscall_64+0x65/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (text_mutex){+.+.}: __lock_acquire+0xfcb/0x1b60 lock_acquire+0xca/0x1d0 __mutex_lock+0xac/0x9f0 mutex_lock_nested+0x1b/0x20 kprobe_optimizer+0x163/0x290 process_one_work+0x22b/0x560 worker_thread+0x50/0x3c0 kthread+0x112/0x150 ret_from_fork+0x3a/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(module_mutex); lock(text_mutex); lock(module_mutex); lock(text_mutex); *** DEADLOCK *** As a reproducer I've been using bcc's funccount.py (https://github.com/iovisor/bcc/blob/master/tools/funccount.py), for example: # ./funccount.py '*interrupt*' That immediately triggers the lockdep splat. Fix by acquiring text_mutex before module_mutex in kprobe_optimizer(). Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: d5b844a2cf50 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()") Link: http://lkml.kernel.org/r/20190812184302.GA7010@xps-13 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31kprobes: Use CONFIG_PREEMPTIONThomas Gleixner
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch kprobes conditional over to CONFIG_PREEMPTION. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.516286187@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-18Merge tag 'trace-v5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The main changes in this release include: - Add user space specific memory reading for kprobes - Allow kprobes to be executed earlier in boot The rest are mostly just various clean ups and small fixes" * tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Make trace_get_fields() global tracing: Let filter_assign_type() detect FILTER_PTR_STRING tracing: Pass type into tracing_generic_entry_update() ftrace/selftest: Test if set_event/ftrace_pid exists before writing ftrace/selftests: Return the skip code when tracing directory not configured in kernel tracing/kprobe: Check registered state using kprobe tracing/probe: Add trace_event_call accesses APIs tracing/probe: Add probe event name and group name accesses APIs tracing/probe: Add trace flag access APIs for trace_probe tracing/probe: Add trace_event_file access APIs for trace_probe tracing/probe: Add trace_event_call register API for trace_probe tracing/probe: Add trace_probe init and free functions tracing/uprobe: Set print format when parsing command tracing/kprobe: Set print format right after parsed command kprobes: Fix to init kprobes in subsys_initcall tracepoint: Use struct_size() in kmalloc() ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS ftrace: Enable trampoline when rec count returns back to one tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline tracing: Make a separate config for trace event self tests ...
2019-07-16kprobes: Fix to init kprobes in subsys_initcallMasami Hiramatsu
Since arm64 kernel initializes breakpoint trap vector in arch_initcall(), initializing kprobe (and run smoke test) in postcore_initcall() causes a kernel panic. To fix this issue, move the kprobe initialization in subsys_initcall() (which is called right afer the arch_initcall). In-kernel kprobe users (ftrace and bpf) are using fs_initcall() which is called after subsys_initcall(), so this shouldn't cause more problem. Link: http://lkml.kernel.org/r/155956708268.12228.10363800793132214198.stgit@devnote2 Link: http://lkml.kernel.org/r/20190709153755.GB10123@lakrids.cambridge.arm.com Reported-by: Anders Roxell <anders.roxell@linaro.org> Fixes: b5f8b32c93b2 ("kprobes: Initialize kprobes at postcore_initcall") Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-03kprobes: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1334 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25kprobes: Initialize kprobes at postcore_initcallMasami Hiramatsu
Initialize kprobes at postcore_initcall level instead of module_init since kprobes is not a module, and it depends on only subsystems initialized in core_initcall. This will allow ftrace kprobe event to add new events when it is initializing because ftrace kprobe event is initialized at later initcall level. Link: http://lkml.kernel.org/r/155851394736.15728.13626739508905120098.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-16kprobes: Fix error check when reusing optimized probesMasami Hiramatsu
The following commit introduced a bug in one of our error paths: 819319fc9346 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()") it missed to handle the return value of kprobe_optready() as error-value. In reality, the kprobe_optready() returns a bool result, so "true" case must be passed instead of 0. This causes some errors on kprobe boot-time selftests on ARM: [ ] Beginning kprobe tests... [ ] Probe ARM code [ ] kprobe [ ] kretprobe [ ] ARM instruction simulation [ ] Check decoding tables [ ] Run test cases [ ] FAIL: test_case_handler not run [ ] FAIL: Test andge r10, r11, r14, asr r7 [ ] FAIL: Scenario 11 ... [ ] FAIL: Scenario 7 [ ] Total instruction simulation tests=1631, pass=1433 fail=198 [ ] kprobe tests failed This can happen if an optimized probe is unregistered and next kprobe is registered on same address until the previous probe is not reclaimed. If this happens, a hidden aggregated probe may be kept in memory, and no new kprobe can probe same address. Also, in that case register_kprobe() will return "1" instead of minus error value, which can mislead caller logic. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v5.0+ Fixes: 819319fc9346 ("kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()") Link: http://lkml.kernel.org/r/155530808559.32517.539898325433642204.stgit@devnote2 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13kprobes: Search non-suffixed symbol in blacklistMasami Hiramatsu
Newer GCC versions can generate some different instances of a function with suffixed symbols if the function is optimized and only has a part of that. (e.g. .constprop, .part etc.) In this case, it is not enough to check the entry of kprobe blacklist because it only records non-suffixed symbol address. To fix this issue, search non-suffixed symbol in blacklist if given address is within a symbol which has a suffix. Note that this can cause false positive cases if a kprobe-safe function is optimized to suffixed instance and has same name symbol which is blacklisted. But I would like to chose a fail-safe design for this issue. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998799234.31052.6136378903570418008.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-26Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle on the kernel side: - rework kprobes blacklist handling (Masami Hiramatsu) - misc cleanups on the tooling side these areas were the main focus: - 'perf trace' enhancements (Arnaldo Carvalho de Melo) - 'perf bench' enhancements (Davidlohr Bueso) - 'perf record' enhancements (Alexey Budankov) - 'perf annotate' enhancements (Jin Yao) - 'perf top' enhancements (Jiri Olsa) - Intel hw tracing enhancements (Adrian Hunter) - ARM hw tracing enhancements (Leo Yan, Mathieu Poirier) - ... plus lots of other enhancements, cleanups and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (171 commits) tools uapi asm: Update asm-generic/unistd.h copy perf symbols: Relax checks on perf-PID.map ownership perf trace: Wire up the fadvise 'advice' table generator perf beauty: Add generator for fadvise64's 'advice' arg constants tools headers uapi: Grab a copy of fadvise.h perf beauty mmap: Print mmap's 'offset' arg in hexadecimal perf beauty mmap: Print PROT_READ before PROT_EXEC to match strace output perf trace beauty: Beautify arch_prctl()'s arguments perf trace: When showing string prefixes show prefix + ??? for unknown entries perf trace: Move strarrays to beauty.h for further reuse perf beauty: Wire up the x86_arch prctl code table generator perf beauty: Add a string table generator for x86's 'arch_prctl' codes tools include arch: Grab a copy of x86's prctl.h perf trace: Show NULL when syscall pointer args are 0 perf trace: Enclose the errno strings with () perf augmented_raw_syscalls: Copy 'access' arg as well perf trace: Add alignment spaces after the closing parens perf trace beauty: Print O_RDONLY when (flags & O_ACCMODE) == 0 perf trace: Allow asking for not suppressing common string prefixes perf trace: Add a prefix member to the strarray class ...
2018-12-17kprobes: Blacklist symbols in arch-defined prohibited areaMasami Hiramatsu
Blacklist symbols in arch-defined probe-prohibited areas. With this change, user can see all symbols which are prohibited to probe in debugfs. All archtectures which have custom prohibit areas should define its own arch_populate_kprobe_blacklist() function, but unless that, all symbols marked __kprobes are blacklisted. Reported-by: Andrea Righi <righi.andrea@gmail.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503485491.26176.15823229545155174796.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>