summaryrefslogtreecommitdiff
path: root/arch/arm64/kernel/stacktrace.c
AgeCommit message (Collapse)Author
2024-12-12arm64: stacktrace: Don't WARN when unwinding other tasksMark Rutland
The arm64 stacktrace code has a few error conditions where a WARN_ON_ONCE() is triggered before the stacktrace is terminated and an error is returned to the caller. The conditions shouldn't be triggered when unwinding the current task, but it is possible to trigger these when unwinding another task which is not blocked, as the stack of that task is concurrently modified. Kent reports that these warnings can be triggered while running filesystem tests on bcachefs, which calls the stacktrace code directly. To produce a meaningful stacktrace of another task, the task in question should be blocked, but the stacktrace code is expected to be robust to cases where it is not blocked. Note that this is purely about not unuduly scaring the user and/or crashing the kernel; stacktraces in such cases are meaningless and may leak kernel secrets from the stack of the task being unwound. Ideally we'd pin the task in a blocked state during the unwind, as we do for /proc/${PID}/wchan since commit: 42a20f86dc19f928 ("sched: Add wrapper for get_wchan() to keep task blocked") ... but a bunch of places don't do that, notably /proc/${PID}/stack, where we don't pin the task in a blocked state, but do restrict the output to privileged users since commit: f8a00cef17206ecd ("proc: restrict kernel stack dumps to root") ... and so it's possible to trigger these warnings accidentally, e.g. by reading /proc/*/stack (as root): | for n in $(seq 1 10); do | while true; do cat /proc/*/stack > /dev/null 2>&1; done & | done | ------------[ cut here ]------------ | WARNING: CPU: 3 PID: 166 at arch/arm64/kernel/stacktrace.c:207 arch_stack_walk+0x1c8/0x370 | Modules linked in: | CPU: 3 UID: 0 PID: 166 Comm: cat Not tainted 6.13.0-rc2-00003-g3dafa7a7925d #2 | Hardware name: linux,dummy-virt (DT) | pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) | pc : arch_stack_walk+0x1c8/0x370 | lr : arch_stack_walk+0x1b0/0x370 | sp : ffff800080773890 | x29: ffff800080773930 x28: fff0000005c44500 x27: fff00000058fa038 | x26: 000000007ffff000 x25: 0000000000000000 x24: 0000000000000000 | x23: ffffa35a8d9600ec x22: 0000000000000000 x21: fff00000043a33c0 | x20: ffff800080773970 x19: ffffa35a8d960168 x18: 0000000000000000 | x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 | x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 | x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 | x8 : ffff8000807738e0 x7 : ffff8000806e3800 x6 : ffff8000806e3818 | x5 : ffff800080773920 x4 : ffff8000806e4000 x3 : ffff8000807738e0 | x2 : 0000000000000018 x1 : ffff8000806e3800 x0 : 0000000000000000 | Call trace: | arch_stack_walk+0x1c8/0x370 (P) | stack_trace_save_tsk+0x8c/0x108 | proc_pid_stack+0xb0/0x134 | proc_single_show+0x60/0x120 | seq_read_iter+0x104/0x438 | seq_read+0xf8/0x140 | vfs_read+0xc4/0x31c | ksys_read+0x70/0x108 | __arm64_sys_read+0x1c/0x28 | invoke_syscall+0x48/0x104 | el0_svc_common.constprop.0+0x40/0xe0 | do_el0_svc+0x1c/0x28 | el0_svc+0x30/0xcc | el0t_64_sync_handler+0x10c/0x138 | el0t_64_sync+0x198/0x19c | ---[ end trace 0000000000000000 ]--- Fix this by only warning when unwinding the current task. When unwinding another task the error conditions will be handled by returning an error without producing a warning. The two warnings in kunwind_next_frame_record_meta() were added recently as part of commit: c2c6b27b5aa14fa2 ("arm64: stacktrace: unwind exception boundaries") The warning when recovering the fgraph return address has changed form many times, but was originally introduced back in commit: 9f416319f40cd857 ("arm64: fix unwind_frame() for filtered out fn for function graph tracing") Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries") Fixes: 9f416319f40c ("arm64: fix unwind_frame() for filtered out fn for function graph tracing") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241211140704.2498712-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-12arm64: stacktrace: Skip reporting LR at exception boundariesMark Rutland
Aishwarya reports that warnings are sometimes seen when running the ftrace kselftests, e.g. | WARNING: CPU: 5 PID: 2066 at arch/arm64/kernel/stacktrace.c:141 arch_stack_walk+0x4a0/0x4c0 | Modules linked in: | CPU: 5 UID: 0 PID: 2066 Comm: ftracetest Not tainted 6.13.0-rc2 #2 | Hardware name: linux,dummy-virt (DT) | pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : arch_stack_walk+0x4a0/0x4c0 | lr : arch_stack_walk+0x248/0x4c0 | sp : ffff800083643d20 | x29: ffff800083643dd0 x28: ffff00007b891400 x27: ffff00007b891928 | x26: 0000000000000001 x25: 00000000000000c0 x24: ffff800082f39d80 | x23: ffff80008003ee8c x22: ffff80008004baa8 x21: ffff8000800533e0 | x20: ffff800083643e10 x19: ffff80008003eec8 x18: 0000000000000000 | x17: 0000000000000000 x16: ffff800083640000 x15: 0000000000000000 | x14: 02a37a802bbb8a92 x13: 00000000000001a9 x12: 0000000000000001 | x11: ffff800082ffad60 x10: ffff800083643d20 x9 : ffff80008003eed0 | x8 : ffff80008004baa8 x7 : ffff800086f2be80 x6 : ffff0000057cf000 | x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800086f2b690 | x2 : ffff80008004baa8 x1 : ffff80008004baa8 x0 : ffff80008004baa8 | Call trace: | arch_stack_walk+0x4a0/0x4c0 (P) | arch_stack_walk+0x248/0x4c0 (L) | profile_pc+0x44/0x80 | profile_tick+0x50/0x80 (F) | tick_nohz_handler+0xcc/0x160 (F) | __hrtimer_run_queues+0x2ac/0x340 (F) | hrtimer_interrupt+0xf4/0x268 (F) | arch_timer_handler_virt+0x34/0x60 (F) | handle_percpu_devid_irq+0x88/0x220 (F) | generic_handle_domain_irq+0x34/0x60 (F) | gic_handle_irq+0x54/0x140 (F) | call_on_irq_stack+0x24/0x58 (F) | do_interrupt_handler+0x88/0x98 | el1_interrupt+0x34/0x68 (F) | el1h_64_irq_handler+0x18/0x28 | el1h_64_irq+0x6c/0x70 | queued_spin_lock_slowpath+0x78/0x460 (P) The warning in question is: WARN_ON_ONCE(state->common.pc == orig_pc)) ... in kunwind_recover_return_address(), which is triggered when return_to_handler() is encountered in the trace, but ftrace_graph_ret_addr() cannot find a corresponding original return address on the fgraph return stack. This happens because the stacktrace code encounters an exception boundary where the LR was not live at the time of the exception, but the LR happens to contain return_to_handler(); either because the task recently returned there, or due to unfortunate usage of the LR at a scratch register. In such cases attempts to recover the return address via ftrace_graph_ret_addr() may fail, triggering the WARN_ON_ONCE() above and aborting the unwind (hence the stacktrace terminating after reporting the PC at the time of the exception). Handling unreliable LR values in these cases is likely to require some larger rework, so for the moment avoid this problem by restoring the old behaviour of skipping the LR at exception boundaries, which the stacktrace code did prior to commit: c2c6b27b5aa14fa2 ("arm64: stacktrace: unwind exception boundaries") This commit is effectively a partial revert, keeping the structures and logic to explicitly identify exception boundaries while still skipping reporting of the LR. The logic to explicitly identify exception boundaries is still useful for general robustness and as a building block for future support for RELIABLE_STACKTRACE. Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241211140704.2498712-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: stacktrace: unwind exception boundariesMark Rutland
When arm64's stack unwinder encounters an exception boundary, it uses the pt_regs::stackframe created by the entry code, which has a copy of the PC and FP at the time the exception was taken. The unwinder doesn't know anything about pt_regs, and reports the PC from the stackframe, but does not report the LR. The LR is only guaranteed to contain the return address at function call boundaries, and can be used as a scratch register at other times, so the LR at an exception boundary may or may not be a legitimate return address. It would be useful to report the LR value regardless, as it can be helpful when debugging, and in future it will be helpful for reliable stacktrace support. This patch changes the way we unwind across exception boundaries, allowing both the PC and LR to be reported. The entry code creates a frame_record_meta structure embedded within pt_regs, which the unwinder uses to find the pt_regs. The unwinder can then extract pt_regs::pc and pt_regs::lr as two separate unwind steps before continuing with a regular walk of frame records. When a PC is unwound from pt_regs::lr, dump_backtrace() will log this with an "L" marker so that it can be identified easily. For example, an unwind across an exception boundary will appear as follows: | el1h_64_irq+0x6c/0x70 | _raw_spin_unlock_irqrestore+0x10/0x60 (P) | __aarch64_insn_write+0x6c/0x90 (L) | aarch64_insn_patch_text_nosync+0x28/0x80 ... with a (P) entry for pt_regs::pc, and an (L) entry for pt_regs:lr. Note that the LR may be stale at the point of the exception, for example, shortly after a return: | el1h_64_irq+0x6c/0x70 | default_idle_call+0x34/0x180 (P) | default_idle_call+0x28/0x180 (L) | do_idle+0x204/0x268 ... where the LR points a few instructions before the current PC. This plays nicely with all the other unwind metadata tracking. With the ftrace_graph profiler enabled globally, and kretprobes installed on generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered by magic-sysrq + L reports: | Call trace: | show_stack+0x20/0x40 (CF) | dump_stack_lvl+0x60/0x80 (F) | dump_stack+0x18/0x28 | nmi_cpu_backtrace+0xfc/0x140 | nmi_trigger_cpumask_backtrace+0x1c8/0x200 | arch_trigger_cpumask_backtrace+0x20/0x40 | sysrq_handle_showallcpus+0x24/0x38 (F) | __handle_sysrq+0xa8/0x1b0 (F) | handle_sysrq+0x38/0x50 (F) | pl011_int+0x460/0x5a8 (F) | __handle_irq_event_percpu+0x60/0x220 (F) | handle_irq_event+0x54/0xc0 (F) | handle_fasteoi_irq+0xa8/0x1d0 (F) | generic_handle_domain_irq+0x34/0x58 (F) | gic_handle_irq+0x54/0x140 (FK) | call_on_irq_stack+0x24/0x58 (F) | do_interrupt_handler+0x88/0xa0 | el1_interrupt+0x34/0x68 (FK) | el1h_64_irq_handler+0x18/0x28 | el1h_64_irq+0x6c/0x70 | default_idle_call+0x34/0x180 (P) | default_idle_call+0x28/0x180 (L) | do_idle+0x204/0x268 | cpu_startup_entry+0x3c/0x50 (F) | rest_init+0xe4/0xf0 | start_kernel+0x744/0x750 | __primary_switched+0x88/0x98 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-11-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: stacktrace: report recovered PCsMark Rutland
When analysing a stacktrace it can be useful to know whether an unwound PC has been rewritten by fgraph or kretprobes, as in some situations these may be suspect or be known to be unreliable. This patch adds flags to track when an unwind entry has recovered the PC from fgraph and/or kretprobes, and updates dump_backtrace() to log when this is the case. The flags recorded are: "F" - the PC was recovered from fgraph "K" - the PC was recovered from kretprobes These flags are recorded and logged in addition to the original source of the unwound PC. For example, with the ftrace_graph profiler enabled globally, and kretprobes installed on generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered by magic-sysrq + L reports: | Call trace: | show_stack+0x20/0x40 (CF) | dump_stack_lvl+0x60/0x80 (F) | dump_stack+0x18/0x28 | nmi_cpu_backtrace+0xfc/0x140 | nmi_trigger_cpumask_backtrace+0x1c8/0x200 | arch_trigger_cpumask_backtrace+0x20/0x40 | sysrq_handle_showallcpus+0x24/0x38 (F) | __handle_sysrq+0xa8/0x1b0 (F) | handle_sysrq+0x38/0x50 (F) | pl011_int+0x460/0x5a8 (F) | __handle_irq_event_percpu+0x60/0x220 (F) | handle_irq_event+0x54/0xc0 (F) | handle_fasteoi_irq+0xa8/0x1d0 (F) | generic_handle_domain_irq+0x34/0x58 (F) | gic_handle_irq+0x54/0x140 (FK) | call_on_irq_stack+0x24/0x58 (F) | do_interrupt_handler+0x88/0xa0 | el1_interrupt+0x34/0x68 (FK) | el1h_64_irq_handler+0x18/0x28 | el1h_64_irq+0x64/0x68 | default_idle_call+0x34/0x180 | do_idle+0x204/0x268 | cpu_startup_entry+0x40/0x50 (F) | rest_init+0xe4/0xf0 | start_kernel+0x744/0x750 | __primary_switched+0x80/0x90 Note that as these flags are reported next to the recovered PC value, they appear on the callers of instrumented functions. For example gic_handle_irq() has a "K" marker because generic_handle_domain_irq() was instrumented with kretprobes and had its return address rewritten. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: stacktrace: report source of unwind dataMark Rutland
When analysing a stacktrace it can be useful to know where an unwound PC came from, as in some situations certain sources may be suspect or known to be unreliable. In future it would also be useful to track this so that certain unwind steps can be performed in a stateful manner. For example when unwinding across an exception boundary, we'd ideally unwind pt_regs::pc, then pt_regs::lr, then the next frame record. This patch adds an enumerated set of unwind sources, tracks this during the unwind, and updates dump_backtrace() to log these for interesting unwind steps. The interesting sources recorded are: "C" - the PC came from the caller of an unwind function. "T" - the PC came from thread_saved_pc() for a blocked task. "P" - the PC came from a pt_regs::pc. "U" - the PC came from an unknown source (indicates an unwinder error). ... with nothing recorded when the PC came from a frame_record::pc as this is the vastly common case and logging this would make it difficult to spot the more interesting cases. For example, when triggering a backtrace via magic-sysrq + L, the CPU handling the sysrq will have a backtrace whose first element is the caller (C) of dump_backtrace(): | Call trace: | show_stack+0x18/0x30 (C) | dump_stack_lvl+0x60/0x80 | dump_stack+0x18/0x24 | nmi_cpu_backtrace+0xfc/0x140 | ... ... and other CPUs will have a backtrace whose first element is their pt_regs::pc (P) at the instant the backtrace IPI was taken: | Call trace: | _raw_spin_unlock_irqrestore+0x8/0x50 (P) | wake_up_process+0x18/0x24 | process_timeout+0x14/0x20 | call_timer_fn.isra.0+0x24/0x80 | ... Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-8-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: stacktrace: move dump_backtrace() to kunwind_stack_walk()Mark Rutland
Currently dump_backtrace() can only print the PC value at each step of the unwind, as this is all the information that arch_stack_walk() passes to the dump_backtrace_entry() callback. In future we'd like to print some additional information, such as the origin of entries (e.g. PC, LR, FP) and/or the reliability thereof. In preparation for doing so, this patch moves dump_backtrace() over to kunwind_stack_walk(), which passes the full kunwind_state to the callback. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-7-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: use a common struct frame_recordMark Rutland
Currently the signal handling code has its own struct frame_record, the definition of struct pt_regs open-codes a frame record as an array, and the kernel unwinder hard-codes frame record offsets. Move to a common struct frame_record that can be used throughout the kernel. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-09-05arm64: stacktrace: fix the usage of ftrace_graph_ret_addr()Puranjay Mohan
ftrace_graph_ret_addr() takes an 'idx' integer pointer that is used to optimize the stack unwinding process. arm64 currently passes `NULL` for this parameter which stops it from utilizing these optimizations. Further, the current code for ftrace_graph_ret_addr() will just return the passed in return address if it is NULL which will break this usage. Pass a valid integer pointer to ftrace_graph_ret_addr() similar to x86_64's stack unwinder. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Fixes: 29c1c24a2707 ("function_graph: Fix up ftrace_graph_ret_addr()") Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240618162342.28275-1-puranjay@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-05-03arm64: Add USER_STACKTRACE supportchenqiwu
Currently, userstacktrace is unsupported for ftrace and uprobe tracers on arm64. This patch uses the perf_callchain_user() code as blueprint to implement the arch_stack_walk_user() which add userstacktrace support on arm64. Meanwhile, we can use arch_stack_walk_user() to simplify the implementation of perf_callchain_user(). This patch is tested pass with ftrace, uprobe and perf tracers profiling userstacktrace cases. Tested-by: chenqiwu <qiwu.chen@transsion.com> Signed-off-by: chenqiwu <qiwu.chen@transsion.com> Link: https://lore.kernel.org/r/20231219022229.10230-1-qiwu.chen@transsion.com Signed-off-by: Will Deacon <will@kernel.org>
2024-03-12Merge tag 'net-next-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
2024-03-04arm64: prohibit probing on arch_kunwind_consume_entry()Puranjay Mohan
Make arch_kunwind_consume_entry() as __always_inline otherwise the compiler might not inline it and allow attaching probes to it. Without this, just probing arch_kunwind_consume_entry() via <tracefs>/kprobe_events will crash the kernel on arm64. The crash can be reproduced using the following compiler and kernel combination: clang version 19.0.0git (https://github.com/llvm/llvm-project.git d68d29516102252f6bf6dc23fb22cef144ca1cb3) commit 87adedeba51a ("Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") [root@localhost ~]# echo 'p arch_kunwind_consume_entry' > /sys/kernel/debug/tracing/kprobe_events [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable Modules linked in: aes_ce_blk aes_ce_cipher ghash_ce sha2_ce virtio_net sha256_arm64 sha1_ce arm_smccc_trng net_failover failover virtio_mmio uio_pdrv_genirq uio sch_fq_codel dm_mod dax configfs CPU: 3 PID: 1405 Comm: bash Not tainted 6.8.0-rc6+ #14 Hardware name: linux,dummy-virt (DT) pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kprobe_breakpoint_handler+0x17c/0x258 lr : kprobe_breakpoint_handler+0x17c/0x258 sp : ffff800085d6ab60 x29: ffff800085d6ab60 x28: ffff0000066f0040 x27: ffff0000066f0b20 x26: ffff800081fa7b0c x25: 0000000000000002 x24: ffff00000b29bd18 x23: ffff00007904c590 x22: ffff800081fa6590 x21: ffff800081fa6588 x20: ffff00000b29bd18 x19: ffff800085d6ac40 x18: 0000000000000079 x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000004 x14: ffff80008277a940 x13: 0000000000000003 x12: 0000000000000003 x11: 00000000fffeffff x10: c0000000fffeffff x9 : aa95616fdf80cc00 x8 : aa95616fdf80cc00 x7 : 205d343137373231 x6 : ffff800080fb48ec x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff800085d6a910 x0 : 0000000000000079 Call trace: kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40 Fixes: 1aba06e7b2b4 ("arm64: stacktrace: factor out kunwind_stack_walk()") Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240229231620.24846-1-puranjay12@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-27arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JITPuranjay Mohan
This will be used by bpf_throw() to unwind till the program marked as exception boundary and run the callback with the stack of the main program. This is required for supporting BPF exceptions on ARM64. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240201125225.72796-2-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-11arm64: stacktrace: factor out kunwind_stack_walk()Mark Rutland
Currently arm64 uses the generic arch_stack_walk() interface for all stack walking code. This only passes a PC value and cookie to the unwind callback, whereas we'd like to pass some additional information in some cases. For example, the BPF exception unwinder wants the FP, for reliable stacktrace we'll want to perform additional checks on other portions of unwind state, and we'd like to expand the information printed by dump_backtrace() to include provenance and reliability information. As preparation for all of the above, this patch factors the core unwind logic out of arch_stack_walk() and into a new kunwind_stack_walk() function which provides all of the unwind state to a callback function. The existing arch_stack_walk() interface is implemented atop this. The kunwind_stack_walk() function is intended to be a private implementation detail of unwinders in stacktrace.c, and not something to be exported generally to kernel code. It is __always_inline'd into its caller so that neither it or its caller appear in stactraces (which is the existing/required behavior for arch_stack_walk() and friends) and so that the compiler can optimize away some of the indirection. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Puranjay Mohan <puranjay12@gmail.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231124110511.2795958-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-11arm64: stacktrace: factor out kernel unwind stateMark Rutland
On arm64 we share some unwinding code between the regular kernel unwinder and the KVM hyp unwinder. Some of this common code only matters to the regular unwinder, e.g. the `kr_cur` and `task` fields in the common struct unwind_state. We're likely to add more state which only matters for regular kernel unwinding (or only for hyp unwinding). In preparation for such changes, this patch factors out the kernel-specific state into a new struct kunwind_state, and updates the kernel unwind code accordingly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Puranjay Mohan <puranjay12@gmail.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231124110511.2795958-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-04-13arm64: use XPACLRI to strip PACMark Rutland
Currently we strip the PAC from pointers using C code, which requires generating bitmasks, and conditionally clearing/setting bits depending on bit 55. We can do better by using XPACLRI directly. When the logic was originally written to strip PACs from user pointers, contemporary toolchains used for the kernel had assemblers which were unaware of the PAC instructions. As stripping the PAC from userspace pointers required unconditional clearing of a fixed set of bits (which could be performed with a single instruction), it was simpler to implement the masking in C than it was to make use of XPACI or XPACLRI. When support for in-kernel pointer authentication was added, the stripping logic was extended to cover TTBR1 pointers, requiring several instructions to handle whether to clear/set bits dependent on bit 55 of the pointer. This patch simplifies the stripping of PACs by using XPACLRI directly, as contemporary toolchains do within __builtin_return_address(). This saves a number of instructions, especially where __builtin_return_address() does not implicitly strip the PAC but is heavily used (e.g. with tracepoints). As the kernel might be compiled with an assembler without knowledge of XPACLRI, it is assembled using the 'HINT #7' alias, which results in an identical opcode. At the same time, I've split ptrauth_strip_insn_pac() into ptrauth_strip_user_insn_pac() and ptrauth_strip_kernel_insn_pac() helpers so that we can avoid unnecessary PAC stripping when pointer authentication is not in use in userspace or kernel respectively. The underlying xpaclri() macro uses inline assembly which clobbers x30. The clobber causes the compiler to save/restore the original x30 value in a frame record (protected with PACIASP and AUTIASP when in-kernel authentication is enabled), so this does not provide a gadget to alter the return address. Similarly this does not adversely affect unwinding due to the presence of the frame record. The ptrauth_user_pac_mask() and ptrauth_kernel_pac_mask() are exported from the kernel in ptrace and core dumps, so these are retained. A subsequent patch will move them out of <asm/compiler.h>. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230412160134.306148-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11arm64: stacktrace: always inline core stacktrace functionsMark Rutland
The arm64 stacktrace code can be used in kprobe context, and so cannot be safely probed. Some (but not all) of the unwind functions are annotated with `NOKPROBE_SYMBOL()` to ensure this, with others markes as `__always_inline`, relying on the top-level unwind function being marked as `noinstr`. This patch has stacktrace.c consistently mark the internal stacktrace functions as `__always_inline`, removing the need for NOKPROBE_SYMBOL() as the top-level unwind function (arch_stack_walk()) is marked as `noinstr`. This is more consistent and is a simpler pattern to follow for future additions to stacktrace.c. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230411162943.203199-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11arm64: stacktrace: move dump functions to end of fileMark Rutland
For historical reasons, the backtrace dumping functions are placed in the middle of stacktrace.c, despite using functions defined later. For clarity, and to make subsequent refactoring easier, move the dumping functions to the end of stacktrace.c There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230411162943.203199-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11arm64: stacktrace: recover return address for first entryMark Rutland
The function which calls the top-level backtracing function may have been instrumented with ftrace and/or kprobes, and hence the first return address may have been rewritten. Factor out the existing fgraph / kretprobes address recovery, and use this for the first address. As the comment for the fgraph case isn't all that helpful, I've also dropped that. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230411162943.203199-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-16arm64: efi: Account for the EFI runtime stack in stack unwinderArd Biesheuvel
The EFI runtime services run from a dedicated stack now, and so the stack unwinder needs to be informed about this. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-12-05arm64: Prohibit instrumentation on arch_stack_walk()Masami Hiramatsu (Google)
Mark arch_stack_walk() as noinstr instead of notrace and inline functions called from arch_stack_walk() as __always_inline so that user does not put any instrumentations on it, because this function can be used from return_address() which is used by lockdep. Without this, if the kernel built with CONFIG_LOCKDEP=y, just probing arch_stack_walk() via <tracefs>/kprobe_events will crash the kernel on arm64. # echo p arch_stack_walk >> ${TRACEFS}/kprobe_events # echo 1 > ${TRACEFS}/events/kprobes/enable kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! PREEMPT SMP Modules linked in: CPU: 0 PID: 17 Comm: migration/0 Tainted: G N 6.1.0-rc5+ #6 Hardware name: linux,dummy-virt (DT) Stopper: 0x0 <- 0x0 pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kprobe_breakpoint_handler+0x178/0x17c lr : kprobe_breakpoint_handler+0x178/0x17c sp : ffff8000080d3090 x29: ffff8000080d3090 x28: ffff0df5845798c0 x27: ffffc4f59057a774 x26: ffff0df5ffbba770 x25: ffff0df58f420f18 x24: ffff49006f641000 x23: ffffc4f590579768 x22: ffff0df58f420f18 x21: ffff8000080d31c0 x20: ffffc4f590579768 x19: ffffc4f590579770 x18: 0000000000000006 x17: 5f6b636174735f68 x16: 637261203d207264 x15: 64612e202c30203d x14: 2074657366666f2e x13: 30633178302f3078 x12: 302b6b6c61775f6b x11: 636174735f686372 x10: ffffc4f590dc5bd8 x9 : ffffc4f58eb31958 x8 : 00000000ffffefff x7 : ffffc4f590dc5bd8 x6 : 80000000fffff000 x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff0df5845798c0 x0 : 0000000000000064 Call trace: kprobes: Failed to recover from reentered kprobes. kprobes: Dump kprobe: .symbol_name = arch_stack_walk, .offset = 0, .addr = arch_stack_walk+0x0/0x1c0 ------------[ cut here ]------------ kernel BUG at arch/arm64/kernel/probes/kprobes.c:241! Fixes: 39ef362d2d45 ("arm64: Make return_address() use arch_stack_walk()") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/166994751368.439920.3236636557520824664.stgit@devnote3 Signed-off-by: Will Deacon <will@kernel.org>
2022-09-09arm64: stacktrace: track hyp stacks in unwinder's address spaceMark Rutland
Currently unwind_next_frame_record() has an optional callback to convert the address space of the FP. This is necessary for the NVHE unwinder, which tracks the stacks in the hyp VA space, but accesses the frame records in the kernel VA space. This is a bit unfortunate since it clutters unwind_next_frame_record(), which will get in the way of future rework. Instead, this patch changes the NVHE unwinder to track the stacks in the kernel's VA space and translate to FP prior to calling unwind_next_frame_record(). This removes the need for the translate_fp() callback, as all unwinders consistently track stacks in the native address space of the unwinder. At the same time, this patch consolidates the generation of the stack addresses behind the stackinfo_get_*() helpers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-10-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64: stacktrace: track all stack boundaries explicitlyMark Rutland
Currently we call an on_accessible_stack() callback for each step of the unwinder, requiring redundant work to be performed in the core of the unwind loop (e.g. disabling preemption around accesses to per-cpu variables containing stack boundaries). To prevent unwind loops which go through a stack multiple times, we have to track the set of unwound stacks, requiring a stack_type enum which needs to cater for all the stacks of all possible callees. To prevent loops within a stack, we must track the prior FP values. This patch reworks the unwinder to minimize the work in the core of the unwinder, and to remove the need for the stack_type enum. The set of accessible stacks (and their boundaries) are determined at the start of the unwind, and the current stack is tracked during the unwind, with completed stacks removed from the set of accessible stacks. This makes the boundary checks more accurate (e.g. detecting overlapped frame records), and removes the need for separate tracking of the prior FP and visited stacks. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64: stacktrace: rework stack boundary discoveryMark Rutland
In subsequent patches we'll want to acquire the stack boundaries ahead-of-time, and we'll need to be able to acquire the relevant stack_info regardless of whether we have an object the happens to be on the stack. This patch replaces the on_XXX_stack() helpers with stackinfo_get_XXX() helpers, with the caller being responsible for the checking whether an object is on a relevant stack. For the moment this is moved into the on_accessible_stack() functions, making these slightly larger; subsequent patches will remove the on_accessible_stack() functions and simplify the logic. The on_irq_stack() and on_task_stack() helpers are kept as these are used by IRQ entry sequences and stackleak respectively. As they're only used as predicates, the stack_info pointer parameter is removed in both cases. As the on_accessible_stack() functions are always passed a non-NULL info pointer, these now update info unconditionally. When updating the type to STACK_TYPE_UNKNOWN, the low/high bounds are also modified, but as these will not be consumed this should have no adverse affect. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-7-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64: stacktrace: move SDEI stack helpers to stacktrace codeMark Rutland
For clarity and ease of maintenance, it would be helpful for all the stack helpers to be in the same place. Move the SDEI stack helpers into the stacktrace code where all the other stack helpers live. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64: stacktrace: rename unwind_next_common() -> unwind_next_frame_record()Mark Rutland
The unwind_next_common() function unwinds a single frame record. There are other unwind steps (e.g. unwinding through trampolines) which are handled in the regular kernel unwinder, and in future there may be other common unwind helpers. Clarify the purpose of unwind_next_common() by renaming it to unwind_next_frame_record(). At the same time, add commentary, and delete the redundant comment at the top of asm/stacktrace/common.h. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-09arm64: stacktrace: simplify unwind_next_common()Mark Rutland
Currently unwind_next_common() takes a pointer to a stack_info which is only ever used within unwind_next_common(). Make it a local variable and simplify callers. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220901130646.1316937-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-07-27KVM: arm64: Make unwind()/on_accessible_stack() per-unwinder functionsMarc Zyngier
Having multiple versions of on_accessible_stack() (one per unwinder) makes it very hard to reason about what is used where due to the complexity of the various includes, the forward declarations, and the reliance on everything being 'inline'. Instead, move the code back where it should be. Each unwinder implements: - on_accessible_stack() as well as the helpers it depends on, - unwind()/unwind_next(), as they pass on_accessible_stack as a parameter to unwind_next_common() (which is the only common code here) This hardly results in any duplication, and makes it much easier to reason about the code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Tested-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20220727142906.1856759-4-maz@kernel.org
2022-07-26arm64: stacktrace: Factor out common unwind()Kalesh Singh
Move unwind() to stacktrace/common.h, and as a result the kernel unwind_next() to asm/stacktrace.h. This allow reusing unwind() in the implementation of the nVHE HYP stack unwinder, later in the series. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-6-kaleshsingh@google.com
2022-07-26arm64: stacktrace: Handle frame pointer from different address spacesKalesh Singh
The unwinder code is made reusable so that it can be used to unwind various types of stacks. One usecase is unwinding the nVHE hyp stack from the host (EL1) in non-protected mode. This means that the unwinder must be able to translate HYP stack addresses to kernel addresses. Add a callback (stack_trace_translate_fp_fn) to allow specifying the translation function. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-5-kaleshsingh@google.com
2022-07-26arm64: stacktrace: Factor out unwind_next_common()Kalesh Singh
Move common unwind_next logic to stacktrace/common.h. This allows reusing the code in the implementation the nVHE hypervisor stack unwinder, later in this series. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-4-kaleshsingh@google.com
2022-07-26arm64: stacktrace: Add shared header for common stack unwinding codeKalesh Singh
In order to reuse the arm64 stack unwinding logic for the nVHE hypervisor stack, move the common code to a shared header (arch/arm64/include/asm/stacktrace/common.h). The nVHE hypervisor cannot safely link against kernel code, so we make use of the shared header to avoid duplicated logic later in this series. Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-2-kaleshsingh@google.com
2022-06-27arm64: Copy the task argument to unwind_stateMadhavan T. Venkataraman
Copy the task argument passed to arch_stack_walk() to unwind_state so that it can be passed to unwind functions via unwind_state rather than as a separate argument. The task is a fundamental part of the unwind state. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220617180219.20352-3-madvenka@linux.microsoft.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-27arm64: Split unwind_init()Madhavan T. Venkataraman
unwind_init() is currently a single function that initializes all of the unwind state. Split it into the following functions and call them appropriately: - unwind_init_from_regs() - initialize from regs passed by caller. - unwind_init_from_caller() - initialize for the current task from the caller of arch_stack_walk(). - unwind_init_from_task() - initialize from the saved state of a task other than the current task. In this case, the other task must not be running. This is done for two reasons: - the different ways of initializing are clear - specialized code can be added to each initializer in the future. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220617180219.20352-2-madvenka@linux.microsoft.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-23arm64: stacktrace: use non-atomic __set_bitAndrey Konovalov
Use the non-atomic version of set_bit() in arch/arm64/kernel/stacktrace.c, as there is no concurrent accesses to frame->prev_type. This speeds up stack trace collection and improves the boot time of Generic KASAN by 2-5%. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Link: https://lore.kernel.org/r/23dfa36d1cc91e4a1059945b7834eac22fb9854d.1653317461.git.andreyknvl@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-23arm64: kasan: do not instrument stacktrace.cAndrey Konovalov
Disable KASAN instrumentation of arch/arm64/kernel/stacktrace.c. This speeds up Generic KASAN by 5-20%. As a side-effect, KASAN is now unable to detect bugs in the stack trace collection code. This is taken as an acceptable downside. Also replace READ_ONCE_NOCHECK() with READ_ONCE() in stacktrace.c. As the file is now not instrumented, there is no need to use the NOCHECK version of READ_ONCE(). Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Link: https://lore.kernel.org/r/c4c944a2a905e949760fbeb29258185087171708.1653317461.git.andreyknvl@google.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-22arm64: stacktrace: align with common namingMadhavan T. Venkataraman
For historical reasons, the naming of parameters and their types in the arm64 stacktrace code differs from that used in generic code and other architectures, even though the types are equivalent. For consistency and clarity, use the generic names. There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-7-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22arm64: stacktrace: rename stackframe to unwind_stateMadhavan T. Venkataraman
Rename "struct stackframe" to "struct unwind_state" for consistency and better naming. Accordingly, rename variable/argument "frame" to "state". There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22arm64: stacktrace: rename unwinder functionsMadhavan T. Venkataraman
Rename unwinder functions for consistency and better naming. - Rename start_backtrace() to unwind_init(). - Rename unwind_frame() to unwind_next(). - Rename walk_stackframe() to unwind(). There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22arm64: stacktrace: make struct stackframe private to stacktrace.cMark Rutland
Now that arm64 uses arch_stack_walk() consistently, struct stackframe is only used within stacktrace.c. To make it easier to read and maintain this code, it would be nicer if the definition were there too. Move the definition into stacktrace.c. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviwed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22arm64: stacktrace: delete PCS commentMark Rutland
The comment at the top of stacktrace.c isn't all that helpful, as it's not associated with the code which inspects the frame record, and the code example isn't representative of common code generation today. Delete it. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-22arm64: stacktrace: remove NULL task check from unwind_frame()Madhavan T. Venkataraman
Currently, there is a check for a NULL task in unwind_frame(). It is not needed since all current callers pass a non-NULL task. There should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> for the series. Link: https://lore.kernel.org/r/20220413145910.3060139-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-24arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOLMasami Hiramatsu
Mark the start_backtrace() as notrace and NOKPROBE_SYMBOL because this function is called from ftrace and lockdep to get the caller address via return_address(). The lockdep is used in kprobes, it should also be NOKPROBE_SYMBOL. Fixes: b07f3499661c ("arm64: stacktrace: Move start_backtrace() out of the header") Cc: <stable@vger.kernel.org> # 5.13.x Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/164301227374.1433152.12808232644267107415.stgit@devnote2 Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10arm64: Make some stacktrace functions privateMark Rutland
Now that open-coded stack unwinds have been converted to arch_stack_walk(), we no longer need to expose any of unwind_frame(), walk_stackframe(), or start_backtrace() outside of stacktrace.c. Make those functions private to stacktrace.c, removing their prototypes from <asm/stacktrace.h> and marking them static. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Link: https://lore.kernel.org/r/20211129142849.3056714-10-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10arm64: Make dump_backtrace() use arch_stack_walk()Madhavan T. Venkataraman
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to substantially rework arm64's unwinding code. As part of this, we want to minimize the set of unwind interfaces we expose, and avoid open-coding of unwind logic. Currently, dump_backtrace() walks the stack of the current task or a blocked task by calling stact_backtrace() and iterating unwind steps using unwind_frame(). This can be written more simply in terms of arch_stack_walk(), considering three distinct cases: 1) When unwinding a blocked task, start_backtrace() is called with the blocked task's saved PC and FP, and the unwind proceeds immediately from this point without skipping any entries. This is functionally equivalent to calling arch_stack_walk() with the blocked task, which will start with the task's saved PC and FP. There is no functional change to this case. 2) When unwinding the current task without regs, start_backtrace() is called with dump_backtrace() as the PC and __builtin_frame_address(0) as the next frame, and the unwind proceeds immediately without skipping. This is *almost* functionally equivalent to calling arch_stack_walk() for the current task, which will start with its caller (i.e. an offset into dump_backtrace()) as the PC, and the callers frame record as the next frame. The only difference being that dump_backtrace() will be reported with an offset (which is strictly more correct than currently). Otherwise there is no functional cahnge to this case. 3) When unwinding the current task with regs, start_backtrace() is called with dump_backtrace() as the PC and __builtin_frame_address(0) as the next frame, and the unwind is performed silently until the next frame is the frame pointed to by regs->fp. Reporting starts from regs->pc and continues from the frame in regs->fp. Historically, this pre-unwind was necessary to correctly record return addresses rewritten by the ftrace graph calller, but this is no longer necessary as these are now recovered using the FP since commit: c6d3cd32fd0064af ("arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR") This pre-unwind is not necessary to recover return addresses rewritten by kretprobes, which historically were not recovered, and are now recovered using the FP since commit: cd9bc2c9258816dc ("arm64: Recover kretprobe modified return address in stacktrace") Thus, this is functionally equivalent to calling arch_stack_walk() with the current task and regs, which will start with regs->pc as the PC and regs->fp as the next frame, without a pre-unwind. This patch makes dump_backtrace() use arch_stack_walk(). This simplifies dump_backtrace() and will permit subsequent changes to the unwind code. Aside from the improved reporting when unwinding current without regs, there should be no functional change as a result of this patch. Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> [Mark: elaborate commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211129142849.3056714-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10arch: Make ARCH_STACKWALK independent of STACKTRACEPeter Zijlstra
Make arch_stack_walk() available for ARCH_STACKWALK architectures without it being entangled in STACKTRACE. Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [Mark: rebase, drop unnecessary arm change] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-11-16arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTRMark Rutland
When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph tracer is in use, unwind_frame() may erroneously associate a traced function with an incorrect return address. This can happen when starting an unwind from a pt_regs, or when unwinding across an exception boundary. This can be seen when recording with perf while the function graph tracer is in use. For example: | # echo function_graph > /sys/kernel/debug/tracing/current_tracer | # perf record -g -e raw_syscalls:sys_enter:k /bin/true | # perf report ... reports the callchain erroneously as: | el0t_64_sync | el0t_64_sync_handler | el0_svc_common.constprop.0 | perf_callchain | get_perf_callchain | syscall_trace_enter | syscall_trace_enter ... whereas when the function graph tracer is not in use, it reports: | el0t_64_sync | el0t_64_sync_handler | el0_svc | do_el0_svc | el0_svc_common.constprop.0 | syscall_trace_enter | syscall_trace_enter The underlying problem is that ftrace_graph_get_ret_stack() takes an index offset from the most recent entry added to the fgraph return stack. We start an unwind at offset 0, and increment the offset each time we encounter a rewritten return address (i.e. when we see `return_to_handler`). This is broken in two cases: 1) Between creating a pt_regs and starting the unwind, function calls may place entries on the stack, leaving an arbitrary offset which we can only determine by performing a full unwind from the caller of the unwind code (and relying on none of the unwind code being instrumented). This can result in erroneous entries being reported in a backtrace recorded by perf or kfence when the function graph tracer is in use. Currently show_regs() is unaffected as dump_backtrace() performs an initial unwind. 2) When unwinding across an exception boundary (whether continuing an unwind or starting a new unwind from regs), we currently always skip the LR of the interrupted context. Where this was live and contained a rewritten address, we won't consume the corresponding fgraph ret stack entry, leaving subsequent entries off-by-one. This can result in erroneous entries being reported in a backtrace performed by any in-kernel unwinder when that backtrace crosses an exception boundary, with entries after the boundary being reported incorrectly. This includes perf, kfence, show_regs(), panic(), etc. To fix this, we need to be able to uniquely identify each rewritten return address such that we can map this back to the original return address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate each rewritten return address with a unique location on the stack. As the return address is passed in the LR (and so is not guaranteed a unique location in memory), we use the FP upon entry to the function (i.e. the address of the caller's frame record) as the return address pointer. Any nested call will have a different FP value as the caller must create its own frame record and update FP to point to this. Since ftrace_graph_ret_addr() requires the return address with the PAC stripped, the stripping of the PAC is moved before the fixup of the rewritten address. As we would unconditionally strip the PAC, moving this earlier is not harmful, and we can avoid a redundant strip in the return address fixup code. I've tested this with the perf case above, the ftrace selftests, and a number of ad-hoc unwinder tests. The tests all pass, and I have seen no unexpected behaviour as a result of this change. I've tested with pointer authentication under QEMU TCG where magic-sysrq+l correctly recovers the original return addresses. Note that this doesn't fix the issue of skipping a live LR at an exception boundary, which is a more general problem and requires more substantial rework. Were we to consume the LR in all cases this would result in warnings where the interrupted context's LR contains `return_to_handler`, but the FP has been altered, e.g. | func: | <--- ftrace entry ---> // logs FP & LR, rewrites LR | STP FP, LR, [SP, #-16]! | MOV FP, SP | <--- INTERRUPT ---> ... as ftrace_graph_get_ret_stack() fill not find a matching entry, triggering the WARN_ON_ONCE() in unwind_frame(). Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-22arm64: Recover kretprobe modified return address in stacktraceMasami Hiramatsu
Since the kretprobe replaces the function return address with the kretprobe_trampoline on the stack, stack unwinder shows it instead of the correct return address. This checks whether the next return address is the __kretprobe_trampoline(), and if so, try to find the correct return address from the kretprobe instance list. For this purpose this adds 'kr_cur' loop cursor to memorize the current kretprobe instance. With this fix, now arm64 can enable CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE, and pass the kprobe self tests. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-03arm64: stacktrace: avoid tracing arch_stack_walk()Mark Rutland
When the function_graph tracer is in use, arch_stack_walk() may unwind the stack incorrectly, erroneously reporting itself, missing the final entry which is being traced, and reporting all traced entries between these off-by-one from where they should be. When ftrace hooks a function return, the original return address is saved to the fgraph ret_stack, and the return address in the LR (or the function's frame record) is replaced with `return_to_handler`. When arm64's unwinder encounter frames returning to `return_to_handler`, it finds the associated original return address from the fgraph ret stack, assuming the most recent `ret_to_hander` entry on the stack corresponds to the most recent entry in the fgraph ret stack, and so on. When arch_stack_walk() is used to dump the current task's stack, it starts from the caller of arch_stack_walk(). However, arch_stack_walk() can be traced, and so may push an entry on to the fgraph ret stack, leaving the fgraph ret stack offset by one from the expected position. This can be seen when dumping the stack via /proc/self/stack, where enabling the graph tracer results in an unexpected `stack_trace_save_tsk` entry at the start of the trace, and `el0_svc` missing form the end of the trace. This patch fixes this by marking arch_stack_walk() as notrace, as we do for all other functions on the path to ftrace_graph_get_ret_stack(). While a few helper functions are not marked notrace, their calls/returns are balanced, and will have no observable effect when examining the fgraph ret stack. It is possible for an exeption boundary to cause a similar offset if the return address of the interrupted context was in the LR. Fixing those cases will require some more substantial rework, and is left for subsequent patches. Before: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c | # echo function_graph > /sys/kernel/tracing/current_tracer | # cat /proc/self/stack | [<0>] stack_trace_save_tsk+0xa4/0x110 | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c After: | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c | # echo function_graph > /sys/kernel/tracing/current_tracer | # cat /proc/self/stack | [<0>] proc_pid_stack+0xc4/0x140 | [<0>] proc_single_show+0x6c/0x120 | [<0>] seq_read_iter+0x240/0x4e0 | [<0>] seq_read+0xe8/0x140 | [<0>] vfs_read+0xb8/0x1e4 | [<0>] ksys_read+0x74/0x100 | [<0>] __arm64_sys_read+0x28/0x3c | [<0>] invoke_syscall+0x50/0x120 | [<0>] el0_svc_common.constprop.0+0xc4/0xd4 | [<0>] do_el0_svc+0x30/0x9c | [<0>] el0_svc+0x2c/0x54 | [<0>] el0t_64_sync_handler+0x1a8/0x1b0 | [<0>] el0t_64_sync+0x198/0x19c Cc: <stable@vger.kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviwed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210802164845.45506-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-07-08arm64: stacktrace: use %pSb for backtrace printingStephen Boyd
Let's use the new printk format to print the stacktrace entry when printing a backtrace to the kernel logs. This will include any module's build ID[1] in it so that offline/crash debugging can easily locate the debuginfo for a module via something like debuginfod[2]. Link: https://lkml.kernel.org/r/20210511003845.2429846-7-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sasha Levin <sashal@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-26arm64: stacktrace: Relax frame record alignment requirement to 8 bytesPeter Collingbourne
The AAPCS places no requirements on the alignment of the frame record. In theory it could be placed anywhere, although it seems sensible to require it to be aligned to 8 bytes. With an upcoming enhancement to tag-based KASAN Clang will begin creating frame records located at an address that is only aligned to 8 bytes. Accommodate such frame records in the stack unwinding code. As pointed out by Mark Rutland, the userspace stack unwinding code has the same problem, so fix it there as well. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/Ia22c375230e67ca055e9e4bb639383567f7ad268 Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20210526174927.2477847-2-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>