summaryrefslogtreecommitdiff
path: root/arch/riscv/kernel/asm-offsets.c
AgeCommit message (Collapse)Author
2025-06-05riscv: uaccess: Only restore the CSR_STATUS SUM bitCyril Bur
During switch to csrs will OR the value of the register into the corresponding csr. In this case we're only interested in restoring the SUM bit not the entire register. Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Link: https://lore.kernel.org/r/20250522160954.429333-1-cyrilbur@tenstorrent.com Co-developed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Fixes: 788aa64c01f1 ("riscv: save the SR_SUM status over switches") Link: https://lore.kernel.org/r/20250602121543.1544278-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05Merge tag 'riscv-mw1-6.16-rc1' of ↵Palmer Dabbelt
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into for-next riscv patches for 6.16-rc1 * Implement atomic patching support for ftrace which finally allows to get rid of stop_machine(). * Support for kexec_file_load() syscall * Improve module loading time by changing the algorithm that counts the number of plt/got entries in a module. * Zicbop is now used in the kernel to prefetch instructions [Palmer: There's been two rounds of surgery on this one, so as a result it's a bit different than the PR.] * alex-pr: (734 commits) riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE MAINTAINERS: Update Atish's email address riscv: hwprobe: export Zabha extension riscv: Make regs_irqs_disabled() more clear perf symbols: Ignore mapping symbols on riscv RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND riscv: module: Optimize PLT/GOT entry counting riscv: Add support for PUD THP riscv: xchg: Prefetch the destination word for sc.w riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop riscv: Add support for Zicbop riscv: Introduce Zicbop instructions riscv/kexec_file: Fix comment in purgatory relocator riscv: kexec_file: Support loading Image binary file riscv: kexec_file: Split the loading of kernel and others riscv: Documentation: add a description about dynamic ftrace riscv: ftrace: support direct call using call_ops riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS riscv: ftrace: support PREEMPT riscv: add a data fence for CMODX in the kernel mode ... Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: ftrace: support direct call using call_opsAndy Chiu
jump to FTRACE_ADDR if distance is out of reach Co-developed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andy Chiu <andybnac@gmail.com> Link: https://lore.kernel.org/r/20250407180838.42877-11-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPSPuranjay Mohan
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V. This allows each ftrace callsite to provide an ftrace_ops to the common ftrace trampoline, allowing each callsite to invoke distinct tracer functions without the need to fall back to list processing or to allocate custom trampolines for each callsite. This significantly speeds up cases where multiple distinct trace functions are used and callsites are mostly traced by a single tracer. The idea and most of the implementation is taken from the ARM64's implementation of the same feature. The idea is to place a pointer to the ftrace_ops as a literal at a fixed offset from the function entry point, which can be recovered by the common ftrace trampoline. We use -fpatchable-function-entry to reserve 8 bytes above the function entry by emitting 2 4 byte or 4 2 byte nops depending on the presence of CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer to the associated ftrace_ops for that callsite. Functions are aligned to 8 bytes to make sure that the accesses to this literal are atomic. This approach allows for directly invoking ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or part of a module), without going via ftrace_ops_list_func. We've benchamrked this with the ftrace_ops sample module on Spacemit K1 Jupiter: Without this patch: baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29 +-----------------------+-----------------+----------------------------+ | Number of tracers | Total time (ns) | Per-call average time | |-----------------------+-----------------+----------------------------| | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) | |----------+------------+-----------------+------------+---------------| | 0 | 0 | 1357958 | 13 | - | | 0 | 1 | 1302375 | 13 | - | | 0 | 2 | 1302375 | 13 | - | | 0 | 10 | 1379084 | 13 | - | | 0 | 100 | 1302458 | 13 | - | | 0 | 200 | 1302333 | 13 | - | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 13677833 | 136 | 123 | | 1 | 1 | 18500916 | 185 | 172 | | 1 | 2 | 22856459 | 228 | 215 | | 1 | 10 | 58824709 | 588 | 575 | | 1 | 100 | 505141584 | 5051 | 5038 | | 1 | 200 | 1580473126 | 15804 | 15791 | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 13561000 | 135 | 122 | | 2 | 0 | 19707292 | 197 | 184 | | 10 | 0 | 67774750 | 677 | 664 | | 100 | 0 | 714123125 | 7141 | 7128 | | 200 | 0 | 1918065668 | 19180 | 19167 | +----------+------------+-----------------+------------+---------------+ Note: per-call overhead is estimated relative to the baseline case with 0 relevant tracers and 0 irrelevant tracers. With this patch: v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29 +-----------------------+-----------------+----------------------------+ | Number of tracers | Total time (ns) | Per-call average time | |-----------------------+-----------------+----------------------------| | Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) | |----------+------------+-----------------+------------+---------------| | 0 | 0 | 1459917 | 14 | - | | 0 | 1 | 1408000 | 14 | - | | 0 | 2 | 1383792 | 13 | - | | 0 | 10 | 1430709 | 14 | - | | 0 | 100 | 1383791 | 13 | - | | 0 | 200 | 1383750 | 13 | - | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 5238041 | 52 | 38 | | 1 | 1 | 5228542 | 52 | 38 | | 1 | 2 | 5325917 | 53 | 40 | | 1 | 10 | 5299667 | 52 | 38 | | 1 | 100 | 5245250 | 52 | 39 | | 1 | 200 | 5238459 | 52 | 39 | |----------+------------+-----------------+------------+---------------| | 1 | 0 | 5239083 | 52 | 38 | | 2 | 0 | 19449417 | 194 | 181 | | 10 | 0 | 67718584 | 677 | 663 | | 100 | 0 | 709840708 | 7098 | 7085 | | 200 | 0 | 2203580626 | 22035 | 22022 | +----------+------------+-----------------+------------+---------------+ Note: per-call overhead is estimated relative to the baseline case with 0 relevant tracers and 0 irrelevant tracers. As can be seen from the above: a) Whenever there is a single relevant tracer function associated with a tracee, the overhead of invoking the tracer is constant, and does not scale with the number of tracers which are *not* associated with that tracee. b) The overhead for a single relevant tracer has dropped to ~1/3 of the overhead prior to this series (from 122ns to 38ns). This is largely due to permitting calls to dynamically-allocated ftrace_ops without going through ftrace_ops_list_func. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> [update kconfig, asm, refactor] Signed-off-by: Andy Chiu <andybnac@gmail.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250407180838.42877-10-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: ftrace: support fastcc in Clang for WITH_ARGSAndy Chiu
Some caller-saved registers which are not defined as function arguments in the ABI can still be passed as arguments when the kernel is compiled with Clang. As a result, we must save and restore those registers to prevent ftrace from clobbering them. - [1]: https://reviews.llvm.org/D68559 Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/ Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS") Acked-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-05-08riscv: save the SR_SUM status over switchesBen Dooks
When threads/tasks are switched we need to ensure the old execution's SR_SUM state is saved and the new thread has the old SR_SUM state restored. The issue was seen under heavy load especially with the syz-stress tool running, with crashes as follows in schedule_tail: Unable to handle kernel access to user memory without uaccess routines at virtual address 000000002749f0d0 Oops [#1] Modules linked in: CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 Hardware name: riscv-virtio,qemu (DT) epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 ra : task_pid_vnr include/linux/sched.h:1421 [inline] ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2 t5 : ffffffc4043cafba t6 : 0000000000040000 status: 0000000000000120 badaddr: 000000002749f0d0 cause: 000000000000000f Call Trace: [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 [<ffffffe000005570>] ret_from_exception+0x0/0x14 Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace b5f8f9231dc87dda ]--- The issue comes from the put_user() in schedule_tail (kernel/sched/core.c) doing the following: asmlinkage __visible void schedule_tail(struct task_struct *prev) { ... if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); ... } the put_user() macro causes the code sequence to come out as follows: 1: __enable_user_access() 2: reg = task_pid_vnr(current); 3: *current->set_child_tid = reg; 4: __disable_user_access() The problem is that we may have a sleeping function as argument which could clear SR_SUM causing the panic above. This was fixed by evaluating the argument of the put_user() macro outside the user-enabled section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before enabling user access")" In order for riscv to take advantage of unsafe_get/put_XXX() macros and to avoid the same issue we had with put_user() and sleeping functions we must ensure code flow can go through switch_to() from within a region of code with SR_SUM enabled and come back with SR_SUM still enabled. This patch addresses the problem allowing future work to enable full use of unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost on every access. Make switch_to() save and restore SR_SUM. Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-2-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-03-18riscv: Remove unused TASK_TI_FLAGSJinjie Ruan
Since commit f0bddf50586d ("riscv: entry: Convert to generic entry"), TASK_TI_FLAGS is not used any more, so remove it. Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20241109014605.2801492-1-ruanjinjie@huawei.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2024-11-20Merge tag 'ftrace-v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Restructure the function graph shadow stack to prepare it for use with kretprobes With the goal of merging the shadow stack logic of function graph and kretprobes, some more restructuring of the function shadow stack is required. Move out function graph specific fields from the fgraph infrastructure and store it on the new stack variables that can pass data from the entry callback to the exit callback. Hopefully, with this change, the merge of kretprobes to use fgraph shadow stacks will be ready by the next merge window. - Make shadow stack 4k instead of using PAGE_SIZE. Some architectures have very large PAGE_SIZE values which make its use for shadow stacks waste a lot of memory. - Give shadow stacks its own kmem cache. When function graph is started, every task on the system gets a shadow stack. In the future, shadow stacks may not be 4K in size. Have it have its own kmem cache so that whatever size it becomes will still be efficient in allocations. - Initialize profiler graph ops as it will be needed for new updates to fgraph - Convert to use guard(mutex) for several ftrace and fgraph functions - Add more comments and documentation - Show function return address in function graph tracer Add an option to show the caller of a function at each entry of the function graph tracer, similar to what the function tracer does. - Abstract out ftrace_regs from being used directly like pt_regs ftrace_regs was created to store a partial pt_regs. It holds only the registers and stack information to get to the function arguments and return values. On several archs, it is simply a wrapper around pt_regs. But some users would access ftrace_regs directly to get the pt_regs which will not work on all archs. Make ftrace_regs an abstract structure that requires all access to its fields be through accessor functions. - Show how long it takes to do function code modifications When code modification for function hooks happen, it always had the time recorded in how long it took to do the conversion. But this value was never exported. Recently the code was touched due to new ROX modification handling that caused a large slow down in doing the modifications and had a significant impact on boot times. Expose the timings in the dyn_ftrace_total_info file. This file was created a while ago to show information about memory usage and such to implement dynamic function tracing. It's also an appropriate file to store the timings of this modification as well. This will make it easier to see the impact of changes to code modification on boot up timings. - Other clean ups and small fixes * tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits) ftrace: Show timings of how long nop patching took ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash() ftrace: Use guard to take the ftrace_lock in release_probe() ftrace: Use guard to lock ftrace_lock in cache_mod() ftrace: Use guard for match_records() fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph() fgraph: Give ret_stack its own kmem cache fgraph: Separate size of ret_stack from PAGE_SIZE ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value selftests/ftrace: Fix check of return value in fgraph-retval.tc test ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs ftrace: Make ftrace_regs abstract from direct use fgragh: No need to invoke the function call_filter_check_discard() fgraph: Simplify return address printing in function graph tracer function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr() function_graph: Support recording and printing the function return address ftrace: Have calltime be saved in the fgraph storage ftrace: Use a running sleeptime instead of saving on shadow stack fgraph: Use fgraph data to store subtime for profiler ...
2024-10-25riscv: Remove unused GENERATING_ASM_OFFSETSChunyan Zhang
The macro is not used in the current version of kernel, it looks like can be removed to avoid a build warning: ../arch/riscv/kernel/asm-offsets.c: At top level: ../arch/riscv/kernel/asm-offsets.c:7: warning: macro "GENERATING_ASM_OFFSETS" is not used [-Wunused-macros] 7 | #define GENERATING_ASM_OFFSETS Fixes: 9639a44394b9 ("RISC-V: Provide a cleaner raw_smp_processor_id()") Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Link: https://lore.kernel.org/r/20241008094141.549248-2-zhangchunyan@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-10-10ftrace: Make ftrace_regs abstract from direct useSteven Rostedt
ftrace_regs was created to hold registers that store information to save function parameters, return value and stack. Since it is a subset of pt_regs, it should only be used by its accessor functions. But because pt_regs can easily be taken from ftrace_regs (on most archs), it is tempting to use it directly. But when running on other architectures, it may fail to build or worse, build but crash the kernel! Instead, make struct ftrace_regs an empty structure and have the architectures define __arch_ftrace_regs and all the accessor functions will typecast to it to get to the actual fields. This will help avoid usage of ftrace_regs directly. Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/ Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-15riscv: Stop emitting preventive sfence.vma for new vmalloc mappingsAlexandre Ghiti
In 6.5, we removed the vmalloc fault path because that can't work (see [1] [2]). Then in order to make sure that new page table entries were seen by the page table walker, we had to preventively emit a sfence.vma on all harts [3] but this solution is very costly since it relies on IPI. And even there, we could end up in a loop of vmalloc faults if a vmalloc allocation is done in the IPI path (for example if it is traced, see [4]), which could result in a kernel stack overflow. Those preventive sfence.vma needed to be emitted because: - if the uarch caches invalid entries, the new mapping may not be observed by the page table walker and an invalidation may be needed. - if the uarch does not cache invalid entries, a reordered access could "miss" the new mapping and traps: in that case, we would actually only need to retry the access, no sfence.vma is required. So this patch removes those preventive sfence.vma and actually handles the possible (and unlikely) exceptions. And since the kernel stacks mappings lie in the vmalloc area, this handling must be done very early when the trap is taken, at the very beginning of handle_exception: this also rules out the vmalloc allocations in the fault path. Link: https://lore.kernel.org/linux-riscv/20230531093817.665799-1-bjorn@kernel.org/ [1] Link: https://lore.kernel.org/linux-riscv/20230801090927.2018653-1-dylan@andestech.com [2] Link: https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/ [3] Link: https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/ [4] Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240717060125.139416-4-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-22ftrace: riscv: move from REGS to ARGSPuranjay Mohan
This commit replaces riscv's support for FTRACE_WITH_REGS with support for FTRACE_WITH_ARGS. This is required for the ongoing effort to stop relying on stop_machine() for RISCV's implementation of ftrace. The main relevant benefit that this change will bring for the above use-case is that now we don't have separate ftrace_caller and ftrace_regs_caller trampolines. This will allow the callsite to call ftrace_caller by modifying a single instruction. Now the callsite can do something similar to: When not tracing: | When tracing: func: func: auipc t0, ftrace_caller_top auipc t0, ftrace_caller_top nop <=========<Enable/Disable>=========> jalr t0, ftrace_caller_bottom [...] [...] The above assumes that we are dropping the support of calling a direct trampoline from the callsite. We need to drop this as the callsite can't change the target address to call, it can only enable/disable a call to a preset target (ftrace_caller in the above diagram). We can later optimize this by calling an intermediate dispatcher trampoline before ftrace_caller. Currently, ftrace_regs_caller saves all CPU registers in the format of struct pt_regs and allows the tracer to modify them. We don't need to save all of the CPU registers because at function entry only a subset of pt_regs is live: |----------+----------+---------------------------------------------| | Register | ABI Name | Description | |----------+----------+---------------------------------------------| | x1 | ra | Return address for traced function | | x2 | sp | Stack pointer | | x5 | t0 | Return address for ftrace_caller trampoline | | x8 | s0/fp | Frame pointer | | x10-11 | a0-1 | Function arguments/return values | | x12-17 | a2-7 | Function arguments | |----------+----------+---------------------------------------------| See RISCV calling convention[1] for the above table. Saving just the live registers decreases the amount of stack space required from 288 Bytes to 112 Bytes. Basic testing was done with this on the VisionFive 2 development board. Note: - Moving from REGS to ARGS will mean that RISCV will stop supporting KPROBES_ON_FTRACE as it requires full pt_regs to be saved. - KPROBES_ON_FTRACE will be supplanted by FPROBES see [2]. [1] https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf [2] https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/ Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240405142453.4187-1-puranjay@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: Implement Shadow Call StackSami Tolvanen
Implement CONFIG_SHADOW_CALL_STACK for RISC-V. When enabled, the compiler injects instructions to all non-leaf C functions to store the return address to the shadow stack and unconditionally load it again before returning, which makes it harder to corrupt the return address through a stack overflow, for example. The active shadow call stack pointer is stored in the gp register, which makes SCS incompatible with gp relaxation. Use --no-relax-gp to ensure gp relaxation is disabled and disable global pointer loading. Add SCS pointers to struct thread_info, implement SCS initialization, and task switching Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-12-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: Deduplicate IRQ stack switchingSami Tolvanen
With CONFIG_IRQ_STACKS, we switch to a separate per-CPU IRQ stack before calling handle_riscv_irq or __do_softirq. We currently have duplicate inline assembly snippets for stack switching in both code paths. Now that we can access per-CPU variables in assembly, implement call_on_irq_stack in assembly, and use that instead of redundant inline assembly. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-10-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: VMAP_STACK overflow detection thread-safeDeepak Gupta
commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to `shadow_stack` temporarily before switching finally to per-cpu `overflow_stack`. If two CPUs/harts are racing and end up in over flowing kernel stack, one or both will end up corrupting each other state because `shadow_stack` is not per-cpu. This patch optimizes per-cpu overflow stack switch by directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. Following are the changes in this patch - Defines an asm macro to obtain per-cpu symbols in destination register. - In entry.S, when overflow is detected, per-cpu overflow stack is located using per-cpu asm macro. Computing per-cpu symbol requires a temporary register. x31 is saved away into CSR_SCRATCH (CSR_SCRATCH is anyways zero since we're in kernel). Please see Links for additional relevant disccussion and alternative solution. Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` Kernel crash log below Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT Task stack: [0xff20000010a98000..0xff20000010a9c000] Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc ra : recursive_loop+0x48/0xc6 [lkdtm] epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f Kernel panic - not syncing: Kernel stack overflow CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80006754>] dump_backtrace+0x30/0x38 [<ffffffff808de798>] show_stack+0x40/0x4c [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c [<ffffffff808ea2d8>] dump_stack+0x18/0x20 [<ffffffff808dec06>] panic+0x126/0x2fe [<ffffffff800065ea>] walk_stackframe+0x0/0xf0 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm] SMP: stopping secondary CPUs ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- Cc: Guo Ren <guoren@kernel.org> Cc: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/ Signed-off-by: Deepak Gupta <debug@rivosinc.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Guo Ren <guoren@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-9-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29RISC-V: Add arch functions to support hibernation/suspend-to-diskSia Jee Heng
Low level Arch functions were created to support hibernation. swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write cpu state onto the stack, then calling swsusp_save() to save the memory image. Arch specific hibernation header is implemented and is utilized by the arch_hibernation_header_restore() and arch_hibernation_header_save() functions. The arch specific hibernation header consists of satp, hartid, and the cpu_resume address. The kernel built version is also need to be saved into the hibernation image header to making sure only the same kernel is restore when resume. swsusp_arch_resume() creates a temporary page table that covering only the linear map. It copies the restore code to a 'safe' page, then start to restore the memory image. Once completed, it restores the original kernel's page table. It then calls into __hibernate_cpu_resume() to restore the CPU context. Finally, it follows the normal hibernation path back to the hibernation core. To enable hibernation/suspend to disk into RISCV, the below config need to be enabled: - CONFIG_HIBERNATION - CONFIG_ARCH_HIBERNATION_HEADER - CONFIG_ARCH_HIBERNATION_POSSIBLE Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Mason Huo <mason.huo@starfivetech.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230330064321.1008373-5-jeeheng.sia@starfivetech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-03-10RISC-V: Add arch functions for non-retentive suspend entry/exitAnup Patel
The hart registers and CSRs are not preserved in non-retentative suspend state so we provide arch specific helper functions which will save/restore hart context upon entry/exit to non-retentive suspend state. These helper functions can be used by cpuidle drivers for non-retentive suspend entry/exit. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-01-20RISC-V: Avoid using per cpu array for ordered bootingAtish Patra
Currently both order booting and spinwait approach uses a per cpu array to update stack & task pointer. This approach will not work for the following cases. 1. If NR_CPUs are configured to be less than highest hart id. 2. A platform has sparse hartid. This issue can be fixed for ordered booting as the booting cpu brings up one cpu at a time using SBI HSM extension which has opaque parameter that is unused until now. Introduce a common secondary boot data structure that can store the stack and task pointer. Secondary harts will use this data while booting up to setup the sp & tp. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2021-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us RISC-V: - New KVM port. x86: - New API to control TSC offset from userspace - TSC scaling for nested hypervisors on SVM - Switch masterclock protection from raw_spin_lock to seqcount - Clean up function prototypes in the page fault code and avoid repeated memslot lookups - Convey the exit reason to userspace on emulation failure - Configure time between NX page recovery iterations - Expose Predictive Store Forwarding Disable CPUID leaf - Allocate page tracking data structures lazily (if the i915 KVM-GT functionality is not compiled in) - Cleanups, fixes and optimizations for the shadow MMU code s390: - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC Starting from this release, KVM-PPC patches will come from Michael Ellerman's PPC tree" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) RISC-V: KVM: fix boolreturn.cocci warnings RISC-V: KVM: remove unneeded semicolon RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions RISC-V: KVM: Factor-out FP virtualization into separate sources KVM: s390: add debug statement for diag 318 CPNC data KVM: s390: pv: properly handle page flags for protected guests KVM: s390: Fix handle_sske page fault handling KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol KVM: x86: On emulation failure, convey the exit reason, etc. to userspace KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info KVM: x86: Clarify the kvm_run.emulation_failure structure layout KVM: s390: Add a routine for setting userspace CPU state KVM: s390: Simplify SIGP Set Arch handling KVM: s390: pv: avoid stalls when making pages secure KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm KVM: s390: pv: avoid double free of sida page KVM: s390: pv: add macros for UVC CC values s390/mm: optimize reset_guest_reference_bit() s390/mm: optimize set_guest_storage_key() s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present ...
2021-10-04RISC-V: KVM: FP lazy save/restoreAtish Patra
This patch adds floating point (F and D extension) context save/restore for guest VCPUs. The FP context is saved and restored lazily only when kernel enter/exits the in-kernel run loop and not during the KVM world switch. This way FP save/restore has minimal impact on KVM performance. Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alexander Graf <graf@amazon.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04RISC-V: KVM: Handle MMIO exits for VCPUAnup Patel
We will get stage2 page faults whenever Guest/VM access SW emulated MMIO device or unmapped Guest RAM. This patch implements MMIO read/write emulation by extracting MMIO details from the trapped load/store instruction and forwarding the MMIO read/write to user-space. The actual MMIO emulation will happen in user-space and KVM kernel module will only take care of register updates before resuming the trapped VCPU. The handling for stage2 page faults for unmapped Guest RAM will be implemeted by a separate patch later. [jiangyifei: ioeventfd and in-kernel mmio device support] Signed-off-by: Yifei Jiang <jiangyifei@huawei.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alexander Graf <graf@amazon.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-04RISC-V: KVM: Implement VCPU world-switchAnup Patel
This patch implements the VCPU world-switch for KVM RISC-V. The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly switches general purpose registers, SSTATUS, STVEC, SSCRATCH and HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put() interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions respectively. Signed-off-by: Anup Patel <anup.patel@wdc.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alexander Graf <graf@amazon.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-09-30riscv: rely on core code to keep thread_info::cpu updatedArd Biesheuvel
Now that the core code switched back to using thread_info::cpu to keep a task's CPU number, we no longer need to keep it in sync explicitly. So just drop the code that does this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2021-07-05riscv: Introduce structure that group all variables regarding kernel mappingAlexandre Ghiti
We have a lot of variables that are used to hold kernel mapping addresses, offsets between physical and virtual mappings and some others used for XIP kernels: they are all defined at different places in mm/init.c, so group them into a single structure with, for some of them, more explicit and concise names. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-14riscv: Enable per-task stack canariesGuo Ren
This enables the use of per-task stack canary values if GCC has support for emitting the stack canary reference relative to the value of tp, which holds the task struct pointer in the riscv kernel. After compare arm64 and x86 implementations, seems arm64's is more flexible and readable. The key point is how gcc get the offset of stack_canary from gs/el0_sp. x86: Use a fix offset from gs, not flexible. struct fixed_percpu_data { /* * GCC hardcodes the stack canary as %gs:40. Since the * irq_stack is the object at %gs:0, we reserve the bottom * 48 bytes of the irq stack for the canary. */ char gs_base[40]; // :( unsigned long stack_canary; }; arm64: Use -mstack-protector-guard-offset & guard-reg gcc options: -mstack-protector-guard=sysreg -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard-offset=xxx riscv: Use -mstack-protector-guard-offset & guard-reg gcc options: -mstack-protector-guard=tls -mstack-protector-guard-reg=tp -mstack-protector-guard-offset=xxx GCC's implementation has been merged: commit c931e8d5a96463427040b0d11f9c4352ac22b2b0 Author: Cooper Qu <cooper.qu@linux.alibaba.com> Date: Mon Jul 13 16:15:08 2020 +0800 RISC-V: Add support for TLS stack protector canary access In the end, these codes are inserted by gcc before return: * 0xffffffe00020b396 <+120>: ld a5,1008(tp) # 0x3f0 * 0xffffffe00020b39a <+124>: xor a5,a5,a4 * 0xffffffe00020b39c <+126>: mv a0,s5 * 0xffffffe00020b39e <+128>: bnez a5,0xffffffe00020b61c <_do_fork+766> 0xffffffe00020b3a2 <+132>: ld ra,136(sp) 0xffffffe00020b3a4 <+134>: ld s0,128(sp) 0xffffffe00020b3a6 <+136>: ld s1,120(sp) 0xffffffe00020b3a8 <+138>: ld s2,112(sp) 0xffffffe00020b3aa <+140>: ld s3,104(sp) 0xffffffe00020b3ac <+142>: ld s4,96(sp) 0xffffffe00020b3ae <+144>: ld s5,88(sp) 0xffffffe00020b3b0 <+146>: ld s6,80(sp) 0xffffffe00020b3b2 <+148>: ld s7,72(sp) 0xffffffe00020b3b4 <+150>: addi sp,sp,144 0xffffffe00020b3b6 <+152>: ret ... * 0xffffffe00020b61c <+766>: auipc ra,0x7f8 * 0xffffffe00020b620 <+770>: jalr -1764(ra) # 0xffffffe000a02f38 <__stack_chk_fail> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Cooper Qu <cooper.qu@linux.alibaba.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-10riscv: Fixed kernel test robot warningSouptick Joarder
Kernel test robot throws below warning - arch/riscv/kernel/asm-offsets.c:14:6: warning: no previous prototype for 'asm_offsets' [-Wmissing-prototypes] 14 | void asm_offsets(void) | ^~~~~~~~~~~ This patch should fixed it. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-07-30riscv: Cleanup unnecessary define in asm-offset.cGuo Ren
- TASK_THREAD_SP is duplicated define - TASK_STACK is no use at all - Don't worry about thread_info's offset in task_struct, have a look on comment in include/linux/sched.h: struct task_struct { /* * For reasons of header soup (see current_thread_info()), this * must be the first element of task_struct. */ struct thread_info thread_info; Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2019-11-05riscv: abstract out CSR names for supervisor vs machine modeChristoph Hellwig
Many of the privileged CSRs exist in a supervisor and machine version that are used very similarly. Provide versions of the CSR names and fields that map to either the S-mode or M-mode variant depending on a new CONFIG_RISCV_M_MODE kconfig symbol. Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com> and Paul Walmsley <paul.walmsley@sifive.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> # for drivers/clocksource, drivers/irqchip [paul.walmsley@sifive.com: updated to apply] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 286Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 97 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.025053186@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25riscv: simplify the stack pointer setup in head.SChristoph Hellwig
We don't need THREAD_SIZE in asm-offsets.c as we can just calculate the value of init_thread_union + THREAD_SIZE using cpp, just like we do a few lines above. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-01-23RISC-V: Add _TIF_NEED_RESCHED check for kernel thread when CONFIG_PREEMPT=yVincent Chen
The cond_resched() can be used to yield the CPU resource if CONFIG_PREEMPT is not defined. Otherwise, cond_resched() is a dummy function. In order to avoid kernel thread occupying entire CPU, when CONFIG_PREEMPT=y, the kernel thread needs to follow the rescheduling mechanism like a user thread. Signed-off-by: Vincent Chen <vincentc@andestech.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2017-09-26RISC-V: Task implementationPalmer Dabbelt
This patch contains the implementation of tasks on RISC-V, most of which is involved in task switching. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>