summaryrefslogtreecommitdiff
path: root/arch/riscv/kernel/entry.S
AgeCommit message (Collapse)Author
2025-06-06Merge tag 'riscv-for-linus-6.16-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the FWFT SBI extension, which is part of SBI 3.0 and a dependency for many new SBI and ISA extensions - Support for getrandom() in the VDSO - Support for mseal - Optimized routines for raid6 syndrome and recovery calculations - kexec_file() supports loading Image-formatted kernel binaries - Improvements to the instruction patching framework to allow for atomic instruction patching, along with rules as to how systems need to behave in order to function correctly - Support for a handful of new ISA extensions: Svinval, Zicbop, Zabha, some SiFive vendor extensions - Various fixes and cleanups, including: misaligned access handling, perf symbol mangling, module loading, PUD THPs, and improved uaccess routines * tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (69 commits) riscv: uaccess: Only restore the CSR_STATUS SUM bit RISC-V: vDSO: Wire up getrandom() vDSO implementation riscv: enable mseal sysmap for RV64 raid6: Add RISC-V SIMD syndrome and recovery calculations riscv: mm: Add support for Svinval extension RISC-V: Documentation: Add enough title underlines to CMODX riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE MAINTAINERS: Update Atish's email address riscv: uaccess: do not do misaligned accesses in get/put_user() riscv: process: use unsigned int instead of unsigned long for put_user() riscv: make unsafe user copy routines use existing assembly routines riscv: hwprobe: export Zabha extension riscv: Make regs_irqs_disabled() more clear perf symbols: Ignore mapping symbols on riscv RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND riscv: module: Optimize PLT/GOT entry counting riscv: Add support for PUD THP riscv: xchg: Prefetch the destination word for sc.w riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop riscv: Add support for Zicbop ...
2025-06-05riscv: uaccess: Only restore the CSR_STATUS SUM bitCyril Bur
During switch to csrs will OR the value of the register into the corresponding csr. In this case we're only interested in restoring the SUM bit not the entire register. Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Link: https://lore.kernel.org/r/20250522160954.429333-1-cyrilbur@tenstorrent.com Co-developed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Fixes: 788aa64c01f1 ("riscv: save the SR_SUM status over switches") Link: https://lore.kernel.org/r/20250602121543.1544278-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-05-08riscv: save the SR_SUM status over switchesBen Dooks
When threads/tasks are switched we need to ensure the old execution's SR_SUM state is saved and the new thread has the old SR_SUM state restored. The issue was seen under heavy load especially with the syz-stress tool running, with crashes as follows in schedule_tail: Unable to handle kernel access to user memory without uaccess routines at virtual address 000000002749f0d0 Oops [#1] Modules linked in: CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 Hardware name: riscv-virtio,qemu (DT) epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 ra : task_pid_vnr include/linux/sched.h:1421 [inline] ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2 t5 : ffffffc4043cafba t6 : 0000000000040000 status: 0000000000000120 badaddr: 000000002749f0d0 cause: 000000000000000f Call Trace: [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 [<ffffffe000005570>] ret_from_exception+0x0/0x14 Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace b5f8f9231dc87dda ]--- The issue comes from the put_user() in schedule_tail (kernel/sched/core.c) doing the following: asmlinkage __visible void schedule_tail(struct task_struct *prev) { ... if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); ... } the put_user() macro causes the code sequence to come out as follows: 1: __enable_user_access() 2: reg = task_pid_vnr(current); 3: *current->set_child_tid = reg; 4: __disable_user_access() The problem is that we may have a sleeping function as argument which could clear SR_SUM causing the panic above. This was fixed by evaluating the argument of the put_user() macro outside the user-enabled section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before enabling user access")" In order for riscv to take advantage of unsafe_get/put_XXX() macros and to avoid the same issue we had with put_user() and sleeping functions we must ensure code flow can go through switch_to() from within a region of code with SR_SUM enabled and come back with SR_SUM still enabled. This patch addresses the problem allowing future work to enable full use of unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost on every access. Make switch_to() save and restore SR_SUM. Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-2-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-29riscv: entry: Split ret_from_fork() into user and kernelCharlie Jenkins
This function was unified into a single function in commit ab9164dae273 ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork"). However that imposed a performance degradation. Partially reverting this commit to have ret_from_fork() split again, results in a 1% increase on the number of times fork is able to be called per second. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/all/20250320-riscv_optimize_entry-v6-2-63e187e26041@rivosinc.com
2025-04-29riscv: entry: Convert ret_from_fork() to CCharlie Jenkins
Move the main section of ret_from_fork() to C to allow inlining of syscall_exit_to_user_mode(). Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/all/20250320-riscv_optimize_entry-v6-1-63e187e26041@rivosinc.com
2025-01-08riscv: use local label names instead of global ones in assemblyClément Léger
Local labels should be prefix by '.L' or they'll be exported in the symbol table. Additionally, this messes up the backtrace by displaying an incorrect symbol: ... [ 12.751810] [<ffffffff80441628>] _copy_from_user+0x28/0xc2 [ 12.752035] [<ffffffff800152ca>] handle_misaligned_load+0x1ca/0x2fc [ 12.752310] [<ffffffff80a033e8>] do_trap_load_misaligned+0x24/0xee [ 12.752596] [<ffffffff80a0dcae>] _new_vmalloc_restore_context_a0+0xc2/0xce After: ... [ 10.243916] [<ffffffff804415e4>] _copy_from_user+0x28/0xc2 [ 10.244026] [<ffffffff800152ca>] handle_misaligned_load+0x1ca/0x2fc [ 10.244150] [<ffffffff80a033a0>] do_trap_load_misaligned+0x24/0xee [ 10.244268] [<ffffffff80a0dc66>] handle_exception+0x146/0x152 Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Fixes: 503638e0babf3 ("riscv: Stop emitting preventive sfence.vma for new vmalloc mappings") Link: https://lore.kernel.org/r/20250103141814.508865-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: stacktrace: fix backtracing through exceptionsClément Léger
Prior to commit 5d5fc33ce58e ("riscv: Improve exception and system call latency"), backtrace through exception worked since ra was filled with ret_from_exception symbol address and the stacktrace code checked 'pc' to be equal to that symbol. Now that handle_exception uses regular 'call' instructions, this isn't working anymore and backtrace stops at handle_exception(). Since there are multiple call site to C code in the exception handling path, rather than checking multiple potential return addresses, add a new symbol at the end of exception handling and check pc to be in that range. Fixes: 5d5fc33ce58e ("riscv: Improve exception and system call latency") Signed-off-by: Clément Léger <cleger@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241209155714.1239665-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-15riscv: avoid Imbalance in RASJisheng Zhang
Inspired by[1], modify the code to remove the code of modifying ra to avoid imbalance RAS (return address stack) which may lead to incorret predictions on return. Link: https://lore.kernel.org/linux-riscv/20240607061335.2197383-1-cyrilbur@tenstorrent.com/ [1] Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Cyril Bur <cyrilbur@tenstorrent.com> Link: https://lore.kernel.org/r/20240720170659.1522-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-15Merge patch series "Svvptc extension to remove preventive sfence.vma"Palmer Dabbelt
Alexandre Ghiti <alexghiti@rivosinc.com> says: In RISC-V, after a new mapping is established, a sfence.vma needs to be emitted for different reasons: - if the uarch caches invalid entries, we need to invalidate it otherwise we would trap on this invalid entry, - if the uarch does not cache invalid entries, a reordered access could fail to see the new mapping and then trap (sfence.vma acts as a fence). We can actually avoid emitting those (mostly) useless and costly sfence.vma by handling the traps instead: - for new kernel mappings: only vmalloc mappings need to be taken care of, other new mapping are rare and already emit the required sfence.vma if needed. That must be achieved very early in the exception path as explained in patch 3, and this also fixes our fragile way of dealing with vmalloc faults. - for new user mappings: Svvptc makes update_mmu_cache() a no-op but we can take some gratuitous page faults (which are very unlikely though). Patch 1 and 2 introduce Svvptc extension probing. On our uarch that does not cache invalid entries and a 6.5 kernel, the gains are measurable: * Kernel boot: 6% * ltp - mmapstress01: 8% * lmbench - lat_pagefault: 20% * lmbench - lat_mmap: 5% Here are the corresponding numbers of sfence.vma emitted: * Ubuntu boot to login: Before: ~630k sfence.vma After: ~200k sfence.vma * ltp - mmapstress01 Before: ~45k After: ~6.3k * lmbench - lat_pagefault Before: ~665k After: 832 (!) * lmbench - lat_mmap Before: ~546k After: 718 (!) Thanks to Ved and Matt Evans for triggering the discussion that led to this patchset! * b4-shazam-merge: riscv: Stop emitting preventive sfence.vma for new userspace mappings with Svvptc riscv: Stop emitting preventive sfence.vma for new vmalloc mappings dt-bindings: riscv: Add Svvptc ISA extension description riscv: Add ISA extension parsing for Svvptc Link: https://lore.kernel.org/r/20240717060125.139416-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-15riscv: Stop emitting preventive sfence.vma for new vmalloc mappingsAlexandre Ghiti
In 6.5, we removed the vmalloc fault path because that can't work (see [1] [2]). Then in order to make sure that new page table entries were seen by the page table walker, we had to preventively emit a sfence.vma on all harts [3] but this solution is very costly since it relies on IPI. And even there, we could end up in a loop of vmalloc faults if a vmalloc allocation is done in the IPI path (for example if it is traced, see [4]), which could result in a kernel stack overflow. Those preventive sfence.vma needed to be emitted because: - if the uarch caches invalid entries, the new mapping may not be observed by the page table walker and an invalidation may be needed. - if the uarch does not cache invalid entries, a reordered access could "miss" the new mapping and traps: in that case, we would actually only need to retry the access, no sfence.vma is required. So this patch removes those preventive sfence.vma and actually handles the possible (and unlikely) exceptions. And since the kernel stacks mappings lie in the vmalloc area, this handling must be done very early when the trap is taken, at the very beginning of handle_exception: this also rules out the vmalloc allocations in the fault path. Link: https://lore.kernel.org/linux-riscv/20230531093817.665799-1-bjorn@kernel.org/ [1] Link: https://lore.kernel.org/linux-riscv/20230801090927.2018653-1-dylan@andestech.com [2] Link: https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/ [3] Link: https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/ [4] Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240717060125.139416-4-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26riscv: enable HAVE_ARCH_STACKLEAKJisheng Zhang
Add support for the stackleak feature. Whenever the kernel returns to user space the kernel stack is filled with a poison value. At the same time, disables the plugin in EFI stub code because EFI stub is out of scope for the protection. Tested on qemu and milkv duo: / # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT [ 38.675575] lkdtm: Performing direct entry STACKLEAK_ERASING [ 38.678448] lkdtm: stackleak stack usage: [ 38.678448] high offset: 288 bytes [ 38.678448] current: 496 bytes [ 38.678448] lowest: 1328 bytes [ 38.678448] tracked: 1328 bytes [ 38.678448] untracked: 448 bytes [ 38.678448] poisoned: 14312 bytes [ 38.678448] low offset: 8 bytes [ 38.689887] lkdtm: OK: the rest of the thread stack is properly erased Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240623235316.2010-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26riscv: Improve exception and system call latencyAnton Blanchard
Many CPUs implement return address branch prediction as a stack. The RISCV architecture refers to this as a return address stack (RAS). If this gets corrupted then the CPU will mispredict at least one but potentally many function returns. There are two issues with the current RISCV exception code: - We are using the alternate link stack (x5/t0) for the indirect branch which makes the hardware think this is a function return. This will corrupt the RAS. - We modify the return address of handle_exception to point to ret_from_exception. This will also corrupt the RAS. Testing the null system call latency before and after the patch: Visionfive2 (StarFive JH7110 / U74) baseline: 189.87 ns patched: 176.76 ns Lichee pi 4a (T-Head TH1520 / C910) baseline: 666.58 ns patched: 636.90 ns Just over 7% on the U74 and just over 4% on the C910. Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Tested-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20240607061335.2197383-1-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-24riscv: blacklist assembly symbols for kprobeClément Léger
Adding kprobes on some assembly functions (mainly exception handling) will result in crashes (either recursive trap or panic). To avoid such errors, add ASM_NOKPROBE() macro which allow adding specific symbols into the __kprobe_blacklist section and use to blacklist the following symbols that showed to be problematic: - handle_exception() - ret_from_exception() - handle_kernel_stack_overflow() Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20231004131009.409193-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-16riscv: vector: allow kernel-mode Vector with preemptionAndy Chiu
Add kernel_vstate to keep track of kernel-mode Vector registers when trap introduced context switch happens. Also, provide riscv_v_flags to let context save/restore routine track context status. Context tracking happens whenever the core starts its in-kernel Vector executions. An active (dirty) kernel task's V contexts will be saved to memory whenever a trap-introduced context switch happens. Or, when a softirq, which happens to nest on top of it, uses Vector. Context retoring happens when the execution transfer back to the original Kernel context where it first enable preempt_v. Also, provide a config CONFIG_RISCV_ISA_V_PREEMPTIVE to give users an option to disable preemptible kernel-mode Vector at build time. Users with constraint memory may want to disable this config as preemptible kernel-mode Vector needs extra space for tracking of per thread's kernel-mode V context. Or, users might as well want to disable it if all kernel-mode Vector code is time sensitive and cannot tolerate context switch overhead. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240115055929.4736-11-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06riscv: kernel: Use correct SYM_DATA_*() macro for dataClément Léger
Some data were incorrectly annotated with SYM_FUNC_*() instead of SYM_DATA_*() ones. Use the correct ones. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231024132655.730417-4-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06riscv: use ".L" local labels in assembly when applicableClément Léger
For the sake of coherency, use local labels in assembly when applicable. This also avoid kprobes being confused when applying a kprobe since the size of function is computed by checking where the next visible symbol is located. This might end up in computing some function size to be way shorter than expected and thus failing to apply kprobes to the specified offset. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231024132655.730417-2-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-02Merge patch series "riscv: SCS support"Palmer Dabbelt
Sami Tolvanen <samitolvanen@google.com> says: This series adds Shadow Call Stack (SCS) support for RISC-V. SCS uses compiler instrumentation to store return addresses in a separate shadow stack to protect them against accidental or malicious overwrites. More information about SCS can be found here: https://clang.llvm.org/docs/ShadowCallStack.html Patch 1 is from Deepak, and it simplifies VMAP_STACK overflow handling by adding support for accessing per-CPU variables directly in assembly. The patch is included in this series to make IRQ stack switching cleaner with SCS, and I've simply rebased it and fixed a couple of minor issues. Patch 2 uses this functionality to clean up the stack switching by moving duplicate code into a single function. On RISC-V, the compiler uses the gp register for storing the current shadow call stack pointer, which is incompatible with global pointer relaxation. Patch 3 moves global pointer loading into a macro that can be easily disabled with SCS. Patch 4 implements SCS register loading and switching, and allows the feature to be enabled, and patch 5 adds separate per-CPU IRQ shadow call stacks when CONFIG_IRQ_STACKS is enabled. Patch 6 fixes the backward-edge CFI test in lkdtm for RISC-V. Note that this series requires Clang 17. Earlier Clang versions support SCS on RISC-V, but use the x18 register instead of gp, which isn't ideal. gcc has SCS support for arm64, but I'm not aware of plans to support RISC-V. Once the Zicfiss extension is ratified, it's probably preferable to use hardware-backed shadow stacks instead of SCS on hardware that supports the extension, and we may want to consider implementing CONFIG_DYNAMIC_SCS to patch between the implementation at runtime (similarly to the arm64 implementation, which switches to SCS when hardware PAC support isn't available). * b4-shazam-merge: lkdtm: Fix CFI_BACKWARD on RISC-V riscv: Use separate IRQ shadow call stacks riscv: Implement Shadow Call Stack riscv: Move global pointer loading to a macro riscv: Deduplicate IRQ stack switching riscv: VMAP_STACK overflow detection thread-safe Link: https://lore.kernel.org/r/20230927224757.1154247-8-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-31riscv: put interrupt entries into .irqentry.textNam Cao
The interrupt entries are expected to be in the .irqentry.text section. For example, for kprobes to work properly, exception code cannot be probed; this is ensured by blacklisting addresses in the .irqentry.text section. Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Signed-off-by: Nam Cao <namcaov@gmail.com> Link: https://lore.kernel.org/r/20230821145708.21270-1-namcaov@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: Use separate IRQ shadow call stacksSami Tolvanen
When both CONFIG_IRQ_STACKS and SCS are enabled, also use a separate per-CPU shadow call stack. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-13-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: Implement Shadow Call StackSami Tolvanen
Implement CONFIG_SHADOW_CALL_STACK for RISC-V. When enabled, the compiler injects instructions to all non-leaf C functions to store the return address to the shadow stack and unconditionally load it again before returning, which makes it harder to corrupt the return address through a stack overflow, for example. The active shadow call stack pointer is stored in the gp register, which makes SCS incompatible with gp relaxation. Use --no-relax-gp to ensure gp relaxation is disabled and disable global pointer loading. Add SCS pointers to struct thread_info, implement SCS initialization, and task switching Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-12-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: Move global pointer loading to a macroSami Tolvanen
In Clang 17, -fsanitize=shadow-call-stack uses the newly declared platform register gp for storing shadow call stack pointers. As this is obviously incompatible with gp relaxation, in preparation for CONFIG_SHADOW_CALL_STACK support, move global pointer loading to a single macro, which we can cleanly disable when SCS is used instead. Link: https://reviews.llvm.org/rGaa1d2693c256 Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/commit/a484e843e6eeb51f0cb7b8819e50da6d2444d769 Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-11-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: Deduplicate IRQ stack switchingSami Tolvanen
With CONFIG_IRQ_STACKS, we switch to a separate per-CPU IRQ stack before calling handle_riscv_irq or __do_softirq. We currently have duplicate inline assembly snippets for stack switching in both code paths. Now that we can access per-CPU variables in assembly, implement call_on_irq_stack in assembly, and use that instead of redundant inline assembly. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-10-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-27riscv: VMAP_STACK overflow detection thread-safeDeepak Gupta
commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to `shadow_stack` temporarily before switching finally to per-cpu `overflow_stack`. If two CPUs/harts are racing and end up in over flowing kernel stack, one or both will end up corrupting each other state because `shadow_stack` is not per-cpu. This patch optimizes per-cpu overflow stack switch by directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. Following are the changes in this patch - Defines an asm macro to obtain per-cpu symbols in destination register. - In entry.S, when overflow is detected, per-cpu overflow stack is located using per-cpu asm macro. Computing per-cpu symbol requires a temporary register. x31 is saved away into CSR_SCRATCH (CSR_SCRATCH is anyways zero since we're in kernel). Please see Links for additional relevant disccussion and alternative solution. Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` Kernel crash log below Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT Task stack: [0xff20000010a98000..0xff20000010a9c000] Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc ra : recursive_loop+0x48/0xc6 [lkdtm] epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f Kernel panic - not syncing: Kernel stack overflow CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80006754>] dump_backtrace+0x30/0x38 [<ffffffff808de798>] show_stack+0x40/0x4c [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c [<ffffffff808ea2d8>] dump_stack+0x18/0x20 [<ffffffff808dec06>] panic+0x126/0x2fe [<ffffffff800065ea>] walk_stackframe+0x0/0xf0 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm] SMP: stopping secondary CPUs ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- Cc: Guo Ren <guoren@kernel.org> Cc: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/ Signed-off-by: Deepak Gupta <debug@rivosinc.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Guo Ren <guoren@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-9-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20riscv: replace deprecated scall with ecallFangrui Song
scall is a deprecated alias for ecall. ecall is used in several places, so there is no assembler compatibility concern. Signed-off-by: Fangrui Song <maskray@google.com> Link: https://lore.kernel.org/r/20230423223210.126948-1-maskray@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08riscv: Disable Vector Instructions for kernel itselfGuo Ren
Disable vector instructions execution for kernel mode at its entrances. This helps find illegal uses of vector in the kernel space, which is similar to the fpu. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Co-developed-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com> Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com> Co-developed-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20230605110724.21391-7-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23riscv: entry: Consolidate general regs saving/restoringJisheng Zhang
Consolidate the saving/restoring GPs (except zero, ra, sp, gp, tp and t0) into save_from_x6_to_x31/restore_from_x6_to_x31 macros. No functional change intended. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230222033021.983168-8-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23riscv: entry: Consolidate ret_from_kernel_thread into ret_from_forkJisheng Zhang
The ret_from_kernel_thread() behaves similarly with ret_from_fork(), the only difference is whether call the fn(arg) or not, this can be achieved by testing fn is NULL or not, I.E s0 is 0 or not. Many architectures have done the same thing, it makes entry.S more clean. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Tested-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230222033021.983168-7-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23riscv: entry: Convert to generic entryGuo Ren
This patch converts riscv to use the generic entry infrastructure from kernel/entry/*. The generic entry makes maintainers' work easier and codes more elegant. Here are the changes: - More clear entry.S with handle_exception and ret_from_exception - Get rid of complex custom signal implementation - Move syscall procedure from assembly to C, which is much more readable. - Connect ret_from_fork & ret_from_kernel_thread to generic entry. - Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode - Use the standard preemption code instead of custom Suggested-by: Huacai Chen <chenhuacai@kernel.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Yipeng Zou <zouyipeng@huawei.com> Tested-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-12Merge patch series "RISC-V: Align the shadow stack"Palmer Dabbelt
Palmer Dabbelt <palmer@rivosinc.com> says: This contains a pair of cleanups that depend on a fix that has already landed upstream. * b4-shazam-merge: RISC-V: Add some comments about the shadow and overflow stacks RISC-V: Align the shadow stack riscv: fix race when vmap stack overflow Link: https://lore.kernel.org/r/20221130023515.20217-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08Merge patch "RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path"Palmer Dabbelt
I'm merging this in as a single patch to make it easier to handle the backports. * b4-shazam-merge: RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08RISC-V: Fix unannoted hardirqs-on in return to userspace slow-pathAndrew Bresticker
The return to userspace path in entry.S may enable interrupts without the corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy due to the use of RA to point back to ret_from_exception, so just move the whole slow-path loop into C. It's more readable and it lets us use local_irq_{enable,disable}(), avoiding the need for manual annotations altogether. [0]: ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()) WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0 Modules linked in: CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53 Hardware name: riscv-virtio,qemu (DT) epc : check_flags+0x10a/0x1e0 ra : check_flags+0x10a/0x1e0 <snip> status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [<ffffffff808edb90>] lock_is_held_type+0x78/0x14e [<ffffffff8003dae2>] __might_resched+0x26/0x22c [<ffffffff8003dd24>] __might_sleep+0x3c/0x66 [<ffffffff80022c60>] get_signal+0x9e/0xa70 [<ffffffff800054a2>] do_notify_resume+0x6e/0x422 [<ffffffff80003c68>] ret_from_exception+0x0/0x10 irq event stamp: 44512 hardirqs last enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62 hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14 softirqs last enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104 ---[ end trace 0000000000000000 ]--- possible reason: unannotated irqs-on. Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05riscv: stacktrace: Make walk_stackframe cross pt_regs frameGuo Ren
The current walk_stackframe with FRAME_POINTER would stop unwinding at ret_from_exception: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192 Call Trace: [<ffffffe0002038c8>] walk_stackframe+0x0/0xee [<ffffffe000aecf48>] show_stack+0x32/0x4a [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e [<ffffffe000af1648>] dump_stack+0x14/0x1c [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138 [<ffffffe000239aec>] __might_sleep+0x10/0x18 [<ffffffe000afe3fe>] down_read+0x22/0xa4 [<ffffffe000207588>] do_page_fault+0xb0/0x2fe [<ffffffe000201b80>] ret_from_exception+0x0/0xc The optimization would help walk_stackframe cross the pt_regs frame and get more backtrace of debug info: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192 Call Trace: [<ffffffe0002038c8>] walk_stackframe+0x0/0xee [<ffffffe000aecf48>] show_stack+0x32/0x4a [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e [<ffffffe000af1648>] dump_stack+0x14/0x1c [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138 [<ffffffe000239aec>] __might_sleep+0x10/0x18 [<ffffffe000afe3fe>] down_read+0x22/0xa4 [<ffffffe000207588>] do_page_fault+0xb0/0x2fe [<ffffffe000201b80>] ret_from_exception+0x0/0xc [<ffffffe000613c06>] riscv_intc_irq+0x1a/0x72 [<ffffffe000201b80>] ret_from_exception+0x0/0xc [<ffffffe00033f44a>] vma_link+0x54/0x160 [<ffffffe000341d7a>] mmap_region+0x2cc/0x4d0 [<ffffffe000342256>] do_mmap+0x2d8/0x3ac [<ffffffe000326318>] vm_mmap_pgoff+0x70/0xb8 [<ffffffe00032638a>] vm_mmap+0x2a/0x36 [<ffffffe0003cfdde>] elf_map+0x72/0x84 [<ffffffe0003d05f8>] load_elf_binary+0x69a/0xec8 [<ffffffe000376240>] bprm_execve+0x246/0x53a [<ffffffe00037786c>] kernel_execve+0xe8/0x124 [<ffffffe000aecdf2>] run_init_process+0xfa/0x10c [<ffffffe000aece16>] try_to_run_init_process+0x12/0x3c [<ffffffe000afa920>] kernel_init+0xb4/0xf8 [<ffffffe000201b80>] ret_from_exception+0x0/0xc Here is the error injection test code for the above output: drivers/irqchip/irq-riscv-intc.c: static asmlinkage void riscv_intc_irq(struct pt_regs *regs) { unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG; + u32 tmp; __get_user(tmp, (u32 *)0); Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20221109064937.3643993-3-guoren@kernel.org [Palmer: use SYM_CODE_*] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-11-29riscv: fix race when vmap stack overflowJisheng Zhang
Currently, when detecting vmap stack overflow, riscv firstly switches to the so called shadow stack, then use this shadow stack to call the get_overflow_stack() to get the overflow stack. However, there's a race here if two or more harts use the same shadow stack at the same time. To solve this race, we introduce spin_shadow_stack atomic var, which will be swap between its own address and 0 in atomic way, when the var is set, it means the shadow_stack is being used; when the var is cleared, it means the shadow_stack isn't being used. Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Suggested-by: Guo Ren <guoren@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org [Palmer: Add AQ to the swap, and also some comments.] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-06-29context_tracking: Split user tracking KconfigFrederic Weisbecker
Context tracking is going to be used not only to track user transitions but also idle/IRQs/NMIs. The user tracking part will then become a separate feature. Prepare Kconfig for that. [ frederic: Apply Max Filippov feedback. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-06-29context_tracking: Rename context_tracking_user_enter/exit() to ↵Frederic Weisbecker
user_enter/exit_callable() context_tracking_user_enter() and context_tracking_user_exit() are ASM callable versions of user_enter() and user_exit() for architectures that didn't manage to check the context tracking static key from ASM. Change those function names to better reflect their purpose. [ frederic: Apply Max Filippov feedback. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-04-26riscv: compat: syscall: Add entry.S implementationGuo Ren
Implement the entry of compat_sys_call_table[] in asm. Ref to riscv-privileged spec 4.1.1 Supervisor Status Register (sstatus): BIT[32:33] = UXL[1:0]: - 1:32 - 2:64 - 3:128 Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220405071314.3225832-13-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-03-25Merge tag 'riscv-for-linus-5.18-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for Sv57-based virtual memory. - Various improvements for the MicroChip PolarFire SOC and the associated Icicle dev board, which should allow upstream kernels to boot without any additional modifications. - An improved memmove() implementation. - Support for the new Ssconfpmf and SBI PMU extensions, which allows for a much more useful perf implementation on RISC-V systems. - Support for restartable sequences. * tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (36 commits) rseq/selftests: Add support for RISC-V RISC-V: Add support for restartable sequence MAINTAINERS: Add entry for RISC-V PMU drivers Documentation: riscv: Remove the old documentation RISC-V: Add sscofpmf extension support RISC-V: Add perf platform driver based on SBI PMU extension RISC-V: Add RISC-V SBI PMU extension definitions RISC-V: Add a simple platform driver for RISC-V legacy perf RISC-V: Add a perf core library for pmu drivers RISC-V: Add CSR encodings for all HPMCOUNTERS RISC-V: Remove the current perf implementation RISC-V: Improve /proc/cpuinfo output for ISA extensions RISC-V: Do no continue isa string parsing without correct XLEN RISC-V: Implement multi-letter ISA extension probing framework RISC-V: Extract multi-letter extension names from "riscv, isa" RISC-V: Minimal parser for "riscv, isa" strings RISC-V: Correctly print supported extensions riscv: Fixed misaligned memory access. Fixed pointer comparison. MAINTAINERS: update riscv/microchip entry riscv: dts: microchip: add new peripherals to icicle kit device tree ...
2022-03-22RISC-V: Add support for restartable sequenceVincent Chen
Add calls to rseq_signal_deliver() and rseq_syscall() to introduce RSEQ support. 1. Call the rseq_signal_deliver() function to fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. 2. Check that system calls are not invoked from within rseq critical sections by invoking rseq_signal() from ret_from_syscall(). With CONFIG_DEBUG_RSEQ, such behavior results in termination of the process with SIGSEGV. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-02-24riscv: fix oops caused by irqsoff latency tracerChangbin Du
The trace_hardirqs_{on,off}() require the caller to setup frame pointer properly. This because these two functions use macro 'CALLER_ADDR1' (aka. __builtin_return_address(1)) to acquire caller info. If the $fp is used for other purpose, the code generated this macro (as below) could trigger memory access fault. 0xffffffff8011510e <+80>: ld a1,-16(s0) 0xffffffff80115112 <+84>: ld s2,-8(a1) # <-- paging fault here The oops message during booting if compiled with 'irqoff' tracer enabled: [ 0.039615][ T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8 [ 0.041925][ T0] Oops [#1] [ 0.042063][ T0] Modules linked in: [ 0.042864][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 #29 [ 0.043568][ T0] Hardware name: riscv-virtio,qemu (DT) [ 0.044343][ T0] epc : trace_hardirqs_on+0x56/0xe2 [ 0.044601][ T0] ra : restore_all+0x12/0x6e [ 0.044721][ T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0 [ 0.044801][ T0] gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020 [ 0.044882][ T0] t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0 [ 0.044967][ T0] s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100 [ 0.045046][ T0] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.045124][ T0] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45 [ 0.045210][ T0] s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50 [ 0.045289][ T0] s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8 [ 0.045389][ T0] s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000 [ 0.045474][ T0] s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000 [ 0.045548][ T0] t5 : 0000000000000000 t6 : ffffffff814aa368 [ 0.045620][ T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d [ 0.046402][ T0] [<ffffffff80003b94>] restore_all+0x12/0x6e This because the $fp(aka. $s0) register is not used as frame pointer in the assembly entry code. resume_kernel: REG_L s0, TASK_TI_PREEMPT_COUNT(tp) bnez s0, restore_all REG_L s0, TASK_TI_FLAGS(tp) andi s0, s0, _TIF_NEED_RESCHED beqz s0, restore_all call preempt_schedule_irq j restore_all To fix above issue, here we add one extra level wrapper for function trace_hardirqs_{on,off}() so they can be safely called by low level entry code. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2021-11-01Merge tag 'cpu-to-thread_info-v5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull thread_info update to move 'cpu' back from task_struct from Kees Cook: "Cross-architecture update to move task_struct::cpu back into thread_info on arm64, x86, s390, powerpc, and riscv. All Acked by arch maintainers. Quoting Ard Biesheuvel: 'Move task_struct::cpu back into thread_info Keeping CPU in task_struct is problematic for architectures that define raw_smp_processor_id() in terms of this field, as it requires linux/sched.h to be included, which causes a lot of pain in terms of circular dependencies (aka 'header soup') This series moves it back into thread_info (where it came from) for all architectures that enable THREAD_INFO_IN_TASK, addressing the header soup issue as well as some pointless differences in the implementations of task_cpu() and set_task_cpu()'" * tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: riscv: rely on core code to keep thread_info::cpu updated powerpc: smp: remove hack to obtain offset of task_struct::cpu sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y powerpc: add CPU field to struct thread_info s390: add CPU field to struct thread_info x86: add CPU field to struct thread_info arm64: add CPU field to struct thread_info
2021-10-26irq: riscv: perform irqentry in entry codeMark Rutland
In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/riscv perform all the irqentry accounting in its entry code. As arch/riscv uses GENERIC_IRQ_MULTI_HANDLER, we can use generic_handle_arch_irq() to do so. Since generic_handle_arch_irq() handles the irq entry and setting the irq regs, and happens before the irqchip code calls handle_IPI(), we can remove the redundant irq entry and irq regs manipulation from handle_IPI(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-09-30riscv: rely on core code to keep thread_info::cpu updatedArd Biesheuvel
Now that the core code switched back to using thread_info::cpu to keep a task's CPU number, we no longer need to keep it in sync explicitly. So just drop the code that does this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2021-07-06riscv: add VMAP_STACK overflow detectionTong Tiangen
This patch adds stack overflow detection to riscv, usable when CONFIG_VMAP_STACK=y. Overflow is detected in kernel exception entry(kernel/entry.S), if the kernel stack is overflow and been detected, the overflow handler is invoked on a per-cpu overflow stack. This approach preserves GPRs and the original exception information. The overflow detect is performed before any attempt is made to access the stack and the principle of stack overflow detection: kernel stacks are aligned to double their size, enabling overflow to be detected with a single bit test. For example, a 16K stack is aligned to 32K, ensuring that bit 14 of the SP must be zero. On an overflow (or underflow), this bit is flipped. Thus, overflow (of less than the size of the stack) can be detected by testing whether this bit is set. This gives us a useful error message on stack overflow, as can be trigger with the LKDTM overflow test: [ 388.053267] lkdtm: Performing direct entry EXHAUST_STACK [ 388.053663] lkdtm: Calling function with 1024 frame size to depth 32 ... [ 388.054016] lkdtm: loop 32/32 ... [ 388.054186] lkdtm: loop 31/32 ... [ 388.054491] lkdtm: loop 30/32 ... [ 388.054672] lkdtm: loop 29/32 ... [ 388.054859] lkdtm: loop 28/32 ... [ 388.055010] lkdtm: loop 27/32 ... [ 388.055163] lkdtm: loop 26/32 ... [ 388.055309] lkdtm: loop 25/32 ... [ 388.055481] lkdtm: loop 24/32 ... [ 388.055653] lkdtm: loop 23/32 ... [ 388.055837] lkdtm: loop 22/32 ... [ 388.056015] lkdtm: loop 21/32 ... [ 388.056188] lkdtm: loop 20/32 ... [ 388.058145] Insufficient stack space to handle exception! [ 388.058153] Task stack: [0xffffffd014260000..0xffffffd014264000] [ 388.058160] Overflow stack: [0xffffffe1f8d2c220..0xffffffe1f8d2d220] [ 388.058168] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90 [ 388.058175] Hardware name: riscv-virtio,qemu (DT) [ 388.058187] epc : number+0x32/0x2c0 [ 388.058247] ra : vsnprintf+0x2ae/0x3f0 [ 388.058255] epc : ffffffe0002d38f6 ra : ffffffe0002d814e sp : ffffffd01425ffc0 [ 388.058263] gp : ffffffe0012e4010 tp : ffffffe08014da00 t0 : ffffffd0142606e8 [ 388.058271] t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffd014260070 [ 388.058303] s1 : ffffffd014260158 a0 : ffffffd01426015e a1 : ffffffd014260158 [ 388.058311] a2 : 0000000000000013 a3 : ffff0a01ffffff10 a4 : ffffffe000c398e0 [ 388.058319] a5 : 511b02ec65f3e300 a6 : 0000000000a1749a a7 : 0000000000000000 [ 388.058327] s2 : ffffffff000000ff s3 : 00000000ffff0a01 s4 : ffffffe0012e50a8 [ 388.058335] s5 : 0000000000ffff0a s6 : ffffffe0012e50a8 s7 : ffffffe000da1cc0 [ 388.058343] s8 : ffffffffffffffff s9 : ffffffd0142602b0 s10: ffffffd0142602a8 [ 388.058351] s11: ffffffd01426015e t3 : 00000000000f0000 t4 : ffffffffffffffff [ 388.058359] t5 : 000000000000002f t6 : ffffffd014260158 [ 388.058366] status: 0000000000000100 badaddr: ffffffd01425fff8 cause: 000000000000000f [ 388.058374] Kernel panic - not syncing: Kernel stack overflow [ 388.058381] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90 [ 388.058387] Hardware name: riscv-virtio,qemu (DT) [ 388.058393] Call Trace: [ 388.058400] [<ffffffe000004944>] walk_stackframe+0x0/0xce [ 388.058406] [<ffffffe0006f0b28>] dump_backtrace+0x38/0x46 [ 388.058412] [<ffffffe0006f0b46>] show_stack+0x10/0x18 [ 388.058418] [<ffffffe0006f3690>] dump_stack+0x74/0x8e [ 388.058424] [<ffffffe0006f0d52>] panic+0xfc/0x2b2 [ 388.058430] [<ffffffe0006f0acc>] print_trace_address+0x0/0x24 [ 388.058436] [<ffffffe0002d814e>] vsnprintf+0x2ae/0x3f0 [ 388.058956] SMP: stopping secondary CPUs Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06Merge tag 'riscv-for-linus-5.13-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the memtest= kernel command-line argument. - Support for building the kernel with FORTIFY_SOURCE. - Support for generic clockevent broadcasts. - Support for the buildtar build target. - Some build system cleanups to pass more LLVM-friendly arguments. - Support for kprobes. - A rearranged kernel memory map, the first part of supporting sv48 systems. - Improvements to kexec, along with support for kdump and crash kernels. - An alternatives-based errata framework, along with support for handling a pair of errata that manifest on some SiFive designs (including the HiFive Unmatched). - Support for XIP. - A device tree for the Microchip PolarFire ICICLE SoC and associated dev board. ... along with a bunch of cleanups. There are already a handful of fixes on the list so there will likely be a part 2. * tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (45 commits) RISC-V: Always define XIP_FIXUP riscv: Remove 32b kernel mapping from page table dump riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=y RISC-V: Fix error code returned by riscv_hartid_to_cpuid() RISC-V: Enable Microchip PolarFire ICICLE SoC RISC-V: Initial DTS for Microchip ICICLE board dt-bindings: riscv: microchip: Add YAML documentation for the PolarFire SoC RISC-V: Add Microchip PolarFire SoC kconfig option RISC-V: enable XIP RISC-V: Add crash kernel support RISC-V: Add kdump support RISC-V: Improve init_resources() RISC-V: Add kexec support RISC-V: Add EM_RISCV to kexec UAPI header riscv: vdso: fix and clean-up Makefile riscv/mm: Use BUG_ON instead of if condition followed by BUG. riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU riscv: module: Create module allocations without exec permissions riscv: bpf: Avoid breaking W^X ...
2021-04-26riscv: sifive: Apply errata "cip-453" patchVincent Chen
Add sign extension to the $badaddr before addressing the instruction page fault and instruction access fault to workaround the issue "cip-453". To avoid affecting the existing code sequence, this patch will creates two trampolines to add sign extension to the $badaddr. By the "alternative" mechanism, these two trampolines will replace the original exception handler of instruction page fault and instruction access fault in the excp_vect_table. In this case, only the specific SiFive CPU core jumps to the do_page_fault and do_trap_insn_fault through these two trampolines. Other CPUs are not affected. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-15riscv: keep interrupts disabled for BREAKPOINT exceptionJisheng Zhang
Current riscv's kprobe handlers are run with both preemption and interrupt enabled, this violates kprobe requirements. Fix this issue by keeping interrupts disabled for BREAKPOINT exception. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Cc: stable@vger.kernel.org Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> [Palmer: add a comment] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-01riscv,entry: fix misaligned base for excp_vect_tableZihao Yu
In RV64, the size of each entry in excp_vect_table is 8 bytes. If the base of the table is not 8-byte aligned, loading an entry in the table will raise a misaligned exception. Although such exception will be handled by opensbi/bbl, this still causes performance degradation. Signed-off-by: Zihao Yu <yuzihao@ict.ac.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-12riscv: Trace irq on only interrupt is enabledAtish Patra
We should call irq trace only if interrupt is going to be enabled during excecption handling. Otherwise, it results in following warning during boot with lock debugging enabled. [ 0.000000] ------------[ cut here ]------------ [ 0.000000] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4085 lockdep_hardirqs_on_prepare+0x22a/0x22e [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-00022-ge20097fb37e2-dirty #548 [ 0.000000] epc: c005d5d4 ra : c005d5d4 sp : c1c01e80 [ 0.000000] gp : c1d456e0 tp : c1c0a980 t0 : 00000000 [ 0.000000] t1 : ffffffff t2 : 00000000 s0 : c1c01ea0 [ 0.000000] s1 : c100f360 a0 : 0000002d a1 : c00666ee [ 0.000000] a2 : 00000000 a3 : 00000000 a4 : 00000000 [ 0.000000] a5 : 00000000 a6 : c1c6b390 a7 : 3ffff00e [ 0.000000] s2 : c2384fe8 s3 : 00000000 s4 : 00000001 [ 0.000000] s5 : c1c0a980 s6 : c1d48000 s7 : c1613b4c [ 0.000000] s8 : 00000fff s9 : 80000200 s10: c1613b40 [ 0.000000] s11: 00000000 t3 : 00000000 t4 : 00000000 [ 0.000000] t5 : 00000001 t6 : 00000000 Fixes: 3c4697982982 ("riscv:Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-07riscv: Enable interrupts during syscalls with M-ModeDamien Le Moal
When running is M-Mode (no MMU config), MPIE does not get set. This results in all syscalls being executed with interrupts disabled as handle_exception never sets SR_IE as it always sees SR_PIE being cleared. Fix this by always force enabling interrupts in handle_syscall when CONFIG_RISCV_M_MODE is enabled. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-07riscv: return -ENOSYS for syscall -1Andreas Schwab
Properly return -ENOSYS for syscall -1 instead of leaving the return value uninitialized. This fixes the strace teststuite. Fixes: 5340627e3fe0 ("riscv: add support for SECCOMP and SECCOMP_FILTER") Cc: stable@vger.kernel.org Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>