summaryrefslogtreecommitdiff
path: root/arch/riscv/kernel/signal.c
AgeCommit message (Collapse)Author
2024-04-03riscv: Fix vector state restore in rt_sigreturn()Björn Töpel
The RISC-V Vector specification states in "Appendix D: Calling Convention for Vector State" [1] that "Executing a system call causes all caller-saved vector registers (v0-v31, vl, vtype) and vstart to become unspecified.". In the RISC-V kernel this is called "discarding the vstate". Returning from a signal handler via the rt_sigreturn() syscall, vector discard is also performed. However, this is not an issue since the vector state should be restored from the sigcontext, and therefore not care about the vector discard. The "live state" is the actual vector register in the running context, and the "vstate" is the vector state of the task. A dirty live state, means that the vstate and live state are not in synch. When vectorized user_from_copy() was introduced, an bug sneaked in at the restoration code, related to the discard of the live state. An example when this go wrong: 1. A userland application is executing vector code 2. The application receives a signal, and the signal handler is entered. 3. The application returns from the signal handler, using the rt_sigreturn() syscall. 4. The live vector state is discarded upon entering the rt_sigreturn(), and the live state is marked as "dirty", indicating that the live state need to be synchronized with the current vstate. 5. rt_sigreturn() restores the vstate, except the Vector registers, from the sigcontext 6. rt_sigreturn() restores the Vector registers, from the sigcontext, and now the vectorized user_from_copy() is used. The dirty live state from the discard is saved to the vstate, making the vstate corrupt. 7. rt_sigreturn() returns to the application, which crashes due to corrupted vstate. Note that the vectorized user_from_copy() is invoked depending on the value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which means that vlen has to be larger than 128b for this bug to trigger. The fix is simply to mark the live state as non-dirty/clean prior performing the vstate restore. Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1] Reported-by: Charlie Jenkins <charlie@rivosinc.com> Reported-by: Vineet Gupta <vgupta@kernel.org> Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Vineet Gupta <vineetg@rivosinc.com> Link: https://lore.kernel.org/r/20240403072638.567446-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-16Merge patch series "riscv: support kernel-mode Vector"Palmer Dabbelt
Andy Chiu <andy.chiu@sifive.com> says: This series provides support running Vector in kernel mode. Additionally, kernel-mode Vector can be configured to run without turnning off preemption on a CONFIG_PREEMPT kernel. Along with the suport, we add Vector optimized copy_{to,from}_user. And provide a simple threshold to decide when to run the vectorized functions. We decided to drop vectorized memcpy/memset/memmove for the moment due to the concern of memory side-effect in kernel_vector_begin(). The detailed description can be found at v9[0] This series is composed by 4 parts: patch 1-4: adds basic support for kernel-mode Vector patch 5: includes vectorized copy_{to,from}_user into the kernel patch 6: refactor context switch code in fpu [1] patch 7-10: provides some code refactors and support for preemptible kernel-mode Vector. This series can be merged if we feel any part of {1~4, 5, 6, 7~10} is mature enough. This patch is tested on a QEMU with V and verified that booting, normal userspace operations all work as usual with thresholds set to 0. Also, we test by launching multiple kernel threads which continuously executes and verifies Vector operations in the background. The module that tests these operation is expected to be upstream later. * b4-shazam-merge: riscv: vector: allow kernel-mode Vector with preemption riscv: vector: use kmem_cache to manage vector context riscv: vector: use a mask to write vstate_ctrl riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}() riscv: fpu: drop SR_SD bit checking riscv: lib: vectorize copy_to_user/copy_from_user riscv: sched: defer restoring Vector context for user riscv: Add vector extension XOR implementation riscv: vector: make Vector always available for softirq context riscv: Add support for kernel mode vector Link: https://lore.kernel.org/r/20240115055929.4736-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-16riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}()Andy Chiu
riscv_v_vstate_{save,restore}() can operate only on the knowlege of struct __riscv_v_ext_state, and struct pt_regs. Let the caller decides which should be passed into the function. Meanwhile, the kernel-mode Vector is going to introduce another vstate, so this also makes functions potentially able to be reused. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240115055929.4736-8-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-16riscv: sched: defer restoring Vector context for userAndy Chiu
User will use its Vector registers only after the kernel really returns to the userspace. So we can delay restoring Vector registers as long as we are still running in kernel mode. So, add a thread flag to indicates the need of restoring Vector and do the restore at the last arch-specific exit-to-user hook. This save the context restoring cost when we switch over multiple processes that run V in kernel mode. For example, if the kernel performs a context swicth from A->B->C, and returns to C's userspace, then there is no need to restore B's V-register. Besides, this also prevents us from repeatedly restoring V context when executing kernel-mode Vector multiple times. The cost of this is that we must disable preemption and mark vector as busy during vstate_{save,restore}. Because then the V context will not get restored back immediately when a trap-causing context switch happens in the middle of vstate_{save,restore}. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20240115055929.4736-5-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09riscv; fix __user annotation in save_v_state()Ben Dooks
The save_v_state() is technically sending a __user pointer through __put_user() and thus is generating a sparse warning so force the value to be "void *" to fix: arch/riscv/kernel/signal.c:94:16: warning: incorrect type in initializer (different address spaces) arch/riscv/kernel/signal.c:94:16: expected void *__val arch/riscv/kernel/signal.c:94:16: got void [noderef] __user *[assigned] datap Fixes: 8ee0b41898fa26f66e32 ("riscv: signal: Add sigcontext save/restore for vector") Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://lore.kernel.org/r/20231123142708.261733-1-ben.dooks@codethink.co.uk Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-09Merge patch "drivers: perf: Do not broadcast to other cpus when starting a ↵Palmer Dabbelt
counter" This is really just a single patch, but since the offending fix hasn't yet made it to my for-next I'm merging it here. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05riscv: signal: handle syscall restart before get_signalHaorong Lu
In the current riscv implementation, blocking syscalls like read() may not correctly restart after being interrupted by ptrace. This problem arises when the syscall restart process in arch_do_signal_or_restart() is bypassed due to changes to the regs->cause register, such as an ebreak instruction. Steps to reproduce: 1. Interrupt the tracee process with PTRACE_SEIZE & PTRACE_INTERRUPT. 2. Backup original registers and instruction at new_pc. 3. Change pc to new_pc, and inject an instruction (like ebreak) to this address. 4. Resume with PTRACE_CONT and wait for the process to stop again after executing ebreak. 5. Restore original registers and instructions, and detach from the tracee process. 6. Now the read() syscall in tracee will return -1 with errno set to ERESTARTSYS. Specifically, during an interrupt, the regs->cause changes from EXC_SYSCALL to EXC_BREAKPOINT due to the injected ebreak, which is inaccessible via ptrace so we cannot restore it. This alteration breaks the syscall restart condition and ends the read() syscall with an ERESTARTSYS error. According to include/linux/errno.h, it should never be seen by user programs. X86 can avoid this issue as it checks the syscall condition using a register (orig_ax) exposed to user space. Arm64 handles syscall restart before calling get_signal, where it could be paused and inspected by ptrace/debugger. This patch adjusts the riscv implementation to arm64 style, which also checks syscall using a kernel register (syscallno). It ensures the syscall restart process is not bypassed when changes to the cause register occur, providing more consistent behavior across various architectures. For a simplified reproduction program, feel free to visit: https://github.com/ancientmodern/riscv-ptrace-bug-demo. Signed-off-by: Haorong Lu <ancientmodern4@gmail.com> Link: https://lore.kernel.org/r/20230803224458.4156006-1-ancientmodern4@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12riscv: signal: fix sigaltstack frame size checkingAndy Chiu
The alternative stack checking in get_sigframe introduced by the Vector support is not needed and has a problem. It is not needed as we have already validate it at the beginning of the function if we are already on an altstack. If not, the size of an altstack is always validated at its allocation stage with sigaltstack_size_valid(). Besides, we must only regard the size of an altstack if the handler of a signal is registered with SA_ONSTACK. So, blindly checking overflow of an altstack if sas_ss_size not equals to zero will check against wrong signal handlers if only a subset of signals are registered with SA_ONSTACK. Fixes: 8ee0b41898fa ("riscv: signal: Add sigcontext save/restore for vector") Reported-by: Prashanth Swaminathan <prashanthsw@google.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230822164904.21660-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08riscv: signal: validate altstack to reflect VectorAndy Chiu
Some extensions, such as Vector, dynamically change footprint on a signal frame, so MINSIGSTKSZ is no longer accurate. For example, an RV64V implementation with vlen = 512 may occupy 2K + 40 + 12 Bytes of a signal frame with the upcoming support. And processes that do not execute any vector instructions do not need to reserve the extra sigframe. So we need a way to guard the allocation size of the sigframe at process runtime according to current status of V. Thus, provide the function sigaltstack_size_valid() to validate its size based on current allocation status of supported extensions. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20230605110724.21391-17-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08riscv: signal: Report signal frame size to userspace via auxvVincent Chen
The vector register belongs to the signal context. They need to be stored and restored as entering and leaving the signal handler. According to the V-extension specification, the maximum length of the vector registers can be 2^16. Hence, if userspace refers to the MINSIGSTKSZ to create a sigframe, it may not be enough. To resolve this problem, this patch refers to the commit 94b07c1f8c39c ("arm64: signal: Report signal frame size to userspace via auxv") to enable userspace to know the minimum required sigframe size through the auxiliary vector and use it to allocate enough memory for signal context. Note that auxv always reports size of the sigframe as if V exists for all starting processes, whenever the kernel has CONFIG_RISCV_ISA_V. The reason is that users usually reference this value to allocate an alternative signal stack, and the user may use V anytime. So the user must reserve a space for V-context in sigframe in case that the signal handler invokes after the kernel allocating V. Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20230605110724.21391-16-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08riscv: signal: Add sigcontext save/restore for vectorGreentime Hu
This patch facilitates the existing fp-reserved words for placement of the first extension's context header on the user's sigframe. A context header consists of a distinct magic word and the size, including the header itself, of an extension on the stack. Then, the frame is followed by the context of that extension, and then a header + context body for another extension if exists. If there is no more extension to come, then the frame must be ended with a null context header. A special case is rv64gc, where the kernel support no extensions requiring to expose additional regfile to the user. In such case the kernel would place the null context header right after the first reserved word of __riscv_q_ext_state when saving sigframe. And the kernel would check if all reserved words are zeros when a signal handler returns. __riscv_q_ext_state---->| |<-__riscv_extra_ext_header ~ ~ .reserved[0]--->|0 |<- .reserved <-------|magic |<- .hdr | |size |_______ end of sc_fpregs | |ext-bdy| | ~ ~ +)size ------->|magic |<- another context header |size | |ext-bdy| ~ ~ |magic:0|<- null context header |size:0 | The vector registers will be saved in datap pointer. The datap pointer will be allocated dynamically when the task needs in kernel space. On the other hand, datap pointer on the sigframe will be set right after the __riscv_v_ext_state data structure. Co-developed-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Suggested-by: Vineet Gupta <vineetg@rivosinc.com> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Co-developed-by: Andy Chiu <andy.chiu@sifive.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20230605110724.21391-15-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08riscv: signal: check fp-reserved words unconditionallyAndy Chiu
In order to let kernel/user locate and identify an extension context on the existing sigframe, we are going to utilize reserved space of fp and encode the information there. And since the sigcontext has already preserved a space for fp context w or w/o CONFIG_FPU, we move those reserved words checking/setting routine back into generic code. This commit also undone an additional logical change carried by the refactor commit 007f5c3589578 ("Refactor FPU code in signal setup/return procedures"). Originally we did not restore fp context if restoring of gpr have failed. And it was fine on the other side. In such way the kernel could keep the regfiles intact, and potentially react at the failing point of restore. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Link: https://lore.kernel.org/r/20230605110724.21391-14-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-28Merge tag 'riscv-for-linus-6.4-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for runtime detection of the Svnapot extension - Support for Zicboz when clearing pages - We've moved to GENERIC_ENTRY - Support for !MMU on rv32 systems - The linear region is now mapped via huge pages - Support for building relocatable kernels - Support for the hwprobe interface - Various fixes and cleanups throughout the tree * tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits) RISC-V: hwprobe: Explicity check for -1 in vdso init RISC-V: hwprobe: There can only be one first riscv: Allow to downgrade paging mode from the command line dt-bindings: riscv: add sv57 mmu-type RISC-V: hwprobe: Remove __init on probe_vendor_features() riscv: Use --emit-relocs in order to move .rela.dyn in init riscv: Check relocations at compile time powerpc: Move script to check relocations at compile time in scripts/ riscv: Introduce CONFIG_RELOCATABLE riscv: Move .rela.dyn outside of init to avoid empty relocations riscv: Prepare EFI header for relocatable kernels riscv: Unconditionnally select KASAN_VMALLOC if KASAN riscv: Fix ptdump when KASAN is enabled riscv: Fix EFI stub usage of KASAN instrumented strcmp function riscv: Move DTB_EARLY_BASE_VA to the kernel address space riscv: Rework kasan population functions riscv: Split early and final KASAN population functions riscv: Use PUD/P4D/PGD pages for the linear mapping riscv: Move the linear mapping creation in its own function riscv: Get rid of riscv_pfn_base variable ...
2023-04-11riscv: add icache flush for nommu sigreturn trampolineMathis Salmen
In a NOMMU kernel, sigreturn trampolines are generated on the user stack by setup_rt_frame. Currently, these trampolines are not instruction fenced, thus their visibility to ifetch is not guaranteed. This patch adds a flush_icache_range in setup_rt_frame to fix this problem. Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de> Fixes: 6bd33e1ece52 ("riscv: add nommu support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23riscv: entry: Convert to generic entryGuo Ren
This patch converts riscv to use the generic entry infrastructure from kernel/entry/*. The generic entry makes maintainers' work easier and codes more elegant. Here are the changes: - More clear entry.S with handle_exception and ret_from_exception - Get rid of complex custom signal implementation - Move syscall procedure from assembly to C, which is much more readable. - Connect ret_from_fork & ret_from_kernel_thread to generic entry. - Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode - Use the standard preemption code instead of custom Suggested-by: Huacai Chen <chenhuacai@kernel.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Yipeng Zou <zouyipeng@huawei.com> Tested-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08RISC-V: Fix unannoted hardirqs-on in return to userspace slow-pathAndrew Bresticker
The return to userspace path in entry.S may enable interrupts without the corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy due to the use of RA to point back to ret_from_exception, so just move the whole slow-path loop into C. It's more readable and it lets us use local_irq_{enable,disable}(), avoiding the need for manual annotations altogether. [0]: ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()) WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0 Modules linked in: CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53 Hardware name: riscv-virtio,qemu (DT) epc : check_flags+0x10a/0x1e0 ra : check_flags+0x10a/0x1e0 <snip> status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [<ffffffff808edb90>] lock_is_held_type+0x78/0x14e [<ffffffff8003dae2>] __might_resched+0x26/0x22c [<ffffffff8003dd24>] __might_sleep+0x3c/0x66 [<ffffffff80022c60>] get_signal+0x9e/0xa70 [<ffffffff800054a2>] do_notify_resume+0x6e/0x422 [<ffffffff80003c68>] ret_from_exception+0x0/0x10 irq event stamp: 44512 hardirqs last enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62 hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14 softirqs last enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104 ---[ end trace 0000000000000000 ]--- possible reason: unannotated irqs-on. Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-09-17riscv: fix a nasty sigreturn bug...Al Viro
riscv has an equivalent of arm bug fixed by 653d48b22166 ("arm: fix really nasty sigreturn bug"); if signal gets caught by an interrupt that hits when we have the right value in a0 (-513), *and* another signal gets delivered upon sigreturn() (e.g. included into the blocked mask for the first signal and posted while the handler had been running), the syscall restart logics will see regs->cause equal to EXC_SYSCALL (we are in a syscall, after all) and a0 already restored to its original value (-513, which happens to be -ERESTARTNOINTR) and assume that we need to apply the usual syscall restart logics. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/YxJEiSq%2FCGaL6Gm9@ZenIV/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-08-18riscv: signal: fix missing prototype warningConor Dooley
Fix the warning: arch/riscv/kernel/signal.c:316:27: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes] asmlinkage __visible void do_notify_resume(struct pt_regs *regs, All other functions in the file are static & none of the existing headers stood out as an obvious location. Create signal.h to hold the declaration. Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220814141237.493457-4-mail@conchuod.ie Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-17riscv: compat: signal: Add rt_frame implementationGuo Ren
Implement compat_setup_rt_frame for sigcontext save & restore. The main process is the same with signal, but the rv32 pt_regs' size is different from rv64's, so we needs convert them. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220405071314.3225832-19-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-03-28Merge tag 'ptrace-cleanups-for-v5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace cleanups from Eric Biederman: "This set of changes removes tracehook.h, moves modification of all of the ptrace fields inside of siglock to remove races, adds a missing permission check to ptrace.c The removal of tracehook.h is quite significant as it has been a major source of confusion in recent years. Much of that confusion was around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the semantics clearer). For people who don't know tracehook.h is a vestiage of an attempt to implement uprobes like functionality that was never fully merged, and was later superseeded by uprobes when uprobes was merged. For many years now we have been removing what tracehook functionaly a little bit at a time. To the point where anything left in tracehook.h was some weird strange thing that was difficult to understand" * tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ptrace: Remove duplicated include in ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE ptrace: Return the signal to continue with from ptrace_stop ptrace: Move setting/clearing ptrace_message into ptrace_stop tracehook: Remove tracehook.h resume_user_mode: Move to resume_user_mode.h resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume signal: Move set_notify_signal and clear_notify_signal into sched/signal.h task_work: Decouple TIF_NOTIFY_SIGNAL and task_work task_work: Call tracehook_notify_signal from get_signal on all architectures task_work: Introduce task_work_pending task_work: Remove unnecessary include from posix_timers.h ptrace: Remove tracehook_signal_handler ptrace: Remove arch_syscall_{enter,exit}_tracehook ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h ptrace/arm: Rename tracehook_report_syscall report_syscall ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-22RISC-V: Add support for restartable sequenceVincent Chen
Add calls to rseq_signal_deliver() and rseq_syscall() to introduce RSEQ support. 1. Call the rseq_signal_deliver() function to fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. 2. Check that system calls are not invoked from within rseq critical sections by invoking rseq_signal() from ret_from_syscall(). With CONFIG_DEBUG_RSEQ, such behavior results in termination of the process with SIGSEGV. Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-03-10resume_user_mode: Move to resume_user_mode.hEric W. Biederman
Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h. While doing that rename tracehook_notify_resume to resume_user_mode_work. Update all of the places that included tracehook.h for these functions to include resume_user_mode.h instead. Update all of the callers of tracehook_notify_resume to call resume_user_mode_work. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-25riscv: Turn has_fpu into a static key if FPU=yJisheng Zhang
The has_fpu check sits at hot code path: switch_to(). Currently, has_fpu is a bool variable if FPU=y, switch_to() checks it each time, we can optimize out this check by turning the has_fpu into a static key. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-01-14riscv: Add uprobes supportedGuo Ren
This patch adds support for uprobes on riscv architecture. Just like kprobe, it support single-step and simulate instructions. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-12riscv: add support for TIF_NOTIFY_SIGNALJens Axboe
Wire up TIF_NOTIFY_SIGNAL handling for riscv. Cc: linux-riscv@lists.infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()Jens Axboe
All the callers currently do this, clean it up and move the clearing into tracehook_notify_resume() instead. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2019-11-17riscv: add nommu supportChristoph Hellwig
The kernel runs in M-mode without using page tables, and thus can't run bare metal without help from additional firmware. Most of the patch is just stubbing out code not needed without page tables, but there is an interesting detail in the signals implementation: - The normal RISC-V syscall ABI only implements rt_sigreturn as VDSO entry point, but the ELF VDSO is not supported for nommu Linux. We instead copy the code to call the syscall onto the stack. In addition to enabling the nommu code a new defconfig for a small kernel image that can run in nommu mode on qemu is also provided, to run a kernel in qemu you can use the following command line: qemu-system-riscv64 -smp 2 -m 64 -machine virt -nographic \ -kernel arch/riscv/boot/loader \ -drive file=rootfs.ext2,format=raw,id=hd0 \ -device virtio-blk-device,drive=hd0 Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anup Patel <anup@brainfault.org> [paul.walmsley@sifive.com: updated to apply; add CONFIG_MMU guards around PCI_IOBASE definition to fix build issues; fixed checkpatch issues; move the PCI_IO_* and VMEMMAP address space macros along with the others; resolve sparse warning] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-11-05riscv: abstract out CSR names for supervisor vs machine modeChristoph Hellwig
Many of the privileged CSRs exist in a supervisor and machine version that are used very similarly. Provide versions of the CSR names and fields that map to either the S-mode or M-mode variant depending on a new CONFIG_RISCV_M_MODE kconfig symbol. Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com> and Paul Walmsley <paul.walmsley@sifive.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> # for drivers/clocksource, drivers/irqchip [paul.walmsley@sifive.com: updated to apply] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-28riscv: for C functions called only from assembly, mark with __visiblePaul Walmsley
Rather than adding prototypes for C functions called only by assembly code, mark them as __visible. This avoids adding prototypes that will never be used by the callers. Resolves the following sparse warnings: arch/riscv/kernel/irq.c:27:29: warning: symbol 'do_IRQ' was not declared. Should it be static? arch/riscv/kernel/ptrace.c:151:6: warning: symbol 'do_syscall_trace_enter' was not declared. Should it be static? arch/riscv/kernel/ptrace.c:165:6: warning: symbol 'do_syscall_trace_exit' was not declared. Should it be static? arch/riscv/kernel/signal.c:295:17: warning: symbol 'do_notify_resume' was not declared. Should it be static? arch/riscv/kernel/traps.c:92:1: warning: symbol 'do_trap_unknown' was not declared. Should it be static? arch/riscv/kernel/traps.c:94:1: warning: symbol 'do_trap_insn_misaligned' was not declared. Should it be static? arch/riscv/kernel/traps.c:96:1: warning: symbol 'do_trap_insn_fault' was not declared. Should it be static? arch/riscv/kernel/traps.c:98:1: warning: symbol 'do_trap_insn_illegal' was not declared. Should it be static? arch/riscv/kernel/traps.c:100:1: warning: symbol 'do_trap_load_misaligned' was not declared. Should it be static? arch/riscv/kernel/traps.c:102:1: warning: symbol 'do_trap_load_fault' was not declared. Should it be static? arch/riscv/kernel/traps.c:104:1: warning: symbol 'do_trap_store_misaligned' was not declared. Should it be static? arch/riscv/kernel/traps.c:106:1: warning: symbol 'do_trap_store_fault' was not declared. Should it be static? arch/riscv/kernel/traps.c:108:1: warning: symbol 'do_trap_ecall_u' was not declared. Should it be static? arch/riscv/kernel/traps.c:110:1: warning: symbol 'do_trap_ecall_s' was not declared. Should it be static? arch/riscv/kernel/traps.c:112:1: warning: symbol 'do_trap_ecall_m' was not declared. Should it be static? arch/riscv/kernel/traps.c:124:17: warning: symbol 'do_trap_break' was not declared. Should it be static? arch/riscv/kernel/smpboot.c:136:24: warning: symbol 'smp_callin' was not declared. Should it be static? Based on a suggestion from Luc Van Oostenryck. This version includes changes based on feedback from Christoph Hellwig <hch@lst.de>. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> # for do_syscall_trace_*
2019-10-28riscv: fp: add missing __user pointer annotationsPaul Walmsley
The __user annotations were removed from the {save,restore}_fp_state() function signatures by commit 007f5c358957 ("Refactor FPU code in signal setup/return procedures"), but should be present, and sparse warns when they are not applied. Add them back in. This change should have no functional impact. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Fixes: 007f5c358957 ("Refactor FPU code in signal setup/return procedures") Cc: Alan Kao <alankao@andestech.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-07-08Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-05-27signal: Remove task parameter from force_sigEric W. Biederman
All of the remaining callers pass current into force_sig so remove the task parameter to make this obvious and to make misuse more difficult in the future. This also makes it clear force_sig passes current into force_sig_info. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 120Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see the file copying or write to the free software foundation inc extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 12 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091651.231300438@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-25riscv/signal: Fixup additional syscall restartingGuo Ren
The function of do_notify_resume called by entry.S could be entered in loop when SIGPENDING was setted again before sret. So we must add prevent code to make syscall restart (regs->sepc -= 0x4) or it may re-execute unexpected instructions. Just like in_syscall & forget_syscall used by arm. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-22Auto-detect whether a FPU existsAlan Kao
We expect that a kernel with CONFIG_FPU=y can still support no-FPU machines. To do so, the kernel should first examine the existence of a FPU, then do nothing if a FPU does exist; otherwise, it should disable/bypass all FPU-related functions. In this patch, a new global variable, has_fpu, is created and determined when parsing the hardware capability from device tree during booting. This variable is used in those FPU-related functions. Signed-off-by: Alan Kao <alankao@andestech.com> Cc: Greentime Hu <greentime@andestech.com> Cc: Vincent Chen <vincentc@andestech.com> Cc: Zong Li <zong@andestech.com> Cc: Nick Hu <nickhu@andestech.com> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-10-22Allow to disable FPU supportAlan Kao
FPU codes have been separated from common part in previous patches. This patch add the CONFIG_FPU option and some stubs, so that a no-FPU configuration is allowed. Signed-off-by: Alan Kao <alankao@andestech.com> Cc: Greentime Hu <greentime@andestech.com> Cc: Vincent Chen <vincentc@andestech.com> Cc: Zong Li <zong@andestech.com> Cc: Nick Hu <nickhu@andestech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-10-22Refactor FPU code in signal setup/return proceduresAlan Kao
FPU-related logic is separated from normal signal handling path in this patch. Kernel can easily be configured to exclude those procedures for no-FPU systems. Signed-off-by: Alan Kao <alankao@andestech.com> Cc: Greentime Hu <greentime@andestech.com> Cc: Vincent Chen <vincentc@andestech.com> Cc: Zong Li <zong@andestech.com> Cc: Nick Hu <nickhu@andestech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2017-09-26RISC-V: User-facing APIPalmer Dabbelt
This patch contains code that is in some way visible to the user: including via system calls, the VDSO, module loading and signal handling. It also contains some generic code that is ABI visible. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>