summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/signal.c
AgeCommit message (Collapse)Author
2020-07-22powerpc/64s: system call support for scv/rfscv instructionsNicholas Piggin
Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that LR is not preserved, nor are volatile CR registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve LR. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix ppc64e build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
2020-05-19powerpc/watchpoint: Convert thread_struct->hw_brk to an arrayRavi Bangoria
So far powerpc hw supported only one watchpoint. But Power10 is introducing 2nd DAWR. Convert thread_struct->hw_brk into an array. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-10-ravi.bangoria@linux.ibm.com
2020-05-19powerpc/watchpoint: Provide DAWR number to __set_breakpointRavi Bangoria
Introduce new parameter 'nr' to __set_breakpoint() which indicates which DAWR should be programed. Also convert current_brk variable to an array. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-7-ravi.bangoria@linux.ibm.com
2020-05-15powerpc: Use trap metadata to prevent double restart rather than zeroing trapNicholas Piggin
It's not very nice to zero trap for this, because then system calls no longer have trap_is_syscall(regs) invariant, and we can't distinguish between sc and scv system calls (in a later patch). Take one last unused bit from the low bits of the pt_regs.trap word for this instead. There is not a really good reason why it should be in trap as opposed to another field, but trap has some concept of flags and it exists. Ideally I think we would move trap to 2-byte field and have 2 more bytes available independently. Add a selftests case for this, which can be seen to fail if trap_norestart() is changed to return false. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make them static inlines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-4-mpe@ellerman.id.au
2020-05-15powerpc: trap_is_syscall() helper to hide syscall trap numberNicholas Piggin
A new system call interrupt will be added with a new trap number. Hide the explicit 0xc00 test behind an accessor to reduce churn in callers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-3-mpe@ellerman.id.au
2020-04-03powerpc/64: make buildable without CONFIG_COMPATMichal Suchanek
There are numerous references to 32bit functions in generic and 64bit code so ifdef them out. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e5619617020ef3a1f54f0c076e7d74cb9ec9f3bf.1584699455.git.msuchanek@suse.de
2020-04-03powerpc: move common register copy functions from signal_32.c to signal.cMichal Suchanek
These functions are required for 64bit as well. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9fd6d9b7c5e91fab21159fe23534a2f16b4962d3.1584699455.git.msuchanek@suse.de
2020-02-18powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal deliveryGustavo Luiz Duarte
After a treclaim, we expect to be in non-transactional state. If we don't clear the current thread's MSR[TS] before we get preempted, then tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in suspended transaction state. When handling a signal caught in transactional state, handle_rt_signal64() calls get_tm_stackpointer() that treclaims the transaction using tm_reclaim_current() but without clearing the thread's MSR[TS]. This can cause the TM Bad Thing exception below if later we pagefault and get preempted trying to access the user's sigframe, using __put_user(). Afterwards, when we are rescheduled back into do_page_fault() (but now in suspended state since the thread's MSR[TS] was not cleared), upon executing 'rfid' after completion of the page fault handling, the exception is raised because a transition from suspended to non-transactional state is invalid. Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32 NIP: c00000000000de44 LR: c000000000034728 CTR: 0000000000000000 REGS: c00000003fe7bd70 TRAP: 0700 Not tainted (5.4.0-rc2) MSR: 8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]> CR: 44000884 XER: 00000000 CFAR: c00000000000dda4 IRQMASK: 0 PACATMSCRATCH: 800000010280b033 GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78 GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0 GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000 GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140 GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800 GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000 NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0 LR [c000000000034728] handle_rt_signal64+0x78/0xc50 Call Trace: [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable) [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460 [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74 Instruction dump: 7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088 60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989 ---[ end trace 93094aa44b442f87 ]--- The simplified sequence of events that triggers the above exception is: ... # userspace in NON-TRANSACTIONAL state tbegin # userspace in TRANSACTIONAL state signal delivery # kernelspace in SUSPENDED state handle_rt_signal64() get_tm_stackpointer() treclaim # kernelspace in NON-TRANSACTIONAL state __put_user() page fault happens. We will never get back here because of the TM Bad Thing exception. page fault handling kicks in and we voluntarily preempt ourselves do_page_fault() __schedule() __switch_to(other_task) our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared __switch_to(our_task) switch_to_tm() tm_recheckpoint_new_task() trechkpt # kernelspace in SUSPENDED state The page fault handling resumes, but now we are in suspended transaction state do_page_fault() completes rfid <----- trying to get back where the page fault happened (we were non-transactional back then) TM Bad Thing # illegal transition from suspended to non-transactional This patch fixes that issue by clearing the current thread's MSR[TS] just after treclaim in get_tm_stackpointer() so that we stay in non-transactional state in case we are preempted. In order to make treclaim and clearing the thread's MSR[TS] atomic from a preemption perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is used. It's also necessary to save the previous value of the thread's MSR before get_tm_stackpointer() is called so that it can be exposed to the signal handler later in setup_tm_sigcontexts() to inform the userspace MSR at the moment of the signal delivery. Found with tm-signal-context-force-tm kernel selftest. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9 Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200211033831.11165-1-gustavold@linux.ibm.com
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-22rseq: Avoid infinite recursion when delivering SIGSEGVWill Deacon
When delivering a signal to a task that is using rseq, we call into __rseq_handle_notify_resume() so that the registers pushed in the sigframe are updated to reflect the state of the restartable sequence (for example, ensuring that the signal returns to the abort handler if necessary). However, if the rseq management fails due to an unrecoverable fault when accessing userspace or certain combinations of RSEQ_CS_* flags, then we will attempt to deliver a SIGSEGV. This has the potential for infinite recursion if the rseq code continuously fails on signal delivery. Avoid this problem by using force_sigsegv() instead of force_sig(), which is explicitly designed to reset the SEGV handler to SIG_DFL in the case of a recursive fault. In doing so, remove rseq_signal_deliver() from the internal rseq API and have an optional struct ksignal * parameter to rseq_handle_notify_resume() instead. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: peterz@infradead.org Cc: paulmck@linux.vnet.ibm.com Cc: boqun.feng@gmail.com Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
2018-06-10Merge branch 'core-rseq-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull restartable sequence support from Thomas Gleixner: "The restartable sequences syscall (finally): After a lot of back and forth discussion and massive delays caused by the speculative distraction of maintainers, the core set of restartable sequences has finally reached a consensus. It comes with the basic non disputed core implementation along with support for arm, powerpc and x86 and a full set of selftests It was exposed to linux-next earlier this week, so it does not fully comply with the merge window requirements, but there is really no point to drag it out for yet another cycle" * 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq/selftests: Provide Makefile, scripts, gitignore rseq/selftests: Provide parametrized tests rseq/selftests: Provide basic percpu ops test rseq/selftests: Provide basic test rseq/selftests: Provide rseq library selftests/lib.mk: Introduce OVERRIDE_TARGETS powerpc: Wire up restartable sequences system call powerpc: Add syscall detection for restartable sequences powerpc: Add support for restartable sequences x86: Wire up restartable sequence system call x86: Add support for restartable sequences arm: Wire up restartable sequences system call arm: Add syscall detection for restartable sequences arm: Add restartable sequences support rseq: Introduce restartable sequences system call uapi/headers: Provide types_32_64.h
2018-06-06powerpc: Add support for restartable sequencesBoqun Feng
Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. Perform fixup on the pre-signal when a signal is delivered on top of a restartable sequence critical section. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: linux-api@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20180602124408.8430-9-mathieu.desnoyers@efficios.com
2018-06-03powerpc: Check address limit on user-mode return (TIF_FSCHECK)Michael Ellerman
set_fs() sets the addr_limit, which is used in access_ok() to determine if an address is a user or kernel address. Some code paths use set_fs() to temporarily elevate the addr_limit so that kernel code can read/write kernel memory as if it were user memory. That is fine as long as the code can't ever return to userspace with the addr_limit still elevated. If that did happen, then userspace can read/write kernel memory as if it were user memory, eg. just with write(2). In case it's not clear, that is very bad. It has also happened in the past due to bugs. Commit 5ea0727b163c ("x86/syscalls: Check address limit on user-mode return") added a mechanism to check the addr_limit value before returning to userspace. Any call to set_fs() sets a thread flag, TIF_FSCHECK, and if we see that on the return to userspace we go out of line to check that the addr_limit value is not elevated. For further info see the above commit, as well as: https://lwn.net/Articles/722267/ https://bugs.chromium.org/p/project-zero/issues/detail?id=990 Verified to work on 64-bit Book3S using a POC that objdumps the system call handler, and a modified lkdtm_CORRUPT_USER_DS() that doesn't kill the caller. Before: $ sudo ./test-tif-fscheck ... 0000000000000000 <.data>: 0: e1 f7 8a 79 rldicl. r10,r12,30,63 4: 80 03 82 40 bne 0x384 8: 00 40 8a 71 andi. r10,r12,16384 c: 78 0b 2a 7c mr r10,r1 10: 10 fd 21 38 addi r1,r1,-752 14: 08 00 c2 41 beq- 0x1c 18: 58 09 2d e8 ld r1,2392(r13) 1c: 00 00 41 f9 std r10,0(r1) 20: 70 01 61 f9 std r11,368(r1) 24: 78 01 81 f9 std r12,376(r1) 28: 70 00 01 f8 std r0,112(r1) 2c: 78 00 41 f9 std r10,120(r1) 30: 20 00 82 41 beq 0x50 34: a6 42 4c 7d mftb r10 After: $ sudo ./test-tif-fscheck Killed And in dmesg: Invalid address limit on user-mode return WARNING: CPU: 1 PID: 3689 at ../include/linux/syscalls.h:260 do_notify_resume+0x140/0x170 ... NIP [c00000000001ee50] do_notify_resume+0x140/0x170 LR [c00000000001ee4c] do_notify_resume+0x13c/0x170 Call Trace: do_notify_resume+0x13c/0x170 (unreliable) ret_from_except_lite+0x70/0x74 Performance overhead is essentially zero in the usual case, because the bit is checked as part of the existing _TIF_USER_WORK_MASK check. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - handle 'infinitely'-long sleeping tasks, from Miroslav Benes - remove 'immediate' feature, as it turns out it doesn't provide the originally expected semantics, and brings more issues than value * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add locking to force and signal functions livepatch: Remove immediate feature livepatch: force transition to finish livepatch: send a fake signal to all blocking tasks
2017-12-04livepatch: send a fake signal to all blocking tasksMiroslav Benes
Live patching consistency model is of LEAVE_PATCHED_SET and SWITCH_THREAD. This means that all tasks in the system have to be marked one by one as safe to call a new patched function. Safe means when a task is not (sleeping) in a set of patched functions. That is, no patched function is on the task's stack. Another clearly safe place is the boundary between kernel and userspace. The patching waits for all tasks to get outside of the patched set or to cross the boundary. The transition is completed afterwards. The problem is that a task can block the transition for quite a long time, if not forever. It could sleep in a set of patched functions, for example. Luckily we can force the task to leave the set by sending it a fake signal, that is a signal with no data in signal pending structures (no handler, no sign of proper signal delivered). Suspend/freezer use this to freeze the tasks as well. The task gets TIF_SIGPENDING set and is woken up (if it has been sleeping in the kernel before) or kicked by rescheduling IPI (if it was running on other CPU). This causes the task to go to kernel/userspace boundary where the signal would be handled and the task would be marked as safe in terms of live patching. There are tasks which are not affected by this technique though. The fake signal is not sent to kthreads. They should be handled differently. They can be woken up so they leave the patched set and their TIF_PATCH_PENDING can be cleared thanks to stack checking. For the sake of completeness, if the task is in TASK_RUNNING state but not currently running on some CPU it doesn't get the IPI, but it would eventually handle the signal anyway. Second, if the task runs in the kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not handled on return from the interrupt. It would be handled on return to the userspace in the future when the fake signal is sent again. Stack checking deals with these cases in a better way. If the task was sleeping in a syscall it would be woken by our fake signal, it would check if TIF_SIGPENDING is set (by calling signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with ERESTART* return values are restarted in case of the fake signal (see do_signal()). EINTR is propagated back to the userspace program. This could disturb the program, but... * each process dealing with signals should react accordingly to EINTR return values. * syscalls returning EINTR happen to be quite common situation in the system even if no fake signal is sent. * freezer sends the fake signal and does not deal with EINTR anyhow. Thus EINTR values are returned when the system is resumed. The very safe marking is done in architectures' "entry" on syscall and interrupt/exception exit paths, and in a stack checking functions of livepatch. TIF_PATCH_PENDING is cleared and the next recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also call klp_update_patch_state() before do_signal(), so that recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING immediately and thus prevent a double call of do_signal(). Note that the fake signal is not sent to stopped/traced tasks. Such task prevents the patching to finish till it continues again (is not traced anymore). Last, sending the fake signal is not automatic. It is done only when admin requests it by writing 1 to signal sysfs attribute in livepatch sysfs directory. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: x86@kernel.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-13powerpc/signal: Properly handle return value from uprobe_deny_signal()Naveen N. Rao
When a uprobe is installed on an instruction that we currently do not emulate, we copy the instruction into a xol buffer and single step that instruction. If that instruction generates a fault, we abort the single stepping before invoking the signal handler. Once the signal handler is done, the uprobe trap is hit again since the instruction is retried and the process repeats. We use uprobe_deny_signal() to detect if the xol instruction triggered a signal. If so, we clear TIF_SIGPENDING and set TIF_UPROBE so that the signal is not handled until after the single stepping is aborted. In this case, uprobe_deny_signal() returns true and get_signal() ends up returning 0. However, in do_signal(), we are not looking at the return value, but depending on ksig.sig for further action, all with an uninitialized ksig that is not touched in this scenario. Fix the same by initializing ksig.sig to 0. Fixes: 129b69df9c90 ("powerpc: Use get_signal() signal_setup_done()") Cc: stable@vger.kernel.org # v3.17+ Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-08livepatch/powerpc: add TIF_PATCH_PENDING thread flagJosh Poimboeuf
Add the TIF_PATCH_PENDING thread flag to enable the new livepatch per-task consistency model for powerpc. The bit getting set indicates the thread has a pending patch which needs to be applied when the thread exits the kernel. The bit is included in the _TIF_USER_WORK_MASK macro so that do_notify_resume() and klp_update_patch_state() get called when the bit is set. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-04powerpc: signals: Stop using current in signal codeCyril Bur
Much of the signal code takes a pt_regs on which it operates. Over time the signal code has needed to know more about the thread than what pt_regs can supply, this information is obtained as needed by using 'current'. This approach is not strictly incorrect however it does mean that there is now a hard requirement that the pt_regs being passed around does belong to current, this is never checked. A safer approach is for the majority of the signal functions to take a task_struct from which they can obtain pt_regs and any other information they need. The caveat that the task_struct they are passed must be current doesn't go away but can more easily be checked for. Functions called from outside powerpc signal code are passed a pt_regs and they can confirm that the pt_regs is that of current and pass current to other functions, furthurmore, powerpc signal functions can check that the task_struct they are passed is the same as current avoiding possible corruption of current (or the task they are passed) if this assertion ever fails. CC: paulus@samba.org Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01powerpc: Fix misspellings in comments.Adam Buchbinder
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-08-06powerpc: Use sigsp()Richard Weinberger
Use sigsp() instead of the open coded variant. Signed-off-by: Richard Weinberger <richard@nod.at>
2014-08-06powerpc: Use get_signal() signal_setup_done()Richard Weinberger
Use the more generic functions get_signal() signal_setup_done() for signal delivery. This inverts also the return codes of setup_*frame() to follow the kernel convention. Signed-off-by: Richard Weinberger <richard@nod.at>
2014-05-20powerpc: Fix smp_processor_id() in preemptible splat in set_breakpointPaul Gortmaker
Currently, on 8641D, which doesn't set CONFIG_HAVE_HW_BREAKPOINT we get the following splat: BUG: using smp_processor_id() in preemptible [00000000] code: login/1382 caller is set_breakpoint+0x1c/0xa0 CPU: 0 PID: 1382 Comm: login Not tainted 3.15.0-rc3-00041-g2aafe1a4d451 #1 Call Trace: [decd5d80] [c0008dc4] show_stack+0x50/0x158 (unreliable) [decd5dc0] [c03c6fa0] dump_stack+0x7c/0xdc [decd5de0] [c01f8818] check_preemption_disabled+0xf4/0x104 [decd5e00] [c00086b8] set_breakpoint+0x1c/0xa0 [decd5e10] [c00d4530] flush_old_exec+0x2bc/0x588 [decd5e40] [c011c468] load_elf_binary+0x2ac/0x1164 [decd5ec0] [c00d35f8] search_binary_handler+0xc4/0x1f8 [decd5ef0] [c00d4ee8] do_execve+0x3d8/0x4b8 [decd5f40] [c001185c] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xfeee554 LR = 0xfeee7d4 The call path in this case is: flush_thread --> set_debug_reg_defaults --> set_breakpoint --> __get_cpu_var Since preemption is enabled in the cleanup of flush thread, and there is no need to disable it, introduce the distinction between set_breakpoint and __set_breakpoint, leaving only the flush_thread instance as the current user of set_breakpoint. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc: Don't corrupt transactional state when using FP/VMX in kernelPaul Mackerras
Currently, when we have a process using the transactional memory facilities on POWER8 (that is, the processor is in transactional or suspended state), and the process enters the kernel and the kernel then uses the floating-point or vector (VMX/Altivec) facility, we end up corrupting the user-visible FP/VMX/VSX state. This happens, for example, if a page fault causes a copy-on-write operation, because the copy_page function will use VMX to do the copy on POWER8. The test program below demonstrates the bug. The bug happens because when FP/VMX state for a transactional process is stored in the thread_struct, we store the checkpointed state in .fp_state/.vr_state and the transactional (current) state in .transact_fp/.transact_vr. However, when the kernel wants to use FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(), which saves the current state in .fp_state/.vr_state. Furthermore, when we return to the user process we return with FP/VMX/VSX disabled. The next time the process uses FP/VMX/VSX, we don't know which set of state (the current register values, .fp_state/.vr_state, or .transact_fp/.transact_vr) we should be using, since we have no way to tell if we are still in the same transaction, and if not, whether the previous transaction succeeded or failed. Thus it is necessary to strictly adhere to the rule that if FP has been enabled at any point in a transaction, we must keep FP enabled for the user process with the current transactional state in the FP registers, until we detect that it is no longer in a transaction. Similarly for VMX; once enabled it must stay enabled until the process is no longer transactional. In order to keep this rule, we add a new thread_info flag which we test when returning from the kernel to userspace, called TIF_RESTORE_TM. This flag indicates that there is FP/VMX/VSX state to be restored before entering userspace, and when it is set the .tm_orig_msr field in the thread_struct indicates what state needs to be restored. The restoration is done by restore_tm_state(). The TIF_RESTORE_TM bit is set by new giveup_fpu/altivec_maybe_transactional helpers, which are called from enable_kernel_fp/altivec, giveup_vsx, and flush_fp/altivec_to_thread instead of giveup_fpu/altivec. The other thing to be done is to get the transactional FP/VMX/VSX state from .fp_state/.vr_state when doing reclaim, if that state has been saved there by giveup_fpu/altivec_maybe_transactional. Having done this, we set the FP/VMX bit in the thread's MSR after reclaim to indicate that that part of the state is now valid (having been reclaimed from the processor's checkpointed state). Finally, in the signal handling code, we move the clearing of the transactional state bits in the thread's MSR a bit earlier, before calling flush_fp_to_thread(), so that we don't unnecessarily set the TIF_RESTORE_TM bit. This is the test program: /* Michael Neuling 4/12/2013 * * See if the altivec state is leaked out of an aborted transaction due to * kernel vmx copy loops. * * gcc -m64 htm_vmxcopy.c -o htm_vmxcopy * */ /* We don't use all of these, but for reference: */ int main(int argc, char *argv[]) { long double vecin = 1.3; long double vecout; unsigned long pgsize = getpagesize(); int i; int fd; int size = pgsize*16; char tmpfile[] = "/tmp/page_faultXXXXXX"; char buf[pgsize]; char *a; uint64_t aborted = 0; fd = mkstemp(tmpfile); assert(fd >= 0); memset(buf, 0, pgsize); for (i = 0; i < size; i += pgsize) assert(write(fd, buf, pgsize) == pgsize); unlink(tmpfile); a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); assert(a != MAP_FAILED); asm __volatile__( "lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value TBEGIN "beq 3f ;" TSUSPEND "xxlxor 40,40,40 ; " // set 40 to 0 "std 5, 0(%[map]) ;" // cause kernel vmx copy page TABORT TRESUME TEND "li %[res], 0 ;" "b 5f ;" "3: ;" // Abort handler "li %[res], 1 ;" "5: ;" "stxvd2x 40,0,%[vecoutptr] ; " : [res]"=r"(aborted) : [vecinptr]"r"(&vecin), [vecoutptr]"r"(&vecout), [map]"r"(a) : "memory", "r0", "r3", "r4", "r5", "r6", "r7"); if (aborted && (vecin != vecout)){ printf("FAILED: vector state leaked on abort %f != %f\n", (double)vecin, (double)vecout); exit(1); } munmap(a, size); close(fd); printf("PASSED!\n"); return 0; } Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/tm: Fix userspace stack corruption on signal delivery for active ↵Michael Neuling
transactions When in an active transaction that takes a signal, we need to be careful with the stack. It's possible that the stack has moved back up after the tbegin. The obvious case here is when the tbegin is called inside a function that returns before a tend. In this case, the stack is part of the checkpointed transactional memory state. If we write over this non transactionally or in suspend, we are in trouble because if we get a tm abort, the program counter and stack pointer will be back at the tbegin but our in memory stack won't be valid anymore. To avoid this, when taking a signal in an active transaction, we need to use the stack pointer from the checkpointed state, rather than the speculated state. This ensures that the signal context (written tm suspended) will be written below the stack required for the rollback. The transaction is aborted becuase of the treclaim, so any memory written between the tbegin and the signal will be rolled back anyway. For signals taken in non-TM or suspended mode, we use the normal/non-checkpointed stack pointer. Tested with 64 and 32 bit signals Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Set show_unhandled_signals to 1 by defaultBenjamin Herrenschmidt
Just like other architectures Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Exit user context on notify resumeLi Zhong
This patch allows RCU usage in do_notify_resume, e.g. signal handling. It corresponds to [PATCH] x86: Exit RCU extended QS on notify resume commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-02-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "This is the first pile; another one will come a bit later and will contain SYSCALL_DEFINE-related patches. - a bunch of signal-related syscalls (both native and compat) unified. - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE (fixing several potential problems with missing argument validation, while we are at it) - a lot of now-pointless wrappers killed - a couple of architectures (cris and hexagon) forgot to save altstack settings into sigframe, even though they used the (uninitialized) values in sigreturn; fixed. - microblaze fixes for delivery of multiple signals arriving at once - saner set of helpers for signal delivery introduced, several architectures switched to using those." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits) x86: convert to ksignal sparc: convert to ksignal arm: switch to struct ksignal * passing alpha: pass k_sigaction and siginfo_t using ksignal pointer burying unused conditionals make do_sigaltstack() static arm64: switch to generic old sigaction() (compat-only) arm64: switch to generic compat rt_sigaction() arm64: switch compat to generic old sigsuspend arm64: switch to generic compat rt_sigqueueinfo() arm64: switch to generic compat rt_sigpending() arm64: switch to generic compat rt_sigprocmask() arm64: switch to generic sigaltstack sparc: switch to generic old sigsuspend sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE sparc: kill sign-extending wrappers for native syscalls kill sparc32_open() sparc: switch to use of generic old sigaction sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE mips: switch to generic sys_fork() and sys_clone() ...
2013-02-03powerpc: switch to generic sigaltstackAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-01-16powerpc: Rename set_break to avoid naming conflictMichael Neuling
With allmodconfig we are getting: drivers/tty/synclink_gt.c:160:12: error: conflicting types for 'set_break' arch/powerpc/include/asm/debug.h:49:5: note: previous declaration of 'set_break' was here drivers/tty/synclinkmp.c:526:12: error: conflicting types for 'set_break' arch/powerpc/include/asm/debug.h:49:5: note: previous declaration of 'set_break' was here This renames set_break to set_breakpoint to avoid this naming conflict Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-01-10powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registersMichael Neuling
This is a rewrite so that we don't assume we are using the DABR throughout the code. We now use the arch_hw_breakpoint to store the breakpoint in a generic manner in the thread_struct, rather than storing the raw DABR value. The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from userspace. We keep this functionality, so that future changes (like the POWER8 DAWR), will still fake the DABR to userspace. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-03uprobes/powerpc: Don't clear TIF_UPROBE in do_notify_resume()Oleg Nesterov
Cleanup. No need to clear TIF_UPROBE, uprobe_notify_resume() does this. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-10powerpc: Rework set_dabr so it can take a DABRX value as wellMichael Neuling
Rework set_dabr to take a DABRX value as well. Both the pseries and PS3 hypervisors do some checks on the DABRX values that are passed in the hcall. This patch stops bogus values from being passed to hypervisor. Also, in the case where we are clearing the breakpoint, where DABR and DABRX are zero, we modify the DABRX value to make it valid so that the hcall won't fail. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05powerpc: Uprobes port to powerpcAnanth N Mavinakayanahalli
This is the port of uprobes to powerpc. Usage is similar to x86. [root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc Added new event: probe_libc:malloc (on 0xb4860) You can now use it in all perf tools, such as: perf record -e probe_libc:malloc -aR sleep 1 [root@xxxx ~]# ./bin/perf record -e probe_libc:malloc -aR sleep 20 [ perf record: Woken up 22 times to write data ] [ perf record: Captured and wrote 5.843 MB perf.data (~255302 samples) ] [root@xxxx ~]# ./bin/perf report --stdio ... 69.05% tar libc-2.12.so [.] malloc 28.57% rm libc-2.12.so [.] malloc 1.32% avahi-daemon libc-2.12.so [.] malloc 0.58% bash libc-2.12.so [.] malloc 0.28% sshd libc-2.12.so [.] malloc 0.08% irqbalance libc-2.12.so [.] malloc 0.05% bzip2 libc-2.12.so [.] malloc 0.04% sleep libc-2.12.so [.] malloc 0.03% multipathd libc-2.12.so [.] malloc 0.01% sendmail libc-2.12.so [.] malloc 0.01% automount libc-2.12.so [.] malloc The trap_nr addition patch is a prereq. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-06-01new helper: signal_delivered()Al Viro
Does block_sigmask() + tracehook_signal_handler(); called when sigframe has been successfully built. All architectures converted to it; block_sigmask() itself is gone now (merged into this one). I'm still not too happy with the signature, but that's a separate story (IMO we need a structure that would contain signal number + siginfo + k_sigaction, so that get_signal_to_deliver() would fill one, signal_delivered(), handle_signal() and probably setup...frame() - take one). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01powerpc: get rid of restore_sigmask()Al Viro
... it's just a call of set_current_blocked() now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from setAl Viro
Only 3 out of 63 do not. Renamed the current variant to __set_current_blocked(), added set_current_blocked() that will exclude unblockable signals, switched open-coded instances to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01pull clearing RESTORE_SIGMASK into block_sigmask()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01new helper: sigmask_to_save()Al Viro
replace boilerplate "should we use ->saved_sigmask or ->blocked?" with calls of obvious inlined helper... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01new helper: restore_saved_sigmask()Al Viro
first fruits of ..._restore_sigmask() helpers: now we can take boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK and restore the blocked mask from ->saved_mask" into a common helper. Open-coded instances switched... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23move key_repace_session_keyring() into tracehook_notify_resume()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-28Disintegrate asm/system.h for PowerPCDavid Howells
Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org
2012-03-07powerpc: Use set_current_blocked() and block_sigmask()Matt Fleming
As described in e6fa16ab ("signal: sigprocmask() should do retarget_shared_pending()") the modification of current->blocked is incorrect as we need to check whether the signal we're about to block is pending in the shared queue. Also, use the new helper function introduced in commit 5e6292c0f28f ("signal: add block_sigmask() for adding sigmask to current->blocked") which centralises the code for updating current->blocked after successfully delivering a signal and reduces the amount of duplicate code across architectures. In the past some architectures got this code wrong, so using this helper function should stop that from happening again. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22powerpc: Fix various issues with return to userspaceBenjamin Herrenschmidt
We have a few problems when returning to userspace. This is a quick set of fixes for 3.3, I'll look into a more comprehensive rework for 3.4. This fixes: - We kept interrupts soft-disabled when schedule'ing or calling do_signal when returning to userspace as a result of a hardware interrupt. - Rename do_signal to do_notify_resume like all other archs (and do_signal_pending back to do_signal, which it was before Roland changed it). - Add the missing call to key_replace_session_keyring() to do_notify_resume(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---
2010-09-22powerpc: fix double syscall restartsAl Viro
Make sigreturn zero regs->trap, make do_signal() do the same on all paths. As it is, signal interrupting e.g. read() from fd 512 (== ERESTARTSYS) with another signal getting unblocked when the first handler finishes will lead to restart one insn earlier than it ought to. Same for multiple signals with in-kernel handlers interrupting that sucker at the same time. Same for multiple signals of any kind interrupting that sucker on 64bit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-22powerpc, hw_breakpoint: Enable hw-breakpoints while handling intervening signalsK.Prasad
A signal delivered between a hw_breakpoint_handler() and the single_step_dabr_instruction() will not have the breakpoint active while the signal handler is running -- the signal delivery will set up a new MSR value which will not have MSR_SE set, so we won't get the signal step interrupt until and unless the signal handler returns (which it may never do). To fix this, we restore the breakpoint when delivering a signal -- we clear the MSR_SE bit and set the DABR again. If the signal handler returns, the DABR interrupt will occur again when the instruction that we were originally trying to single-step gets re-executed. [Paul Mackerras <paulus@samba.org> pointed out the need to do this.] Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-02-17powerpc/booke: Add support for advanced debug registersDave Kleikamp
powerpc/booke: Add support for advanced debug registers From: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Based on patches originally written by Torez Smith. This patch defines context switch and trap related functionality for BookE specific Debug Registers. It adds support to ptrace() for setting and getting BookE related Debug Registers Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Torez Smith <lnxtorez@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <dwg@au1.ibm.com> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Sergio Durigan Junior <sergiodj@br.ibm.com> Cc: Thiago Jung Bauermann <bauerman@br.ibm.com> Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17powerpc/booke: Introduce new CONFIG options for advanced debug registersDave Kleikamp
powerpc/booke: Introduce new CONFIG options for advanced debug registers From: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Introduce new config options to simplify the ifdefs pertaining to the advanced debug registers for booke and 40x processors: CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported Beginning conservatively, since I only have the facilities to test 440 hardware. I believe all 40x and booke platforms support at least 2 IAC and 2 DAC registers. For 440, 4 IAC and 2 DVC registers are enabled, as well as the DAC ranges. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Acked-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-03-27powerpc: Sanitize stack pointer in signal handling codeJosh Boyer
On powerpc64 machines running 32-bit userspace, we can get garbage bits in the stack pointer passed into the kernel. Most places handle this correctly, but the signal handling code uses the passed value directly for allocating signal stack frames. This fixes the issue by introducing a get_clean_sp function that returns a sanitized stack pointer. For 32-bit tasks on a 64-bit kernel, the stack pointer is masked correctly. In all other cases, the stack pointer is simply returned. Additionally, we pass an 'is_32' parameter to get_sigframe now in order to get the properly sanitized stack. The callers are know to be 32 or 64-bit statically. Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-28powerpc: Add TIF_NOTIFY_RESUME support for tracehookRoland McGrath
This adds TIF_NOTIFY_RESUME support for powerpc. When set, we call tracehook_notify_resume() on the way to user mode. This overloads do_signal() to do the work, but changes its arguments to it has the TIF_* bits handy in a register and drops the useless first argument that was always zero. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>