summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel/exceptions-64s.S
AgeCommit message (Collapse)Author
2021-04-18Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge some powerpc KVM patches we are keeping in a topic branch just in case anyone else needs to merge them.
2021-04-12powerpc/64s: remove KVM SKIP test from instruction breakpoint handlerNicholas Piggin
The code being executed in KVM_GUEST_MODE_SKIP is hypervisor code with MSR[IR]=0, so the faults of concern are the d-side ones caused by access to guest context by the hypervisor. Instruction breakpoint interrupts are not a concern here. It's unlikely any good would come of causing breaks in this code, but skipping the instruction that caused it won't help matters (e.g., skip the mtmsr that sets MSR[DR]=0 or clears KVM_GUEST_MODE_SKIP). [Paul notes: "the 0x1300 interrupt was dropped from the architecture a long time ago and is not generated by P7, P8, P9 or P10." So add a comment about this in the handler code while we're here. ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-11-npiggin@gmail.com
2021-04-12powerpc/64s: Remove KVM handler support from CBE_RAS interruptsNicholas Piggin
Cell does not support KVM. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-10-npiggin@gmail.com
2021-04-08powerpc/64s: power4 nap fixup in CNicholas Piggin
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210406025508.821718-1-npiggin@gmail.com
2021-03-10powerpc/64s/exception: Clean up a missed SRR specifierDaniel Axtens
Nick's patch cleaning up the SRR specifiers in exception-64s.S missed a single instance of EXC_HV_OR_STD. Clean that up. Caught by clang's integrated assembler. Fixes: 3f7fbd97d07d ("powerpc/64s/exception: Clean up SRR specifiers") Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210225031006.1204774-2-dja@axtens.net
2021-02-22Merge tag 'powerpc-5.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. - Conversion of the 32-bit syscall handling into C. - A series from Nick to streamline our TLB flushing when using the Radix MMU. - Switch to using queued spinlocks by default for 64-bit server CPUs. - A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. - Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun. * tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits) powerpc/perf: Adds support for programming of Thresholding in P10 powerpc/pci: Remove unimplemented prototypes powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user() powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size() powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user() powerpc/64: Fix stack trace not displaying final frame powerpc/time: Remove get_tbl() powerpc/time: Avoid using get_tbl() spi: mpc52xx: Avoid using get_tbl() powerpc/syscall: Avoid storing 'current' in another pointer powerpc/32: Handle bookE debugging in C in syscall entry/exit powerpc/syscall: Do not check unsupported scv vector on PPC32 powerpc/32: Remove the counter in global_dbcr0 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: Change condition to check MSR_RI powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Use is_compat_task() powerpc/syscall: Make interrupt.c buildable on PPC32 ...
2021-02-11powerpc/64s: Remove EXSLB interrupt save areaNicholas Piggin
SLB faults should not be taken while the PACA save areas are live, all memory accesses should be fetches from the kernel text, and access to PACA and the current stack, before C code is called or any other accesses are made. All of these have pinned SLBs so will not take a SLB fault. Therefore EXSLB is not be required. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208063406.331655-1-npiggin@gmail.com
2021-02-11powerpc/64s: syscall real mode entry use mtmsrd rather than rfidNicholas Piggin
Have the real mode system call entry handler branch to the kernel 0xc000... address and then use mtmsrd to enable the MMU, rather than use SRRs and rfid. Commit 8729c26e675c ("powerpc/64s/exception: Move real to virt switch into the common handler") implemented this style of real mode entry for other interrupt handlers, so this brings system calls into line with them, which is the main motivcation for the change. This tends to be slightly faster due to avoiding the mtsprs, and it also does not clobber the SRR registers, which becomes important in a subsequent change. The real mode entry points don't tend to be too important for performance these days, but it is possible for a hypervisor to run guests in AIL=0 mode for certian reasons. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208063326.331502-1-npiggin@gmail.com
2021-02-09powerpc/64s: Handle program checks in wrong endian during early bootMichael Ellerman
There's a short window during boot where although the kernel is running little endian, any exceptions will cause the CPU to switch back to big endian. This situation persists until we call configure_exceptions(), which calls either the hypervisor or OPAL to configure the CPU so that exceptions will be taken in little endian (via HID0[HILE]). We don't intend to take exceptions during early boot, but one way we sometimes do is via a WARN/BUG etc. Those all boil down to a trap instruction, which will cause a program check exception. The first instruction of the program check handler is an mtsprg, which when executed in the wrong endian is an lhzu with a ~3GB displacement from r3. The content of r3 is random, so that becomes a load from some random location, and depending on the system (installed RAM etc.) can easily lead to a checkstop, or an infinitely recursive page fault. That prevents whatever the WARN/BUG was complaining about being printed to the console, and the user just sees a dead system. We can fix it by having a trampoline at the beginning of the program check handler that detects we are in the wrong endian, and flips us back to the correct endian. We can't flip MSR[LE] using mtmsr (alas), so we have to use rfid. That requires backing up SRR0/1 as well as a GPR. To do that we use SPRG0/2/3 (SPRG1 is already used for the paca). SPRG3 is user readable, but this trampoline is only active very early in boot, and SPRG3 will be reinitialised in vdso_getcpu_init() before userspace starts. With this trampoline in place we can survive a WARN early in boot and print a stack trace, which is eventually printed to the console once the console is up, eg: [83565.758545] kexec_core: Starting new kernel [ 0.000000] ------------[ cut here ]------------ [ 0.000000] static_key_enable_cpuslocked(): static key '0xc000000000ea6160' used before call to jump_label_init() [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xfc/0x120 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-gcc-8.2.0-dirty #618 [ 0.000000] NIP: c0000000002fd46c LR: c0000000002fd468 CTR: c000000000170660 [ 0.000000] REGS: c000000001227940 TRAP: 0700 Not tainted (5.10.0-gcc-8.2.0-dirty) [ 0.000000] MSR: 9000000002823003 <SF,HV,VEC,VSX,FP,ME,RI,LE> CR: 24882422 XER: 20040000 [ 0.000000] CFAR: 0000000000000730 IRQMASK: 1 [ 0.000000] GPR00: c0000000002fd468 c000000001227bd0 c000000001228300 0000000000000065 [ 0.000000] GPR04: 0000000000000001 0000000000000065 c0000000010cf970 000000000000000d [ 0.000000] GPR08: 0000000000000000 0000000000000000 0000000000000000 c00000000122763f [ 0.000000] GPR12: 0000000000002000 c000000000f8a980 0000000000000000 0000000000000000 [ 0.000000] GPR16: 0000000000000000 0000000000000000 c000000000f88c8e c000000000f88c9a [ 0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] GPR24: 0000000000000000 c000000000dea3a8 0000000000000000 c000000000f35114 [ 0.000000] GPR28: 0000002800000000 c000000000f88c9a c000000000f88c8e c000000000ea6160 [ 0.000000] NIP [c0000000002fd46c] static_key_enable_cpuslocked+0xfc/0x120 [ 0.000000] LR [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120 [ 0.000000] Call Trace: [ 0.000000] [c000000001227bd0] [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120 (unreliable) [ 0.000000] [c000000001227c40] [c0000000002fd4c0] static_key_enable+0x30/0x50 [ 0.000000] [c000000001227c70] [c000000000f6629c] early_page_poison_param+0x58/0x9c [ 0.000000] [c000000001227cb0] [c000000000f351b8] do_early_param+0xa4/0x10c [ 0.000000] [c000000001227d30] [c00000000011e020] parse_args+0x270/0x5e0 [ 0.000000] [c000000001227e20] [c000000000f35864] parse_early_options+0x48/0x5c [ 0.000000] [c000000001227e40] [c000000000f358d0] parse_early_param+0x58/0x84 [ 0.000000] [c000000001227e70] [c000000000f3a368] early_init_devtree+0xc4/0x490 [ 0.000000] [c000000001227f10] [c000000000f3bca0] early_setup+0xc8/0x1c8 [ 0.000000] [c000000001227f90] [000000000000c320] 0xc320 [ 0.000000] Instruction dump: [ 0.000000] 4bfffddd 7c2004ac 39200001 913f0000 4bffffb8 7c651b78 3c82ffac 3c62ffc0 [ 0.000000] 38841b00 3863f310 4bdf03a5 60000000 <0fe00000> 4bffff38 60000000 60000000 [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x80 with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] dt-cpu-ftrs: setup for ISA 3000 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202130207.1303975-2-mpe@ellerman.id.au
2021-02-09powerpc/64s: runlatch interrupt handling in CNicholas Piggin
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-42-npiggin@gmail.com
2021-02-09powerpc/64s: move NMI soft-mask handling to CNicholas Piggin
Saving and restoring soft-mask state can now be done in C using the interrupt handler wrapper functions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-41-npiggin@gmail.com
2021-02-09powerpc/64: entry cpu time accounting in CNicholas Piggin
There is no need for this to be in asm, use the new interrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-39-npiggin@gmail.com
2021-02-09powerpc/64s: reconcile interrupts in CNicholas Piggin
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-37-npiggin@gmail.com
2021-02-09powerpc: add and use unknown_async_exceptionNicholas Piggin
This is currently the same as unknown_exception, but it will diverge after interrupt wrappers are added and code moved out of asm into the wrappers (e.g., async handlers will check FINISH_NAP). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-22-npiggin@gmail.com
2021-02-09powerpc/64s: move bad_page_fault handling to CNicholas Piggin
This simplifies code, and it is also useful when introducing interrupt handler wrappers when introducing wrapper functionality that doesn't cope with asm entry code calling into more than one handler function. 32-bit and 64e still have some such cases, which limits some ways they can use interrupt wrappers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-15-npiggin@gmail.com
2021-02-09powerpc/64s: add do_bad_page_fault_segv handlerNicholas Piggin
This function acts like an interrupt handler so it needs to follow the standard interrupt handler function signature which will be introduced in a future change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-13-npiggin@gmail.com
2021-02-09powerpc: remove arguments from fault handler functionsNicholas Piggin
Make mm fault handlers all just take the pt_regs * argument and load DAR/DSISR from that. Make those that return a value return long. This is done to make the function signatures match other handlers, which will help with a future patch to add wrappers. Explicit arguments could be added for performance but that would require more wrapper macro variants. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-7-npiggin@gmail.com
2021-02-09powerpc/64s: move the hash fault handling logic to CNicholas Piggin
The fault handling still has some complex logic particularly around hash table handling, in asm. Implement most of this in C. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-6-npiggin@gmail.com
2021-02-09powerpc/64s: move DABR match out of handle_page_faultNicholas Piggin
Similar to the 32/s change, move the test and call to the do_break handler to the DSI. Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-5-npiggin@gmail.com
2021-01-20powerpc/64s: fix scv entry fallback flush vs interruptNicholas Piggin
The L1D flush fallback functions are not recoverable vs interrupts, yet the scv entry flush runs with MSR[EE]=1. This can result in a timer (soft-NMI) or MCE or SRESET interrupt hitting here and overwriting the EXRFI save area, which ends up corrupting userspace registers for scv return. Fix this by disabling RI and EE for the scv entry fallback flush. Fixes: f79643787e0a0 ("powerpc/64s: flush L1D on kernel entry") Cc: stable@vger.kernel.org # 5.9+ which also have flush L1D patch backport Reported-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210111062408.287092-1-npiggin@gmail.com
2020-12-09powerpc/fault: Perform exception fixup in do_page_fault()Christophe Leroy
Exception fixup doesn't require the heady full regs saving, do it from do_page_fault() directly. For that, split bad_page_fault() in two parts. As bad_page_fault() can also be called from other places than handle_page_fault(), it will still perform exception fixup and fallback on __bad_page_fault(). handle_page_fault() directly calls __bad_page_fault() as the exception fixup will now be done by do_page_fault() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bd07d6fef9237614cd6d318d8f19faeeadaa816b.1607491748.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry ↵Aneesh Kumar K.V
and exit from kernel This prepare kernel to operate with a different value than userspace AMR/IAMR. For this, AMR/IAMR need to be saved and restored on entry and return from the kernel. With KUAP we modify kernel AMR when accessing user address from the kernel via copy_to/from_user interfaces. We don't need to modify IAMR value in similar fashion. If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering kernel from userspace. If not we can assume that AMR/IAMR is not modified from userspace. We need to save AMR if we have MMU_FTR_BOOK3S_KUAP feature enabled and we are interrupted within kernel. This is required so that if we get interrupted within copy_to/from_user we continue with the right AMR value. If we hae MMU_FTR_BOOK3S_KUEP enabled we need to restore IAMR on return to userspace beause kernel will be running with a different IAMR value. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-11-aneesh.kumar@linux.ibm.com
2020-11-23Merge tag 'powerpc-cve-2020-4788' into fixesMichael Ellerman
From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups.
2020-11-19powerpc/64s: flush L1D after user accessesNicholas Piggin
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D on kernel entryNicholas Piggin
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-18powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU contextNicholas Piggin
Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") removed KVM guest tests from interrupts that do not set HV=1, when PR-KVM is not configured. This is wrong for HV-KVM HPT guest MMIO emulation case which attempts to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with the guest MMU context loaded. This can cause host DSI, DSLB interrupts which must test for KVM guest. Restore this and add a comment. Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
2020-11-16powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=yNicholas Piggin
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs, which is basically the same as the regular handlers for those interrupts. The system reset FWNMI handler did not have a KVM guest test in it, although it probably should have because the guest can itself run guests. Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt handlers to new style code gen macros") convert the handler faithfully to avoid a KVM test with a "clever" trick to modify the IKVM_REAL setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y). This worked when the KVM test was generated in the interrupt entry handlers, but a later patch moved the KVM test to the common handler, and the common handler macro is expanded below the fwnmi entry. This prevents the KVM test from being generated even for the 0x100 entry point as well. The result is NMI IPIs in the host kernel when a guest is running will use gest registers. This goes particularly badly when an HPT guest is running and the MMU is set to guest mode. Remove this trickery and just generate the test always. Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
2020-08-07Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
2020-07-27powerpc/64s/hash: Fix hash_preload running with interrupts enabledNicholas Piggin
Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") removed the local_irq_disable from hash_preload, but it was required for more than just the page table walk: the hash pte busy bit is effectively a lock which may be taken in interrupt context, and the local update flag test must not be preempted before it's used. This solves apparent lockups with perf interrupting __hash_page_64K. If get_perf_callchain then also takes a hash fault on the same page while it is already locked, it will loop forever taking hash faults, which looks like this: cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70] pc: c000000000072dc8: hash_page_mm+0x8/0x800 lr: c00000000000c5a4: do_hash_page+0x24/0x38 sp: c0002ac1cc69ac70 msr: 8000000000081033 current = 0xc0002ac1cc602e00 paca = 0xc00000001de1f280 irqmask: 0x03 irq_happened: 0x01 pid = 20118, comm = pread2_processe Linux version 5.8.0-rc6-00345-g1fad14f18bc6 49e:mon> t [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable) --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac [link register ] c000000000335d10 copy_from_user_nofault+0xf0/0x150 [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable) [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0 [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410 [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40 [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360 [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0 [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790 [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0 [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0 [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750 [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510 [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60 [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0 --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160 [link register ] c0000000000846f0 __hash_page_64K+0x210/0x540 [c0002ac1cc69ba50] 0000000000000000 (unreliable) [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0 [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0 [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60 [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60 [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90 [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c --- Exception: 300 (Data Access) at 00007fff8c861fe8 SP (7ffff6b19660) is in userspace Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
2020-07-23Merge branch 'scv' support into nextMichael Ellerman
From Nick's cover letter: Linux powerpc new system call instruction and ABI System Call Vectored (scv) ABI ============================== The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR updates. The scv instruction has 128 levels (not enough to cover the Linux system call space). Assignment and advertisement ---------------------------- The proposal is to assign scv levels conservatively, and advertise them with HWCAP feature bits as we add support for more. Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will cause the kernel to log a "SCV facility unavilable" message, and deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. This change allocates the zero level ('scv 0'), advertised with PPC_FEATURE2_SCV, which will be used to provide normal Linux system calls (equivalent to 'sc'). Attempting to execute scv with other levels will cause a SIGILL to be delivered the same as before, but will not log a "SCV facility unavailable" message (because the processor facility is enabled). Calling convention ------------------ The proposal is for scv 0 to provide the standard Linux system call ABI with the following differences from sc convention[1]: - LR is to be volatile across scv calls. This is necessary because the scv instruction clobbers LR. From previous discussion, this should be possible to deal with in GCC clobbers and CFI. - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the kernel system call exit to avoid restoring the volatile cr registers (although we probably still would anyway to avoid information leaks). - Error handling: The consensus among kernel, glibc, and musl is to move to using negative return values in r3 rather than CR0[SO]=1 to indicate error, which matches most other architectures, and is closer to a function call. Notes ----- - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as submitted currently preserves them. This is to leave room for deciding which way to go with these. Some small benefit was found by preserving them[1] but I'm not convinced it's worth deviating from the C function call ABI just for this. Release code should follow the ABI. Previous discussions: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
2020-07-22powerpc/64s: system call support for scv/rfscv instructionsNicholas Piggin
Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that LR is not preserved, nor are volatile CR registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve LR. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix ppc64e build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
2020-07-22powerpc/64s/exception: treat NIA below __end_interrupts as soft-maskedNicholas Piggin
The scv instruction causes an interrupt which can enter the kernel with MSR[EE]=1, thus allowing interrupts to hit at any time. These must not be taken as normal interrupts, because they come from MSR[PR]=0 context, and yet the kernel stack is not yet set up and r13 is not set to the PACA). Treat this as a soft-masked interrupt regardless of the soft masked state. This does not affect behaviour yet, because currently all interrupts are taken with MSR[EE]=0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-2-npiggin@gmail.com
2020-07-08powerpc/64s/exception: Fix 0x1500 interrupt handler crashNicholas Piggin
A typo caused the interrupt handler to branch immediately to the common "unknown interrupt" handler and skip the special case test for denormal cause. This does not affect KVM softpatch handling (e.g., for POWER9 TM assist) because the KVM test was moved to common code by commit 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") just before this bug was introduced. Fixes: 3f7fbd97d07d ("powerpc/64s/exception: Clean up SRR specifiers") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> [mpe: Split selftest into a separate patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200708074942.1713396-1-npiggin@gmail.com
2020-06-16powerpc/64s: Fix KVM interrupt using wrong save areaNicholas Piggin
The CTR register reload in the KVM interrupt path used the wrong save area for SLB (and NMI) interrupts. Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200615061247.1310763-1-npiggin@gmail.com
2020-05-28powerpc/64s/kuap: Conditionally restore AMR in kuap_restore_amr asmNicholas Piggin
Similar to the C code change, make the AMR restore conditional on whether the register has changed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-7-npiggin@gmail.com
2020-05-26Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch from this cycle. It contains several important fixes we need in next for testing purposes, and also some that will conflict with upcoming changes.
2020-05-26powerpc/64s: Fix restore of NV GPRs after facility unavailable exceptionMichael Ellerman
Commit 702f09805222 ("powerpc/64s/exception: Remove lite interrupt return") changed the interrupt return path to not restore non-volatile registers by default, and explicitly restore them in paths where it is required. But it missed that the facility unavailable exception can sometimes modify user registers, ie. when it does emulation of move from DSCR. This is seen as a failure of the dscr_sysfs_thread_test: test: dscr_sysfs_thread_test [cpu 0] User DSCR should be 1 but is 0 failure: dscr_sysfs_thread_test So restore non-volatile GPRs after facility unavailable exceptions. Currently the hypervisor facility unavailable exception is also wired up to call facility_unavailable_exception(). In practice we should never take a hypervisor facility unavailable exception for the DSCR. On older bare metal systems we set HFSCR_DSCR unconditionally in __init_HFSCR, or on newer systems it should be enabled via the "data-stream-control-register" device tree CPU feature. Even if it's not, since commit f3c99f97a3cd ("KVM: PPC: Book3S HV: Don't access HFSCR, LPIDR or LPCR when running nested"), the KVM code has unconditionally set HFSCR_DSCR when running guests. So we should only get a hypervisor facility unavailable for the DSCR if skiboot has disabled the "data-stream-control-register" feature, and we are somehow in guest context but not via KVM. Given all that, it should be unnecessary to add a restore of non-volatile GPRs after the hypervisor facility exception, because we never expect to hit that path. But equally we may as well add the restore, because we never expect to hit that path, and if we ever did, at least we would correctly restore the registers to their post emulation state. In future we can split the non-HV and HV facility unavailable handling so that there is no emulation in the HV handler, and then remove the restore for the HV case. Fixes: 702f09805222 ("powerpc/64s/exception: Remove lite interrupt return") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200526061808.2472279-1-mpe@ellerman.id.au
2020-05-18powerpc/64s/exceptions: Machine check reconcile irq stateNicholas Piggin
pseries fwnmi machine check code pops the soft-irq checks in rtas_call (after the next patch to remove rtas_token from this call path). Rather than play whack a mole with these and forever having fragile code, it seems better to have the early machine check handler perform the same kind of reconcile as the other NMI interrupts. WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 493 Comm: a Tainted: G W NIP: c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000 REGS: c0000001fffd38b0 TRAP: 0700 Tainted: G W MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000 GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000042c40] unlock_rtas+0x30/0x90 Call Trace: [c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable) [c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280 [c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70 [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540 [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70 [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0 --- interrupt: 200 at 0x13f1307c8 LR = 0x7fff888b8528 Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-5-npiggin@gmail.com
2020-05-18powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to ↵Nicholas Piggin
RESULT A spare interrupt stack slot is needed to save irq state when reconciling NMIs (sreset and decrementer soft-nmi). _DAR is used for this, but we want to reconcile machine checks as well, which do use _DAR. Switch to using RESULT instead, as it's used by system calls. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-4-npiggin@gmail.com
2020-05-18powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable pathNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-3-npiggin@gmail.com
2020-05-18powerpc/64s/exception: Fix machine check no-loss idle wakeupNicholas Piggin
The architecture allows for machine check exceptions to cause idle wakeups which resume at the 0x200 address which has to return via the idle wakeup code, but the early machine check handler is run first. The case of a no state-loss sleep is broken because the early handler uses non-volatile register r1 , which is needed for the wakeup protocol, but it is not restored. Fix this by loading r1 from the MCE exception frame before returning to the idle wakeup code. Also update the comment which has become stale since the idle rewrite in C. This crash was found and fix confirmed with a machine check injection test in qemu powernv model (which is not upstream in qemu yet). Fixes: 10d91611f426d ("powerpc/64s: Reimplement book3s idle code in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-2-npiggin@gmail.com
2020-05-04powerpc/64s/kuap: Restore AMR in system reset exceptionNicholas Piggin
The system reset interrupt handler locks AMR and exits with EXCEPTION_RESTORE_REGS without restoring AMR. Similarly to the soft-NMI handler, it needs to restore. Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-5-npiggin@gmail.com
2020-04-03powerpc/64s: Fix doorbell wakeup msgclr optimisationNicholas Piggin
Commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C") broke the doorbell wakeup optimisation introduced by commit a9af97aa0a12 ("powerpc/64s: msgclr when handling doorbell exceptions from system reset"). This patch restores the msgclr, in C code. It's now done in the system reset wakeup path rather than doorbell interrupt replay where it used to be, because it is always the right thing to do in the wakeup case, but it may be rarely of use in other interrupt replay situations in which case it's wasted work - we would have to run measurements to see if that was a worthwhile optimisation, and I suspect it would not be. The results are similar to those in the original commit, test on POWER8 of context_switch selftests benchmark with polling idle disabled (e.g., always nap, giving cross-CPU IPIs) gives the following results: broken patched Different threads, same core: 317k/s 375k/s +18.7% Different cores: 280k/s 282k/s +1.0% Fixes: 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200402121212.1118218-1-npiggin@gmail.com
2020-04-01powerpc/64s/exception: Remove lite interrupt returnNicholas Piggin
Regular interrupt return restores NVGPRS whereas lite returns do not. This is clumsy: most interrupts can return without restoring NVGPRS in most of the time, but there are special cases that require it (when registers have been modified by the kernel). So change interrupt return to not restore NVGPRS, and have interrupt handlers restore them explicitly in the cases that requires it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-30-npiggin@gmail.com
2020-04-01powerpc/64s: Implement interrupt exit logic in CNicholas Piggin
Implement the bulk of interrupt return logic in C. The asm return code must handle a few cases: restoring full GPRs, and emulating stack store. The stack store emulation is significantly simplfied, rather than creating a new return frame and switching to that before performing the store, it uses the PACA to keep a scratch register around to perform the store. The asm return code is moved into 64e for now. The new logic has made allowance for 64e, but I don't have a full environment that works well to test it, and even booting in emulated qemu is not great for stress testing. 64e shouldn't be too far off working with this, given a bit more testing and auditing of the logic. This is slightly faster on a POWER9 (page fault speed increases about 1.1%), probably due to reduced mtmsrd. mpe: Includes fixes from Nick for _TIF_EMULATE_STACK_STORE handling (including the fast_interrupt_return path), to remove trace_hardirqs_on(), and fixes the interrupt-return part of the MSR_VSX restore bug caught by tm-unavailable selftest. mpe: Incorporate fix from Nick: The return-to-kernel path has to replay any soft-pending interrupts if it is returning to a context that had interrupts soft-enabled. It has to do this carefully and avoid plain enabling interrupts if this is an irq context, which can cause multiple nesting of interrupts on the stack, and other unexpected issues. The code which avoided this case got the soft-mask state wrong, and marked interrupts as enabled before going around again to retry. This seems to be mostly harmless except when PREEMPT=y, this calls preempt_schedule_irq with irqs apparently enabled and runs into a BUG in kernel/sched/core.c Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-29-npiggin@gmail.com
2020-04-01powerpc/64: Implement soft interrupt replay in CNicholas Piggin
When local_irq_enable() finds a pending soft-masked interrupt, it "replays" it by setting up registers like the initial interrupt entry, then calls into the low level handler to set up an interrupt stack frame and process the interrupt. This is not necessary, and uses more stack than needed. The high level interrupt handler can be called directly from C, with just pt_regs set up on stack. This should be faster and use less stack. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-28-npiggin@gmail.com
2020-04-01powerpc/64s/exception: Soft NMI interrupt should not use ret_from_exceptNicholas Piggin
The soft NMI handler does not reconcile interrupt state, so it should not return via the normal ret_from_except path. Return like other NMIs, using the EXCEPTION_RESTORE_REGS macro. This becomes important when the scv interrupt is implemented, which must handle soft-masked interrupts that have r13 set to something other than the PACA -- returning to kernel in this case must restore r13. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-23-npiggin@gmail.com
2020-04-01powerpc/64s/exception: Reconcile interrupts in system_resetNicholas Piggin
This adds IRQ_HARD_DIS to irq_happened. Although it doesn't seem to matter much because we're not allowed to enable irqs in an NMI handler, the soft-irq debugging code is becoming more strict about ensuring IRQ_HARD_DIS is in sync with MSR[EE], this may help avoid asserts or other issues. Add a comment explaining why MCE does not have this. Early machine check is generally much smaller and more contained code which will explode if you look at it wrong anyway as it runs in real mode, though there's an argument that we should do similar reconciling for the MCE as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-22-npiggin@gmail.com
2020-04-01powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supportedNicholas Piggin
Apart from SRESET, MCE, and syscall (hcall variant), the SRR type interrupts are not escalated to hypervisor mode, so are delivered to the OS. When running PR KVM, the OS is the hypervisor, and the guest runs with MSR[PR]=1 (ie. usermode), so these interrupts must test if a guest was running when interrupted. These tests are required at the real-mode entry points because the PR KVM host runs with LPCR[AIL]=0. In HV KVM and nested HV KVM, the guest always receives these interrupts, so there is no need for the host to make this test. So remove the tests if PR KVM is not configured. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-21-npiggin@gmail.com
2020-04-01powerpc/64s/exception: Add more comments for interrupt handlersNicholas Piggin
A few of the non-standard handlers are left uncommented. Some more description could be added to some. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-20-npiggin@gmail.com