summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2017-10-05powerpc/jprobes: Validate break handler invocation as being due to a ↵Naveen N. Rao
jprobe_return() Fix a circa 2005 FIXME by implementing a check to ensure that we actually got into the jprobe break handler() due to the trap in jprobe_return(). Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-05powerpc/jprobes: Disable preemption when triggered through ftraceNaveen N. Rao
KPROBES_SANITY_TEST throws the below splat when CONFIG_PREEMPT is enabled: Kprobe smoke test: started DEBUG_LOCKS_WARN_ON(val > preempt_count()) ------------[ cut here ]------------ WARNING: CPU: 19 PID: 1 at kernel/sched/core.c:3094 preempt_count_sub+0xcc/0x140 Modules linked in: CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc7-nnr+ #97 task: c0000000fea80000 task.stack: c0000000feb00000 NIP: c00000000011d3dc LR: c00000000011d3d8 CTR: c000000000a090d0 REGS: c0000000feb03400 TRAP: 0700 Not tainted (4.13.0-rc7-nnr+) MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28000282 XER: 00000000 CFAR: c00000000015aa18 SOFTE: 0 <snip> NIP preempt_count_sub+0xcc/0x140 LR preempt_count_sub+0xc8/0x140 Call Trace: preempt_count_sub+0xc8/0x140 (unreliable) kprobe_handler+0x228/0x4b0 program_check_exception+0x58/0x3b0 program_check_common+0x16c/0x170 --- interrupt: 0 at kprobe_target+0x8/0x20 LR = init_test_probes+0x248/0x7d0 kp+0x0/0x80 (unreliable) livepatch_handler+0x38/0x74 init_kprobes+0x1d8/0x208 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x298/0x374 kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0x70 Instruction dump: 419effdc 3d22001b 39299240 81290000 2f890000 409effc8 3c82ffcb 3c62ffcb 3884bc68 3863bc18 4803d5fd 60000000 <0fe00000> 4bffffa8 60000000 60000000 ---[ end trace 432dd46b4ce3d29f ]--- Kprobe smoke test: passed successfully The issue is that we aren't disabling preemption in kprobe_ftrace_handler(). Disable it. Fixes: ead514d5fb30a0 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Trim oops a little for formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/kprobes: Fix warnings from __this_cpu_read() on preempt kernelsNaveen N. Rao
Kamalesh pointed out that we are getting the below call traces with livepatched functions when we enable CONFIG_PREEMPT: [ 495.470721] BUG: using __this_cpu_read() in preemptible [00000000] code: cat/8394 [ 495.471167] caller is is_current_kprobe_addr+0x30/0x90 [ 495.471171] CPU: 4 PID: 8394 Comm: cat Tainted: G K 4.13.0-rc7-nnr+ #95 [ 495.471173] Call Trace: [ 495.471178] [c00000008fd9b960] [c0000000009f039c] dump_stack+0xec/0x160 (unreliable) [ 495.471184] [c00000008fd9b9a0] [c00000000059169c] check_preemption_disabled+0x15c/0x170 [ 495.471187] [c00000008fd9ba30] [c000000000046460] is_current_kprobe_addr+0x30/0x90 [ 495.471191] [c00000008fd9ba60] [c00000000004e9a0] ftrace_call+0x1c/0xb8 [ 495.471195] [c00000008fd9bc30] [c000000000376fd8] seq_read+0x238/0x5c0 [ 495.471199] [c00000008fd9bcd0] [c0000000003cfd78] proc_reg_read+0x88/0xd0 [ 495.471203] [c00000008fd9bd00] [c00000000033e5d4] __vfs_read+0x44/0x1b0 [ 495.471206] [c00000008fd9bd90] [c0000000003402ec] vfs_read+0xbc/0x1b0 [ 495.471210] [c00000008fd9bde0] [c000000000342138] SyS_read+0x68/0x110 [ 495.471214] [c00000008fd9be30] [c00000000000bc6c] system_call+0x58/0x6c Commit c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for jprobes") introduced a helper is_current_kprobe_addr() to help determine if the current function has been livepatched or if it has a jprobe installed, both of which modify the NIP. This was subsequently renamed to __is_active_jprobe(). In the case of a jprobe, kprobe_ftrace_handler() disables pre-emption before calling into setjmp_pre_handler() which returns without disabling pre-emption. This is done to ensure that the jprobe handler won't disappear beneath us if the jprobe is unregistered between the setjmp_pre_handler() and the subsequent longjmp_break_handler() called from the jprobe handler. Due to this, we can use __this_cpu_read() in __is_active_jprobe() with the pre-emption check as we know that pre-emption will be disabled. However, if this function has been livepatched, we are still doing this check and when we do so, pre-emption won't necessarily be disabled. This results in the call trace shown above. Fix this by only invoking __is_active_jprobe() when pre-emption is disabled. And since we now guard this within a pre-emption check, we can instead use raw_cpu_read() to get the current_kprobe value skipping the check done by __this_cpu_read(). Fixes: c05b8c4474c030 ("powerpc/kprobes: Skip livepatch_handler() for jprobes") Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/kprobes: Clean up jprobe detection in livepatch handlerNaveen N. Rao
In commit c05b8c4474c03 ("powerpc/kprobes: Skip livepatch_handler() for jprobes"), we added a helper is_current_kprobe_addr() to help detect if the modified regs->nip was due to a jprobe or livepatch. Masami felt that the function name was not quite clear. To that end, this patch renames is_current_kprobe_addr() to __is_active_jprobe() and adds a comment to (hopefully) better clarify the purpose of this helper. The helper has also now been moved to kprobes-ftrace.c so that it is only available for KPROBES_ON_FTRACE. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/kprobes: Do not suppress instruction emulation if a single run failedNaveen N. Rao
Currently, we disable instruction emulation if emulate_step() fails for any reason. However, such failures could be transient and specific to a particular run. Instead, only disable instruction emulation if we have never been able to emulate this. If we had emulated this instruction successfully at least once, then we single step only this probe hit and continue to try emulating the instruction in subsequent probe hits. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/kprobes: Some cosmetic updates to try_to_emulate()Naveen N. Rao
1. This is only used in kprobes.c, so make it static. 2. Remove the un-necessary (ret == 0) comparison in the else clause. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/configs: Add Skiroot defconfigJoel Stanley
This configuration is used by the OpenPower firmware for it's Linux-as-bootloader implementation. Also known as the Petitboot kernel, this configuration broke in 4.12 (CPU_HOTPLUG=n), so add it to the upstream tree in order to get better coverage. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/mm: Call flush_tlb_kernel_range with interrupts enabledGuenter Roeck
flush_tlb_kernel_range() may call smp_call_function_many() which expects interrupts to be enabled. This results in a traceback. WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xcc/0x2fc CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc1-00009-g0666f56 #1 task: cf830000 task.stack: cf82e000 NIP: c00a93c8 LR: c00a9634 CTR: 00000001 REGS: cf82fde0 TRAP: 0700 Not tainted (4.14.0-rc1-00009-g0666f56) MSR: 00021000 <CE,ME> CR: 24000082 XER: 00000000 GPR00: c00a9634 cf82fe90 cf830000 c050ad3c c0015a54 00000000 00000001 00000001 GPR08: 00000001 00000000 00000000 cf82e000 24000084 00000000 c0003150 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000001 00000000 c0510000 GPR24: 00000000 c0015a54 00000000 c050ad3c c051823c c050ad3c 00000025 00000000 NIP [c00a93c8] smp_call_function_many+0xcc/0x2fc LR [c00a9634] smp_call_function+0x3c/0x50 Call Trace: [cf82fe90] [00000010] 0x10 (unreliable) [cf82fed0] [c00a9634] smp_call_function+0x3c/0x50 [cf82fee0] [c0015d2c] flush_tlb_kernel_range+0x20/0x38 [cf82fef0] [c001524c] mark_initmem_nx+0x154/0x16c [cf82ff20] [c001484c] free_initmem+0x20/0x4c [cf82ff30] [c000316c] kernel_init+0x1c/0x108 [cf82ff40] [c000f3a8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7c0803a6 7d808120 38210040 4e800020 3d20c052 812981a0 2f890000 40beffac 3d20c051 8929ac64 2f890000 40beff9c <0fe00000> 4bffff94 7fc3f378 7f64db78 Fixes: 3184cc4b6f6a ("powerpc/mm: Fix kernel RAM protection after freeing ...") Fixes: e611939fc8ec ("powerpc/mm: Ensure change_page_attr() doesn't ...") Cc: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/xive: Clear XIVE internal structures when a CPU is removedCédric Le Goater
Commit eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") introduced support for the XIVE exploitation mode of the P9 interrupt controller on the pseries platform. At that time, support for CPU removal was not complete on PowerVM and CPU hot unplug remained untested. It appears that some cleanups of the XIVE internal structures are required before releasing the CPU, without which the kernel crashes in a RTAS call doing the CPU isolation. These changes fix the crash by deconfiguring the IPI interrupt source and clearing the event queues of the CPU when it is removed. Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/xive: Fix IPI resetCédric Le Goater
When resetting an IPI, hw_ipi should also be set to zero. Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/watchdog: Make use of watchdog_nmi_probe()Thomas Gleixner
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu() on powerpc because it is called multiple times for the boot CPU. The first call is via: start_wd_on_cpu+0x80/0x2f0 watchdog_nmi_reconfigure+0x124/0x170 softlockup_reconfigure_threads+0x110/0x130 lockup_detector_init+0xbc/0xe0 kernel_init_freeable+0x18c/0x37c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0xbc And then again via the CPU hotplug registration: start_wd_on_cpu+0x80/0x2f0 cpuhp_invoke_callback+0x194/0x620 cpuhp_thread_fun+0x7c/0x1b0 smpboot_thread_fn+0x290/0x2a0 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xbc This can be avoided by setting up the cpu hotplug state with nocalls and move the initialization to the watchdog_nmi_probe() function. That initializes the hotplug callbacks without invoking the callback and the following core initialization function then configures the watchdog for the online CPUs (in this case CPU0) via softlockup_reconfigure_threads(). Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org
2017-10-04watchdog/core, powerpc: Lock cpus across reconfigurationThomas Gleixner
Instead of dropping the cpu hotplug lock after stopping NMI watchdog and threads and reaquiring for restart, the code and the protection rules become more obvious when holding cpu hotplug lock across the full reconfiguration. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
2017-10-04watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()Thomas Gleixner
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure() into two stages. One to stop the NMI and one to restart it after reconfiguration. That was done by adding a boolean 'run' argument to the code, which is functionally correct but not necessarily a piece of art. Replace it by two explicit functions: watchdog_nmi_stop() and watchdog_nmi_start(). Fixes: 6592ad2fcc8f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage") Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
2017-10-03rapidio: remove global irq spinlocks from the subsystemIoan Nicu
Locking of config and doorbell operations should be done only if the underlying hardware requires it. This patch removes the global spinlocks from the rapidio subsystem and moves them to the mport drivers (fsl_rio and tsi721), only to the necessary places. For example, local config space read and write operations (lcread/lcwrite) are atomic in all existing drivers, so there should be no need for locking, while the cread/cwrite operations which generate maintenance transactions need to be synchronized with a lock. Later, each driver could chose to use a per-port lock instead of a global one, or even more granular locking. Link: http://lkml.kernel.org/r/20170824113023.GD50104@nokia.com Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com> Signed-off-by: Frank Kunz <frank.kunz@nokia.com> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-04powerpc/lib/sstep: Fix fixed-point shift instructions that set CA32Sandipan Das
This fixes the emulated behaviour of existing fixed-point shift right algebraic instructions that are supposed to set both the CA and CA32 bits of XER when running on a system that is compliant with POWER ISA v3.0 independent of whether the system is executing in 32-bit mode or 64-bit mode. The following instructions are affected: * Shift Right Algebraic Word Immediate (srawi[.]) * Shift Right Algebraic Word (sraw[.]) * Shift Right Algebraic Doubleword Immediate (sradi[.]) * Shift Right Algebraic Doubleword (srad[.]) Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()") Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/lib/sstep: Fix fixed-point arithmetic instructions that set CA32Sandipan Das
There are existing fixed-point arithmetic instructions that always set the CA bit of XER to reflect the carry out of bit 0 in 64-bit mode and out of bit 32 in 32-bit mode. In ISA v3.0, these instructions also always set the CA32 bit of XER to reflect the carry out of bit 32. This fixes the emulated behaviour of such instructions when running on a system that is compliant with POWER ISA v3.0. The following instructions are affected: * Add Immediate Carrying (addic) * Add Immediate Carrying and Record (addic.) * Subtract From Immediate Carrying (subfic) * Add Carrying (addc[.]) * Subtract From Carrying (subfc[.]) * Add Extended (adde[.]) * Subtract From Extended (subfe[.]) * Add to Minus One Extended (addme[.]) * Subtract From Minus One Extended (subfme[.]) * Add to Zero Extended (addze[.]) * Subtract From Zero Extended (subfze[.]) Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()") Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/lib/sstep: Add XER bits introduced in POWER ISA v3.0Sandipan Das
This adds definitions for the OV32 and CA32 bits of XER that were introduced in POWER ISA v3.0. There are some existing instructions that currently set the OV and CA bits based on certain conditions. The emulation behaviour of all these instructions needs to be updated to set these new bits accordingly. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/powermac: Use setup_timer() helperAllen Pais
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/6xx: Use setup_timer() helperAllen Pais
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/oprofile: Use setup_timer() helperAllen Pais
Use setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/powernv: Use early_radix_enabled in POWER9 tlb flushNicholas Piggin
This code is used at boot and machine checks, so it should be using early_radix_enabled() (which is usable any time). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESETNicholas Piggin
This allows MSR[EE]=0 lockups to be detected on an OPAL (bare metal) system similarly to the hcall NMI IPI on pseries guests, when the platform/firmware supports it. This is an example of CPU10 spinning with interrupts hard disabled: Watchdog CPU:32 detected Hard LOCKUP other CPUS:10 Watchdog CPU:10 Hard LOCKUP CPU: 10 PID: 4410 Comm: bash Not tainted 4.13.0-rc7-00074-ge89ce1f89f62-dirty #34 task: c0000003a82b4400 task.stack: c0000003af55c000 NIP: c0000000000a7b38 LR: c000000000659044 CTR: c0000000000a7b00 REGS: c00000000fd23d80 TRAP: 0100 Not tainted (4.13.0-rc7-00074-ge89ce1f89f62-dirty) MSR: 90000000000c1033 <SF,HV,ME,IR,DR,RI,LE> CR: 28422222 XER: 20000000 CFAR: c0000000000a7b38 SOFTE: 0 GPR00: c000000000659044 c0000003af55fbb0 c000000001072a00 0000000000000078 GPR04: c0000003c81b5c80 c0000003c81cc7e8 9000000000009033 0000000000000000 GPR08: 0000000000000000 c0000000000a7b00 0000000000000001 9000000000001003 GPR12: c0000000000a7b00 c00000000fd83200 0000000010180df8 0000000010189e60 GPR16: 0000000010189ed8 0000000010151270 000000001018bd88 000000001018de78 GPR20: 00000000370a0668 0000000000000001 00000000101645e0 0000000010163c10 GPR24: 00007fffd14d6294 00007fffd14d6290 c000000000fba6f0 0000000000000004 GPR28: c000000000f351d8 0000000000000078 c000000000f4095c 0000000000000000 NIP [c0000000000a7b38] sysrq_handle_xmon+0x38/0x40 LR [c000000000659044] __handle_sysrq+0xe4/0x270 Call Trace: [c0000003af55fbd0] [c000000000659044] __handle_sysrq+0xe4/0x270 [c0000003af55fc70] [c000000000659810] write_sysrq_trigger+0x70/0xa0 [c0000003af55fca0] [c0000000003da650] proc_reg_write+0xb0/0x110 [c0000003af55fcf0] [c0000000003423bc] __vfs_write+0x6c/0x1b0 [c0000003af55fd90] [c000000000344398] vfs_write+0xd8/0x240 [c0000003af55fde0] [c00000000034632c] SyS_write+0x6c/0x110 [c0000003af55fe30] [c00000000000b220] system_call+0x58/0x6c Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use kernel types for opal_signal_system_reset()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/64s: Implement system reset idle wakeup reasonNicholas Piggin
It is possible to wake from idle due to a system reset exception, in which case the CPU takes a system reset interrupt to wake from idle, with system reset as the wakeup reason. The regular (not idle wakeup) system reset interrupt handler must be invoked in this case, otherwise the system reset interrupt is lost. Handle the system reset interrupt immediately after CPU state has been restored. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/xmon: Avoid tripping SMP hardlockup watchdogNicholas Piggin
The SMP hardlockup watchdog cross-checks other CPUs for lockups, which causes xmon headaches because it's assuming interrupts hard disabled means no watchdog troubles. Try to improve that by calling touch_nmi_watchdog() in obvious places where secondaries are spinning. Also annotate these spin loops with spin_begin/end calls. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/watchdog: Do not trigger SMP crash from touch_nmi_watchdogNicholas Piggin
In xmon, touch_nmi_watchdog() is not expected to be checking that other CPUs have not touched the watchdog, so the code will just call touch_nmi_watchdog() once before re-enabling hard interrupts. Just update our CPU's state, and ignore apparently stuck SMP threads. Arguably touch_nmi_watchdog should check for SMP lockups, and callers should be fixed, but that's not trivial for the input code of xmon. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/watchdog: Do not backtrace locked CPUs twice if allcpus backtrace is ↵Nicholas Piggin
enabled If sysctl_hardlockup_all_cpu_backtrace is enabled, there is no need to IPI stuck CPUs for backtrace before trigger_allbutself_cpu_backtrace(), which does the same thing again. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/watchdog: Do not panic from locked CPU's IPI handlerNicholas Piggin
The SMP watchdog will detect locked CPUs and IPI them to print a backtrace and registers. If panic on hard lockup is enabled, do not panic from this handler, because that can cause recursion into the IPI layer during the panic. The caller already panics in this case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-03powerpc: pseries: only store the device node basename in full_nameRob Herring
With dependencies on full_name containing the entire device node path removed, stop storing the full_name in nodes created by dlpar_configure_connector() and pSeries_reconfig_add_node(). Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org
2017-10-03KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()Sam Bobroff
In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the value of state->guest_server as "server". However, this value is not set by it's counterpart kvmppc_xive_set_xive(). When the guest uses this interface to migrate interrupts away from a CPU that is going offline, it sees all interrupts as belonging to CPU 0, so they are left assigned to (now) offline CPUs. This patch removes the guest_server field from the state, and returns act_server in it's place (that is, the CPU actually handling the interrupt, which may differ from the one requested). Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-03powerpc/4xx: Fix compile error with 64K pages on 40x, 44xChristian Lamparter
The mmu context on the 40x, 44x does not define pte_frag entry. This causes gcc abort the compilation due to: setup-common.c: In function ‘setup_arch’: setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’ This patch fixes the issue by removing the pte_frag initialization in setup-common.c. This is possible, because the compiler will do the initialization, since the mm_context is a sub struct of init_mm. init_mm is declared in mm_types.h as external linkage. According to C99 6.2.4.3: An object whose identifier is declared with external linkage [...] has static storage duration. C99 defines in 6.7.8.10 that: If an object that has static storage duration is not initialized explicitly, then: - if it has pointer type, it is initialized to a null pointer Fixes: b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit setup_arch()") Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-03powerpc: Fix action argument for cpufeatures-based TLB flushJeremy Kerr
Commit 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the cpufeatures code, specifying the number of sets to flush. However, these functions take an action argument, not a number of sets. This means we hit the BUG() in __flush_tlb_{206,300} when using cpufeatures-style configuration. This change passes TLB_INVAL_SCOPE_GLOBAL instead. Fixes: 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Mixed bugfixes. Perhaps the most interesting one is a latent bug that was finally triggered by PCID support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm/x86: Handle async PF in RCU read-side critical sections KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume KVM: VMX: use cmpxchg64 KVM: VMX: simplify and fix vmx_vcpu_pi_load KVM: VMX: avoid double list add with VT-d posted interrupts KVM: VMX: extract __pi_post_block KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exception KVM: nVMX: fix HOST_CR3/HOST_CR4 cache
2017-09-29powerpc: Fix workaround for spurious MCE on POWER9Michael Neuling
In the recent commit d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") I screwed up the bit number. It should be bit 25 (IBM bit 38). Fixes: d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-28cxl: Enable global TLBIs for cxl contextsFrederic Barrat
The PSL and nMMU need to see all TLB invalidations for the memory contexts used on the adapter. For the hash memory model, it is done by making all TLBIs global as soon as the cxl driver is in use. For radix, we need something similar, but we can refine and only convert to global the invalidations for contexts actually used by the device. The new mm_context_add_copro() API increments the 'active_cpus' count for the contexts attached to the cxl adapter. As soon as there's more than 1 active cpu, the TLBIs for the context become global. Active cpu count must be decremented when detaching to restore locality if possible and to avoid overflowing the counter. The hash memory model support is somewhat limited, as we can't decrement the active cpus count when mm_context_remove_copro() is called, because we can't flush the TLB for a mm on hash. So TLBIs remain global on hash. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Fixes: f24be42aab37 ("cxl: Add psl9 specific code") Tested-by: Alistair Popple <alistair@popple.id.au> [mpe: Fold in updated comment on the barrier from Fred] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-28powerpc/mm: Export flush_all_mm()Frederic Barrat
With the optimizations introduced by commit a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes"), flush_tlb_mm() no longer flushes the page walk cache (PWC) with radix. This patch introduces flush_all_mm(), which flushes everything, TLB and PWC, for a given mm. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> [mpe: Add a WARN_ON_ONCE() in the empty hash routines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-27powerpc/64s: Add workaround for P9 vector CI load issueMichael Neuling
POWER9 DD2.1 and earlier has an issue where some cache inhibited vector load will return bad data. The workaround is two part, one firmware/microcode part triggers HMI interrupts when hitting such loads, the other part is this patch which then emulates the instructions in Linux. The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and lxvh8x. When an instruction triggers the HMI, all threads in the core will be sent to the HMI handler, not just the one running the vector load. In general, these spurious HMIs are detected by the emulation code and we just return back to the running process. Unfortunately, if a spurious interrupt occurs on a vector load that's to normal memory we have no way to detect that it's spurious (unless we walk the page tables, which is very expensive). In this case we emulate the load but we need do so using a vector load itself to ensure 128bit atomicity is preserved. Some additional debugfs emulated instruction counters are added also. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26powerpc: Handle MCE on POWER9 with only DSISR bit 30 setMichael Neuling
On POWER9 DD2.1 and below, it's possible for a paste instruction to cause a Machine Check Exception (MCE) where only DSISR bit 30 (IBM 33) is set. This will result in the MCE handler seeing an unknown event, which triggers linux to crash. We change this by detecting unknown events caused by load/stores in the MCE handler and marking them as handled so that we no longer crash. An MCE that occurs like this is spurious, so we don't need to do anything in terms of servicing it. If there is something that needs to be serviced, the CPU will raise the MCE again with the correct DSISR so that it can be serviced properly. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com Acked-by: Balbir Singh <bsingharora@gmail.com> [mpe: Expand comment with details from change log, use normal bit #s] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26powerpc/powernv: Rework EEH initialization on powernvBenjamin Herrenschmidt
Remove the post_init callback which is only used by powernv, we can just call it explicitly from the powernv code. This partially kills the ability to "disable" eeh at runtime via debugfs as this was calling that same callback again, but this is both unused and broken in several ways. If we want to revive it, we need to create a dedicated enable/disable callback on the backend that does the right thing. Let the bulk of eeh initialize normally at core_initcall() like it does on pseries by removing the hack in eeh_init() that delays it. Instead we make sure our eeh->probe cleanly bails out of the PEs haven't been created yet and we force a re-probe where we used to call eeh_init() again. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-22KVM: PPC: Book3S HV: Check for updated HDSISR on P9 HDSI exceptionMichael Neuling
On POWER9 DD2.1 and below, sometimes on a Hypervisor Data Storage Interrupt (HDSI) the HDSISR is not be updated at all. To work around this we put a canary value into the HDSISR before returning to a guest and then check for this canary when we take a HDSI. If we find the canary on a HDSI, we know the hardware didn't update the HDSISR. In this case we return to the guest to retake the HDSI which should correctly update the HDSISR the second time HDSI entry. After talking to Paulus we've applied this workaround to all POWER9 CPUs. The workaround of returning to the guest shouldn't ever be triggered on well behaving CPU. The extra instructions should have negligible performance impact. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21powerpc/pseries: Fix parent_dn reference leak in add_dt_node()Tyrel Datwyler
A reference to the parent device node is held by add_dt_node() for the node to be added. If the call to dlpar_configure_connector() fails add_dt_node() returns ENOENT and that reference is not freed. Add a call to of_node_put(parent_dn) prior to bailing out after a failed dlpar_configure_connector() call. Fixes: 8d5ff320766f ("powerpc/pseries: Make dlpar_configure_connector parent node aware") Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-21powerpc/pseries: Fix "OF: ERROR: Bad of_node_put() on /cpus" during DLPARTyrel Datwyler
Commit 215ee763f8cb ("powerpc: pseries: remove dlpar_attach_node dependency on full path") reworked dlpar_attach_node() to no longer look up the parent node "/cpus", but instead to have the parent node passed by the caller in the function parameter list. As a result dlpar_attach_node() is no longer responsible for freeing the reference to the parent node. However, commit 215ee763f8cb failed to remove the of_node_put(parent) call in dlpar_attach_node(), or to take into account that the reference to the parent in the caller dlpar_cpu_add() needs to be held until after dlpar_attach_node() returns. As a result doing repeated cpu add/remove dlpar operations will eventually result in the following error: OF: ERROR: Bad of_node_put() on /cpus CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1 Call Trace: dump_stack+0x15c/0x1f8 (unreliable) of_node_release+0x1a4/0x1c0 kobject_put+0x1a8/0x310 kobject_del+0xbc/0xf0 __of_detach_node_sysfs+0x144/0x210 of_detach_node+0xf0/0x180 dlpar_detach_node+0xc4/0x120 dlpar_cpu_remove+0x280/0x560 dlpar_cpu_release+0xbc/0x1b0 arch_cpu_release+0x6c/0xb0 cpu_release_store+0xa0/0x100 dev_attr_store+0x68/0xa0 sysfs_kf_write+0xa8/0xf0 kernfs_fop_write+0x2cc/0x400 __vfs_write+0x5c/0x340 vfs_write+0x1a8/0x3d0 SyS_write+0xa8/0x1a0 system_call+0x58/0x6c Fix the issue by removing the of_node_put(parent) call from dlpar_attach_node(), and ensuring that the reference to the parent node is properly held and released by the caller dlpar_cpu_add(). Fixes: 215ee763f8cb ("powerpc: pseries: remove dlpar_attach_node dependency on full path") Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> [mpe: Add a comment in the code and frob the change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-21powerpc/eeh: Create PHB PEs after EEH is initializedBenjamin Herrenschmidt
Otherwise we end up not yet having computed the right diag data size on powernv where EEH initialization is delayed, thus causing memory corruption later on when calling OPAL. Fixes: 5cb1f8fdddb7 ("powerpc/powernv/pci: Dynamically allocate PHB diag data") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/kprobes: Update optprobes to use emulate_update_regs()Naveen N. Rao
Optprobes depended on an updated regs->nip from analyse_instr() to identify the location to branch back from the optprobes trampoline. However, since commit 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs"), analyse_instr() doesn't update the registers anymore. Due to this, we end up branching back from the optprobes trampoline to the same branch into the trampoline resulting in a loop. Fix this by calling out to emulate_update_regs() before using the nip. Additionally, explicitly compare the return value from analyse_instr() to 1, rather than just checking for !0 so as to guard against any future changes to analyse_instr() that may result in -1 being returned in more scenarios. Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20Merge branch 'next' of ↵Michael Ellerman
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes Merge one commit from Scott which I missed while away.
2017-09-20powerpc/powernv: Clear LPCR[PECE1] via stop-api only for deep state offlineGautham R. Shenoy
Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") clears the PECE1 bit of the LPCR via stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on an offlined CPU which is in a deep stop state. In the case where the stop-api support is found to be lacking, the commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails") disables deep states that lose hypervisor context. Thus in this case, the offlined CPU will be put to some shallow idle state. However, we currently unconditionally clear the PECE1 in LPCR via stop-api during CPU-Hotplug even when deep states are disabled due to stop-api failure. Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug *only* when the offlined CPU will be put to a deep state that loses hypervisor context. Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") Reported-by: Pavithra Prakash <pavirampu@linux.vnet.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/sstep: mullw should calculate a 64 bit signed resultAnton Blanchard
mullw should do a 32 bit signed multiply and create a 64 bit signed result. It currently truncates the result to 32 bits. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/sstep: Fix issues with mcrfAnton Blanchard
mcrf broke when we changed analyse_instr() to not modify the register state. The instruction writes to the CR, so we need to store the result in op->ccval, not op->val. Fixes: 3cdfcbfd32b9 ("powerpc: Change analyse_instr so it doesn't modify *regs") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/sstep: Fix issues with set_cr0()Anton Blanchard
set_cr0() broke when we changed analyse_instr() to not modify the register state. Instead of looking at regs->gpr[x] which has not been updated yet, we need to look at op->val. Fixes: 3cdfcbfd32b9 ("powerpc: Change analyse_instr so it doesn't modify *regs") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/tm: Flush TM only if CPU has TM featureGustavo Romero
Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") added code to access TM SPRs in flush_tmregs_to_thread(). However flush_tmregs_to_thread() does not check if TM feature is available on CPU before trying to access TM SPRs in order to copy live state to thread structures. flush_tmregs_to_thread() is indeed guarded by CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on a CPU without TM feature available, thus rendering the execution of TM instructions that are treated by the CPU as illegal instructions. The fix is just to add proper checking in flush_tmregs_to_thread() if CPU has the TM feature before accessing any TM-specific resource, returning immediately if TM is no available on the CPU. Adding that checking in flush_tmregs_to_thread() instead of in places where it is called, like in vsr_get() and vsr_set(), is better because avoids the same problem cropping up elsewhere. Cc: stable@vger.kernel.org # v4.13+ Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/sysrq: Fix oops whem ppmu is not registeredRavi Bangoria
Kernel crashes if power pmu is not registered and user tries to dump regs with 'echo p > /proc/sysrq-trigger'. Sample log: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000000d52f0 NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230 LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50 Call Trace: printk+0x38/0x4c (unreliable) __handle_sysrq+0xe4/0x270 write_sysrq_trigger+0x64/0x80 proc_reg_write+0x80/0xd0 __vfs_write+0x40/0x200 vfs_write+0xc8/0x240 SyS_write+0x60/0x110 system_call+0x58/0x6c Fixes: 5f6d0380c640 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>