summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2016-05-01powerpc/mm: Use a helper for finding pte bits mapping I/O areaAneesh Kumar K.V
Use a helper instead of open coding with constants. A later patch will drop the WIMG bits and use PowerISA 3.0 defines. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27ftrace: Match dot symbols when searching functions on ppc64Thiago Jung Bauermann
In the ppc64 big endian ABI, function symbols point to function descriptors. The symbols which point to the function entry points have a dot in front of the function name. Consequently, when the ftrace filter mechanism searches for the symbol corresponding to an entry point address, it gets the dot symbol. As a result, ftrace filter users have to be aware of this ABI detail on ppc64 and prepend a dot to the function name when setting the filter. The perf probe command insulates the user from this by ignoring the dot in front of the symbol name when matching function names to symbols, but the sysfs interface does not. This patch makes the ftrace filter mechanism do the same when searching symbols. Fixes the following failure in ftracetest's kprobe_ftrace.tc: .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument That failure is on this line of kprobe_ftrace.tc: echo _do_fork > set_ftrace_filter This is because there's no _do_fork entry in the functions list: # cat available_filter_functions | grep _do_fork ._do_fork This change introduces no regressions on the perf and ftracetest testsuite results. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27powerpc: Add support for userspace P9 copy pasteChris Smart
The copy paste facility introduced in POWER9 provides an optimised mechanism for a userspace application to copy a cacheline. This is provided by a pair of instructions, copy and paste, while a third, cp_abort (copy paste abort), provides a clean up of the state in case of a failure. The copy instruction will read a 128 byte cacheline and store it in an internal buffer. The subsequent paste instruction will store this internal buffer to memory and set a CR field if the paste succeeds. Since the state of the copy paste buffer is internal (and not architecturally visible), in the unlikely event of a context switch, the state cannot be stored and the paste should therefore fail. The cp_abort instruction exists to fail and clean up any such interrupted copy paste sequence and is to be called by the kernel as part of the context switch. Doing so prevents data from a preceding copy in one process leaking into the paste of another. This code enables use of the cp_abort instruction if a supported processor is detected. NOTE: this is for userspace only, not in kernel, and does not deal with KVM guests. Patch created with much assistance from Michael Neuling <mikey@neuling.org> Signed-off-by: Chris Smart <chris@distroguy.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27powerpc/eeh: fix misleading indentationAndrew Donnellan
Found by smatch. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-21powerpc/book3s64: Remove __end_handlers markerHari Bathini
The __end_handlers marker was intended to mark down upto code that gets called from exception prologs. But that hasn't kept pace with code changes. Case in point, slb_miss_realmode being called from exception prolog code but isn't below __end_handlers marker. So, __end_handlers marker is as good as a comment but could be misleading at times if it isn't in sync with the code, as is the case now. So, let us avoid this confusion by having a better comment and removing __end_handlers marker altogether. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-21powerpc/book3s64: Fix branching to OOL handlers in relocatable kernelHari Bathini
Some of the interrupt vectors on 64-bit POWER server processors are only 32 bytes long (8 instructions), which is not enough for the full first-level interrupt handler. For these we need to branch to an out-of-line (OOL) handler. But when we are running a relocatable kernel, interrupt vectors till __end_interrupts marker are copied down to real address 0x100. So, branching to labels (ie. OOL handlers) outside this section must be handled differently (see LOAD_HANDLER()), considering relocatable kernel, which would need at least 4 instructions. However, branching from interrupt vector means that we corrupt the CFAR (come-from address register) on POWER7 and later processors as mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions) that contains the part up to the point where the CFAR is saved in the PACA should be part of the short interrupt vectors before we branch out to OOL handlers. But as mentioned already, there are interrupt vectors on 64-bit POWER server processors that are only 32 bytes long (like vectors 0x4f00, 0x4f20, etc.), which cannot accomodate the above two cases at the same time owing to space constraint. Currently, in these interrupt vectors, we simply branch out to OOL handlers, without using LOAD_HANDLER(), which leaves us vulnerable when running a relocatable kernel (eg. kdump case). While this has been the case for sometime now and kdump is used widely, we were fortunate not to see any problems so far, for three reasons: 1. In almost all cases, production kernel (relocatable) is used for kdump as well, which would mean that crashed kernel's OOL handler would be at the same place where we end up branching to, from short interrupt vector of kdump kernel. 2. Also, OOL handler was unlikely the reason for crash in almost all the kdump scenarios, which meant we had a sane OOL handler from crashed kernel that we branched to. 3. On most 64-bit POWER server processors, page size is large enough that marking interrupt vector code as executable (see commit 429d2e83) leads to marking OOL handler code from crashed kernel, that sits right below interrupt vector code from kdump kernel, as executable as well. Let us fix this by moving the __end_interrupts marker down past OOL handlers to make sure that we also copy OOL handlers to real address 0x100 when running a relocatable kernel. This fix has been tested successfully in kdump scenario, on an LPAR with 4K page size by using different default/production kernel and kdump kernel. Also tested by manually corrupting the OOL handlers in the first kernel and then kdump'ing, and then causing the OOL handlers to fire - mpe. Fixes: c1fb6816fb1b ("powerpc: Add relocation on exception vector handlers") Cc: stable@vger.kernel.org Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-18Merge branch 'topic/livepatch' into nextMichael Ellerman
Merge the support for live patching on ppc64le using mprofile-kernel. This branch has also been merged into the livepatching tree for v4.7.
2016-04-14powerpc/livepatch: Add live patching support on ppc64leMichael Ellerman
Add the kconfig logic & assembly support for handling live patched functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn depends on the new -mprofile-kernel ftrace ABI, which is only supported currently on ppc64le. Live patching is handled by a special ftrace handler. This means it runs from ftrace_caller(). The live patch handler modifies the NIP so as to redirect the return from ftrace_caller() to the new patched function. However there is one particularly tricky case we need to handle. If a function A calls another function B, and it is known at link time that they share the same TOC, then A will not save or restore its TOC, and will call the local entry point of B. When we live patch B, we replace it with a new function C, which may not have the same TOC as A. At live patch time it's too late to modify A to do the TOC save/restore, so the live patching code must interpose itself between A and C, and do the TOC save/restore that A omitted. An additionaly complication is that the livepatch code can not create a stack frame in order to save the TOC. That is because if C takes > 8 arguments, or is varargs, A will have written the arguments for C in A's stack frame. To solve this, we introduce a "livepatch stack" which grows upward from the base of the regular stack, and is used to store the TOC & LR when calling a live patched function. When the patched function returns, we retrieve the real LR & TOC from the livepatch stack, restore them, and pop the livepatch "stack frame". Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Torsten Duwe <duwe@suse.de> Reviewed-by: Balbir Singh <bsingharora@gmail.com>
2016-04-14powerpc/livepatch: Add livepatch stack to struct thread_infoMichael Ellerman
In order to support live patching we need to maintain an alternate stack of TOC & LR values. We use the base of the stack for this, and store the "live patch stack pointer" in struct thread_info. Unlike the other fields of thread_info, we can not statically initialise that value, so it must be done at run time. This patch just adds the code to support that, it is not enabled until the next patch which actually adds live patch support. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com>
2016-04-12powerpc: sparse: Include headers for __weak symbolsDaniel Axtens
Sometimes when sparse warns about undefined symbols, it isn't because they should have 'static' added, it's because they're overriding __weak symbols defined elsewhere, and the header has been missed. Fix a few of them by adding appropriate headers. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-12powerpc: sparse: static-ify some thingsDaniel Axtens
As sparse suggests, these should be made static. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11powerpc: make kernel/nvram_64.c explicitly non-modularPaul Gortmaker
The Makefile/Kconfig currently controlling compilation of this code is: obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ signal_64.o ptrace32.o \ paca.o nvram_64.o firmware.o arch/powerpc/platforms/Kconfig.cputype:config PPC64 arch/powerpc/platforms/Kconfig.cputype: bool "64-bit kernel" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We don't replace module.h with init.h since the file already has that. We delete the MODULE_LICENSE tag since that information is already contained at the top of the file in the comments. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Anton Blanchard <anton@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11powerpc/eeh: rename EEH from "extended" to "enhanced" error handlingRussell Currey
IBM online documentation for EEH uses "extended error handling" and "enhanced error handling" to refer to the same thing, in different places. The only place mentioning it as "enhanced error handling" in the kernel is the MAINTAINERS file, and it's "extended" in some documentation. IBM originally defined EEH as "enhanced error handling", so standardise all mentions of EEH to use that term. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11powerpc: Remove orphaned asm implementation of abs()Michael Ellerman
This has been unused since ~2004, remove it. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-11powerpc/mm: Remove long disabled SLB codeMichael Ellerman
We have a bunch of SLB related code in the tree which is there to handle dynamic VSIDs - but currently it's all disabled at compile time. The comments say "Keep that around for when we re-implement dynamic VSIDs". But that was over 10 years ago (commit 3c726f8dee6f ("[PATCH] ppc64: support 64k pages")). The chance that it would still work unchanged is minimal, and in the meantime it's confusing to folks browsing/grepping the code. If we ever want to re-instate it, it's in the git history. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com>
2016-03-29powerpc/process: Fix altivec SPR not being savedOliver O'Halloran
In save_sprs() in process.c contains the following test: if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) t->vrsave = mfspr(SPRN_VRSAVE); CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test is equivilent to: if (cpu_has_feature(CPU_FTR_ALTIVEC) && cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) On CPUs without support for both (i.e G5) this results in vrsave not being saved between context switches. The vector register save/restore code doesn't use VRSAVE to determine which registers to save/restore, but the value of VRSAVE is used to determine if altivec is being used in several code paths. Fixes: 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") Cc: stable@vger.kernel.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-25arch, ftrace: for KASAN put hard/soft IRQ entries into separate sectionsAlexander Potapenko
KASAN needs to know whether the allocation happens in an IRQ handler. This lets us strip everything below the IRQ entry point to reduce the number of unique stack traces needed to be stored. Move the definition of __irq_entry to <linux/interrupt.h> so that the users don't need to pull in <linux/ftrace.h>. Also introduce the __softirq_entry macro which is similar to __irq_entry, but puts the corresponding functions to the .softirqentry.text section. Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-19Merge tag 'powerpc-4.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "This was delayed a day or two by some build-breakage on old toolchains which we've now fixed. There's two PCI commits both acked by Bjorn. There's one commit to mm/hugepage.c which is (co)authored by Kirill. Highlights: - Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V - Add POWER9 cputable entry from Michael Neuling - FPU/Altivec/VSX save/restore optimisations from Cyril Bur - Add support for new ftrace ABI on ppc64le from Torsten Duwe Various cleanups & minor fixes from: - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh. General: - atomics: Allow architectures to define their own __atomic_op_* helpers from Boqun Feng - Implement atomic{, 64}_*_return_* variants and acquire/release/ relaxed variants for (cmp)xchg from Boqun Feng - Add powernv_defconfig from Jeremy Kerr - Fix BUG_ON() reporting in real mode from Balbir Singh - Add xmon command to dump OPAL msglog from Andrew Donnellan - Add xmon command to dump process/task similar to ps(1) from Douglas Miller - Clean up memory hotplug failure paths from David Gibson pci/eeh: - Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei Yang. - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan. - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang - PCI: Add pcibios_bus_add_device() weak function from Wei Yang - MAINTAINERS: Update EEH details and maintainership from Russell Currey cxl: - Support added to the CXL driver for running on both bare-metal and hypervisor systems, from Christophe Lombard and Frederic Barrat. - Ignore probes for virtual afu pci devices from Vaibhav Jain perf: - Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu - hv-24x7: Fix usage with chip events, display change in counter values, display domain indices in sysfs, eliminate domain suffix in event names, from Sukadev Bhattiprolu Freescale: - Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt bits, and minor fixes/cleanup" * tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits) powerpc: Fix unrecoverable SLB miss during restore_math() powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers powerpc/rcpm: Fix build break when SMP=n powerpc/book3e-64: Use hardcoded mttmr opcode powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible powerpc/T104xRDB: add tdm riser card node to device tree powerpc32: PAGE_EXEC required for inittext powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s) powerpc/86xx: Introduce and use common dtsi powerpc/86xx: Update device tree powerpc/86xx: Move dts files to fsl directory powerpc/86xx: Switch to kconfig fragments approach powerpc/86xx: Update defconfigs powerpc/86xx: Consolidate common platform code powerpc32: Remove one insn in mulhdu powerpc32: small optimisation in flush_icache_range() powerpc: Simplify test in __dma_sync() powerpc32: move xxxxx_dcache_range() functions inline powerpc32: Remove clear_pages() and define clear_page() inline ...
2016-03-17param: convert some "on"/"off" users to strtoboolKees Cook
This changes several users of manual "on"/"off" parsing to use strtobool. Some side-effects: - these uses will now parse y/n/1/0 meaningfully too - the early_param uses will now bubble up parse errors Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Amitkumar Karwar <akarwar@marvell.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Joe Perches <joe@perches.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nishant Sarmukadam <nishants@marvell.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Steve French <sfrench@samba.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17powerpc: query dynamic DEBUG_PAGEALLOC settingJoonsoo Kim
We can disable debug_pagealloc processing even if the code is compiled with CONFIG_DEBUG_PAGEALLOC. This patch changes the code to query whether it is enabled or not in runtime. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ...
2016-03-16powerpc: Fix unrecoverable SLB miss during restore_math()Cyril Bur
Commit 70fe3d9 "powerpc: Restore FPU/VEC/VSX if previously used" introduces a call to restore_math() late in the syscall return path, after MSR_RI has been cleared. The MSR_RI flag is used to indicate whether the kernel can take another exception or not. A cleared MSR_RI flag indicates that the kernel cannot. Unfortunately when a machine is under SLB pressure an SLB miss can occur in restore_math() which (with MSR_RI cleared) leads to an unrecoverable exception. Unrecoverable exception 4100 at c0000000000088d8 cpu 0x0: Vector: 4100 at [c0000003fa473b20] pc: c0000000000088d8: .load_vr_state+0x70/0x110 lr: c00000000000f710: .restore_math+0x130/0x188 sp: c0000003fa473da0 msr: 9000000002003030 current = 0xc0000007f876f180 paca = 0xc00000000fff0000 softe: 0 irq_happened: 0x01 pid = 1944, comm = K08umountfs [link register ] c00000000000f710 .restore_math+0x130/0x188 [c0000003fa473da0] c0000003fa473e30 (unreliable) [c0000003fa473e30] c000000000007b6c system_call+0x84/0xfc The clearing of MSR_RI is actually an optimisation to avoid multiple MSR writes, what must be disabled are interrupts. See comment in entry_64.S: /* * For performance reasons we clear RI the same time that we * clear EE. We only need to clear RI just before we restore r13 * below, but batching it with EE saves us one expensive mtmsrd call. * We have to be careful to restore RI if we branch anywhere from * here (eg syscall_exit_work). */ At the point of calling restore_math() r13 has not been restored, as such, the quick fix of turning MSR_RI back on for the call to restore_math() will eliminate the occurrence of an unrecoverable exception. We'd like to do a better fix in future. Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-16powerpc/book3e-64: Use hardcoded mttmr opcodeScott Wood
This preserves the ability to build using older binutils (reportedly <= 2.22). Fixes: 6becef7ea04a ("powerpc/mpc85xx: Add CPU hotplug support for E6500") Signed-off-by: Scott Wood <oss@buserror.net> Cc: chenhui.zhao@freescale.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-15Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu hotplug updates from Thomas Gleixner: "This is the first part of the ongoing cpu hotplug rework: - Initial implementation of the state machine - Runs all online and prepare down callbacks on the plugged cpu and not on some random processor - Replaces busy loop waiting with completions - Adds tracepoints so the states can be followed" More detailed commentary on this work from an earlier email: "What's wrong with the current cpu hotplug infrastructure? - Asymmetry The hotplug notifier mechanism is asymmetric versus the bringup and teardown. This is mostly caused by the notifier mechanism. - Largely undocumented dependencies While some notifiers use explicitely defined notifier priorities, we have quite some notifiers which use numerical priorities to express dependencies without any documentation why. - Control processor driven Most of the bringup/teardown of a cpu is driven by a control processor. While it is understandable, that preperatory steps, like idle thread creation, memory allocation for and initialization of essential facilities needs to be done before a cpu can boot, there is no reason why everything else must run on a control processor. Before this patch series, bringup looks like this: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu bring the rest up - All or nothing approach There is no way to do partial bringups. That's something which is really desired because we waste e.g. at boot substantial amount of time just busy waiting that the cpu comes to life. That's stupid as we could very well do preparatory steps and the initial IPI for other cpus and then go back and do the necessary low level synchronization with the freshly booted cpu. - Minimal debuggability Due to the notifier based design, it's impossible to switch between two stages of the bringup/teardown back and forth in order to test the correctness. So in many hotplug notifiers the cancel mechanisms are either not existant or completely untested. - Notifier [un]registering is tedious To [un]register notifiers we need to protect against hotplug at every callsite. There is no mechanism that bringup/teardown callbacks are issued on the online cpus, so every caller needs to do it itself. That also includes error rollback. What's the new design? The base of the new design is a symmetric state machine, where both the control processor and the booting/dying cpu execute a well defined set of states. Each state is symmetric in the end, except for some well defined exceptions, and the bringup/teardown can be stopped and reversed at almost all states. So the bringup of a cpu will look like this in the future: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu bring itself up The synchronization step does not require the control cpu to wait. That mechanism can be done asynchronously via a worker or some other mechanism. The teardown can be made very similar, so that the dying cpu cleans up and brings itself down. Cleanups which need to be done after the cpu is gone, can be scheduled asynchronously as well. There is a long way to this, as we need to refactor the notion when a cpu is available. Today we set the cpu online right after it comes out of the low level bringup, which is not really correct. The proper mechanism is to set it to available, i.e. cpu local threads, like softirqd, hotplug thread etc. can be scheduled on that cpu, and once it finished all booting steps, it's set to online, so general workloads can be scheduled on it. The reverse happens on teardown. First thing to do is to forbid scheduling of general workloads, then teardown all the per cpu resources and finally shut it off completely. This patch series implements the basic infrastructure for this at the core level. This includes the following: - Basic state machine implementation with well defined states, so ordering and prioritization can be expressed. - Interfaces to [un]register state callbacks This invokes the bringup/teardown callback on all online cpus with the proper protection in place and [un]installs the callbacks in the state machine array. For callbacks which have no particular ordering requirement we have a dynamic state space, so that drivers don't have to register an explicit hotplug state. If a callback fails, the code automatically does a rollback to the previous state. - Sysfs interface to drive the state machine to a particular step. This is only partially functional today. Full functionality and therefor testability will be achieved once we converted all existing hotplug notifiers over to the new scheme. - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying processor: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu wait for boot bring itself up Signal completion to control cpu In a previous step of this work we've done a full tree mechanical conversion of all hotplug notifiers to the new scheme. The balance is a net removal of about 4000 lines of code. This is not included in this series, as we decided to take a different approach. Instead of mechanically converting everything over, we will do a proper overhaul of the usage sites one by one so they nicely fit into the symmetric callback scheme. I decided to do that after I looked at the ugliness of some of the converted sites and figured out that their hotplug mechanism is completely buggered anyway. So there is no point to do a mechanical conversion first as we need to go through the usage sites one by one again in order to achieve a full symmetric and testable behaviour" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) cpu/hotplug: Document states better cpu/hotplug: Fix smpboot thread ordering cpu/hotplug: Remove redundant state check cpu/hotplug: Plug death reporting race rcu: Make CPU_DYING_IDLE an explicit call cpu/hotplug: Make wait for dead cpu completion based cpu/hotplug: Let upcoming cpu bring itself fully up arch/hotplug: Call into idle with a proper state cpu/hotplug: Move online calls to hotplugged cpu cpu/hotplug: Create hotplug threads cpu/hotplug: Split out the state walk into functions cpu/hotplug: Unpark smpboot threads from the state machine cpu/hotplug: Move scheduler cpu_online notifier to hotplug core cpu/hotplug: Implement setup/removal interface cpu/hotplug: Make target state writeable cpu/hotplug: Add sysfs state interface cpu/hotplug: Hand in target state to _cpu_up/down cpu/hotplug: Convert the hotplugged cpu work to a state machine cpu/hotplug: Convert to a state machine for the control processor cpu/hotplug: Add tracepoints ...
2016-03-14Merge branch 'next' of ↵Michael Ellerman
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and other dt bits, and minor fixes/cleanup."
2016-03-11powerpc32: Remove one insn in mulhduChristophe Leroy
Remove one instruction in mulhdu Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc32: small optimisation in flush_icache_range()Christophe Leroy
Inlining of _dcache_range() functions has shown that the compiler does the same thing a bit better with one insn less Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc32: move xxxxx_dcache_range() functions inlineChristophe Leroy
flush/clean/invalidate _dcache_range() functions are all very similar and are quite short. They are mainly used in __dma_sync() perf_event locate them in the top 3 consumming functions during heavy ethernet activity They are good candidate for inlining, as __dma_sync() does almost nothing but calling them Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc32: Remove clear_pages() and define clear_page() inlineChristophe Leroy
clear_pages() is never used expect by clear_page, and PPC32 is the only architecture (still) having this function. Neither PPC64 nor any other architecture has it. This patch removes clear_pages() and moves clear_page() function inline (same as PPC64) as it only is a few isns Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc/8xx: rewrite flush_instruction_cache() in CChristophe Leroy
On PPC8xx, flushing instruction cache is performed by writing in register SPRN_IC_CST. This registers suffers CPU6 ERRATA. The patch rewrites the fonction in C so that CPU6 ERRATA will be handled transparently Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc/8xx: rewrite set_context() in CChristophe Leroy
There is no real need to have set_context() in assembly. Now that we have mtspr() handling CPU6 ERRATA directly, we can rewrite set_context() in C language for easier maintenance. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc/8xx: remove special handling of CPU6 errata in set_dec()Christophe Leroy
CPU6 ERRATA is now handled directly in mtspr(), so we can use the standard set_dec() fonction in all cases. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc/8xx: Map linear kernel RAM with 8M pagesChristophe Leroy
On a live running system (VoIP gateway for Air Trafic Control), over a 10 minutes period (with 277s idle), we get 87 millions DTLB misses and approximatly 35 secondes are spent in DTLB handler. This represents 5.8% of the overall time and even 10.8% of the non-idle time. Among those 87 millions DTLB misses, 15% are on user addresses and 85% are on kernel addresses. And within the kernel addresses, 93% are on addresses from the linear address space and only 7% are on addresses from the virtual address space. MPC8xx has no BATs but it has 8Mb page size. This patch implements mapping of kernel RAM using 8Mb pages, on the same model as what is done on the 40x. In 4k pages mode, each PGD entry maps a 4Mb area: we map every two entries to the same 8Mb physical page. In each second entry, we add 4Mb to the page physical address to ease life of the FixupDAR routine. This is just ignored by HW. In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry will point to the first page of the area. The DTLB handler adds the 3 bits from EPN to map the correct page. With this patch applied, we now get only 13 millions TLB misses during the 10 minutes period. The idle time has increased to 313s and the overall time spent in DTLB miss handler is 6.3s, which represents 1% of the overall time and 2.2% of non-idle time. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc/8xx: Save r3 all the time in DTLB miss handlerChristophe Leroy
We are spending between 40 and 160 cycles with a mean of 65 cycles in the DTLB handling routine (measured with mftbl) so make it more simple althought it adds one instruction. With this modification, we get three registers available at all time, which will help with following patch. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11Merge branch 'topic/mprofile-kernel' into nextMichael Ellerman
Merge the ftrace changes to support -mprofile-kernel on ppc64le. This is a prerequisite for live patching, the support for which will be merged via the livepatch tree based on this topic branch.
2016-03-10Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-09powerpc/8xx: CONFIG_DEBUG_PAGEALLOC requires ITLBmiss for kernel addressesChristophe Leroy
When CONFIG_DEBUG_PAGEALLOC is activated, the initial TLB mapping gets flushed to track accesses to wrong areas. Therefore, kernel addresses will also generate ITLB misses. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-09Merge tag 'kvm-arm-for-4.6' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.6 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - Various optimizations to the vgic save/restore code Conflicts: include/uapi/linux/kvm.h
2016-03-09powerpc/eeh: eeh_pci_enable(): fix checking of post-request stateAndrew Donnellan
In eeh_pci_enable(), after making the request to set the new options, we call eeh_ops->wait_state() to check that the request finished successfully. At the moment, if eeh_ops->wait_state() returns 0, we return 0 without checking that it reflects the expected outcome. This can lead to callers further up the chain incorrectly assuming the slot has been successfully unfrozen and continuing to attempt recovery. On powernv, this will occur if pnv_eeh_get_pe_state() or pnv_eeh_get_phb_state() return 0, which in turn occurs if the relevant OPAL call returns OPAL_EEH_STOPPED_MMIO_DMA_FREEZE or OPAL_EEH_PHB_ERROR respectively. On pseries, this will occur if pseries_eeh_get_state() returns 0, which in turn occurs if RTAS reports that the PE is in the MMIO Stopped and DMA Stopped states. Obviously, none of these cases represent a successful completion of a request to thaw MMIO or DMA. Fix the check so that a wait_state() return value of 0 won't be considered successful for the EEH_OPT_THAW_MMIO or EEH_OPT_THAW_DMA cases. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Remove duplicated check in eeh_dump_pe_log()Gavin Shan
When eeh_dump_pe_log() is only called by eeh_slot_error_detail(), we already have the check that the PE isn't in PCI config blocked state in eeh_slot_error_detail(). So we needn't the duplicated check in eeh_dump_pe_log(). This removes the duplicated check in eeh_dump_pe_log(). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Synchronize recovery in host/guestGavin Shan
When passing through SRIOV VFs to guest, we possibly encounter EEH error on PF. In this case, the VF PEs are put into frozen state. The error could be reported to guest before it's captured by the host. That means the guest could attempt to recover errors on VFs before host gets chance to recover errors on PFs. The VFs won't be recovered successfully. This enforces the recovery order for above case: the recovery on child PE in guest is hold until the recovery on parent PE in host is completed. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Don't remove passed VFsGavin Shan
When we have partial hotplug as part of the error recovery on PF, the VFs that are bound with vfio-pci driver will experience hotplug. That's not allowed. This checks if the VF PE is passed or not. If it does, we leave the VF without removing it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Don't propagate error to guestGavin Shan
When EEH error happened to the parent PE of those PEs that have been passed through to guest, the error is propagated to guest domain and the VFIO driver's error handlers are called. It's not correct as the error in the host domain shouldn't be propagated to guests and affect them. This adds one more limitation when calling EEH error handlers. If the PE has been passed through to guest, the error handlers won't be called. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: powerpc/eeh: Support error recovery for VF PEWei Yang
PFs are enumerated on PCI bus, while VFs are created by PF's driver. In EEH recovery, it has two cases: 1. Device and driver is EEH aware, error handlers are called. 2. Device and driver is not EEH aware, un-plug the device and plug it again by enumerating it. The special thing happens on the second case. For a PF, we could use the original pci core to enumerate the bus, while for VF we need to record the VFs which aer un-plugged then plug it again. Also The patch caches the VF index in pci_dn, which can be used to calculate VF's bus, device and function number. Those information helps to locate the VF's PCI device instance when doing hotplug during EEH recovery if necessary. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/powernv: Support EEH reset for VF PEWei Yang
PEs for VFs don't have primary bus. So they have to have their own reset backend, which is used during EEH recovery. The patch implements the reset backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained in the PE. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Create PE for VFsWei Yang
This creates PEs for VFs in the weak function pcibios_bus_add_device(). Those PEs for VFs are identified with newly introduced flag EEH_PE_VF so that we treat them differently during EEH recovery. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: EEH device for VFWei Yang
VFs and their corresponding pdn are created and released dynamically when their PF's SRIOV capability is enabled and disabled. This creates and releases EEH devices for VFs when creating and releasing their pdn instances, which means EEH devices and pdn instances have same life cycle. Also, VF's EEH device is identified by (struct eeh_dev::physfn). Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Cache normal BARs, not windows or IOV BARsWei Yang
This restricts the EEH address cache to use only the first 7 BARs. This makes __eeh_addr_cache_insert_dev() ignore PCI bridge window and IOV BARs. As the result of this change, eeh_addr_cache_get_dev() will return VFs from VF's resource addresses instead of parent PFs. This also removes PCI bridge check as we limit __eeh_addr_cache_insert_dev() to 7 BARs and this effectively excludes PCI bridges from being cached. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/pci: Remove VFs prior to PFWei Yang
As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates, VFs which is on the same PCI bus as their PF, should be removed before the PF. Otherwise, we might run into kernel crash at PCI unplugging time. This applies the above pattern to powerpc PCI hotplug path. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-09powerpc/eeh: Reworked eeh_pe_bus_get()Gavin Shan
The original implementation is ugly: unnecessary if statements and "out" tag. This reworks the function to avoid above weaknesses. No functional changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>