summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2017-04-11powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1Gautham R. Shenoy
POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus the HSPRG0 of a thread waking up from can contain the paca pointer of its sibling. This patch implements a context recovery framework within threads of a core, by provisioning space in paca_struct for saving every sibling threads's paca pointers. Basically, we should be able to arrive at the right paca pointer from any of the thread's existing paca pointer. At bootup, during powernv idle-init, we save the paca address of every CPU in each one its siblings paca_struct in the slot corresponding to this CPU's index in the core. On wakeup from a stop, the thread will determine its index in the core from the TIR register and recover its PACA pointer by indexing into the correct slot in the provisioned space in the current PACA. Furthermore, ensure that the NVGPRs are restored from the stack on the way out by setting the NAPSTATELOST in paca. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Call it a bug] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/powernv/idle: Don't override default/deepest directly in kernelGautham R. Shenoy
Currently during idle-init on power9, if we don't find suitable stop states in the device tree that can be used as the default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default stop state psscr to be used by power9_idle and deepest stop state which is used by CPU-Hotplug. However, if the platform firmware has not configured or enabled a stop state, the kernel should not make any assumptions and fallback to a default choice. If the kernel uses a stop state that is not configured by the platform firmware, it may lead to further failures which should be avoided. In this patch, we modify the init code to ensure that the kernel uses only the stop states exposed by the firmware through the device tree. When a suitable default stop state isn't found, we disable ppc_md.power_save for power9. Similarly, when a suitable deepest_stop_state is not found in the device tree exported by the firmware, fall back to the default busy-wait loop in the CPU-Hotplug code. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/powernv/smp: Add busy-wait loop as fall back for CPU-HotplugGautham R. Shenoy
Currently, the powernv cpu-offline function assumes that platform idle states such as stop on POWER9, winkle/sleep/nap on POWER8 are always available. On POWER8, it picks nap as the default state if other deep idle states like sleep/winkle are not available and enabled in the platform. On POWER9, nap is not available and all idle states are managed by STOP instruction. The parameters to the idle state are passed through processor stop status control register (PSSCR). Hence as such executing STOP would take parameters from current PSSCR. We do not want to make any assumptions in kernel on what STOP states and PSSCR features are configured by the platform. Ideally platform will configure a good set of stop states that can be used in the kernel. We would like to start with a clean slate, if the platform choose to not configure any state or there is an error in platform firmware that lead to no stop states being configured or allowed to be requested. This patch adds a fallback method for CPU-Hotplug that is similar to snooze loop at idle where the threads are left to spin at low priority and hence reduce the cycles consumed. This is a safe fallback mechanism in the case when no stop state would be requested if the platform firmware did not configure them most likely due to an error condition. Requesting a stop state when the platform has not configured them or enabled them would lead to further error conditions which could be difficult to debug. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.cGautham R. Shenoy
Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which transitions the CPU to the deepest available platform idle state to a new function named pnv_cpu_offline() in powernv/idle.c. The rationale behind this code movement is that the data required to determine the deepest available platform state resides in powernv/idle.c. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/hugetlb: Add ABI defines for supported HugeTLB page sizesAnshuman Khandual
Add user space exported API definitions for 512KB, 1MB, 2MB, 8MB, 16MB, 1GB, 16GB non default huge page sizes to be used with mmap() system call. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Reword the comment to emphasise that these are only needed to use the non-default huge page size, and updated the change log.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/mm: Remove reduntant initmem information from logAnshuman Khandual
Generic core VM already prints these information in the log buffer, hence there is no need for a second print. This just removes the second print from arch powerpc NUMA init path. Before the patch: $ dmesg | grep "Initmem" numa: Initmem setup node 0 [mem 0x00000000-0xffffffff] numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff] numa: Initmem setup node 2 [mem 0x200000000-0x2ffffffff] numa: Initmem setup node 3 [mem 0x300000000-0x3ffffffff] numa: Initmem setup node 4 [mem 0x400000000-0x4ffffffff] numa: Initmem setup node 5 [mem 0x500000000-0x5ffffffff] numa: Initmem setup node 6 [mem 0x600000000-0x6ffffffff] numa: Initmem setup node 7 [mem 0x700000000-0x7ffffffff] Initmem setup node 0 [mem 0x0000000000000000-0x00000000ffffffff] Initmem setup node 1 [mem 0x0000000100000000-0x00000001ffffffff] Initmem setup node 2 [mem 0x0000000200000000-0x00000002ffffffff] Initmem setup node 3 [mem 0x0000000300000000-0x00000003ffffffff] Initmem setup node 4 [mem 0x0000000400000000-0x00000004ffffffff] Initmem setup node 5 [mem 0x0000000500000000-0x00000005ffffffff] Initmem setup node 6 [mem 0x0000000600000000-0x00000006ffffffff] Initmem setup node 7 [mem 0x0000000700000000-0x00000007ffffffff] After the patch just the latter set is printed. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc: Make sparsemem the default on 64-bit Book3SMichael Ellerman
Make sparsemem the default on all 64-bit Book3S platforms. It already is for pseries and ps3, and we need to enable it for powernv because on POWER9 memory between chips is discontiguous. For the other platforms sparsemem should work fine, though it might add a small amount of overhead. We can always force FLATMEM in the defconfigs if necessary. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/nohash: Fix use of mmu_has_feature() in setup_initial_memory_limit()Michael Ellerman
setup_initial_memory_limit() is called from early_init_devtree(), which runs prior to feature patching. If the kernel is built with CONFIG_JUMP_LABEL=y and CONFIG_JUMP_LABEL_FEATURE_CHECKS=y then we will potentially get the wrong value. If we also have CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG=y we get a warning and backtrace: Warning! mmu_has_feature() used prior to jump label init! CPU: 0 PID: 0 Comm: swapper Not tainted 4.11.0-rc4-gccN-next-20170331-g6af2434 #1 Call Trace: [c000000000fc3d50] [c000000000a26c30] .dump_stack+0xa8/0xe8 (unreliable) [c000000000fc3de0] [c00000000002e6b8] .setup_initial_memory_limit+0xa4/0x104 [c000000000fc3e60] [c000000000d5c23c] .early_init_devtree+0xd0/0x2f8 [c000000000fc3f00] [c000000000d5d3b0] .early_setup+0x90/0x11c [c000000000fc3f90] [c000000000000520] start_here_multiplatform+0x68/0x80 Fix it by using early_mmu_has_feature(). Fixes: c12e6f24d413 ("powerpc: Add option to use jump label for mmu_has_feature()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc: Remove unnecessary includes of asm/debug.hMichael Ellerman
These files don't seem to have any need for asm/debug.h, now that all it includes are the debugger hooks and breakpoint definitions. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc: Create asm/debugfs.h and move powerpc_debugfs_root thereMichael Ellerman
powerpc_debugfs_root is the dentry representing the root of the "powerpc" directory tree in debugfs. Currently it sits in asm/debug.h, a long with some other things that have "debug" in the name, but are otherwise unrelated. Pull it out into a separate header, which also includes linux/debugfs.h, and convert all the users to include debugfs.h instead of debug.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/powernv: Require MMU_NOTIFIER to fix NPU buildAlistair Popple
In the recent commit 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") the NPU code gained a dependency on MMU notifiers. All our defconfigs have KVM enabled, which selects MMU_NOTIFIER, but if KVM is not enabled then the build breaks. Fix it by always selecting MMU_NOTIFIER when we're building powernv. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/mm/radix: Remove unnecessary ptesyncAneesh Kumar K.V
For a tlbiel with pid, we need to issue tlbiel with set number encoded. We don't need to do ptesync for each of those. Instead we need one for the entire tlbiel pid operation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/mm/radix: Don't do page walk cache flush when doing full mm flushAneesh Kumar K.V
For fullmm tlb flush, we do a flush with RIC_FLUSH_ALL which will invalidate all related caches (radix__tlb_flush()). Hence the pwc flush is not needed. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10EDAC: Remove EDAC_MM_EDACBorislav Petkov
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of that indirection. Update defconfigs which had it. While at it, fix dependencies such that EDAC depends on RAS for the tracepoints. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-edac@vger.kernel.org
2017-04-10powerpc: Fixup LPCR:PECE and HEIC setting on POWER9Benjamin Herrenschmidt
We need to set LPES in order for normal external interrupts (0x500) to be directed to the guest while running in guest state. We also need HEIC set to prevent them to be sent to the host while in host state. With XIVE the host never gets one of these and wouldn't know how to handle it. All host external interrupts come in via the new hypervisor virtualization interrupts vector. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10powerpc: Consolidate variants of real-mode MMIOsBenjamin Herrenschmidt
We have all sort of variants of MMIO accessors for the real mode instructions. This creates a clean set of accessors based on Linux normal naming conventions, replacing all occurrences of the old ones in the tree. I have purposefully removed the "out/in" variants in favor of only including __raw variants. Any code using these is already pretty much hand tuned to operate in a very specific environment. I've fixed up the 2 users (only one of them actually needed a barrier in the first place). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10powerpc/kvm: Remove obsolete kvm_vm_ioctl_xics_irq declarationBenjamin Herrenschmidt
The function doesn't exist anymore Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10powerpc/kvm: Make kvmppc_xics_create_icp staticBenjamin Herrenschmidt
It's only used within the same file it's defined Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10powerpc/kvm: Massage order of #includeBenjamin Herrenschmidt
We traditionally have linux/ before asm/ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10powerpc/xive: Native exploitation of the XIVE interrupt controllerBenjamin Herrenschmidt
The XIVE interrupt controller is the new interrupt controller found in POWER9. It supports advanced virtualization capabilities among other things. Currently we use a set of firmware calls that simulate the old "XICS" interrupt controller but this is fairly inefficient. This adds the framework for using XIVE along with a native backend which OPAL for configuration. Later, a backend allowing the use in a KVM or PowerVM guest will also be provided. This disables some fast path for interrupts in KVM when XIVE is enabled as these rely on the firmware emulation code which is no longer available when the XIVE is used natively by Linux. A latter patch will make KVM also directly exploit the XIVE, thus recovering the lost performance (and more). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings, tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV, fix build errors when SMP=n, fold in fixes from Ben: Don't call cpu_online() on an invalid CPU number Fix irq target selection returning out of bounds cpu# Extra sanity checks on cpu numbers ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-08Merge tag 'powerpc-4.11-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 4.11: Headed to stable: - disable HFSCR[TM] if TM is not supported, fixes a potential host kernel crash triggered by a hostile guest, but only in configurations that no one uses - don't try to fix up misaligned load-with-reservation instructions - fix flush_(d|i)cache_range() called from modules on little endian kernels - add missing global TLB invalidate if cxl is active - fix missing preempt_disable() in crc32c-vpmsum And a fix for selftests build changes that went in this release: - selftests/powerpc: Fix standalone powerpc build Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran, Paul Mackerras" * tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable() powerpc/mm: Add missing global TLB invalidate if cxl is active powerpc/64: Fix flush_(d|i)cache_range() called from modules powerpc: Don't try to fix up misaligned load-with-reservation instructions powerpc: Disable HFSCR[TM] if TM is not supported selftests/powerpc: Fix standalone powerpc build
2017-04-08New getsockopt option to get socket cookieChenbo Feng
Introduce a new getsockopt operation to retrieve the socket cookie for a specific socket based on the socket fd. It returns a unique non-decreasing cookie for each socket. Tested: https://android-review.googlesource.com/#/c/358163/ Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET publicPaolo Bonzini
Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07kvm: make KVM_CAP_COALESCED_MMIO architecture agnosticPaolo Bonzini
Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()Michael Ellerman
In crc32c_vpmsum() we call enable_kernel_altivec() without first disabling preemption, which is not allowed: WARNING: CPU: 9 PID: 2949 at ../arch/powerpc/kernel/process.c:277 enable_kernel_altivec+0x100/0x120 Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c vmx_crypto ... CPU: 9 PID: 2949 Comm: docker Not tainted 4.11.0-rc5-compiler_gcc-6.3.1-00033-g308ac7563944 #381 ... NIP [c00000000001e320] enable_kernel_altivec+0x100/0x120 LR [d000000003df0910] crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum] Call Trace: 0xc138fd09 (unreliable) crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum] crc32c_vpmsum_update+0x3c/0x60 [crc32c_vpmsum] crypto_shash_update+0x88/0x1c0 crc32c+0x64/0x90 [libcrc32c] dm_bm_checksum+0x48/0x80 [dm_persistent_data] sb_check+0x84/0x120 [dm_thin_pool] dm_bm_validate_buffer.isra.0+0xc0/0x1b0 [dm_persistent_data] dm_bm_read_lock+0x80/0xf0 [dm_persistent_data] __create_persistent_data_objects+0x16c/0x810 [dm_thin_pool] dm_pool_metadata_open+0xb0/0x1a0 [dm_thin_pool] pool_ctr+0x4cc/0xb60 [dm_thin_pool] dm_table_add_target+0x16c/0x3c0 table_load+0x184/0x400 ctl_ioctl+0x2f0/0x560 dm_ctl_ioctl+0x38/0x50 do_vfs_ioctl+0xd8/0x920 SyS_ioctl+0x68/0xc0 system_call+0x38/0xfc It used to be sufficient just to call pagefault_disable(), because that also disabled preemption. But the two were decoupled in commit 8222dbe21e79 ("sched/preempt, mm/fault: Decouple preemption from the page fault logic") in mid 2015. So add the missing preempt_disable/enable(). We should also call disable_kernel_fp(), although it does nothing by default, there is a debug switch to make it active and all enables should be paired with disables. Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-07powerpc/smp: Remove migrate_irq() custom implementationBenjamin Herrenschmidt
Some powerpc platforms use this to move IRQs away from a CPU being unplugged. This function has several bugs such as not taking the right locks or failing to NULL check pointers. There's a new generic function doing exactly the same thing without all the bugs, so let's use it instead. mpe: The obvious place for the select of GENERIC_IRQ_MIGRATION is on HOTPLUG_CPU, but that doesn't work. On some configs PM_SLEEP_SMP will select HOTPLUG_CPU even though its dependencies are not met, which means the select of GENERIC_IRQ_MIGRATION doesn't happen. That leads to the build breaking. Fix it by moving the select of GENERIC_IRQ_MIGRATION to SMP. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06Merge commit 'b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e' into uaccess.ia64Al Viro
backmerge of mainline ia64 fix
2017-04-06powerpc: get rid of zeroing, switch to RAW_COPY_USERAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-06Merge commit 'a7d2475af7aedcb9b5c6343989a8bfadbf84429b' into uaccess.powerpcAl Viro
backmerge of sorting the arch/powerpc/Kconfig
2017-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Mostly simple cases of overlapping changes (adding code nearby, a function whose name changes, for example). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06powerpc: Add optional smp_ops->prepare_cpu SMP callbackBenjamin Herrenschmidt
Some platforms (will) need to perform allocations before bringing a new CPU online. Doing it from smp_ops->setup_cpu is the wrong thing to do: - It has no useful failure path (too late) - Calling any allocator will enable interrupts prematurely causing problems with large decrementer among others Instead, add a new callback that is called from __cpu_up (so from the context trying to online the new CPU) at a point where we can safely allocate and handle failures. This will be used by XIVE support. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06powerpc: Add more PPC bit conversion macrosBenjamin Herrenschmidt
Add 32 and 8 bit variants Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06powerpc/powernv: Add XIVE related definitions to opal-api.hBenjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06Merge commit 'fc69910f329d' into uaccess.mipsAl Viro
backmerge of a build fix from mainline
2017-04-06KVM: PPC: Book3S HV: Check for kmalloc errors in ioctlDan Carpenter
kzalloc() won't actually fail because sizeof(*resize) is small, but static checkers complain. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-05powerpc/mm: Add missing global TLB invalidate if cxl is activeFrederic Barrat
Commit 4c6d9acce1f4 ("powerpc/mm: Add hooks for cxl") converted local TLB invalidates to global if the cxl driver is active. This is necessary because the CAPP snoops invalidations to forward them to the PSL on the cxl adapter. However one path was forgotten. native_flush_hash_range() still does local TLB invalidates, as found out the hard way recently. This patch fixes it by following the same logic as previously: if the cxl driver is active, the local TLB invalidates are 'upgraded' to global. Fixes: 4c6d9acce1f4 ("powerpc/mm: Add hooks for cxl") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-05powerpc/64: Fix flush_(d|i)cache_range() called from modulesOliver O'Halloran
When the kernel is compiled to use 64bit ABIv2 the _GLOBAL() macro does not include a global entry point. A function's global entry point is used when the function is called from a different TOC context and in the kernel this typically means a call from a module into the vmlinux (or vice-versa). There are a few exported asm functions declared with _GLOBAL() and calling them from a module will likely crash the kernel since any TOC relative load will yield garbage. flush_icache_range() and flush_dcache_range() are both exported to modules, and use the TOC, so must use _GLOBAL_TOC(). Fixes: 721aeaa9fdf3 ("powerpc: Build little endian ppc64 kernel with ABIv2") Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04powerpc: Don't try to fix up misaligned load-with-reservation instructionsPaul Mackerras
In the past, there was only one load-with-reservation instruction, lwarx, and if a program attempted a lwarx on a misaligned address, it would take an alignment interrupt and the kernel handler would emulate it as though it was lwzx, which was not really correct, but benign since it is loading the right amount of data, and the lwarx should be paired with a stwcx. to the same address, which would also cause an alignment interrupt which would result in a SIGBUS being delivered to the process. We now have 5 different sizes of load-with-reservation instruction. Of those, lharx and ldarx cause an immediate SIGBUS by luck since their entries in aligninfo[] overlap instructions which were not fixed up, but lqarx overlaps with lhz and will be emulated as such. lbarx can never generate an alignment interrupt since it only operates on 1 byte. To straighten this out and fix the lqarx case, this adds code to detect the l[hwdq]arx instructions and return without fixing them up, resulting in a SIGBUS being delivered to the process. Cc: stable@vger.kernel.org Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04powerpc/powernv: Add OPAL exports attributes to sysfsMatt Brown
New versions of OPAL have a device node /ibm,opal/firmware/exports, each property of which describes a range of memory in OPAL that Linux might want to export to userspace for debugging. This patch adds a sysfs file under 'opal/exports' for each property found there, and makes it read-only by root. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> [mpe: Drop counting of props, rename to attr, free on sysfs error, c'log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04powerpc/prom: Increase minimum RMA size to 512MBSukadev Bhattiprolu
When booting very large systems with a large initrd, we run out of space early in boot for either RTAS or the flattened device tree (FDT). Boot fails with messages like: Could not allocate memory for RTAS or No memory for flatten_device_tree (no room) Increasing the minimum RMA size to 512MB fixes the problem. This should not have an impact on smaller LPARs (with 256MB memory), as the firmware will cap the RMA to the memory assigned to the LPAR. Fix is based on input/discussions with Michael Ellerman. Thanks to Praveen K. Pandey for testing on a large system. Reported-by: Praveen K. Pandey <preveen.pandey@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04powerpc/powernv: Introduce address translation services for Nvlink2Alistair Popple
Nvlink2 supports address translation services (ATS) allowing devices to request address translations from an mmu known as the nest MMU which is setup to walk the CPU page tables. To access this functionality certain firmware calls are required to setup and manage hardware context tables in the nvlink processing unit (NPU). The NPU also manages forwarding of TLB invalidates (known as address translation shootdowns/ATSDs) to attached devices. This patch exports several methods to allow device drivers to register a process id (PASID/PID) in the hardware tables and to receive notification of when a device should stop issuing address translation requests (ATRs). It also adds a fault handler to allow device drivers to demand fault pages in. Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Fix up comment formatting, use flush_tlb_mm()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03Merge tag 'v4.11-rc5' into x86/mm, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_devAlistair Popple
The pnv_pci_get_{gpu|npu}_dev functions are used to find associations between nvlink PCIe devices and standard PCIe devices. However they lacked basic sanity checking which results in NULL pointer dereferencing if they are incorrect called can be harder to spot than an explicit WARN_ON. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03powerpc/mm: Remove stale comment about the DART holeOliver O'Halloran
The code to fix the problem it describes was removed in commit c40785ad305b ("powerpc/dart: Use a cachable DART"), and it uses the stupid comment style. Away it goooooooooooooes! Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03powerpc: Avoid taking a data miss on every userspace instruction missAnton Blanchard
Early on in do_page_fault() we call store_updates_sp(), regardless of the type of exception. For an instruction miss this doesn't make sense, because we only use this information to detect if a data miss is the result of a stack expansion instruction or not. Worse still, it results in a data miss within every userspace instruction miss handler, because we try and load the very instruction we are about to install a pte for! A simple exec microbenchmark runs 6% faster on POWER8 with this fix: #include <stdlib.h> #include <stdio.h> #include <unistd.h> int main(int argc, char *argv[]) { unsigned long left = atol(argv[1]); char leftstr[16]; if (left-- == 0) return 0; sprintf(leftstr, "%ld", left); execlp(argv[0], argv[0], leftstr, NULL); perror("exec failed\n"); return 0; } Pass the number of iterations on the command line (eg 10000) and time how long it takes to execute. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03debug: Fix __bug_table[] in arch linker scriptsPeter Zijlstra
The kbuild test robot reported this build failure on a number of architectures: > make.cross ARCH=arm > lib/lib.a(bug.o): In function `find_bug': > >> lib/bug.c:135: undefined reference to `__start___bug_table' > >> lib/bug.c:135: undefined reference to `__stop___bug_table' Caused by: 19d436268dde ("debug: Add _ONCE() logic to report_bug()") Which moved the BUG_TABLE from RO_DATA_SECTION() to RW_DATA_SECTION(), but a number of architectures don't use RW_DATA_SECTION(), so they ended up with no __bug_table[] ... Ideally all those would use RW_DATA_SECTION() in their linker scripts, but that's for another day. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Link: http://lkml.kernel.org/r/20170330154927.o6qmgfp4bdhrajbm@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03powerpc/book3s: Print task info if we take a machine check in user modeMichael Ellerman
For an MCE (Machine Check Exception) that hits while in user mode MSR(PR=1), print the task info to the console MCE error log. This may help to identify an application that triggered the MCE. After this patch the MCE console looks like: Severe Machine check interrupt [Recovered] NIP: [0000000010039778] PID: 762 Comm: ebizzy Initiator: CPU Error type: SLB [Multihit] Effective address: 0000000010039778 Severe Machine check interrupt [Not recovered] NIP: [0000000010039778] PID: 763 Comm: ebizzy Initiator: CPU Error type: UE [Page table walk ifetch] Effective address: 0000000010039778 ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004 Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03powerpc/book3s: Print the kernel function name in machine checkMahesh Salgaonkar
For D-side errors we print the load/store address that caused the machine check as 'Effective address'. But the instruction that may have caused the machine check can also be helpful, so in addition to printing the NIP, also print the kernel function name as well. After this patch the MCE console log would look like: Severe Machine check interrupt [Recovered] NIP [d00000001bc70194]: init_module+0x194/0x2b0 [bork_kernel] Initiator: CPU Error type: SLB [Parity] Effective address: d000000026de0000 Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-02Merge branch 'parisc-4.11-3' of ↵Al Viro
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc
2017-04-01powerpc/mm: Enable mappings above 128TBAneesh Kumar K.V
Not all user space application is ready to handle wide addresses. It's known that at least some JIT compilers use higher bits in pointers to encode their information. It collides with valid pointers with 512TB addresses and leads to crashes. To mitigate this, we are not going to allocate virtual address space above 128TB by default. But userspace can ask for allocation from full address space by specifying hint address (with or without MAP_FIXED) above 128TB. If hint address set above 128TB, but MAP_FIXED is not specified, we try to look for unmapped area by specified address. If it's already occupied, we look for unmapped area in *full* address space, rather than from 128TB window. This approach helps to easily make application's memory allocator aware about large address space without manually tracking allocated virtual address space. This is going to be a per mmap decision. ie, we can have some mmaps with larger addresses and other that do not. A sample memory layout looks like: 10000000-10010000 r-xp 00000000 fc:00 9057045 /home/max_addr_512TB 10010000-10020000 r--p 00000000 fc:00 9057045 /home/max_addr_512TB 10020000-10030000 rw-p 00010000 fc:00 9057045 /home/max_addr_512TB 10029630000-10029660000 rw-p 00000000 00:00 0 [heap] 7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0 7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so 7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so 7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190 /lib/powerpc64le-linux-gnu/libc-2.23.so 7fff83690000-7fff836a0000 rw-p 00000000 00:00 0 7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0 [vdso] 7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so 7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so 7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193 /lib/powerpc64le-linux-gnu/ld-2.23.so 7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0 [stack] 1000000000000-1000000010000 rw-p 00000000 00:00 0 1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>