summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2016-11-29powerpc/mm: Fix lazy icache flush on pre-POWER5Benjamin Herrenschmidt
On 64-bit CPUs with no-execute support and non-snooping icache, such as 970 or POWER4, we have a software mechanism to ensure coherency of the cache (using exec faults when needed). This was broken due to a logic error when the code was rewritten from assembly to C, previously the assembly code did: BEGIN_FTR_SECTION mr r4,r30 mr r5,r7 bl hash_page_do_lazy_icache END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE) Which tests that: (cpu_features & (NOEXECUTE | COHERENT_ICACHE)) == NOEXECUTE Which says that the current cpu does have NOEXECUTE, but does not have COHERENT_ICACHE. Fixes: 91f1da99792a ("powerpc/mm: Convert 4k hash insert to C") Fixes: 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C") Fixes: a43c0eb8364c ("powerpc/mm: Convert 4k insert from asm to C") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Change log verbosification] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-29powerpc/boot: Fix rebuild when changing kernel endianMichael Ellerman
Now that we don't set ARCH incorrectly when calling the boot Makefile, we can use the generic cpp_lds_S rule for converting our zImage.lds.S into zImage.lds. The main advantage of using the generic rule is that it correctly uses if_changed, which means we correctly regenerate the linker script when switching endian. Fixing that means we are finally able to build one endian and then rebuild the other endian without requiring to clean between builds. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-29powerpc/boot: All uses of if_changed should depend on FORCEMichael Ellerman
If we're using if_changed then we must depend on FORCE, so that if_changed gets a chance to check if something changed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-29powerpc: Stop passing ARCH=ppc64 to boot MakefileMichael Ellerman
Back in 2005 when the ppc/ppc64 merge started, we used to build the kernel code in arch/powerpc but use the boot code from arch/ppc or arch/ppc64 depending on whether we were building for 32 or 64-bit. Originally we called the boot Makefile passing ARCH=$(OLDARCH), where OLDARCH was ppc or ppc64. In commit 20f629549b30 ("powerpc: Make building the boot image work for both 32-bit and 64-bit") (2005-10-11) we split the call for 32/64-bit using an ifeq check, because the two Makefiles took different targets, and explicitly passed ARCH=ppc64 for the 64-bit case and ARCH=ppc for the 32-bit case. Then in commit 94b212c29f68 ("powerpc: Move ppc64 boot wrapper code over to arch/powerpc") (2005-11-16) we moved the boot code into arch/powerpc and dropped the ppc case, but kept passing ARCH=ppc64 to arch/powerpc/boot/Makefile. Since then there have been several more boot targets added, all of which have copied the ARCH=ppc64 setting, such that now we have four targets using it. Currently it seems that nothing actually uses the ARCH value, but that's basically just luck, and in particular it prevents us from using the generic cpp_lds_S rule. It's also clearly wrong, ARCH=ppc64 is dead, buried and cremated. Fix it by dropping the setting of ARCH completely, the correct value is exported by the top level Makefile. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-29powerpc/virtex: Use generic xilinx irqchip driverZubair Lutfullah Kakakhel
The Xilinx interrupt controller driver is now available in drivers/irqchip. Switch to using that driver. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-28crypto: crc32c-vpmsum - Rename CRYPT_CRC32C_VPMSUM optionJean Delvare
For consistency with the other 246 kernel configuration options, rename CRYPT_CRC32C_VPMSUM to CRYPTO_CRC32C_VPMSUM. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Anton Blanchard <anton@samba.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-11-28powerpc/boot: Fix build failure in 32-bit boot wrapperBen Hutchings
OPAL is not callable from 32-bit mode and the assembly code for it may not even build (depending on how binutils was configured). References: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpcspe&ver=4.8.7-1&stamp=1479203712 Fixes: 656ad58ef19e ("powerpc/boot: Add OPAL console to epapr wrappers") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/mm: Batch tlb flush when invalidating pte entriesAneesh Kumar K.V
This will improve the task exit case, by batching tlb invalidates. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/mm: update radix__pte_update to not do full mm tlb flushAneesh Kumar K.V
When we are updating a pte, we just need to flush the tlb mapping that pte. Right now we do a full mm flush because we don't track page size. Now that we have page size details in pte use that to do the optimized flush Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/mm: update radix__ptep_set_access_flag to not do full mm tlb flushAneesh Kumar K.V
When we are updating a pte, we just need to flush the tlb mapping that pte. Right now we do a full mm flush because we don't track the page size. Now that we have page size details in pte use that to do the optimized flush Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/mm: Add radix__tlb_flush_pte_p9_dd1()Aneesh Kumar K.V
Now that we have page size details encoded in pte using software pte bits, use that to find the page size needed for tlb flush. This function should only be used on P9 DD1, so give it a horrible name to make that clear. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/mm: Introduce _PAGE_LARGE software pte bitsAneesh Kumar K.V
This patch adds a new software defined pte bit. We use the reserved fields of ISA 3.0 pte definition since we will only be using this on DD1 code paths. We can possibly look at removing this code later. The software bit will be used to differentiate between 64K/4K and 2M ptes. This helps in finding the page size mapping by a pte so that we can do efficient tlb flush. We don't support 1G hugetlb pages yet. So we add a DEBUG WARN_ON to catch wrong usage. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/mm/hugetlb: Handle hugepage size supported by hash configAneesh Kumar K.V
W.r.t hash page table config, we support 16MB and 16GB as the hugepage size. Update the hstate_get_psize to handle 16M and 16G. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/mm: Rename hugetlb-radix.h to hugetlb.hAneesh Kumar K.V
We will start moving some book3s specific hugetlb functions there. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/64e: Don't branch to dot symbolsNicholas Piggin
This converts one that was missed by b1576fec7f4d ("powerpc: No need to use dot symbols when branching to a function"). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28powerpc/64e: Convert cmpi to cmpwi in head_64.SNicholas Piggin
From 80f23935cadb ("powerpc: Convert cmp to cmpd in idle enter sequence"): PowerPC's "cmp" instruction has four operands. Normally people write "cmpw" or "cmpd" for the second cmp operand 0 or 1. But, frequently people forget, and write "cmp" with just three operands. With older binutils this is silently accepted as if this was "cmpw", while often "cmpd" is wanted. With newer binutils GAS will complain about this for 64-bit code. For 32-bit code it still silently assumes "cmpw" is what is meant. In this case, cmpwi is called for, so this is just a build fix for new toolchains. Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28KVM: PPC: Book3S HV: Comment style and print format fixupsSuraj Jitindar Singh
Fix comment block to match kernel comment style. Fix print format from signed to unsigned. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-28KVM: PPC: Decrease the powerpc default halt poll max valueSuraj Jitindar Singh
KVM_HALT_POLL_NS_DEFAULT is an arch specific constant which sets the default value of the halt_poll_ns kvm module parameter which determines the global maximum halt polling interval. The current value for powerpc is 500000 (500us) which means that any repetitive workload with a period of less than that can drive the cpu usage to 100% where it may have been mostly idle without halt polling. This presents the possibility of a large increase in power usage with a comparatively small performance benefit. Reduce the default to 10000 (10us) and a user can tune this themselves to set their affinity for halt polling based on the trade off between power and performance which they are willing to make. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-28KVM: PPC: Book3S HV: Add check for module parameter halt_poll_nsSuraj Jitindar Singh
The kvm module parameter halt_poll_ns defines the global maximum halt polling interval and can be dynamically changed by writing to the /sys/module/kvm/parameters/halt_poll_ns sysfs file. However in kvm-hv this module parameter value is only ever checked when we grow the current polling interval for the given vcore. This means that if we decrease the halt_poll_ns value below the current polling interval we won't see any effect unless we try to grow the polling interval above the new max at some point or it happens to be shrunk below the halt_poll_ns value. Update the halt polling code so that we always check for a new module param value of halt_poll_ns and set the current halt polling interval to it if it's currently greater than the new max. This means that it's redundant to also perform this check in the grow_halt_poll_ns() function now. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-28KVM: PPC: Book3S HV: Use generic kvm module parametersSuraj Jitindar Singh
The previous patch exported the variables which back the module parameters of the generic kvm module. Now use these variables in the kvm-hv module so that any change to the generic module parameters will also have the same effect for the kvm-hv module. This removes the duplication of the kvm module parameters which was redundant and should reduce confusion when tuning them. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26Merge tag 'powerpc-4.9-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes marked for stable: - Set missing wakeup bit in LPCR on POWER9 - Fix the early OPAL console wrappers - Fixup kernel read only mapping Fixes for code merged this cycle: - Fix missing CRCs, add more asm-prototypes.h declarations" * tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Fixup kernel read only mapping powerpc/boot: Fix the early OPAL console wrappers powerpc: Fix missing CRCs, add more asm-prototypes.h declarations powerpc: Set missing wakeup bit in LPCR on POWER9
2016-11-26powerpc/mm/radix: Prevent kernel execution of user spaceBalbir Singh
ISA 3 defines new encoded access authority that allows instruction access prevention in privileged mode and allows normal access to problem state. This patch just enables IAMR (Instruction Authority Mask Register), enabling AMR would require more work. I've tested this with a buggy driver and a simple payload. The payload is specific to the build I've tested. mpe: Also tested with LKDTM: # echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT lkdtm: Performing direct entry EXEC_USERSPACE lkdtm: attempting ok execution at c0000000005bf560 lkdtm: attempting bad execution at 00003fff8d940000 Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x3fff8d940000 Oops: Kernel access of bad area, sig: 11 [#1] NIP: 00003fff8d940000 LR: c0000000005bfa58 CTR: 00003fff8d940000 REGS: c0000000f1fcf900 TRAP: 0400 Not tainted (4.9.0-rc5-compiler_gcc-6.2.0-00109-g956dbc06232a) MSR: 9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48002222 XER: 00000000 ... Call Trace: lkdtm_EXEC_USERSPACE+0x104/0x120 (unreliable) lkdtm_do_action+0x3c/0x80 direct_entry+0x100/0x1b0 full_proxy_write+0x94/0x100 __vfs_write+0x3c/0x1b0 vfs_write+0xcc/0x230 SyS_write+0x60/0x110 system_call+0x38/0xfc Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25powerpc/mm: Detect instruction fetch denied and reportBalbir Singh
ISA 3 allows for prevention of instruction fetch and execution of user mode pages. If such an error occurs, SRR1 bit 35 reports the error. We catch and report the error in do_page_fault(). Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25powerpc/mm/radix: Setup AMOR in HV mode to allow key 0Balbir Singh
Setup AMOR (Authority Mask Override Register) in HV mode so that the host and guest kernel can in turn setup IAMR. This allows us to enable key 0 in a following patch. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25powernv: Clear SPRN_PSSCR when a POWER9 CPU comes onlineGautham R. Shenoy
Ensure that PSSCR is set to a safe value corresponding to no state-loss each time a POWER9 CPU comes online. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25powerpc/xmon: Add 'dt' command to dump trace buffersMichael Ellerman
There is a nice interface for asking ftrace to dump all its tracing buffers. The only down side for use in xmon is that it uses printk. Depending on circumstances printk may not work when in xmon, but it also may, so add a 'dt' command which dumps the ftrace buffers, and add a note to the help to mentiont that it uses printk. Calling this routine also disables tracing, which is problematic if you return from xmon and expect the system to keep operating normally. So after we do the dump turn tracing back on. Both functions already have nop versions defined for when ftrace is not enabled, so we don't need any extra #ifdefs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25powerpc/mm: Fixup kernel read only mappingAneesh Kumar K.V
With commit e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO") we started using the ppp value 0b110 to map kernel readonly. But that facility was only added as part of ISA 2.04. For earlier ISA version only supported ppp bit value for readonly mapping is 0b011. (This implies both user and kernel get mapped using the same ppp bit value for readonly mapping.). Update the code such that for earlier architecture version we use ppp value 0b011 for readonly mapping. We don't differentiate between power5+ and power5 here and apply the new ppp bits only from power6 (ISA 2.05). This keep the changes minimal. This fixes issue with PS3 spu usage reported at https://lkml.kernel.org/r/rep.1421449714.geoff@infradead.org Fixes: e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO") Cc: stable@vger.kernel.org # v4.7+ Tested-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25powerpc/of_platform: Use builtin_platform_driverGeliang Tang
Use builtin_platform_driver() helper to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-25powerpc: Fix __cmpxchg() to take a volatile ptr againMichael Ellerman
In commit d0563a1297e2 ("powerpc: Implement {cmp}xchg for u8 and u16") we removed the volatile from __cmpxchg(). This is leading to warnings such as: drivers/gpu/drm/drm_lock.c: In function ‘drm_lock_take’: arch/powerpc/include/asm/cmpxchg.h:484:37: warning: passing argument 1 of ‘__cmpxchg’ discards ‘volatile’ qualifier from pointer target (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_, \ There doesn't seem to be consensus across architectures whether the argument is volatile or not, so at least for now put the volatile back. Fixes: d0563a1297e2 ("powerpc: Implement {cmp}xchg for u8 and u16") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-24Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge the topic branch we're sharing with the kvm-ppc tree.
2016-11-24powerpc/boot: Fix the early OPAL console wrappersOliver O'Halloran
When configured with CONFIG_PPC_EARLY_DEBUG_OPAL=y the kernel expects the OPAL entry and base addresses to be passed in r8 and r9 respectively. Currently the wrapper does not attempt to restore these values before entering the decompressed kernel which causes the kernel to branch into whatever happens to be in r9 when doing a write to the OPAL console in early boot. This patch adds a platform_ops hook that can be used to branch into the new kernel. The OPAL console driver patches this at runtime so that if the console is used it will be restored just prior to entering the kernel. Fixes: 656ad58ef19e ("powerpc/boot: Add OPAL console to epapr wrappers") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-24KVM: PPC: Correctly report KVM_CAP_PPC_ALLOC_HTABDavid Gibson
At present KVM on powerpc always reports KVM_CAP_PPC_ALLOC_HTAB as enabled. However, the ioctl() it advertises (KVM_PPC_ALLOCATE_HTAB) only actually works on KVM HV. On KVM PR it will fail with ENOTTY. QEMU already has a workaround for this, so it's not breaking things in practice, but it would be better to advertise this correctly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Fix compilation with unusual configurationsPaul Mackerras
This adds the "again" parameter to the dummy version of kvmppc_check_passthru(), so that it matches the real version. This fixes compilation with CONFIG_BOOK3S_64_HV set but CONFIG_KVM_XICS=n. This includes asm/smp.h in book3s_hv_builtin.c to fix compilation with CONFIG_SMP=n. The explicit inclusion is necessary to provide definitions of hard_smp_processor_id() and get_hard_smp_processor_id() in UP configs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Update kvmppc_set_arch_compat() for ISA v3.00Suraj Jitindar Singh
The function kvmppc_set_arch_compat() is used to determine the value of the processor compatibility register (PCR) for a guest running in a given compatibility mode. There is currently no support for v3.00 of the ISA. Add support for v3.00 of the ISA which adds an ISA v2.07 compatilibity mode to the PCR. We also add a check to ensure the processor we are running on is capable of emulating the chosen processor (for example a POWER7 cannot emulate a POWER8, similarly with a POWER8 and a POWER9). Based on work by: Paul Mackerras <paulus@ozlabs.org> [paulus@ozlabs.org - moved dummy PCR_ARCH_300 definition here; set guest_pcr_bit when arch_compat == 0, added comment.] Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Treat POWER9 CPU threads as independent subcoresPaul Mackerras
With POWER9, each CPU thread has its own MMU context and can be in the host or a guest independently of the other threads; there is still however a restriction that all threads must use the same type of address translation, either radix tree or hashed page table (HPT). Since we only support HPT guests on a HPT host at this point, we can treat the threads as being independent, and avoid all of the work of coordinating the CPU threads. To make this simpler, we introduce a new threads_per_vcore() function that returns 1 on POWER9 and threads_per_subcore on POWER7/8, and use that instead of threads_per_subcore or threads_per_core in various places. This also changes the value of the KVM_CAP_PPC_SMT capability on POWER9 systems from 4 to 1, so that userspace will not try to create VMs with multiple vcpus per vcore. (If userspace did create a VM that thought it was in an SMT mode, the VM might try to use the msgsndp instruction, which will not work as expected. In future it may be possible to trap and emulate msgsndp in order to allow VMs to think they are in an SMT mode, if only for the purpose of allowing migration from POWER8 systems.) With all this, we can now run guests on POWER9 as long as the host is running with HPT translation. Since userspace currently has no way to request radix tree translation for the guest, the guest has no choice but to use HPT translation. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Enable hypervisor virtualization interrupts while in guestPaul Mackerras
The new XIVE interrupt controller on POWER9 can direct external interrupts to the hypervisor or the guest. The interrupts directed to the hypervisor are controlled by an LPCR bit called LPCR_HVICE, and come in as a "hypervisor virtualization interrupt". This sets the LPCR bit so that hypervisor virtualization interrupts can occur while we are in the guest. We then also need to cope with exiting the guest because of a hypervisor virtualization interrupt. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Use stop instruction rather than nap on POWER9Paul Mackerras
POWER9 replaces the various power-saving mode instructions on POWER8 (doze, nap, sleep and rvwinkle) with a single "stop" instruction, plus a register, PSSCR, which controls the depth of the power-saving mode. This replaces the use of the nap instruction when threads are idle during guest execution with the stop instruction, and adds code to set PSSCR to a value which will allow an SMT mode switch while the thread is idle (given that the core as a whole won't be idle in these cases). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9Paul Mackerras
POWER9 includes a new interrupt controller, called XIVE, which is quite different from the XICS interrupt controller on POWER7 and POWER8 machines. KVM-HV accesses the XICS directly in several places in order to send and clear IPIs and handle interrupts from PCI devices being passed through to the guest. In order to make the transition to XIVE easier, OPAL firmware will include an emulation of XICS on top of XIVE. Access to the emulated XICS is via OPAL calls. The one complication is that the EOI (end-of-interrupt) function can now return a value indicating that another interrupt is pending; in this case, the XIVE will not signal an interrupt in hardware to the CPU, and software is supposed to acknowledge the new interrupt without waiting for another interrupt to be delivered in hardware. This adapts KVM-HV to use the OPAL calls on machines where there is no XICS hardware. When there is no XICS, we look for a device-tree node with "ibm,opal-intc" in its compatible property, which is how OPAL indicates that it provides XICS emulation. In order to handle the EOI return value, kvmppc_read_intr() has become kvmppc_read_one_intr(), with a boolean variable passed by reference which can be set by the EOI functions to indicate that another interrupt is pending. The new kvmppc_read_intr() keeps calling kvmppc_read_one_intr() until there are no more interrupts to process. The return value from kvmppc_read_intr() is the largest non-zero value of the returns from kvmppc_read_one_intr(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9Paul Mackerras
On POWER9, the msgsnd instruction is able to send interrupts to other cores, as well as other threads on the local core. Since msgsnd is generally simpler and faster than sending an IPI via the XICS, we use msgsnd for all IPIs sent by KVM on POWER9. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9Paul Mackerras
POWER9 adds new capabilities to the tlbie (TLB invalidate entry) and tlbiel (local tlbie) instructions. Both instructions get a set of new parameters (RIC, PRS and R) which appear as bits in the instruction word. The tlbiel instruction now has a second register operand, which contains a PID and/or LPID value if needed, and should otherwise contain 0. This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9 as well as older processors. Since we only handle HPT guests so far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction word as on previous processors, so we don't need to conditionally execute different instructions depending on the processor. The local flush on first entry to a guest in book3s_hv_rmhandlers.S is a loop which depends on the number of TLB sets. Rather than using feature sections to set the number of iterations based on which CPU we're on, we now work out this number at VM creation time and store it in the kvm_arch struct. That will make it possible to get the number from the device tree in future, which will help with compatibility with future processors. Since mmu_partition_table_set_entry() does a global flush of the whole LPID, we don't need to do the TLB flush on first entry to the guest on each processor. Therefore we don't set all bits in the tlb_need_flush bitmap on VM startup on POWER9. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRsPaul Mackerras
This adds code to handle two new guest-accessible special-purpose registers on POWER9: TIDR (thread ID register) and PSSCR (processor stop status and control register). They are context-switched between host and guest, and the guest values can be read and set via the one_reg interface. The PSSCR contains some fields which are guest-accessible and some which are only accessible in hypervisor mode. We only allow the guest-accessible fields to be read or set by userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Adjust host/guest context switch for POWER9Paul Mackerras
Some special-purpose registers that were present and accessible by guests on POWER8 no longer exist on POWER9, so this adds feature sections to ensure that we don't try to context-switch them when going into or out of a guest on POWER9. These are all relatively obscure, rarely-used registers, but we had to context-switch them on POWER8 to avoid creating a covert channel. They are: SPMC1, SPMC2, MMCRS, CSIGR, TACR, TCSCR, and ACOP. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9Paul Mackerras
On POWER9, the SDR1 register (hashed page table base address) is no longer used, and instead the hardware reads the HPT base address and size from the partition table. The partition table entry also contains the bits that specify the page size for the VRMA mapping, which were previously in the LPCR. The VPM0 bit of the LPCR is now reserved; the processor now always uses the VRMA (virtual real-mode area) mechanism for guest real-mode accesses in HPT mode, and the RMO (real-mode offset) mechanism has been dropped. When entering or exiting the guest, we now only have to set the LPIDR (logical partition ID register), not the SDR1 register. There is also no requirement now to transition via a reserved LPID value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9Paul Mackerras
This adapts the KVM-HV hashed page table (HPT) code to read and write HPT entries in the new format defined in Power ISA v3.00 on POWER9 machines. The new format moves the B (segment size) field from the first doubleword to the second, and trims some bits from the AVA (abbreviated virtual address) and ARPN (abbreviated real page number) fields. As far as possible, the conversion is done when reading or writing the HPT entries, and the rest of the code continues to use the old format. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras
This merges in the ppc-kvm topic branch to get changes to arch/powerpc code that are necessary for adding POWER9 KVM support. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-23powerpc/32: Change the stack protector canary value per taskChristophe Leroy
Partially copied from commit df0698be14c66 ("ARM: stack protector: change the canary value per task") A new random value for the canary is stored in the task struct whenever a new task is forked. This is meant to allow for different canary values per task. On powerpc, GCC expects the canary value to be found in a global variable called __stack_chk_guard. So this variable has to be updated with the value stored in the task struct whenever a task switch occurs. Because the variable GCC expects is global, this cannot work on SMP unfortunately. So, on SMP, the same initial canary value is kept throughout, making this feature a bit less effective although it is still useful. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-23powerpc: Initial stack protector (-fstack-protector) supportChristophe Leroy
Partialy copied from commit c743f38013aef ("ARM: initial stack protector (-fstack-protector) support") This is the very basic stuff without the changing canary upon task switch yet. Just the Kconfig option and a constant canary value initialized at boot time. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-23powerpc: Implement {cmp}xchg for u8 and u16Pan Xinhui
Implement xchg{u8,u16}{local,relaxed}, and cmpxchg{u8,u16}{,local,acquire,relaxed}. It works on all ppc. remove volatile of first parameter in __cmpxchg_local and __cmpxchg Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Acked-by: Boqun Feng <boqun.feng@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-23powerpc/pseries/ibmebus: Remove legacy suspend/resume supportLars-Peter Clausen
There are no ibmebus driver that make use of legacy suspend/resume. This patch removes the support for it from ibmebus framework, new ibmebus driver (as unlikely as they are) wanting to use suspend/resume should use dev_pm_ops. Since there aren't any special bus specific things to do during suspend/resume and since the PM core will automatically fallback directly to using the device's PM ops if no bus PM ops are specified there is no need to have any special ibmebus PM ops at all. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>