summaryrefslogtreecommitdiff
path: root/arch/powerpc/include/asm
AgeCommit message (Collapse)Author
2019-08-30powerpc/pseries/svm: Force SWIOTLB for secure guestsAnshuman Khandual
SWIOTLB checks range of incoming CPU addresses to be bounced and sees if the device can access it through its DMA window without requiring bouncing. In such cases it just chooses to skip bouncing. But for cases like secure guests on powerpc platform all addresses need to be bounced into the shared pool of memory because the host cannot access it otherwise. Hence the need to do the bouncing is not related to device's DMA window and use of bounce buffers is forced by setting swiotlb_force. Also, connect the shared memory conversion functions into the ARCH_HAS_MEM_ENCRYPT hooks and call swiotlb_update_mem_attributes() to convert SWIOTLB's memory pool to shared memory. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [ bauerman: Use ARCH_HAS_MEM_ENCRYPT hooks to share swiotlb memory pool. ] Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-15-bauerman@linux.ibm.com
2019-08-30powerpc/pseries/svm: Unshare all pages before kexecing a new kernelRam Pai
A new kernel deserves a clean slate. Any pages shared with the hypervisor is unshared before invoking the new kernel. However there are exceptions. If the new kernel is invoked to dump the current kernel, or if there is a explicit request to preserve the state of the current kernel, unsharing of pages is skipped. NOTE: While testing crashkernel, make sure at least 256M is reserved for crashkernel. Otherwise SWIOTLB allocation will fail and crash kernel will fail to boot. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-11-bauerman@linux.ibm.com
2019-08-30powerpc/pseries/svm: Use shared memory for Debug Trace Log (DTL)Anshuman Khandual
Secure guests need to share the DTL buffers with the hypervisor. To that end, use a kmem_cache constructor which converts the underlying buddy allocated SLUB cache pages into shared memory. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-10-bauerman@linux.ibm.com
2019-08-30powerpc/pseries/svm: Use shared memory for LPPACA structuresAnshuman Khandual
LPPACA structures need to be shared with the host. Hence they need to be in shared memory. Instead of allocating individual chunks of memory for a given structure from memblock, a contiguous chunk of memory is allocated and then converted into shared memory. Subsequent allocation requests will come from the contiguous chunk which will be always shared memory for all structures. While we are able to use a kmem_cache constructor for the Debug Trace Log, LPPACAs are allocated very early in the boot process (before SLUB is available) so we need to use a simpler scheme here. Introduce helper is_svm_platform() which uses the S bit of the MSR to tell whether we're running as a secure guest. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-9-bauerman@linux.ibm.com
2019-08-30powerpc: Introduce the MSR_S bitSukadev Bhattiprolu
Protected Execution Facility (PEF) is an architectural change for POWER 9 that enables Secure Virtual Machines (SVMs). When enabled, PEF adds a new higher privileged mode, called Ultravisor mode, to POWER architecture. The hardware changes include the following: * There is a new bit in the MSR that determines whether the current process is running in secure mode, MSR(S) bit 41. MSR(S)=1, process is in secure mode, MSR(s)=0 process is in normal mode. * The MSR(S) bit can only be set by the Ultravisor. * HRFID cannot be used to set the MSR(S) bit. If the hypervisor needs to return to a SVM it must use an ultracall. It can determine if the VM it is returning to is secure. * The privilege of a process is now determined by three MSR bits, MSR(S, HV, PR). In each of the tables below the modes are listed from least privilege to highest privilege. The higher privilege modes can access all the resources of the lower privilege modes. **Secure Mode MSR Settings** +---+---+---+---------------+ | S | HV| PR|Privilege | +===+===+===+===============+ | 1 | 0 | 1 | Problem | +---+---+---+---------------+ | 1 | 0 | 0 | Privileged(OS)| +---+---+---+---------------+ | 1 | 1 | 0 | Ultravisor | +---+---+---+---------------+ | 1 | 1 | 1 | Reserved | +---+---+---+---------------+ **Normal Mode MSR Settings** +---+---+---+---------------+ | S | HV| PR|Privilege | +===+===+===+===============+ | 0 | 0 | 1 | Problem | +---+---+---+---------------+ | 0 | 0 | 0 | Privileged(OS)| +---+---+---+---------------+ | 0 | 1 | 0 | Hypervisor | +---+---+---+---------------+ | 0 | 1 | 1 | Problem (HV) | +---+---+---+---------------+ Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ cclaudio: Update the commit message ] Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-7-bauerman@linux.ibm.com
2019-08-30powerpc/pseries/svm: Add helpers for UV_SHARE_PAGE and UV_UNSHARE_PAGERam Pai
These functions are used when the guest wants to grant the hypervisor access to certain pages. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-6-bauerman@linux.ibm.com
2019-08-30powerpc/prom_init: Add the ESM call to prom_initRam Pai
Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure mode. Pass kernel base address and FDT address so that the Ultravisor is able to verify the integrity of the VM using information from the ESM blob. Add "svm=" command line option to turn on switching to secure mode. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ andmike: Generate an RTAS os-term hcall when the ESM ucall fails. ] Signed-off-by: Michael Anderson <andmike@linux.ibm.com> [ bauerman: Cleaned up the code a bit. ] Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-5-bauerman@linux.ibm.com
2019-08-30powerpc/pseries: Introduce option to build secure virtual machinesThiago Jung Bauermann
Introduce CONFIG_PPC_SVM to control support for secure guests and include Ultravisor-related helpers when it is selected Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-3-bauerman@linux.ibm.com
2019-08-30Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge our ppc-kvm topic branch to bring in the Ultravisor support patches.
2019-08-30powerpc/kvm: Use UV_RETURN ucall to return to ultravisorSukadev Bhattiprolu
When an SVM makes an hypercall or incurs some other exception, the Ultravisor usually forwards (a.k.a. reflects) the exceptions to the Hypervisor. After processing the exception, Hypervisor uses the UV_RETURN ultracall to return control back to the SVM. The expected register state on entry to this ultracall is: * Non-volatile registers are restored to their original values. * If returning from an hypercall, register R0 contains the return value (unlike other ultracalls) and, registers R4 through R12 contain any output values of the hypercall. * R3 contains the ultracall number, i.e UV_RETURN. * If returning with a synthesized interrupt, R2 contains the synthesized interrupt number. Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-8-cclaudio@linux.ibm.com
2019-08-30powerpc/mm: Write to PTCR only if ultravisor disabledClaudio Carvalho
In ultravisor enabled systems, PTCR becomes ultravisor privileged only for writing and an attempt to write to it will cause a Hypervisor Emulation Assitance interrupt. This patch uses the set_ptcr_when_no_uv() function to restrict PTCR writing to only when ultravisor is disabled. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-6-cclaudio@linux.ibm.com
2019-08-30powerpc/mm: Use UV_WRITE_PATE ucall to register a PATEMichael Anderson
When Ultravisor (UV) is enabled, the partition table is stored in secure memory and can only be accessed via the UV. The Hypervisor (HV) however maintains a copy of the partition table in normal memory to allow Nest MMU translations to occur (for normal VMs). The HV copy includes partition table entries (PATE)s for secure VMs which would currently be unused (Nest MMU translations cannot access secure memory) but they would be needed as we add functionality. This patch adds the UV_WRITE_PATE ucall which is used to update the PATE for a VM (both normal and secure) when Ultravisor is enabled. Signed-off-by: Michael Anderson <andmike@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [ cclaudio: Write the PATE in HV's table before doing that in UV's ] Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Reviewed-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-5-cclaudio@linux.ibm.com
2019-08-30powerpc/powernv: Introduce FW_FEATURE_ULTRAVISORClaudio Carvalho
In PEF enabled systems, some of the resources which were previously hypervisor privileged are now ultravisor privileged and controlled by the ultravisor firmware. This adds FW_FEATURE_ULTRAVISOR to indicate if PEF is enabled. The host kernel can use FW_FEATURE_ULTRAVISOR, for instance, to skip accessing resources (e.g. PTCR and LDBAR) in case PEF is enabled. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> [ andmike: Device node name to "ibm,ultravisor" ] Signed-off-by: Michael Anderson <andmike@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-4-cclaudio@linux.ibm.com
2019-08-30powerpc/kernel: Add ucall_norets() ultravisor call handlerClaudio Carvalho
The ultracalls (ucalls for short) allow the Secure Virtual Machines (SVM)s and hypervisor to request services from the ultravisor such as accessing a register or memory region that can only be accessed when running in ultravisor-privileged mode. This patch adds the ucall_norets() ultravisor call handler. The specific service needed from an ucall is specified in register R3 (the first parameter to the ucall). Other parameters to the ucall, if any, are specified in registers R4 through R12. Return value of all ucalls is in register R3. Other output values from the ucall, if any, are returned in registers R4 through R12. Each ucall returns specific error codes, applicable in the context of the ucall. However, like with the PowerPC Architecture Platform Reference (PAPR), if no specific error code is defined for a particular situation, then the ucall will fallback to an erroneous parameter-position based code. i.e U_PARAMETER, U_P2, U_P3 etc depending on the ucall parameter that may have caused the error. Every host kernel (powernv) needs to be able to do ucalls in case it ends up being run in a machine with ultravisor enabled. Otherwise, the kernel may crash early in boot trying to access ultravisor resources, for instance, trying to set the partition table entry 0. Secure guests also need to be able to do ucalls and its kernel may not have CONFIG_PPC_POWERNV=y. For that reason, the ucall.S file is placed under arch/powerpc/kernel. If ultravisor is not enabled, the ucalls will be redirected to the hypervisor which must handle/fail the call. Thanks to inputs from Ram Pai and Michael Anderson. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-3-cclaudio@linux.ibm.com
2019-08-30powerpc: Add PowerPC Capabilities ELF noteClaudio Carvalho
Add the PowerPC name and the PPC_ELFNOTE_CAPABILITIES type in the kernel binary ELF note. This type is a bitmap that can be used to advertise kernel capabilities to userland. This patch also defines PPCCAP_ULTRAVISOR_BIT as being the bit zero. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> [ maxiwell: Define the 'PowerPC' type in the elfnote.h ] Signed-off-by: Maxiwell S. Garcia <maxiwell@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829155021.2915-2-maxiwell@linux.ibm.com
2019-08-30powerpc/powernv/ioda: Remove obsolete iommu_table_ops::exchange callbacksAlexey Kardashevskiy
As now we have xchg_no_kill/tce_kill, these are not used anymore so remove them. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-6-aik@ozlabs.ru
2019-08-30powerpc/powernv/ioda: Split out TCE invalidation from TCE updatesAlexey Kardashevskiy
At the moment updates in a TCE table are made by iommu_table_ops::exchange which update one TCE and invalidates an entry in the PHB/NPU TCE cache via set of registers called "TCE Kill" (hence the naming). Writing a TCE is a simple xchg() but invalidating the TCE cache is a relatively expensive OPAL call. Mapping a 100GB guest with PCI+NPU passed through devices takes about 20s. Thankfully we can do better. Since such big mappings happen at the boot time and when memory is plugged/onlined (i.e. not often), these requests come in 512 pages so we call call OPAL 512 times less which brings 20s from the above to less than 10s. Also, since TCE caches can be flushed entirely, calling OPAL for 512 TCEs helps skiboot [1] to decide whether to flush the entire cache or not. This implements 2 new iommu_table_ops callbacks: - xchg_no_kill() to update a single TCE with no TCE invalidation; - tce_kill() to invalidate multiple TCEs. This uses the same xchg_no_kill() callback for IODA1/2. This implements 2 new wrappers on top of the new callbacks similar to the existing iommu_tce_xchg(). This does not use the new callbacks yet, the next patches will; so this should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-2-aik@ozlabs.ru
2019-08-28powerpc: use the generic dma coherent remap allocatorChristoph Hellwig
This switches to using common code for the DMA allocations, including potential use of the CMA allocator if configured. Switching to the generic code enables DMA allocations from atomic context, which is required by the DMA API documentation, and also adds various other minor features drivers start relying upon. It also makes sure we have on tested code base for all architectures that require uncached pte bits for coherent DMA allocations. Another advantage is that consistent memory allocations now share the general vmalloc pool instead of needing an explicit careout from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> # tested on 8xx Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190814132230.31874-2-hch@lst.de
2019-08-28powerpc/32: drop CPU_FTR_UNIFIED_ID_CACHEChristophe Leroy
Only 601 and e200 have unified I/D cache. Drop the feature and use CONFIG_PPC_BOOK3S_601 and CONFIG_E200. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b5902144266d2f4eed1ffea53915bd0245841e02.1566834712.git.christophe.leroy@c-s.fr
2019-08-28powerpc/32s: drop CPU_FTR_USE_RTC featureChristophe Leroy
CPU_FTR_USE_RTC feature only applies to powerpc601. Drop this feature and replace it with tests on CONFIG_PPC_BOOK3S_601. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/170411e2360861f4a95c21faad43519a08bc4040.1566834712.git.christophe.leroy@c-s.fr
2019-08-28powerpc/32s: get rid of CPU_FTR_601 featureChristophe Leroy
Now that 601 is exclusive from other 6xx, CPU_FTR_601 and associated fixups are useless. Drop this feature and use #ifdefs instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ecdb7194a17dbfa01865df6a82979533adc2c70b.1566834712.git.christophe.leroy@c-s.fr
2019-08-28powerpc/prom: convert PROM_BUG() to standard trapChristophe Leroy
Prior to commit 1bd98d7fbaf5 ("ppc64: Update BUG handling based on ppc32"), BUG() family was using BUG_ILLEGAL_INSTRUCTION which was an invalid instruction opcode to trap into program check exception. That commit converted them to using standard trap instructions, but prom/prom_init and their PROM_BUG() macro were left over. head_64.S and exception-64s.S were left aside as well. Convert them to using the standard BUG infrastructure. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cdaf4bbbb64c288a077845846f04b12683f8875a.1566817807.git.christophe.leroy@c-s.fr
2019-08-27powerpc/spinlocks: Fix oops in __spin_yield() on bare metalChristopher M. Riedl
Booting w/ppc64le_defconfig + CONFIG_PREEMPT on bare metal results in the oops below due to calling into __spin_yield() when not running in an SPLPAR, which means lppaca pointers are NULL. We fixed a similar case previously in commit a6201da34ff9 ("powerpc: Fix oops due to bad access of lppaca on bare metal"), by adding SPLPAR checks in lppaca_shared_proc(). However when PREEMPT is enabled we can call __spin_yield() directly from arch_spin_yield(). To fix it add spin_yield() and rw_yield() which check that shared-processor LPAR is enabled before calling the SPLPAR-only implementation of each. BUG: Kernel NULL pointer dereference at 0x00000100 Faulting instruction address: 0xc000000000097f88 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.2.0-rc6-00491-g249155c20f9b #28 NIP: c000000000097f88 LR: c000000000c07a88 CTR: c00000000015ca10 REGS: c0000000727079f0 TRAP: 0300 Not tainted (5.2.0-rc6-00491-g249155c20f9b) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 84000424 XER: 20040000 CFAR: c000000000c07a84 DAR: 0000000000000100 DSISR: 00080000 IRQMASK: 1 GPR00: c000000000c07a88 c000000072707c80 c000000001546300 c00000007be38a80 GPR04: c0000000726f0c00 0000000000000002 c00000007279c980 0000000000000100 GPR08: c000000001581b78 0000000080000001 0000000000000008 c00000007279c9b0 GPR12: 0000000000000000 c000000001730000 c000000000142558 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: c00000007be38a80 c000000000c002f4 0000000000000000 0000000000000000 GPR28: c000000072221a00 c0000000726c2600 c00000007be38a80 c00000007be38a80 NIP [c000000000097f88] __spin_yield+0x48/0xa0 LR [c000000000c07a88] __raw_spin_lock+0xb8/0xc0 Call Trace: [c000000072707c80] [c000000072221a00] 0xc000000072221a00 (unreliable) [c000000072707cb0] [c000000000bffb0c] __schedule+0xbc/0x850 [c000000072707d70] [c000000000c002f4] schedule+0x54/0x130 [c000000072707da0] [c0000000001427dc] kthreadd+0x28c/0x2b0 [c000000072707e20] [c00000000000c1cc] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 4d9e0020 552a043e 210a07ff 79080fe0 0b080000 3d020004 3908b878 794a1f24 e8e80000 7ce7502a e8e70000 38e70100 <7ca03c2c> 70a70001 78a50020 4d820020 ---[ end trace 474d6b2b8fc5cb7e ]--- Fixes: 499dcd41378e ("powerpc/64s: Allocate LPPACAs individually") Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> [mpe: Reword change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813031314.1828-4-cmr@informatik.wtf
2019-08-27powerpc/spinlocks: Rename SPLPAR-only spinlocksChristopher M. Riedl
The __rw_yield and __spin_yield locks only pertain to SPLPAR mode. Rename them to make this relationship obvious. Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813031314.1828-3-cmr@informatik.wtf
2019-08-27powerpc/spinlocks: Refactor SHARED_PROCESSORChristopher M. Riedl
Determining if a processor is in shared processor mode is not a constant so don't hide it behind a #define. Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813031314.1828-2-cmr@informatik.wtf
2019-08-27powerpc/64: optimise LOAD_REG_IMMEDIATE_SYM()Christophe Leroy
Optimise LOAD_REG_IMMEDIATE_SYM() using a temporary register to parallelise operations. It reduces the path from 5 to 3 instructions. Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bad41ed02531bb0382420cbab50a0d7153b71767.1566311636.git.christophe.leroy@c-s.fr
2019-08-27powerpc: rewrite LOAD_REG_IMMEDIATE() as an intelligent macroChristophe Leroy
Today LOAD_REG_IMMEDIATE() is a basic #define which loads all parts on a value into a register, including the parts that are NUL. This means always 2 instructions on PPC32 and always 5 instructions on PPC64. And those instructions cannot run in parallele as they are updating the same register. Ex: LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in: 3c 20 00 00 lis r1,0 60 21 00 00 ori r1,r1,0 78 21 07 c6 rldicr r1,r1,32,31 64 21 00 00 oris r1,r1,0 60 21 40 00 ori r1,r1,16384 Rewrite LOAD_REG_IMMEDIATE() with GAS macro in order to skip the parts that are NUL. Rename existing LOAD_REG_IMMEDIATE() as LOAD_REG_IMMEDIATE_SYM() and use that one for loading value of symbols which are not known at compile time. Now LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) in head_64.S results in: 38 20 40 00 li r1,16384 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d60ce8dd3a383c7adbfc322bf1d53d81724a6000.1566311636.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: split out early ioremap path.Christophe Leroy
ioremap does things differently depending on whether SLAB is available or not at different levels. Try to separate the early path from the beginning. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3acd2dbe04b04f111475e7a59f2b6f2ab9b95ab6.1566309263.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: refactor ioremap vm area setup.Christophe Leroy
PPC32 and PPC64 are doing the same once SLAB is available. Create a do_ioremap() function that calls get_vm_area and do the mapping. For PPC64, we add the 4K PFN hack sanity check to __ioremap_caller() in order to avoid using __ioremap_at(). Other checks in __ioremap_at() are irrelevant for __ioremap_caller(). On PPC64, VM area is allocated in the range [ioremap_bot ; IOREMAP_END] On PPC32, VM area is allocated in the range [VMALLOC_START ; VMALLOC_END] Lets define IOREMAP_START is ioremap_bot for PPC64, and alias IOREMAP_START/END to VMALLOC_START/END on PPC32 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/42e7e36ad32e0fdf76692426cc642799c9f689b8.1566309263.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: refactor ioremap_range() and use ioremap_page_range()Christophe Leroy
book3s64's ioremap_range() is almost same as fallback ioremap_range(), except that it calls radix__ioremap_range() when radix is enabled. radix__ioremap_range() is also very similar to the other ones, expect that it calls ioremap_page_range when slab is available. PPC32 __ioremap_caller() have a loop doing the same thing as ioremap_range() so use it on PPC32 as well. Lets keep only one version of ioremap_range() which calls ioremap_page_range() on all platforms when slab is available. At the same time, drop the nid parameter which is not used. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4b1dca7096b01823b101be7338983578641547f1.1566309263.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: make ioremap_bot common to allChristophe Leroy
Drop multiple definitions of ioremap_bot and make one common to all subarches. Only CONFIG_PPC_BOOK3E_64 had a global static init value for ioremap_bot. Now ioremap_bot is set in early_init_mmu_global(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/920eebfd9f36f14c79d1755847f5bf7c83703bdd.1566309262.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: rework io-workaround invocation.Christophe Leroy
ppc_md.ioremap() is only used for I/O workaround on CELL platform, so indirect function call can be avoided. This patch reworks the io-workaround and ioremap() functions to use the global 'io_workaround_inited' flag for the activation of io-workaround. When CONFIG_PPC_IO_WORKAROUNDS or CONFIG_PPC_INDIRECT_MMIO are not selected, the I/O workaround ioremap() voids and the global flag is not used. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5fa3ef069fbd0f152512afaae19e7a60161454cf.1566309262.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: drop function __ioremap()Christophe Leroy
__ioremap() is not used anymore, drop it. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ccc439f481a0884e00a6be1bab44bab2a4477fea.1566309262.git.christophe.leroy@c-s.fr
2019-08-27powerpc/mm: drop ppc_md.iounmap() and __iounmap()Christophe Leroy
ppc_md.iounmap() is never set, drop it. Once ppc_md.iounmap() is gone, iounmap() remains the only user of __iounmap() and iounmap() does nothing else than calling __iounmap(). So drop iounmap() and make __iounmap() the new iounmap(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d73ba92bb7a387cc58cc34666d7f5158a45851b0.1566309262.git.christophe.leroy@c-s.fr
2019-08-27powerpc: remove the ppc44x ocm.c fileChristoph Hellwig
The on chip memory allocator is entirely unused in the kernel tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7b1668941ad1041d08b19167030868de5840b153.1566309262.git.christophe.leroy@c-s.fr
2019-08-27KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required ↵Paul Mackerras
functions There are some POWER9 machines where the OPAL firmware does not support the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls. The impact of this is that a guest using XIVE natively will not be able to be migrated successfully. On the source side, the get_attr operation on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute will fail; on the destination side, the set_attr operation for the same attribute will fail. This adds tests for the existence of the OPAL get/set queue state functions, and if they are not supported, the XIVE-native KVM device is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false. Userspace can then either provide a software emulation of XIVE, or else tell the guest that it does not have a XIVE controller available to it. Cc: stable@vger.kernel.org # v5.2+ Fixes: 3fab2d10588e ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-08-23KVM: PPC: Book3S HV: Define usage types for rmap array in guest memslotSuraj Jitindar Singh
The rmap array in the guest memslot is an array of size number of guest pages, allocated at memslot creation time. Each rmap entry in this array is used to store information about the guest page to which it corresponds. For example for a hpt guest it is used to store a lock bit, rc bits, a present bit and the index of a hpt entry in the guest hpt which maps this page. For a radix guest which is running nested guests it is used to store a pointer to a linked list of nested rmap entries which store the nested guest physical address which maps this guest address and for which there is a pte in the shadow page table. As there are currently two uses for the rmap array, and the potential for this to expand to more in the future, define a type field (being the top 8 bits of the rmap entry) to be used to define the type of the rmap entry which is currently present and define two values for this field for the two current uses of the rmap array. Since the nested case uses the rmap entry to store a pointer, define this type as having the two high bits set as is expected for a pointer. Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-08-22Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - A few more RCU flavor consolidation cleanups. - Miscellaneous fixes. - Updates to RCU's list-traversal macros improving lockdep usability. - Torture-test updates. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - LKMM updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-22powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse()Sam Bobroff
There are no users of the early-out return value from eeh_pe_dev_traverse(), so remove it. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c648070f5b28fe8ca1880b48e64b267959ffd369.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: Convert log messages to eeh_edev_* macrosSam Bobroff
Convert existing messages, where appropriate, to use the eeh_edev_* logging macros. The only effect should be minor adjustments to the log messages, apart from: - A new message in pseries_eeh_probe() "Probing device" to match the powernv case. - The "Probing device" message in pnv_eeh_probe() is now generated slightly later, which will mean that it is no longer emitted for devices that aren't probed due to the initial checks. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ce505a0a7a4a5b0367f0f40f8b26e7c0a9cf4cb7.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: Introduce EEH edev logging macrosSam Bobroff
Now that struct eeh_dev includes the BDFN of it's PCI device, make use of it to replace eeh_edev_info() with a set of dev_dbg()-style macros that only need a struct edev. With the BDFN available without the struct pci_dev, eeh_pci_name() is now unnecessary, so remove it. While only the "info" level function is used here, the others will be used in followup work. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f90ae9a53d762be7b0ccbad79e62b5a1b4f4996e.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: Add bdfn field to eeh_devOliver O'Halloran
Preparation for removing pci_dn from the powernv EEH code. The only thing we really use pci_dn for is to get the bdfn of the device for config space accesses, so adding that information to eeh_dev reduces the need to carry around the pci_dn. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [SB: Re-wrapped commit message, fixed whitespace damage.] Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e458eb69a1f591d8a120782f23a8506b15d3c654.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: Refactor around eeh_probe_devices()Sam Bobroff
Now that EEH support for all devices (on PowerNV and pSeries) is provided by the pcibios bus add device hooks, eeh_probe_devices() and eeh_addr_cache_build() are redundant and can be removed. Move the EEH enabled message into it's own function so that it can be called from multiple places. Note that previously on pSeries, useless EEH sysfs files were created for some devices that did not have EEH support and this change prevents them from being created. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/33b0a6339d5ac88693de092d6fba984f2a5add66.1565930772.git.sbobroff@linux.ibm.com
2019-08-22powerpc/eeh: Initialize EEH address cache earlierSam Bobroff
The EEH address cache is currently initialized and populated by a single function: eeh_addr_cache_build(). While the initial population of the cache can only be done once resources are allocated, initialization (just setting up a spinlock) could be done much earlier. So move the initialization step into a separate function and call it from a core_initcall (rather than a subsys initcall). This will allow future work to make use of the cache during boot time PCI scanning. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0557206741bffee76cdfff042f65321f6f7a5b41.1565930772.git.sbobroff@linux.ibm.com
2019-08-21powerpc: add machine check safe copy_to_userSantosh Sivaraj
Use memcpy_mcsafe() implementation to define copy_to_user_mcsafe() Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-8-santosh@fossix.org
2019-08-21powerpc/memcpy: Add memcpy_mcsafe for pmemBalbir Singh
The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to convert machine check exceptions into a return value on failure in case a machine check exception is encountered during the memcpy. The return value is the number of bytes remaining to be copied. This patch largely borrows from the copyuser_power7 logic and does not add the VMX optimizations, largely to keep the patch simple. If needed those optimizations can be folded in. Signed-off-by: Balbir Singh <bsingharora@gmail.com> [arbab@linux.ibm.com: Added symbol export] Co-developed-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-7-santosh@fossix.org
2019-08-21powerpc/mce: Handle UE event for memcpy_mcsafeBalbir Singh
If we take a UE on one of the instructions with a fixup entry, set nip to continue execution at the fixup entry. Stop processing the event further or print it. Co-developed-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820081352.8641-6-santosh@fossix.org
2019-08-20powerpc/64s/radix: Remove redundant pfn_pte bitop, add VM_BUG_ONNicholas Piggin
pfn_pte is never given a pte above the addressable physical memory limit, so the masking is redundant. In case of a software bug, it is not obviously better to silently truncate the pfn than to corrupt the pte (either one will result in memory corruption or crashes), so there is no reason to add this to the fast path. Add VM_BUG_ON to catch cases where the pfn is invalid. These would catch the create_section_mapping bug fixed by a previous commit. [16885.256466] ------------[ cut here ]------------ [16885.256492] kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! cpu 0x0: Vector: 700 (Program Check) at [c0000000ee0a36d0] pc: c000000000080738: __map_kernel_page+0x248/0x6f0 lr: c000000000080ac0: __map_kernel_page+0x5d0/0x6f0 sp: c0000000ee0a3960 msr: 9000000000029033 current = 0xc0000000ec63b400 paca = 0xc0000000017f0000 irqmask: 0x03 irq_happened: 0x01 pid = 85, comm = sh kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! Linux version 5.3.0-rc1-00001-g0fe93e5f3394 enter ? for help [c0000000ee0a3a00] c000000000d37378 create_physical_mapping+0x260/0x360 [c0000000ee0a3b10] c000000000d370bc create_section_mapping+0x1c/0x3c [c0000000ee0a3b30] c000000000071f54 arch_add_memory+0x74/0x130 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-5-npiggin@gmail.com
2019-08-20powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addressesNicholas Piggin
Ensure __va is given a physical address below PAGE_OFFSET, and __pa is given a virtual address above PAGE_OFFSET. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-4-npiggin@gmail.com
2019-08-20powerpc/64: allow compiler to cache 'current'Nicholas Piggin
current may be cached by the compiler, so remove the volatile asm restriction. This results in better generated code, as well as being smaller and fewer dependent loads, it can avoid store-hit-load flushes like this one that shows up in irq_exit(): preempt_count_sub(HARDIRQ_OFFSET); if (!in_interrupt() && ...) Which ends up as: ((struct thread_info *)current)->preempt_count -= HARDIRQ_OFFSET; if (((struct thread_info *)current)->preempt_count ... Evaluating current twice presently means it has to be loaded twice, and here gcc happens to pick a different register each time, then preempt_count is accessed via that base register: 1058: ld r10,2392(r13) <-- current 105c: lwz r9,0(r10) <-- preempt_count 1060: addis r9,r9,-1 1064: stw r9,0(r10) <-- preempt_count 1068: ld r9,2392(r13) <-- current 106c: lwz r9,0(r9) <-- preempt_count 1070: rlwinm. r9,r9,0,11,23 1074: bne 1090 <irq_exit+0x60> This can frustrate store-hit-load detection heuristics and cause flushes. Allowing the compiler to cache current in a reigster with this patch results in the same base register being used for all accesses, which is more likely to be detected as an alias: 1058: ld r31,2392(r13) ... 1070: lwz r9,0(r31) 1074: addis r9,r9,-1 1078: stw r9,0(r31) 107c: lwz r9,0(r31) 1080: rlwinm. r9,r9,0,11,23 1084: bne 10a0 <irq_exit+0x60> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190612140317.24490-1-npiggin@gmail.com