summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2018-05-25powerpc/mm: Use instruction symbolic names in store_updates_sp()Christophe Leroy
Use symbolic names defined in asm/ppc-opcode.h instead of hardcoded values. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-24bpf: powerpc64: add JIT support for multi-function programsSandipan Das
This adds support for bpf-to-bpf function calls in the powerpc64 JIT compiler. The JIT compiler converts the bpf call instructions to native branch instructions. After a round of the usual passes, the start addresses of the JITed images for the callee functions are known. Finally, to fixup the branch target addresses, we need to perform an extra pass. Because of the address range in which JITed images are allocated on powerpc64, the offsets of the start addresses of these images from __bpf_call_base are as large as 64 bits. So, for a function call, we cannot use the imm field of the instruction to determine the callee's address. Instead, we use the alternative method of getting it from the list of function addresses in the auxiliary data of the caller by using the off field as an index. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24bpf: powerpc64: pad function address loads with NOPsSandipan Das
For multi-function programs, loading the address of a callee function to a register requires emitting instructions whose count varies from one to five depending on the nature of the address. Since we come to know of the callee's address only before the extra pass, the number of instructions required to load this address may vary from what was previously generated. This can make the JITed image grow or shrink. To avoid this, we should generate a constant five-instruction when loading function addresses by padding the optimized load sequence with NOPs. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24powerpc: Export tm_enable()/tm_disable/tm_abort() APIsSimon Guo
This patch exports tm_enable()/tm_disable/tm_abort() APIs, which will be used for PR KVM transactional memory logic. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-24powerpc/reg: Add TEXASR related macrosSimon Guo
This patches add some macros for CR0/TEXASR bits so that PR KVM TM logic (tbegin./treclaim./tabort.) can make use of them later. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-24powerpc: Export msr_check_and_set() to modulesSimon Guo
PR KVM will need to reuse msr_check_and_set(). This patch exports this API for reuse. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-22KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with ↵Simon Guo
analyse_instr() input This patch reimplements LOAD_VMX/STORE_VMX MMIO emulation with analyse_instr() input. When emulating the store, the VMX reg will need to be flushed so that the right reg val can be retrieved before writing to IO MEM. This patch also adds support for lvebx/lvehx/lvewx/stvebx/stvehx/stvewx MMIO emulation. To meet the requirement of handling different element sizes, kvmppc_handle_load128_by2x64()/kvmppc_handle_store128_by2x64() were replaced with kvmppc_handle_vmx_load()/kvmppc_handle_vmx_store(). The framework used is similar to VSX instruction MMIO emulation. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22KVM: PPC: Expand mmio_vsx_copy_type to cover VMX load/store element typesSimon Guo
VSX MMIO emulation uses mmio_vsx_copy_type to represent VSX emulated element size/type, such as KVMPPC_VSX_COPY_DWORD_LOAD, etc. This patch expands mmio_vsx_copy_type to cover VMX copy type, such as KVMPPC_VMX_COPY_BYTE(stvebx/lvebx), etc. As a result, mmio_vsx_copy_type is also renamed to mmio_copy_type. It is a preparation for reimplementing VMX MMIO emulation. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22KVM: PPC: Reimplement LOAD_VSX/STORE_VSX instruction mmio emulation with ↵Simon Guo
analyse_instr() input This patch reimplements LOAD_VSX/STORE_VSX instruction MMIO emulation with analyse_instr() input. It utilizes VSX_FPCONV/VSX_SPLAT/SIGNEXT exported by analyse_instr() and handle accordingly. When emulating VSX store, the VSX reg will need to be flushed so that the right reg val can be retrieved before writing to IO MEM. [paulus@ozlabs.org - mask the register number to 5 bits.] Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22KVM: PPC: Reimplement LOAD_FP/STORE_FP instruction mmio emulation with ↵Simon Guo
analyse_instr() input This patch reimplements LOAD_FP/STORE_FP instruction MMIO emulation with analyse_instr() input. It utilizes the FPCONV/UPDATE properties exported by analyse_instr() and invokes kvmppc_handle_load(s)/kvmppc_handle_store() accordingly. For FP store MMIO emulation, the FP regs need to be flushed firstly so that the right FP reg vals can be read from vcpu->arch.fpr, which will be stored into MMIO data. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22KVM: PPC: Add giveup_ext() hook to PPC KVM opsSimon Guo
Currently HV will save math regs(FP/VEC/VSX) when trap into host. But PR KVM will only save math regs when qemu task switch out of CPU, or when returning from qemu code. To emulate FP/VEC/VSX mmio load, PR KVM need to make sure that math regs were flushed firstly and then be able to update saved VCPU FPR/VEC/VSX area reasonably. This patch adds giveup_ext() field to KVM ops. Only PR KVM has non-NULL giveup_ext() ops. kvmppc_complete_mmio_load() can invoke that hook (when not NULL) to flush math regs accordingly, before updating saved register vals. Math regs flush is also necessary for STORE, which will be covered in later patch within this patch series. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22KVM: PPC: Reimplement non-SIMD LOAD/STORE instruction mmio emulation with ↵Simon Guo
analyse_instr() input This patch reimplements non-SIMD LOAD/STORE instruction MMIO emulation with analyse_instr() input. It utilizes the BYTEREV/UPDATE/SIGNEXT properties exported by analyse_instr() and invokes kvmppc_handle_load(s)/kvmppc_handle_store() accordingly. It also moves CACHEOP type handling into the skeleton. instruction_type within kvm_ppc.h is renamed to avoid conflict with sstep.h. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-22KVM: PPC: Add KVMPPC_VSX_COPY_WORD_LOAD_DUMP type support for mmio emulationSimon Guo
Some VSX instructions like lxvwsx will splat word into VSR. This patch adds a new VSX copy type KVMPPC_VSX_COPY_WORD_LOAD_DUMP to support this. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-21powerpc/64s: Add support for a store forwarding barrier at kernel entry/exitNicholas Piggin
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal. TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part. The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'. Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21powernv: opal-sensor: Add support to read 64bit sensor valuesShilpasri G Bhat
This patch adds support to read 64-bit sensor values. This method is used to read energy sensors and counters which are of type u64. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-21powerpc/fsl/dts: fix the i2c-mux compatible for t104xqdsPeter Rosin
The sanctioned compatible is "nxp,pca9547". Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-21powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREGMichael Neuling
In commit e2a800beaca1 ("powerpc/hw_brk: Fix off by one error when validating DAWR region end") we fixed setting the DAWR end point to its max value via PPC_PTRACE_SETHWDEBUG. Unfortunately we broke PTRACE_SET_DEBUGREG when setting a 512 byte aligned breakpoint. PTRACE_SET_DEBUGREG currently sets the length of the breakpoint to zero (memset() in hw_breakpoint_init()). This worked with arch_validate_hwbkpt_settings() before the above patch was applied but is now broken if the breakpoint is 512byte aligned. This sets the length of the breakpoint to 8 bytes when using PTRACE_SET_DEBUGREG. Fixes: e2a800beaca1 ("powerpc/hw_brk: Fix off by one error when validating DAWR region end") Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-21powerpc/ptrace: Fix enforcement of DAWR constraintsMichael Neuling
Back when we first introduced the DAWR, in commit 4ae7ebe9522a ("powerpc: Change hardware breakpoint to allow longer ranges"), we screwed up the constraint making it a 1024 byte boundary rather than a 512. This makes the check overly permissive. Fortunately GDB is the only real user and it always did they right thing, so we never noticed. This fixes the constraint to 512 bytes. Fixes: 4ae7ebe9522a ("powerpc: Change hardware breakpoint to allow longer ranges") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-19Merge branch 'linus' into timers/2038Thomas Gleixner
Merge upstream to pick up changes on which pending patches depend on.
2018-05-18powerpc/powernv: Use __raw_[rm_]writeq_be() in npu-dma.cMichael Ellerman
This allows us to squash some sparse warnings and also avoids having to do explicity endian conversions in the code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
2018-05-18powerpc/powernv: Use __raw_[rm_]writeq_be() in pci-ioda.cMichael Ellerman
This allows us to squash some sparse warnings and also avoids having to do explicity endian conversions in the code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
2018-05-18powerpc/io: Add __raw_writeq_be() __raw_rm_writeq_be()Michael Ellerman
Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq(). This allows us to avoid sparse warnings caused by passing __be64 to __raw_writeq(), which takes unsigned long: arch/powerpc/platforms/powernv/pci-ioda.c:1981:38: warning: incorrect type in argument 1 (different base types) expected unsigned long [unsigned] v got restricted __be64 [usertype] <noident> It's also generally preferable to use a byte-swapping accessor rather than doing it by hand in the code, which is more bug prone. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
2018-05-18powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus()Anju T Sudhakar
Currently memory is allocated for core-imc based on cpu_present_mask, which has bit 'cpu' set iff cpu is populated. We use (cpu number / threads per core) as the array index to access the memory. Under some circumstances firmware marks a CPU as GUARDed CPU and boot the system, until cleared of errors, these CPU's are unavailable for all subsequent boots. GUARDed CPUs are possible but not present from linux view, so it blows a hole when we assume the max length of our allocation is driven by our max present cpus, where as one of the cpus might be online and be beyond the max present cpus, due to the hole. So (cpu number / threads per core) value bounds the array index and leads to memory overflow. Call trace observed during a guard test: Faulting instruction address: 0xc000000000149f1c cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420] pc:c000000000149f1c: prefetch_freepointer+0x14/0x30 lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac sp:c000003fea3036a0 msr:9000000000009033 dar:c9c54b2c91dbf6b7 current = 0xc000003fea2c0000 paca = 0xc00000000fddd880 softe: 3 irq_happened: 0x01 pid = 1, comm = swapper/104 Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018 enter ? for help call trace: __kmalloc+0x1a8/0x1ac (unreliable) init_imc_pmu+0x7f4/0xbf0 opal_imc_counters_probe+0x3fc/0x43c platform_drv_probe+0x48/0x80 driver_probe_device+0x22c/0x308 __driver_attach+0xa0/0xd8 bus_for_each_dev+0x88/0xb4 driver_attach+0x2c/0x40 bus_add_driver+0x1e8/0x228 driver_register+0xd0/0x114 __platform_driver_register+0x50/0x64 opal_imc_driver_init+0x24/0x38 do_one_initcall+0x150/0x15c kernel_init_freeable+0x250/0x254 kernel_init+0x1c/0x150 ret_from_kernel_thread+0x5c/0xc8 Allocating memory for core-imc based on cpu_possible_mask, which has bit 'cpu' set iff cpu is populatable, will fix this issue. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Fixes: 39a846db1d57 ("powerpc/perf: Add core IMC PMU support") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18powerpc/rtas: Fix spelling mistake "Discharching" -> "Discharging"Colin Ian King
Trivial fix to spelling mistake in battery_charging array. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18KVM: PPC: Book3S PR: Enable use on POWER9 inside HPT-mode guestsPaul Mackerras
This relaxes the restriction on using PR KVM on POWER9. The existing code does work inside a guest partition running in HPT mode, because hypercalls such as H_ENTER use the old HPTE format, not the new format used by POWER9, and so no change to PR KVM's HPT manipulation code is required. PR KVM will still refuse to run if the kernel is using radix translation or if it is running bare-metal. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18powerpc/64s: Clear PCR on bootMichael Neuling
Clear the PCR (Processor Compatibility Register) on boot to ensure we are not running in a compatibility mode. We've seen this cause problems when a crash (and kdump) occurs while running compat mode guests. The kdump kernel then runs with the PCR set and causes problems. The symptom in the kdump kernel (also seen in petitboot after fast-reboot) is early userspace programs taking sigills on newer instructions (seen in libc). Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18powerpc/lib: Fix "integer constant is too large" build failureFinn Thain
My powerpc-linux-gnu-gcc v4.4.5 compiler can't build a 32-bit kernel any more: arch/powerpc/lib/sstep.c: In function 'do_popcnt': arch/powerpc/lib/sstep.c:1068: error: integer constant is too large for 'long' type arch/powerpc/lib/sstep.c:1069: error: integer constant is too large for 'long' type arch/powerpc/lib/sstep.c:1069: error: integer constant is too large for 'long' type arch/powerpc/lib/sstep.c:1070: error: integer constant is too large for 'long' type arch/powerpc/lib/sstep.c:1079: error: integer constant is too large for 'long' type arch/powerpc/lib/sstep.c: In function 'do_prty': arch/powerpc/lib/sstep.c:1117: error: integer constant is too large for 'long' type This file gets compiled with -std=gnu89 which means a constant can be given the type 'long' even if it won't fit. Fix the errors with a 'ULL' suffix on the relevant constants. Fixes: 2c979c489fee ("powerpc/lib/sstep: Add prty instruction emulation") Fixes: dcbd19b48d31 ("powerpc/lib/sstep: Add popcnt instruction emulation") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18KVM: PPC: Book3S HV: Send kvmppc_bad_interrupt NMIs to Linux handlersNicholas Piggin
It's possible to take a SRESET or MCE in these paths due to a bug in the host code or a NMI IPI, etc. A recent bug attempting to load a virtual address from real mode gave th complete but cryptic error, abridged: Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c2430] do_tlbies+0x230/0x2f0 Sending the NMIs through the Linux handlers gives a nicer output: Severe Machine check interrupt [Not recovered] NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0 Initiator: CPU Error type: Real address [Load (bad)] Effective address: d00017fffcc01a28 opal: Machine check interrupt unrecoverable: MSR(RI=0) opal: Hardware platform error: Unrecoverable Machine Check exception CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G M NIP: c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580 REGS: c000000fff9e9d80 TRAP: 0200 Tainted: G M MSR: 9000000000201001 <SF,HV,ME,LE> CR: 48082222 XER: 00000000 CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c23c0] do_tlbies+0x1c0/0x280 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: Fix kvmppc_bad_host_intr for real mode interruptsNicholas Piggin
When CONFIG_RELOCATABLE=n, the Linux real mode interrupt handlers call into KVM using real address. This needs to be translated to the kernel linear effective address before the MMU is switched on. kvmppc_bad_host_intr misses adding these bits, so when it is used to handle a system reset interrupt (that always gets delivered in real mode), it results in an instruction access fault immediately after the MMU is turned on. Fix this by ensuring the top 2 address bits are set when the MMU is turned on. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: radix: Do not clear partition PTE when RC or write bits ↵Nicholas Piggin
do not match Adding the write bit and RC bits to pte permissions does not require a pte clear and flush. There should not be other bits changed here, because restricting access or changing the PFN must have already invalidated any existing ptes (otherwise the race is already lost). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: radix: Refine IO region partition scope attributesNicholas Piggin
When the radix fault handler has no page from the process address space (e.g., for IO memory), it looks up the process pte and sets partition table pte using that to get attributes like CI and guarded. If the process table entry is to be writable, set _PAGE_DIRTY as well to avoid an RC update. If not, then ensure _PAGE_DIRTY does not come across. Set _PAGE_ACCESSED as well to avoid RC update. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with ↵Nicholas Piggin
relocation on The radix guest code can has fewer restrictions about what context it can run in, so move this flushing out of assembly and have it use the Linux TLB flush implementations introduced previously. This allows powerpc:tlbie trace events to be used. This changes the tlbiel sequence to only execute RIC=2 flush once on the first set flushed, then RIC=0 for the rest of the sets. The end result of the flush should be unchanged. This matches the local PID flush pattern that was introduced in a5998fcb92 ("powerpc/mm/radix: Optimise tlbiel flush all case"). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: Make radix use the Linux translation flush functions ↵Nicholas Piggin
for partition scope This has the advantage of consolidating TLB flush code in fewer places, and it also implements powerpc:tlbie trace events. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: Recursively unmap all page table entries when unmappingNicholas Piggin
When partition scope mappings are unmapped with kvm_unmap_radix, the pte is cleared, but the page table structure is left in place. If the next page fault requests a different page table geometry (e.g., due to THP promotion or split), kvmppc_create_pte is responsible for changing the page tables. When a page table entry is to be converted to a large pte, the page table entry is cleared, the PWC flushed, then the page table it points to freed. This will cause pte page tables to leak when a 1GB page is to replace a pud entry points to a pmd table with pte tables under it: The pmd table will be freed, but its pte tables will be missed. Fix this by replacing the simple clear and free code with one that walks down the page tables and frees children. Care must be taken to clear the root entry being unmapped then flushing the PWC before freeing any page tables, as explained in comments. This requires PWC flush to logically become a flush-all-PWC (which it already is in hardware, but the KVM API needs to be changed to avoid confusion). This code also checks that no unexpected pte entries exist in any page table being freed, and unmaps those and emits a WARN. This is an expensive operation for the pte page level, but partition scope changes are rare, so it's unconditional for now to iron out bugs. It can be put under a CONFIG option or removed after some time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: Use a helper to unmap ptes in the radix fault pathNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Book3S HV: Lockless tlbie for HPT hcallsNicholas Piggin
tlbies to an LPAR do not have to be serialised since POWER4/PPC970, after which the MMU_FTR_LOCKLESS_TLBIE feature was introduced to avoid tlbie locking. Since commit c17b98cf6028 ("KVM: PPC: Book3S HV: Remove code for PPC970 processors"), KVM no longer supports processors that do not have this feature, so the tlbie locking can be removed completely. A sanity check for the feature is put in kvmppc_mmu_hv_init. Testing was done on a POWER9 system in HPT mode, with a -smp 32 guest in HPT mode. 32 instances of the powerpc fork benchmark from selftests were run with --fork, and the results measured. Without this patch, total throughput was about 13.5K/sec, and this is the top of the host profile: 74.52% [k] do_tlbies 2.95% [k] kvmppc_book3s_hv_page_fault 1.80% [k] calc_checksum 1.80% [k] kvmppc_vcpu_run_hv 1.49% [k] kvmppc_run_core After this patch, throughput was about 51K/sec, with this profile: 21.28% [k] do_tlbies 5.26% [k] kvmppc_run_core 4.88% [k] kvmppc_book3s_hv_page_fault 3.30% [k] _raw_spin_lock_irqsave 3.25% [k] gup_pgd_range Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Fix a mmio_host_swabbed uninitialized usage issueSimon Guo
When KVM emulates VMX store, it will invoke kvmppc_get_vmx_data() to retrieve VMX reg val. kvmppc_get_vmx_data() will check mmio_host_swabbed to decide which double word of vr[] to be used. But the mmio_host_swabbed can be uninitialized during VMX store procedure: kvmppc_emulate_loadstore \- kvmppc_handle_store128_by2x64 \- kvmppc_get_vmx_data So vcpu->arch.mmio_host_swabbed is not meant to be used at all for emulation of store instructions, and this patch makes that true for VMX stores. This patch also initializes mmio_host_swabbed to avoid possible future problems. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Move nip/ctr/lr/xer registers to pt_regs in kvm_vcpu_archSimon Guo
This patch moves nip/ctr/lr/xer registers from scattered places in kvm_vcpu_arch to pt_regs structure. cr register is "unsigned long" in pt_regs and u32 in vcpu->arch. It will need more consideration and may move in later patches. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18KVM: PPC: Add pt_regs into kvm_vcpu_arch and move vcpu->arch.gpr[] into itSimon Guo
Current regs are scattered at kvm_vcpu_arch structure and it will be more neat to organize them into pt_regs structure. Also it will enable reimplementation of MMIO emulation code with analyse_instr() later. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras
This merges in the ppc-kvm topic branch of the powerpc repository to get some changes on which future patches will depend, in particular the definitions of various new TLB flushing functions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-18powerpc/powernv: Fix NVRAM sleep in invalid context when crashingNicholas Piggin
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18powerpc/powernv: Fix opal_event_shutdown() called with interrupts disabledNicholas Piggin
A kernel crash in process context that calls emergency_restart from panic will end up calling opal_event_shutdown with interrupts disabled but not in interrupt. This causes a sleeping function to be called which gives the following warning with sysrq+c: Rebooting in 10 seconds.. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 in_atomic(): 0, irqs_disabled(): 1, pid: 7669, name: bash CPU: 20 PID: 7669 Comm: bash Tainted: G D W 4.17.0-rc5+ #3 Call Trace: dump_stack+0xb0/0xf4 (unreliable) ___might_sleep+0x174/0x1a0 mutex_lock+0x38/0xb0 __free_irq+0x68/0x460 free_irq+0x70/0xc0 opal_event_shutdown+0xb4/0xf0 opal_shutdown+0x24/0xa0 pnv_shutdown+0x28/0x40 machine_shutdown+0x44/0x60 machine_restart+0x28/0x80 emergency_restart+0x30/0x50 panic+0x2a0/0x328 oops_end+0x1ec/0x1f0 bad_page_fault+0xe8/0x154 handle_page_fault+0x34/0x38 --- interrupt: 300 at sysrq_handle_crash+0x44/0x60 LR = __handle_sysrq+0xfc/0x260 flag_spec.62335+0x12b844/0x1e8db4 (unreliable) __handle_sysrq+0xfc/0x260 write_sysrq_trigger+0xa8/0xb0 proc_reg_write+0xac/0x110 __vfs_write+0x6c/0x240 vfs_write+0xd0/0x240 ksys_write+0x6c/0x110 Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18powerpc/32: Use stmw/lmw for registers save/restore in asmChristophe Leroy
arch/powerpc/Makefile activates -mmultiple on BE PPC32 configs in order to use multiple word instructions in functions entry/exit. The patch does the same for the asm parts, for consistency. On processors like the 8xx on which insn fetching is pretty slow, this speeds up registers save/restore. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: PPC32 is BE only, so drop the endian checks] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18powerpc: Avoid an unnecessary test and branch in longjmp()Christophe Leroy
Doing the test at exit of the function avoids an unnecessary test and branch inside longjmp(). Semantics are unchanged. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18Revert "powerpc/64: Fix checksum folding in csum_add()"Christophe Leroy
This reverts commit 6ad966d7303b70165228dba1ee8da1a05c10eefe. That commit was pointless, because csum_add() sums two 32 bits values, so the sum is 0x1fffffffe at the maximum. And then when adding upper part (1) and lower part (0xfffffffe), the result is 0xffffffff which doesn't carry. Any lower value will not carry either. And behind the fact that this commit is useless, it also kills the whole purpose of having an arch specific inline csum_add() because the resulting code gets even worse than what is obtained with the generic implementation of csum_add() 0000000000000240 <.csum_add>: 240: 38 00 ff ff li r0,-1 244: 7c 84 1a 14 add r4,r4,r3 248: 78 00 00 20 clrldi r0,r0,32 24c: 78 89 00 22 rldicl r9,r4,32,32 250: 7c 80 00 38 and r0,r4,r0 254: 7c 09 02 14 add r0,r9,r0 258: 78 09 00 22 rldicl r9,r0,32,32 25c: 7c 00 4a 14 add r0,r0,r9 260: 78 03 00 20 clrldi r3,r0,32 264: 4e 80 00 20 blr In comparison, the generic implementation of csum_add() gives: 0000000000000290 <.csum_add>: 290: 7c 63 22 14 add r3,r3,r4 294: 7f 83 20 40 cmplw cr7,r3,r4 298: 7c 10 10 26 mfocrf r0,1 29c: 54 00 ef fe rlwinm r0,r0,29,31,31 2a0: 7c 60 1a 14 add r3,r0,r3 2a4: 78 63 00 20 clrldi r3,r3,32 2a8: 4e 80 00 20 blr And the reverted implementation for PPC64 gives: 0000000000000240 <.csum_add>: 240: 7c 84 1a 14 add r4,r4,r3 244: 78 80 00 22 rldicl r0,r4,32,32 248: 7c 80 22 14 add r4,r0,r4 24c: 78 83 00 20 clrldi r3,r4,32 250: 4e 80 00 20 blr Fixes: 6ad966d7303b7 ("powerpc/64: Fix checksum folding in csum_add()") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-18powerpc: get rid of PMD_PAGE_SIZE() and _PMD_SIZEChristophe Leroy
PMD_PAGE_SIZE() is nowhere used and _PMD_SIZE is only used by PMD_PAGE_SIZE(). This patch removes them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-17powerpc: Allow LD_DEAD_CODE_DATA_ELIMINATION to be selectedNicholas Piggin
This requires further changes to linker script to KEEP some tables and wildcard compiler generated sections into the right place. This includes pp32 modifications from Christophe Leroy. When compiling powernv_defconfig with this option, the resulting kernel is almost 400kB smaller (and still boots): text data bss dec filename 11827621 4810490 1341080 17979191 vmlinux 11752437 4598858 1338776 17690071 vmlinux.dcde Mathieu's numbers for custom Mac Mini G4 config has almost 200kB saving. It also had some increase in vmlinux size for as-yet unknown reasons. text data bss dec filename 7461457 2475122 1428064 11364643 vmlinux 7386425 2364370 1425432 11176227 vmlinux.dcde Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [8xx] Tested-by: Mathieu Malaterre <malat@debian.org> [32-bit powermac] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-05-17KVM: PPC: Book3S: Change return type to vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S: Check KVM_CREATE_SPAPR_TCE_64 parametersAlexey Kardashevskiy
Although it does not seem possible to break the host by passing bad parameters when creating a TCE table in KVM, it is still better to get an early clear indication of that than debugging weird effect this might bring. This adds some sanity checks that the page size is 4KB..16GB as this is what the actual LoPAPR supports and that the window actually fits 64bit space. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>