summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2018-08-02Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-08-02kconfig: include kernel/Kconfig.preempt from init/KconfigChristoph Hellwig
Almost all architectures include it. Add a ARCH_NO_PREEMPT symbol to disable preempt support for alpha, hexagon, non-coldfire m68k and user mode Linux. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02Kconfig: consolidate the "Kernel hacking" menuChristoph Hellwig
Move the source of lib/Kconfig.debug and arch/$(ARCH)/Kconfig.debug to the top-level Kconfig. For two architectures that means moving their arch-specific symbols in that menu into a new arch Kconfig.debug file, and for a few more creating a dummy file so that we can include it unconditionally. Also move the actual 'Kernel hacking' menu to lib/Kconfig.debug, where it belongs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-02kconfig: include common Kconfig files from top-level KconfigChristoph Hellwig
Instead of duplicating the source statements in every architecture just do it once in the toplevel Kconfig file. Note that with this the inclusion of arch/$(SRCARCH/Kconfig moves out of the top-level Kconfig into arch/Kconfig so that don't violate ordering constraits while keeping a sensible menu structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-08-01powerpc/64s/radix: Fix missing global invalidations when removing coproFrederic Barrat
With the optimizations for TLB invalidation from commit 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask"), the scope of a TLBI (global vs. local) can now be influenced by the value of the 'copros' counter of the memory context. When calling mm_context_remove_copro(), the 'copros' counter is decremented first before flushing. It may have the unintended side effect of sending local TLBIs when we explicitly need global invalidations in this case. Thus breaking any nMMU user in a bad and unpredictable way. Fix it by flushing first, before updating the 'copros' counter, so that invalidations will be global. Fixes: 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31PCI: Fix is_added/is_busmaster race conditionHari Vyas
When a PCI device is detected, pdev->is_added is set to 1 and proc and sysfs entries are created. When the device is removed, pdev->is_added is checked for one and then device is detached with clearing of proc and sys entries and at end, pdev->is_added is set to 0. is_added and is_busmaster are bit fields in pci_dev structure sharing same memory location. A strange issue was observed with multiple removal and rescan of a PCIe NVMe device using sysfs commands where is_added flag was observed as zero instead of one while removing device and proc,sys entries are not cleared. This causes issue in later device addition with warning message "proc_dir_entry" already registered. Debugging revealed a race condition between the PCI core setting the is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue setting the is_busmaster bit in pci_set_master(). As these fields are not handled atomically, that clears the is_added bit. Move the is_added bit to a separate private flag variable and use atomic functions to set and retrieve the device addition state. This avoids the race because is_added no longer shares a memory location with is_busmaster. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283 Signed-off-by: Hari Vyas <hari.vyas@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31powerpc/pseries: fix EEH recovery of some IOV devicesSam Bobroff
EEH recovery currently fails on pSeries for some IOV capable PCI devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide certain device tree properties for the device. (Found on an IOV capable device using the ipr driver.) Recovery fails in pci_enable_resources() at the check on r->parent, because r->flags is set and r->parent is not. This state is due to sriov_init() setting the start, end and flags members of the IOV BARs but the parent not being set later in pseries_pci_fixup_iov_resources(), because the "ibm,open-sriov-vf-bar-info" property is missing. Correct this by zeroing the resource flags for IOV BARs when they can't be configured (this is the same method used by sriov_init() and __pci_read_base()). VFs cleared this way can't be enabled later, because that requires another device tree property, "ibm,number-of-configurable-vfs" as well as support for the RTAS function "ibm_map_pes". These are all part of hypervisor support for IOV and it seems unlikely that a hypervisor would ever partially, but not fully, support it. (None are currently provided by QEMU/KVM.) Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Bryant G. Ly <bryantly@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31powerpc/powernv: Add support to enable sensor groupsShilpasri G Bhat
Adds support to enable/disable a sensor group at runtime. This can be used to select the sensor groups that needs to be copied to main memory by OCC. Sensor groups like power, temperature, current, voltage, frequency, utilization can be enabled/disabled at runtime. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31powernv/cpuidle: Use parsed device tree values for cpuidle_initAkshay Adiga
Export pnv_idle_states and nr_pnv_idle_states so that its accessible to cpuidle driver. Use properties from pnv_idle_states structure for powernv cpuidle_init. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31powernv/cpuidle: Parse dt idle properties into global structureAkshay Adiga
Device-tree parsing happens twice, once while deciding idle state to be used for hotplug and once during cpuidle init. Hence, parsing the device tree and caching it will reduce code duplication. Parsing code has been moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition to the properties in the device tree the number of available states is also required. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30PCI: Call dma_debug_add_bus() for pci_bus_type from PCI coreChristoph Hellwig
There is nothing arch-specific about PCI or dma-debug, so call dma_debug_add_bus() from the PCI core just after registering the bus type. Most of dma-debug is already generic; this just adds reporting of pending dma-allocations on driver unload for arches other than powerpc, sh, and x86. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
2018-07-30powerpc/44x: Mark mmu_init_secondary() as __initAlexey Spirkov
mmu_init_secondary() calls ppc44x_pin_tlb() which is marked __init, leading to a warning: The function mmu_init_secondary() references the function __init ppc44x_pin_tlb(). There's no CPU hotplug support on 44x so mmu_init_secondary() will only be called at boot. Therefore we should mark it as __init. Signed-off-by: Alexey Spirkov <alexeis@astrosoft.ru> [mpe: Flesh out change log details] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc/mm: Don't report PUDs as memory leaks when using kmemleakMichael Ellerman
Paul Menzel reported that kmemleak was producing reports such as: unreferenced object 0xc0000000f8b80000 (size 16384): comm "init", pid 1, jiffies 4294937416 (age 312.240s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d997deb7>] __pud_alloc+0x80/0x190 [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0 [<00000000091e51c2>] shift_arg_pages+0xc0/0x210 [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0 [<0000000060871529>] load_elf_binary+0x41c/0x1648 [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280 [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940 [<000000005f953a6e>] sys_execve+0x58/0x70 [<000000009700a858>] system_call+0x5c/0x70 Indicating that a PUD was being leaked. However what's really happening is that kmemleak is not able to recognise the references from the PGD to the PUD, because they are not fully qualified pointers. We can confirm that in xmon, eg: Find the task struct for pid 1 "init": 0:mon> P task_struct ->thread.ksp PID PPID S P CMD c0000001fe7c0000 c0000001fe803960 1 0 S 13 systemd Dump virtual address 0 to find the PGD: 0:mon> dv 0 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 Dump the memory of the PGD: 0:mon> d c0000000f8b01000 c0000000f8b01000 00000000f8b90000 0000000000000000 |................| c0000000f8b01010 0000000000000000 0000000000000000 |................| c0000000f8b01020 0000000000000000 0000000000000000 |................| c0000000f8b01030 0000000000000000 00000000f8b80000 |................| ^^^^^^^^^^^^^^^^ There we can see the reference to our supposedly leaked PUD. But because it's missing the leading 0xc, kmemleak won't recognise it. We can confirm it's still in use by translating an address that is mapped via it: 0:mon> dv 7fff94000000 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <-- pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000 pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000 ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386 Maps physical address = 0x00000001d5ce0000 Flags = Accessed Dirty Read Write The fix is fairly simple. We need to tell kmemleak to ignore PUD allocations and never report them as leaks. We can also tell it not to scan the PGD, because it will never find pointers in there. However it will still notice if we allocate a PGD and then leak it. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: split asm/tlbflush.hChristophe Leroy
Split asm/tlbflush.h into: asm/nohash/tlbflush.h asm/book3s/32/tlbflush.h asm/book3s/64/tlbflush.h (already existing) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: remove unnecessary inclusion of asm/tlbflush.hChristophe Leroy
asm/tlbflush.h is only needed for: - using functions xxx_flush_tlb_xxx() - using MMU_NO_CONTEXT - including asm-generic/pgtable.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc/44x: remove page.h from mmu-44x.hChristophe Leroy
mmu-44x.h doesn't need asm/page.h if PAGE_SHIFT are replaced by CONFIG_PPC_XX_PAGES Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc/nohash: fix hash related comments in pgtable.hChristophe Leroy
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: fix includes in asm/processor.hChristophe Leroy
Remove superflous includes and add missing ones Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc/book3s: Remove PPC_PIN_SIZEChristophe Leroy
PPC_PIN_SIZE is specific to the 44x and is defined in mmu.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: declare set_breakpoint() staticChristophe Leroy
set_breakpoint() is only used in process.c so make it static Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: remove superflous inclusions of asm/fixmap.hChristophe Leroy
Files not using fixmap consts or functions don't need asm/fixmap.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: clean inclusions of asm/feature-fixups.hChristophe Leroy
files not using feature fixup don't need asm/feature-fixups.h files using feature fixup need asm/feature-fixups.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: clean the inclusion of stringify.hChristophe Leroy
Only include linux/stringify.h is files using __stringify() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: move ASM_CONST and stringify_in_c() into asm-const.hChristophe Leroy
This patch moves ASM_CONST() and stringify_in_c() into dedicated asm-const.h, then cleans all related inclusions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: asm-compat.h should include asm-const.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc/405: move PPC405_ERR77 in asm-405.hChristophe Leroy
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: remove unneeded inclusions of cpu_has_feature.hChristophe Leroy
Files not using cpu_has_feature() don't need cpu_has_feature.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30powerpc: remove kdump.h from page.hChristophe Leroy
page.h doesn't need kdump.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-26KVM: PPC: Book3S HV: Read kvm->arch.emul_smt_mode under kvm->lockPaul Mackerras
Commit 1e175d2 ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space", 2018-07-25) added code that uses kvm->arch.emul_smt_mode before any VCPUs are created. However, userspace can change kvm->arch.emul_smt_mode at any time up until the first VCPU is created. Hence it is (theoretically) possible for the check in kvmppc_core_vcpu_create_hv() to race with another userspace thread changing kvm->arch.emul_smt_mode. This fixes it by moving the test that uses kvm->arch.emul_smt_mode into the block where kvm->lock is held. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-07-26KVM: PPC: Book3S HV: Allow creating max number of VCPUs on POWER9Paul Mackerras
Commit 1e175d2 ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space", 2018-07-25) allowed use of VCPU IDs up to KVM_MAX_VCPU_ID on POWER9 in all guest SMT modes and guest emulated hardware SMT modes. However, with the current definition of KVM_MAX_VCPU_ID, a guest SMT mode of 1 and an emulated SMT mode of 8, it is only possible to create KVM_MAX_VCPUS / 2 VCPUS, because threads_per_subcore is 4 on POWER9 CPUs. (Using an emulated SMT mode of 8 is useful when migrating VMs to or from POWER8 hosts.) This increases KVM_MAX_VCPU_ID to 8 * KVM_MAX_VCPUS when HV KVM is configured in, so that a full complement of KVM_MAX_VCPUS VCPUs can be created on POWER9 in all guest SMT modes and emulated hardware SMT modes. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-07-26KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID spaceSam Bobroff
It is not currently possible to create the full number of possible VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fewer threads per core than its core stride (or "VSMT mode"). This is because the VCORE ID and XIVE offsets grow beyond KVM_MAX_VCPUS even though the VCPU ID is less than KVM_MAX_VCPU_ID. To address this, "pack" the VCORE ID and XIVE offsets by using knowledge of the way the VCPU IDs will be used when there are fewer guest threads per core than the core stride. The primary thread of each core will always be used first. Then, if the guest uses more than one thread per core, these secondary threads will sequentially follow the primary in each core. So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the VCPUs are being spaced apart, so at least half of each core is empty, and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped into the second half of each core (4..7, in an 8-thread core). Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of each core is being left empty, and we can map down into the second and third quarters of each core (2, 3 and 5, 6 in an 8-thread core). Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary threads are being used and 7/8 of the core is empty, allowing use of the 1, 5, 3 and 7 thread slots. (Strides less than 8 are handled similarly.) This allows the VCORE ID or offset to be calculated quickly from the VCPU ID or XIVE server numbers, without access to the VCPU structure. [paulus@ozlabs.org - tidied up comment a little, changed some WARN_ONCE to pr_devel, wrapped line, fixed id check.] Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-07-25locking/atomics: Rework ordering barriersMark Rutland
Currently architectures can override __atomic_op_*() to define the barriers used before/after a relaxed atomic when used to build acquire/release/fence variants. This has the unfortunate property of requiring the architecture to define the full wrapper for the atomics, rather than just the barriers they care about, and gets in the way of generating atomics which can be easily read. Instead, this patch has architectures define an optional set of barriers: * __atomic_acquire_fence() * __atomic_release_fence() * __atomic_pre_full_fence() * __atomic_post_full_fence() ... which <linux/atomic.h> uses to build the wrappers. It would be nice if we could undef these, along with the __atomic_op_*() wrappers, but that would break the cmpxchg() wrappers, which are written in preprocessor. Undefs would have been nice, but alas. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: dvyukov@google.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-7-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-25Merge branch 'sched/urgent' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-24Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle stations tied to AP_VLANs properly during mac80211 hw reconfig. From Manikanta Pubbisetty. 2) Fix jump stack depth validation in nf_tables, from Taehee Yoo. 3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran Ben Elisha. 4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann. 5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan. 6) Fix cached netdev name leak in nf_tables, from Florian Westphal. 7) Fix memory leaks on chain rename, also from Florian Westphal. 8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk Cheng. 9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing. 10) Fix link local address handling with VRF, from David Ahern. 11) Don't clobber 'err' on a successful call to __skb_linearize() in skb_segment(). From Eric Dumazet. 12) Fix vxlan fdb notification races, from Roopa Prabhu. 13) Hash UDP fragments consistently, from Paolo Abeni. 14) If TCP receives lots of out of order tiny packets, we do really silly stuff. Make the out-of-order queue ending more robust to this kind of behavior, from Eric Dumazet. 15) Don't leak netlink dump state in nf_tables, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: axienet: Fix double deregister of mdio qmi_wwan: fix interface number for DW5821e production firmware ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull bnx2x: Fix invalid memory access in rss hash config path. net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper r8169: restore previous behavior to accept BIOS WoL settings cfg80211: never ignore user regulatory hint sock: fix sg page frag coalescing in sk_alloc_sg netfilter: nf_tables: move dumper state allocation into ->start tcp: add tcp_ooo_try_coalesce() helper tcp: call tcp_drop() from tcp_data_queue_ofo() tcp: detect malicious patterns in tcp_collapse_ofo_queue() tcp: avoid collapses in tcp_prune_queue() if possible tcp: free batches of packets in tcp_prune_ofo_queue() ip: hash fragments consistently ipv6: use fib6_info_hold_safe() when necessary can: xilinx_can: fix power management handling can: xilinx_can: fix incorrect clear of non-processed interrupts can: xilinx_can: fix RX overflow interrupt not being enabled can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting ...
2018-07-24powerpc/powernv: implement opal_put_chars_atomicNicholas Piggin
The RAW console does not need writes to be atomic, so relax opal_put_chars to be able to do partial writes, and implement an _atomic variant which does not take a spinlock. This API is used in xmon, so the less locking that is used, the better chance there is that a crash can be debugged. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: move opal console flushing to udbgNicholas Piggin
OPAL console writes do not have to synchronously flush firmware / hardware buffers unless they are going through the udbg path. Remove the unconditional flushing from opal_put_chars. Flush if there was no space in the buffer as an optimisation (callers loop waiting for success in that case). udbg flushing is moved to udbg_opal_putc. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: Remove OPALv1 support from opal console driverNicholas Piggin
opal_put_chars deals with partial writes because in OPALv1, opal_console_write_buffer_space did not work correctly. That firmware is not supported. This reworks the opal_put_chars code to no longer deal with partial writes by turning them into full writes. Partial write handling is still supported in terms of what gets returned to the caller, but it may not go to the console atomically. A warning message is printed in this case. This allows console flushing to be moved out of the opal_write_lock spinlock. That could cause the lock to be held for long periods if the console is busy (especially if it was being spammed by firmware), which is dangerous because the lock is taken by xmon to debug the system. Flushing outside the lock improves the situation a bit. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: Implement and use opal_flush_consoleNicholas Piggin
A new console flushing firmware API was introduced to replace event polling loops, and implemented in opal-kmsg with affddff69c55e ("powerpc/powernv: Add a kmsg_dumper that flushes console output on panic"), to flush the console in the panic path. The OPAL console driver has other situations where interrupts are off and it needs to flush the console synchronously. These still use a polling loop. So move the opal-kmsg flush code to opal_flush_console, and use the new function in opal-kmsg and opal_put_chars. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: opal-kmsg use flush fallback from console codeNicholas Piggin
Use the more refined and tested event polling loop from opal_put_chars as the fallback console flush in the opal-kmsg path. This loop is used by the console driver today, whereas the opal-kmsg fallback is not likely to have been used for years. Use WARN_ONCE rather than a printk when the fallback is invoked to prepare for moving the console flush into a common function. Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: opal-kmsg standardise OPAL_BUSY handlingNicholas Piggin
OPAL_CONSOLE_FLUSH is documented as being able to return OPAL_BUSY, so implement the standard OPAL_BUSY handling for it. Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: Fix OPAL console driver OPAL_BUSY loopsNicholas Piggin
The OPAL console driver does not delay in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware. It can't yet be made to sleep because it is called under spinlock, but it can be changed to the standard OPAL_BUSY loop form, and a delay added to keep it from hitting the firmware too frequently. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv: opal_put_chars partial write fixNicholas Piggin
The intention here is to consume and discard the remaining buffer upon error. This works if there has not been a previous partial write. If there has been, then total_len is no longer total number of bytes to copy. total_len is always "bytes left to copy", so it should be added to written bytes. This code may not be exercised any more if partial writes will not be hit, but this is a small bugfix before a larger change. Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt ↵Mukesh Ojha
handler Fixes: 8034f715f ("powernv/opal-dump: Convert to irq domain") Converts all the return explicit number to a more proper IRQ_HANDLED, which looks proper incase of interrupt handler returning case. Here, It also removes error message like "nobody cared" which was getting unveiled while returning -1 or 0 from handler. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/powernv/opal-dump : Handles opal_dump_info properlyMukesh Ojha
Moves the return value check of 'opal_dump_info' to a proper place which was previously unnecessarily filling all the dump info even on failure. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/tm: Remove struct thread_info param from tm_reclaim_thread()Cyril Bur
Since commit dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") tm_reclaim_thread() doesn't use the parameter anymore, both callers have to bother getting it as they have no need for a struct thread_info either. Just remove it and adjust the callers. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/tm: Update function prototype commentCyril Bur
In commit eb5c3f1c8647 ("powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint") __tm_recheckpoint was modified to no longer take the second parameter 'unsigned long orig_msr' as part of a TM rewrite to simplify the reclaiming/recheckpointing process. There is a comment in the asm file where the function is delcared which has an incorrect prototype with the 'orig_msr' parameter. This patch corrects the comment. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp()Simon Guo
This patch is based on the previous VMX patch on memcmp(). To optimize ppc64 memcmp() with VMX instruction, we need to think about the VMX penalty brought with: If kernel uses VMX instruction, it needs to save/restore current thread's VMX registers. There are 32 x 128 bits VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store. The major concern regarding the memcmp() performance in kernel is KSM, who will use memcmp() frequently to merge identical pages. So it will make sense to take some measures/enhancement on KSM to see whether any improvement can be done here. Cyril Bur indicates that the memcmp() for KSM has a higher possibility to fail (unmatch) early in previous bytes in following mail. https://patchwork.ozlabs.org/patch/817322/#1773629 And I am taking a follow-up on this with this patch. Per some testing, it shows KSM memcmp() will fail early at previous 32 bytes. More specifically: - 76% cases will fail/unmatch before 16 bytes; - 83% cases will fail/unmatch before 32 bytes; - 84% cases will fail/unmatch before 64 bytes; So 32 bytes looks a better choice than other bytes for pre-checking. The early failure is also true for memcmp() for non-KSM case. With a non-typical call load, it shows ~73% cases fail before first 32 bytes. This patch adds a 32 bytes pre-checking firstly before jumping into VMX operations, to avoid the unnecessary VMX penalty. It is not limited to KSM case. And the testing shows ~20% improvement on memcmp() average execution time with this patch. And note the 32B pre-checking is only performed when the compare size is long enough (>=4K currently) to allow VMX operation. The detail data and analysis is at: https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc/64: enhance memcmp() with VMX instruction for long bytes comparisionSimon Guo
This patch add VMX primitives to do memcmp() in case the compare size is equal or greater than 4K bytes. KSM feature can benefit from this. Test result with following test program(replace the "^>" with ""): ------ ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c >#include <malloc.h> >#include <stdlib.h> >#include <string.h> >#include <time.h> >#include "utils.h" >#define SIZE (1024 * 1024 * 900) >#define ITERATIONS 40 int test_memcmp(const void *s1, const void *s2, size_t n); static int testcase(void) { char *s1; char *s2; unsigned long i; s1 = memalign(128, SIZE); if (!s1) { perror("memalign"); exit(1); } s2 = memalign(128, SIZE); if (!s2) { perror("memalign"); exit(1); } for (i = 0; i < SIZE; i++) { s1[i] = i & 0xff; s2[i] = i & 0xff; } for (i = 0; i < ITERATIONS; i++) { int ret = test_memcmp(s1, s2, SIZE); if (ret) { printf("return %d at[%ld]! should have returned zero\n", ret, i); abort(); } } return 0; } int main(void) { return test_harness(testcase, "memcmp"); } ------ Without this patch (but with the first patch "powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()." in the series): 4.726728762 seconds time elapsed ( +- 3.54%) With VMX patch: 4.234335473 seconds time elapsed ( +- 2.63%) There is ~+10% improvement. Testing with unaligned and different offset version (make s1 and s2 shift random offset within 16 bytes) can archieve higher improvement than 10%.. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24powerpc: add vcmpequd/vcmpequb ppc instruction macroSimon Guo
Some old tool chains don't know about instructions like vcmpequd. This patch adds .long macro for vcmpequd and vcmpequb, which is a preparation to optimize ppc64 memcmp with VMX instructions. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>