summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2018-05-17KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller ↵Alexey Kardashevskiy
physical pages At the moment we only support in the host the IOMMU page sizes which the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support 16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still emulate bigger guest pages (for example 16MB) with smaller host pages (4KB/64KB/2MB). This allows the physical IOMMU pages to use a page size smaller or equal than the guest visible IOMMU page size. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S: Use correct page shift in H_STUFF_TCEAlexey Kardashevskiy
The other TCE handlers use page shift from the guest visible TCE table (described by kvmppc_spapr_tce_iommu_table) so let's make H_STUFF_TCE handlers do the same thing. This should cause no behavioral change now but soon we will allow the iommu_table::it_page_shift being different from from the emulated table page size so this will play a role. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Fix inaccurate commentPaul Mackerras
We now have interrupts hard-disabled when coming back from kvmppc_hv_entry_trampoline, so this changes the comment to reflect that. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Set RWMR on POWER8 so PURR/SPURR count correctlyPaul Mackerras
Although Linux doesn't use PURR and SPURR ((Scaled) Processor Utilization of Resources Register), other OSes depend on them. On POWER8 they count at a rate depending on whether the VCPU is idle or running, the activity of the VCPU, and the value in the RWMR (Region-Weighting Mode Register). Hardware expects the hypervisor to update the RWMR when a core is dispatched to reflect the number of online VCPUs in the vcore. This adds code to maintain a count in the vcore struct indicating how many VCPUs are online. In kvmppc_run_core we use that count to set the RWMR register on POWER8. If the core is split because of a static or dynamic micro-threading mode, we use the value for 8 threads. The RWMR value is not relevant when the host is executing because Linux does not use the PURR or SPURR register, so we don't bother saving and restoring the host value. For the sake of old userspace which does not set the KVM_REG_PPC_ONLINE register, we set online to 1 if it was 0 at the time of a KVM_RUN ioctl. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Add 'online' register to ONE_REG interfacePaul Mackerras
This adds a new KVM_REG_PPC_ONLINE register which userspace can set to 0 or 1 via the GET/SET_ONE_REG interface to indicate whether it considers the VCPU to be offline (0), that is, not currently running, or online (1). This will be used in a later patch to configure the register which controls PURR and SPURR accumulation. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book 3S HV: Do ptesync in radix guest exit pathPaul Mackerras
A radix guest can execute tlbie instructions to invalidate TLB entries. After a tlbie or a group of tlbies, it must then do the architected sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation has been processed by all CPUs in the system before it can rely on no CPU using any translation that it just invalidated. In fact it is the ptesync which does the actual synchronization in this sequence, and hardware has a requirement that the ptesync must be executed on the same CPU thread as the tlbies which it is expected to order. Thus, if a vCPU gets moved from one physical CPU to another after it has done some tlbies but before it can get to do the ptesync, the ptesync will not have the desired effect when it is executed on the second physical CPU. To fix this, we do a ptesync in the exit path for radix guests. If there are any pending tlbies, this will wait for them to complete. If there aren't, then ptesync will just do the same as sync. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority changeBenjamin Herrenschmidt
When a vcpu priority (CPPR) is set to a lower value (masking more interrupts), we stop processing interrupts already in the queue for the priorities that have now been masked. If those interrupts were previously re-routed to a different CPU, they might still be stuck until the older one that has them in its queue processes them. In the case of guest CPU unplug, that can be never. To address that without creating additional overhead for the normal interrupt processing path, this changes H_CPPR handling so that when such a priority change occurs, we scan the interrupt queue for that vCPU, and for any interrupt in there that has been re-routed, we replace it with a dummy and force a re-trigger. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Make radix clear pte when unmappingNicholas Piggin
The current partition table unmap code clears the _PAGE_PRESENT bit out of the pte, which leaves pud_huge/pmd_huge true and does not clear pud_present/pmd_present. This can confuse subsequent page faults and possibly lead to the guest looping doing continual hypervisor page faults. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in ↵Nicholas Piggin
kvmppc_radix_tlbie_page The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it is ordered with respect to subsequent operations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Snapshot timebase offset on guest entryPaul Mackerras
Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts. To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time. In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17powerpc/kvm: Prefer fault_in_pages_readable functionMathieu Malaterre
Directly use fault_in_pages_readable instead of manual __get_user code. Fix warning treated as error with W=1: arch/powerpc/kernel/kvm.c:675:6: error: variable ‘tmp’ set but not used [-Werror=unused-but-set-variable] Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-17powerpc/mm/radix: implement LPID based TLB flushes to be used by KVMNicholas Piggin
Implement a local TLB flush for invalidating an LPID with variants for process or partition scope. And a global TLB flush for invalidating a partition scoped page of an LPID. These will be used by KVM in subsequent patches. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-16proc: introduce proc_create_single{,_data}Christoph Hellwig
Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-15powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by StarletJonathan Neuschäfer
The interrupt controller inside the Wii's Hollywood chip is connected to two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with their own interrupt status and mask registers. When booting the Wii with mini[1], interrupts from the SD card controller (IRQ 7) are handled by the ARM, because mini provides SD access over IPC. Linux however can't currently use or disable this IPC service, so both sides try to handle IRQ 7 without coordination. Let's instead make sure that all interrupts that are unmasked on the PPC side are masked on the ARM side; this will also make sure that Linux can properly talk to the SD card controller (and potentially other devices). If access to a device through IPC is desired in the future, interrupts from that device should not be handled by Linux directly. [1]: https://github.com/lewurm/mini Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/embedded6xx/flipper-pic: Don't match all IRQ domainsJonathan Neuschäfer
On the Wii, there is a secondary IRQ controller (hlwd-pic), so flipper-pic's match operation should not be hardcoded to return 1. In fact, the default matching logic is sufficient, and we can completely omit flipper_pic_match. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/book3s64: Enable split pmd ptlock.Aneesh Kumar K.V
Testing with a threaded version of mmap_bench which allocate 1G chunks and with large number of threads we find: without patch 32.72% mmap_bench [kernel.vmlinux] [k] do_raw_spin_lock | ---do_raw_spin_lock | --32.68%--0 | |--15.82%--pte_fragment_alloc | | | --15.79%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone | |--14.95%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone | with patch 12.89% mmap_bench [kernel.vmlinux] [k] do_raw_spin_lock | ---do_raw_spin_lock | --12.83%--0 | |--3.21%--pagevec_lru_move_fn | __lru_cache_add | | | --2.74%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone | |--3.11%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone ..... | --0.55%--pte_fragment_alloc | --0.55%--do_huge_pmd_anonymous_page __handle_mm_fault handle_mm_fault __do_page_fault handle_page_fault test_mmap test_mmap start_thread __clone Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm: Use page fragments for allocation page table at PMD levelAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm: Implement helpers for pagetable fragment support at PMD levelAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/book3s64/mm: Simplify the rcu callback for page table freeAneesh Kumar K.V
Instead of encoding shift in the table address, use an enumerated index value. This allow us to do different things in the callback for pte and pmd. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm/book3s64/4k: Switch 4k pagesize config to use pagetable fragmentAneesh Kumar K.V
4K config use one full page at level 4 of the pagetable. Add support for single fragment allocation in pagetable fragment code and and use that for 4K config. This makes both 4k and 64k use the same code path. Later we will switch pmd to use the page table fragment code. This is done only for 64bit platforms which is using page table fragment support. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm/nohash: Remove pte fragment dependency from nohashAneesh Kumar K.V
Now that we have removed 64K page size support, the RCU page table free can be much simpler for nohash. Make a copy of the the rcu callback to pgalloc.h header similar to nohash 32. We could possibly merge 32 and 64 bit there. But that is for a later patch We also move the book3s specific handler to pgtable_book3s64.c. This will be updated in a later patch to handle split pmd ptlock. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm/book3e/64: Remove unsupported 64Kpage size from 64bit bookeAneesh Kumar K.V
We have in Kconfig config PPC_64K_PAGES bool "64k page size" depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64) select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64 Only supported BOOK3E 64 bit platforms is FSL_BOOK3E. Remove the dead 64k page support code from 64bit nohash. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm: Rename pte fragment functionsAneesh Kumar K.V
We rename the alloc and get_from_cache to indicate they operate on pte fragments. In later patch we will add pmd fragment support. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm: Use pmd_lockptr instead of opencoding itAneesh Kumar K.V
In later patch we switch pmd_lock from mm->page_table_lock to split pmd ptlock. It avoid compilations issues, use pmd_lockptr helper. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15powerpc/mm/book3s64: Move book3s64 code to pgtable-book3s64Aneesh Kumar K.V
Only code movement and avoid #ifdef. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-15Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
This brings in one commit that we may want to share with the kvm-ppc tree, to avoid merge conflicts and get wider testing.
2018-05-15powerpc/kvm: Switch kvm pmd allocator to custom allocatorAneesh Kumar K.V
In the next set of patches, we will switch pmd allocator to use page fragments and the locking will be updated to split pmd ptlock. We want to avoid using fragments for partition-scoped table. Use slab cache similar to level 4 table Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-14misc: IBM Virtual Management Channel Driver (VMC)Bryant G. Ly
This driver is a logical device which provides an interface between the hypervisor and a management partition. This interface is like a message passing interface. This management partition is intended to provide an alternative to HMC-based system management. VMC enables the Management LPAR to provide basic logical partition functions: - Logical Partition Configuration - Boot, start, and stop actions for individual partitions - Display of partition status - Management of virtual Ethernet - Management of virtual Storage - Basic system management This driver is to be used for the POWER Virtual Management Channel Virtual Adapter on the PowerPC platform. It provides a character device which allows for both request/response and async message support through the /dev/ibmvmc node. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Reviewed-by: Steven Royer <seroyer@linux.vnet.ibm.com> Reviewed-by: Adam Reznechek <adreznec@linux.vnet.ibm.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Taylor Jakobson <tjakobs@us.ibm.com> Tested-by: Brad Warrum <bwarrum@us.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-14powerpc/cell/spufs: Change return type to vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. See commit 1c8f422059ae ("mm: change return type to vm_fault_t"). We are fixing a minor bug, that the error from vm_insert_pfn() was being ignored and the effect of this is likely to be only felt in OOM situations. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-14powerpc/ioda: Use ibm, supported-tce-sizes for IOMMU page size maskAlexey Kardashevskiy
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M IOMMU pages, however this is not the case for POWER9 and now skiboot advertises the supported sizes via the device so we use that instead of hard coding the mask. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-14powerpc/powernv: Fix memtrace build when NUMA=nMichael Ellerman
Currently memtrace doesn't build if NUMA=n: In function ‘memtrace_alloc_node’: arch/powerpc/platforms/powernv/memtrace.c:134:6: error: the address of ‘contig_page_data’ will always evaluate as ‘true’ if (!NODE_DATA(nid) || !node_spanned_pages(nid)) ^ This is because for NUMA=n NODE_DATA(nid) points to an always allocated structure, contig_page_data. But even in the NUMA=y case memtrace_alloc_node() is only called for online nodes, and we should always have a NODE_DATA() allocated for an online node. So remove the (hopefully) overly paranoid check, which also means we can build when NUMA=n. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-14softirq/powerpc: Switch to generic local_softirq_pending() implementationFrederic Weisbecker
Remove the ad-hoc implementation, the generic code now allows us not to reinvent the wheel. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1525786706-22846-9-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The bpf syscall and selftests conflicts were trivial overlapping changes. The r8169 change involved moving the added mdelay from 'net' into a different function. A TLS close bug fix overlapped with the splitting of the TLS state into separate TX and RX parts. I just expanded the tests in the bug fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf == X". Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11powerpc/prom: Drop support for old FDT versionsMichael Ellerman
In commit e6a6928c3ea1 ("of/fdt: Convert FDT functions to use libfdt") (Apr 2014), the generic flat device tree code dropped support for flat device tree's older than version 0x10 (16). We still have code in our CPU scanning to cope with flat device tree versions earlier than 2, which can now never trigger, so drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-11powerpc/lib: Add alt patching test of branching past the last instructionMichael Ellerman
Add a test of the relative branch patching logic in the alternate section feature fixup code. This tests that if we branch past the last instruction of the alternate section, the branch is not patched. That's because the assembler will have created a branch that already points to the first instruction after the patched section, which is correct and needs no further patching. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-11powerpc/lib: Rename ftr_fixup_test7 to ftr_fixup_test_too_bigMichael Ellerman
We want this to remain the last test (because it's disabled by default), so give it a non-numbered name so we don't have to renumber it when adding new tests before it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-11powerpc/lib: Fix the feature fixup tests to actually workMichael Ellerman
The code patching code has always been a bit confused about whether it's best to use void *, unsigned int *, char *, etc. to point to instructions. In fact in the feature fixups tests we use both unsigned int[] and u8[] in different places. Unfortunately the tests that use unsigned int[] calculate the size of the code blocks using subtraction of those unsigned int pointers, and then pass the result to memcmp(). This means we're only comparing 1/4 of the bytes we need to, because we need to multiply by sizeof(unsigned int) to get the number of *bytes*. The result is that the tests do all the patching and then only compare some of the resulting code, so patching bugs that only effect that last 3/4 of the code could slip through undetected. It turns out that hasn't been happening, although one test had a bad expected case (see previous commit). Fix it for now by multiplying the size by 4 in the affected functions. Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code") Epic-brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-11powerpc/lib: Fix feature fixup test of external branchMichael Ellerman
The expected case for this test was wrong, the source of the alternate code sequence is: FTR_SECTION_ELSE 2: or 2,2,2 PPC_LCMPI r3,1 beq 3f blt 2b b 3f b 1b ALT_FTR_SECTION_END(0, 1) 3: or 1,1,1 or 2,2,2 4: or 3,3,3 So when it's patched the '3' label should still be on the 'or 1,1,1', and the 4 label is irrelevant and can be removed. Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc: Make it clearer that systbl check errors are errorsMichael Ellerman
If the systbl_chk.sh checks fail we print a message, but with no indication that it's an error. That makes it hard to find in build logs with eg. grep. So prefix any output with "Error:". Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/syscalls: timer_create can be handle by perfectly normal COMPAT_SYS_SPUAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/syscalls: kill ppc32_select()Al Viro
it had always been pointless - compat_sys_select() sign-extends the first argument just fine on its own. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [mpe: Use COMPAT_SPU_NEW() to keep systbl_chk.sh happy] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/syscalls: Add COMPAT_SPU_NEW() macroMichael Ellerman
Currently the select system call is wired up with the SYSX_SPU() macro. The SYSX_SPU() is not handled by systbl_chk.c, which means the syscall number for select is not checked. That hides the fact that the syscall number for select is actually __NR__newselect not __NR_select. In a following patch we'd like to drop ppc32_select() which means select will become a regular COMPAT_SYS_SPU() syscall. But COMPAT_SYS_SPU() can't deal with the fact that the syscall number is actually __NR__newselect. We also can't just redefine __NR_select because that's still used for the old select call. So add a new COMPAT_NEW_SPU() that does the same thing as COMPAT_SYS_SPU() except it encodes that we're using the new number. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/syscalls: switch rtas(2) to SYSCALL_DEFINEAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [mpe: Update sys_ni.c for s/ppc_rtas/sys_rtas/] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/syscalls: signal_{32, 64} - switch to SYSCALL_DEFINEAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [mpe: Fix sys_debug_setcontext() prototype to return long] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/syscalls: Switch trivial cases to SYSCALL_DEFINEAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/livepatch: Implement reliable stack tracing for the consistency modelTorsten Duwe
The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3: [...] There are several rules that must be adhered to in order to ensure reliable and consistent call chain backtracing: * Before a function calls any other function, it shall establish its own stack frame, whose size shall be a multiple of 16 bytes. – In instances where a function’s prologue creates a stack frame, the back-chain word of the stack frame shall be updated atomically with the value of the stack pointer (r1) when a back chain is implemented. (This must be supported as default by all ELF V2 ABI-compliant environments.) [...] – The function shall save the link register that contains its return address in the LR save doubleword of its caller’s stack frame before calling another function. To me this sounds like the equivalent of HAVE_RELIABLE_STACKTRACE. This patch may be unneccessarily limited to ppc64le, but OTOH the only user of this flag so far is livepatching, which is only implemented on PPCs with 64-LE, a.k.a. ELF ABI v2. Feel free to add other ppc variants, but so far only ppc64le got tested. This change also implements save_stack_trace_tsk_reliable() for ppc64le that checks for the above conditions, where possible. Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Nicolai Stange <nstange@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/watchdog: provide more data in watchdog messagesNicholas Piggin
Provide timebase and timebase of last heartbeat in watchdog lockup messages. Also provide a stack trace of when a CPU becomes un-stuck, which can be useful -- it could be where irqs are re-enabled, so it may be the end of the critical section which is responsible for the latency which is useful information. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/watchdog: don't update the watchdog timestamp if a lockup is detectedNicholas Piggin
The watchdog heartbeat timestamp is updated when the local heartbeat timer fires (or touch_nmi_watchdog() is called). This is an interesting data point, so don't overwrite it when the soft-NMI interrupt detects a hard lockup. That code came from a pre- merge version to prevent hard lockup messages flood, but that's taken care of with the stuck CPU logic now, so there is no reason to update the heartbeat timestamp here. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/xive: prepare all hcalls to support long busy delaysCédric Le Goater
This is not the case for the moment, but future releases of pHyp might need to introduce some synchronisation routines under the hood which would make the XIVE hcalls longer to complete. As this was done for H_INT_RESET, let's wrap the other hcalls in a loop catching the H_LONG_BUSY_* codes. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-10powerpc/xive: shutdown XIVE when kexec or kdump is performedCédric Le Goater
The hcall H_INT_RESET should be called to make sure XIVE is fully reseted. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>