Age | Commit message (Collapse) | Author |
|
physical pages
At the moment we only support in the host the IOMMU page sizes which
the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
emulate bigger guest pages (for example 16MB) with smaller host pages
(4KB/64KB/2MB).
This allows the physical IOMMU pages to use a page size smaller or equal
than the guest visible IOMMU page size.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
The other TCE handlers use page shift from the guest visible TCE table
(described by kvmppc_spapr_tce_iommu_table) so let's make H_STUFF_TCE
handlers do the same thing.
This should cause no behavioral change now but soon we will allow
the iommu_table::it_page_shift being different from from the emulated
table page size so this will play a role.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
We now have interrupts hard-disabled when coming back from
kvmppc_hv_entry_trampoline, so this changes the comment to reflect
that.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Although Linux doesn't use PURR and SPURR ((Scaled) Processor
Utilization of Resources Register), other OSes depend on them.
On POWER8 they count at a rate depending on whether the VCPU is
idle or running, the activity of the VCPU, and the value in the
RWMR (Region-Weighting Mode Register). Hardware expects the
hypervisor to update the RWMR when a core is dispatched to reflect
the number of online VCPUs in the vcore.
This adds code to maintain a count in the vcore struct indicating
how many VCPUs are online. In kvmppc_run_core we use that count
to set the RWMR register on POWER8. If the core is split because
of a static or dynamic micro-threading mode, we use the value for
8 threads. The RWMR value is not relevant when the host is
executing because Linux does not use the PURR or SPURR register,
so we don't bother saving and restoring the host value.
For the sake of old userspace which does not set the KVM_REG_PPC_ONLINE
register, we set online to 1 if it was 0 at the time of a KVM_RUN
ioctl.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
This adds a new KVM_REG_PPC_ONLINE register which userspace can set
to 0 or 1 via the GET/SET_ONE_REG interface to indicate whether it
considers the VCPU to be offline (0), that is, not currently running,
or online (1). This will be used in a later patch to configure the
register which controls PURR and SPURR accumulation.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
A radix guest can execute tlbie instructions to invalidate TLB entries.
After a tlbie or a group of tlbies, it must then do the architected
sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation
has been processed by all CPUs in the system before it can rely on
no CPU using any translation that it just invalidated.
In fact it is the ptesync which does the actual synchronization in
this sequence, and hardware has a requirement that the ptesync must
be executed on the same CPU thread as the tlbies which it is expected
to order. Thus, if a vCPU gets moved from one physical CPU to
another after it has done some tlbies but before it can get to do the
ptesync, the ptesync will not have the desired effect when it is
executed on the second physical CPU.
To fix this, we do a ptesync in the exit path for radix guests. If
there are any pending tlbies, this will wait for them to complete.
If there aren't, then ptesync will just do the same as sync.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
When a vcpu priority (CPPR) is set to a lower value (masking more
interrupts), we stop processing interrupts already in the queue
for the priorities that have now been masked.
If those interrupts were previously re-routed to a different
CPU, they might still be stuck until the older one that has
them in its queue processes them. In the case of guest CPU
unplug, that can be never.
To address that without creating additional overhead for
the normal interrupt processing path, this changes H_CPPR
handling so that when such a priority change occurs, we
scan the interrupt queue for that vCPU, and for any
interrupt in there that has been re-routed, we replace it
with a dummy and force a re-trigger.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
The current partition table unmap code clears the _PAGE_PRESENT bit
out of the pte, which leaves pud_huge/pmd_huge true and does not
clear pud_present/pmd_present. This can confuse subsequent page
faults and possibly lead to the guest looping doing continual
hypervisor page faults.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
kvmppc_radix_tlbie_page
The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it
is ordered with respect to subsequent operations.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Currently, the HV KVM guest entry/exit code adds the timebase offset
from the vcore struct to the timebase on guest entry, and subtracts
it on guest exit. Which is fine, except that it is possible for
userspace to change the offset using the SET_ONE_REG interface while
the vcore is running, as there is only one timebase offset per vcore
but potentially multiple VCPUs in the vcore. If that were to happen,
KVM would subtract a different offset on guest exit from that which
it had added on guest entry, leading to the timebase being out of sync
between cores in the host, which then leads to bad things happening
such as hangs and spurious watchdog timeouts.
To fix this, we add a new field 'tb_offset_applied' to the vcore struct
which stores the offset that is currently applied to the timebase.
This value is set from the vcore tb_offset field on guest entry, and
is what is subtracted from the timebase on guest exit. Since it is
zero when the timebase offset is not applied, we can simplify the
logic in kvmhv_start_timing and kvmhv_accumulate_time.
In addition, we had secondary threads reading the timebase while
running concurrently with code on the primary thread which would
eventually add or subtract the timebase offset from the timebase.
This occurred while saving or restoring the DEC register value on
the secondary threads. Although no specific incorrect behaviour has
been observed, this is a race which should be fixed. To fix it, we
move the DEC saving code to just before we call kvmhv_commence_exit,
and the DEC restoring code to after the point where we have waited
for the primary thread to switch the MMU context and add the timebase
offset. That way we are sure that the timebase contains the guest
timebase value in both cases.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Directly use fault_in_pages_readable instead of manual __get_user code. Fix
warning treated as error with W=1:
arch/powerpc/kernel/kvm.c:675:6: error: variable ‘tmp’ set but not used [-Werror=unused-but-set-variable]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Implement a local TLB flush for invalidating an LPID with variants for
process or partition scope. And a global TLB flush for invalidating
a partition scoped page of an LPID.
These will be used by KVM in subsequent patches.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The interrupt controller inside the Wii's Hollywood chip is connected to
two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with
their own interrupt status and mask registers.
When booting the Wii with mini[1], interrupts from the SD card
controller (IRQ 7) are handled by the ARM, because mini provides SD
access over IPC. Linux however can't currently use or disable this IPC
service, so both sides try to handle IRQ 7 without coordination.
Let's instead make sure that all interrupts that are unmasked on the PPC
side are masked on the ARM side; this will also make sure that Linux can
properly talk to the SD card controller (and potentially other devices).
If access to a device through IPC is desired in the future, interrupts
from that device should not be handled by Linux directly.
[1]: https://github.com/lewurm/mini
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
On the Wii, there is a secondary IRQ controller (hlwd-pic), so
flipper-pic's match operation should not be hardcoded to return 1.
In fact, the default matching logic is sufficient, and we can completely
omit flipper_pic_match.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Testing with a threaded version of mmap_bench which allocate 1G chunks and
with large number of threads we find:
without patch
32.72% mmap_bench [kernel.vmlinux] [k] do_raw_spin_lock
|
---do_raw_spin_lock
|
--32.68%--0
|
|--15.82%--pte_fragment_alloc
| |
| --15.79%--do_huge_pmd_anonymous_page
| __handle_mm_fault
| handle_mm_fault
| __do_page_fault
| handle_page_fault
| test_mmap
| test_mmap
| start_thread
| __clone
|
|--14.95%--do_huge_pmd_anonymous_page
| __handle_mm_fault
| handle_mm_fault
| __do_page_fault
| handle_page_fault
| test_mmap
| test_mmap
| start_thread
| __clone
|
with patch
12.89% mmap_bench [kernel.vmlinux] [k] do_raw_spin_lock
|
---do_raw_spin_lock
|
--12.83%--0
|
|--3.21%--pagevec_lru_move_fn
| __lru_cache_add
| |
| --2.74%--do_huge_pmd_anonymous_page
| __handle_mm_fault
| handle_mm_fault
| __do_page_fault
| handle_page_fault
| test_mmap
| test_mmap
| start_thread
| __clone
|
|--3.11%--do_huge_pmd_anonymous_page
| __handle_mm_fault
| handle_mm_fault
| __do_page_fault
| handle_page_fault
| test_mmap
| test_mmap
| start_thread
| __clone
.....
|
--0.55%--pte_fragment_alloc
|
--0.55%--do_huge_pmd_anonymous_page
__handle_mm_fault
handle_mm_fault
__do_page_fault
handle_page_fault
test_mmap
test_mmap
start_thread
__clone
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Instead of encoding shift in the table address, use an enumerated index value.
This allow us to do different things in the callback for pte and pmd.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
4K config use one full page at level 4 of the pagetable. Add support for single
fragment allocation in pagetable fragment code and and use that for 4K config.
This makes both 4k and 64k use the same code path. Later we will switch pmd to
use the page table fragment code. This is done only for 64bit platforms which
is using page table fragment support.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Now that we have removed 64K page size support, the RCU page table free can
be much simpler for nohash. Make a copy of the the rcu callback to pgalloc.h
header similar to nohash 32. We could possibly merge 32 and 64 bit there. But
that is for a later patch
We also move the book3s specific handler to pgtable_book3s64.c. This will be
updated in a later patch to handle split pmd ptlock.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We have in Kconfig
config PPC_64K_PAGES
bool "64k page size"
depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64)
select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64
Only supported BOOK3E 64 bit platforms is FSL_BOOK3E. Remove the dead 64k page
support code from 64bit nohash.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We rename the alloc and get_from_cache to indicate they operate on pte
fragments. In later patch we will add pmd fragment support.
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In later patch we switch pmd_lock from mm->page_table_lock to split pmd ptlock.
It avoid compilations issues, use pmd_lockptr helper.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Only code movement and avoid #ifdef.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This brings in one commit that we may want to share with the kvm-ppc
tree, to avoid merge conflicts and get wider testing.
|
|
In the next set of patches, we will switch pmd allocator to use page fragments
and the locking will be updated to split pmd ptlock. We want to avoid using
fragments for partition-scoped table. Use slab cache similar to level 4 table
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This driver is a logical device which provides an
interface between the hypervisor and a management
partition. This interface is like a message
passing interface. This management partition
is intended to provide an alternative to HMC-based
system management.
VMC enables the Management LPAR to provide basic
logical partition functions:
- Logical Partition Configuration
- Boot, start, and stop actions for individual
partitions
- Display of partition status
- Management of virtual Ethernet
- Management of virtual Storage
- Basic system management
This driver is to be used for the POWER Virtual
Management Channel Virtual Adapter on the PowerPC
platform. It provides a character device which
allows for both request/response and async message
support through the /dev/ibmvmc node.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Reviewed-by: Adam Reznechek <adreznec@linux.vnet.ibm.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Taylor Jakobson <tjakobs@us.ibm.com>
Tested-by: Brad Warrum <bwarrum@us.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Use new return type vm_fault_t for fault handler. For now, this is
just documenting that the function returns a VM_FAULT value rather
than an errno. Once all instances are converted, vm_fault_t will
become a distinct type. See commit 1c8f422059ae ("mm: change return
type to vm_fault_t").
We are fixing a minor bug, that the error from vm_insert_pfn() was
being ignored and the effect of this is likely to be only felt in OOM
situations.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M
IOMMU pages, however this is not the case for POWER9 and now skiboot
advertises the supported sizes via the device so we use that instead
of hard coding the mask.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently memtrace doesn't build if NUMA=n:
In function ‘memtrace_alloc_node’:
arch/powerpc/platforms/powernv/memtrace.c:134:6:
error: the address of ‘contig_page_data’ will always evaluate as ‘true’
if (!NODE_DATA(nid) || !node_spanned_pages(nid))
^
This is because for NUMA=n NODE_DATA(nid) points to an always
allocated structure, contig_page_data.
But even in the NUMA=y case memtrace_alloc_node() is only called for
online nodes, and we should always have a NODE_DATA() allocated for an
online node. So remove the (hopefully) overly paranoid check, which
also means we can build when NUMA=n.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Remove the ad-hoc implementation, the generic code now allows us not to
reinvent the wheel.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The bpf syscall and selftests conflicts were trivial
overlapping changes.
The r8169 change involved moving the added mdelay from 'net' into a
different function.
A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts. I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit e6a6928c3ea1 ("of/fdt: Convert FDT functions to use
libfdt") (Apr 2014), the generic flat device tree code dropped support
for flat device tree's older than version 0x10 (16).
We still have code in our CPU scanning to cope with flat device tree
versions earlier than 2, which can now never trigger, so drop it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add a test of the relative branch patching logic in the alternate
section feature fixup code. This tests that if we branch past the last
instruction of the alternate section, the branch is not patched.
That's because the assembler will have created a branch that already
points to the first instruction after the patched section, which is
correct and needs no further patching.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
We want this to remain the last test (because it's disabled by
default), so give it a non-numbered name so we don't have to renumber
it when adding new tests before it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The code patching code has always been a bit confused about whether
it's best to use void *, unsigned int *, char *, etc. to point to
instructions. In fact in the feature fixups tests we use both unsigned
int[] and u8[] in different places.
Unfortunately the tests that use unsigned int[] calculate the size of
the code blocks using subtraction of those unsigned int pointers, and
then pass the result to memcmp(). This means we're only comparing 1/4
of the bytes we need to, because we need to multiply by
sizeof(unsigned int) to get the number of *bytes*.
The result is that the tests do all the patching and then only compare
some of the resulting code, so patching bugs that only effect that
last 3/4 of the code could slip through undetected. It turns out that
hasn't been happening, although one test had a bad expected case (see
previous commit).
Fix it for now by multiplying the size by 4 in the affected functions.
Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code")
Epic-brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The expected case for this test was wrong, the source of the alternate
code sequence is:
FTR_SECTION_ELSE
2: or 2,2,2
PPC_LCMPI r3,1
beq 3f
blt 2b
b 3f
b 1b
ALT_FTR_SECTION_END(0, 1)
3: or 1,1,1
or 2,2,2
4: or 3,3,3
So when it's patched the '3' label should still be on the 'or 1,1,1',
and the 4 label is irrelevant and can be removed.
Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
If the systbl_chk.sh checks fail we print a message, but with no
indication that it's an error. That makes it hard to find in build
logs with eg. grep.
So prefix any output with "Error:".
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
it had always been pointless - compat_sys_select() sign-extends
the first argument just fine on its own.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Use COMPAT_SPU_NEW() to keep systbl_chk.sh happy]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Currently the select system call is wired up with the SYSX_SPU()
macro. The SYSX_SPU() is not handled by systbl_chk.c, which means the
syscall number for select is not checked.
That hides the fact that the syscall number for select is actually
__NR__newselect not __NR_select.
In a following patch we'd like to drop ppc32_select() which means
select will become a regular COMPAT_SYS_SPU() syscall. But
COMPAT_SYS_SPU() can't deal with the fact that the syscall number is
actually __NR__newselect. We also can't just redefine __NR_select
because that's still used for the old select call.
So add a new COMPAT_NEW_SPU() that does the same thing as
COMPAT_SYS_SPU() except it encodes that we're using the new number.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Update sys_ni.c for s/ppc_rtas/sys_rtas/]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[mpe: Fix sys_debug_setcontext() prototype to return long]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3:
[...] There are several rules that must be adhered to in order to ensure
reliable and consistent call chain backtracing:
* Before a function calls any other function, it shall establish its
own stack frame, whose size shall be a multiple of 16 bytes.
– In instances where a function’s prologue creates a stack frame, the
back-chain word of the stack frame shall be updated atomically with
the value of the stack pointer (r1) when a back chain is implemented.
(This must be supported as default by all ELF V2 ABI-compliant
environments.)
[...]
– The function shall save the link register that contains its return
address in the LR save doubleword of its caller’s stack frame before
calling another function.
To me this sounds like the equivalent of HAVE_RELIABLE_STACKTRACE.
This patch may be unneccessarily limited to ppc64le, but OTOH the only
user of this flag so far is livepatching, which is only implemented on
PPCs with 64-LE, a.k.a. ELF ABI v2.
Feel free to add other ppc variants, but so far only ppc64le got tested.
This change also implements save_stack_trace_tsk_reliable() for ppc64le
that checks for the above conditions, where possible.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Provide timebase and timebase of last heartbeat in watchdog lockup
messages. Also provide a stack trace of when a CPU becomes un-stuck,
which can be useful -- it could be where irqs are re-enabled, so it
may be the end of the critical section which is responsible for the
latency which is useful information.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The watchdog heartbeat timestamp is updated when the local heartbeat
timer fires (or touch_nmi_watchdog() is called).
This is an interesting data point, so don't overwrite it when the
soft-NMI interrupt detects a hard lockup. That code came from a pre-
merge version to prevent hard lockup messages flood, but that's taken
care of with the stuck CPU logic now, so there is no reason to
update the heartbeat timestamp here.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This is not the case for the moment, but future releases of pHyp might
need to introduce some synchronisation routines under the hood which
would make the XIVE hcalls longer to complete.
As this was done for H_INT_RESET, let's wrap the other hcalls in a
loop catching the H_LONG_BUSY_* codes.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The hcall H_INT_RESET should be called to make sure XIVE is fully
reseted.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|