Age | Commit message (Collapse) | Author |
|
Add support for bpf_arch_text_poke() and arch_prepare_bpf_trampoline()
for 64-bit powerpc. While the code is generic, BPF trampolines are only
enabled on 64-bit powerpc. 32-bit powerpc will need testing and some
updates.
BPF Trampolines adhere to the existing ftrace ABI utilizing a
two-instruction profiling sequence, as well as the newer ABI utilizing a
three-instruction profiling sequence enabling return with a 'blr'. The
trampoline code itself closely follows x86 implementation.
BPF prog JIT is extended to mimic 64-bit powerpc approach for ftrace
having a single nop at function entry, followed by the function
profiling sequence out-of-line and a separate long branch stub for calls
to trampolines that are out of range. A dummy_tramp is provided to
simplify synchronization similar to arm64.
When attaching a bpf trampoline to a bpf prog, we can patch up to three
things:
- the nop at bpf prog entry to go to the out-of-line stub
- the instruction in the out-of-line stub to either call the bpf trampoline
directly, or to branch to the long_branch stub.
- the trampoline address before the long_branch stub.
We do not need any synchronization here since we always have a valid
branch target regardless of the order in which the above stores are
seen. dummy_tramp ensures that the long_branch stub goes to a valid
destination on other cpus, even when the branch to the long_branch stub
is seen before the updated trampoline address.
However, when detaching a bpf trampoline from a bpf prog, or if changing
the bpf trampoline address, we need synchronization to ensure that other
cpus can no longer branch into the older trampoline so that it can be
safely freed. bpf_tramp_image_put() uses rcu_tasks to ensure all cpus
make forward progress, but we still need to ensure that other cpus
execute isync (or some CSI) so that they don't go back into the
trampoline again. While here, update the stale comment that describes
the redzone usage in ppc64 BPF JIT.
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241030070850.1361304-18-hbathini@linux.ibm.com
|
|
Add jit support for sign division and modulo. Tested using test_bpf
module.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240517075650.248801-6-asavkov@redhat.com
|
|
Add jit support for sign extended load. Tested using test_bpf module.
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240517075650.248801-4-asavkov@redhat.com
|
|
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:
test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
eBPF filter opcode 00d7 (@2) unsupported
jited:0 301 PASS
test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 555 PASS
test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
eBPF filter opcode 00d7 (@2) unsupported
jited:0 268 PASS
test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 269 PASS
test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
eBPF filter opcode 00d7 (@2) unsupported
jited:0 460 PASS
test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 320 PASS
test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
eBPF filter opcode 00d7 (@2) unsupported
jited:0 222 PASS
test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 273 PASS
test_bpf: #344 BPF_LDX_MEMSX | BPF_B
eBPF filter opcode 0091 (@5) unsupported
jited:0 432 PASS
test_bpf: #345 BPF_LDX_MEMSX | BPF_H
eBPF filter opcode 0089 (@5) unsupported
jited:0 381 PASS
test_bpf: #346 BPF_LDX_MEMSX | BPF_W
eBPF filter opcode 0081 (@5) unsupported
jited:0 505 PASS
test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
eBPF filter opcode 0006 (@1) unsupported
jited:0 261 PASS
test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]
Fix them by adding missing processing.
Fixes: daabb2b098e0 ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
|
|
Clang didn't recognize the instruction tlbilxlpid. This was fixed in
clang-18 [0] then backported to clang-17 [1]. To support clang-16 and
older, rather than using that instruction bare in inline asm, add it to
ppc-opcode.h and use that macro as is done elsewhere for other
instructions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1891
Link: https://github.com/llvm/llvm-project/issues/64080
Link: https://github.com/llvm/llvm-project/commit/53648ac1d0c953ae6d008864dd2eddb437a92468 [0]
Link: https://github.com/llvm/llvm-project-release-prs/commit/0af7e5e54a8c7ac665773ac1ada328713e8338f5 [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/llvm/202307211945.TSPcyOhh-lkp@intel.com/
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230803-ppc_tlbilxlpid-v3-1-ca84739bfd73@google.com
|
|
Recognise and pass the appropriate signal to the user program when a
hashchk instruction triggers. This is independent of allowing
configuration of DEXCR[NPHIE], as a hypervisor can enforce this aspect
regardless of the kernel.
The signal mirrors how ARM reports their similar check failure. For
example, their FPAC handler in arch/arm64/kernel/traps.c do_el0_fpac()
does this. When we fail to read the instruction that caused the fault
we send a segfault, similar to how emulate_math() does it.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230616034846.311705-5-bgray@linux.ibm.com
|
|
PC-Relative or PCREL addressing is an extension to the ELF ABI which
uses Power ISA v3.1 PC-relative instructions to calculate addresses,
rather than the traditional TOC scheme.
Add an option to build vmlinux using pcrel addressing. Modules continue
to use TOC addressing.
- TOC address helpers and r2 are poisoned with -1 when running vmlinux.
r2 could be used for something useful once things are ironed out.
- Assembly must call C functions with @notoc annotation, or the linker
complains aobut a missing nop after the call. This is done with the
CFUNC macro introduced earlier.
- Boot: with the exception of prom_init, the execution branches to the
kernel virtual address early in boot, before any addresses are
generated, which ensures 34-bit pcrel addressing does not miss the
high PAGE_OFFSET bits. TOC relative addressing has a similar
requirement. prom_init does not go to the virtual address and its
addresses should not carry over to the post-prom kernel.
- Ftrace trampolines are converted from TOC addressing to pcrel
addressing, including module ftrace trampolines that currently use the
kernel TOC to find ftrace target functions.
- BPF function prologue and function calling generation are converted
from TOC to pcrel.
- copypage_64.S has an interesting problem, prefixed instructions have
alignment restrictions so the linker can add padding, which makes the
assembler treat the difference between two local labels as
non-constant even if alignment is arranged so padding is not required.
This may need toolchain help to solve nicely, for now move the prefix
instruction out of the alternate patch section to work around it.
This reduces kernel text size by about 6%.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com
|
|
The wait instruction encoding changed between ISA v2.07 and ISA v3.0.
In v3.1 the instruction gained a new field.
Update the PPC_WAIT macro to the current encoding. Rename the older
incompatible one with a _v203 suffix as it was introduced in v2.03
(the WC field was introduced in v2.07 but the kernel only uses WC=0).
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220920122259.363092-1-npiggin@gmail.com
|
|
PPC_RAW_TW() is erroneously defined with base code 0x7f000008
instead of 0x7c000008.
That's invisible because its only user is PPC_RAW_TRAP() which is
0x7fe00008, but fix it anyway to avoid any risk of future bug.
Fixes: d00d762daf12 ("powerpc/ppc-opcode: Define and use PPC_RAW_TRAP() and PPC_RAW_TW()")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eca9251f1e1f82c4c46ec6380ddb28356ab3fdfe.1659527244.git.christophe.leroy@csgroup.eu
|
|
The eh field must remain 0 for PPC32 and is only used
by PPC64.
Don't hide that behind a macro, just leave the responsibility
to the user.
At the time being, the only users of PPC_RAW_L{WDQ}ARX are
setting the eh field to 0, so the special handling of __PPC_EH
is useless. Just take the value given by the caller.
Same for DEFINE_TESTOP(), don't do special handling in that
macro, ensure the caller hands over the proper eh value.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8b9c8a1a14f9143552a85fcbf96698224a8c2469.1659430931.git.christophe.leroy@csgroup.eu
|
|
We have PPC_INST_SETB then build the 'setb' instruction in the
user.
Instead, define PPC_RAW_SETB() and use it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b08a4f26919a8f8cdcf7544ab552d9c1c63418b5.1657205708.git.christophe.leroy@csgroup.eu
|
|
Add and use PPC_RAW_TRAP() instead of opencoding.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/52c7e522e56a38e3ff0363906919445920005a8f.1657205708.git.christophe.leroy@csgroup.eu
|
|
The following PPC_INST_XXX macros are not used anymore
outside ppc-opcode.h:
- PPC_INST_LD
- PPC_INST_STD
- PPC_INST_ADDIS
- PPC_INST_ADD
- PPC_INST_DIVD
Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8c28636126f69141419953b5638b4a908c184dc1.1652074503.git.christophe.leroy@csgroup.eu
|
|
Convert last users of PPC_INST_BL to PPC_RAW_BL()
And remove PPC_INST_BL.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9eacb758e7ae7cf224211ebe3f6f7d409a333be.1652074503.git.christophe.leroy@csgroup.eu
|
|
Convert last users of PPC_INST_BRANCH to PPC_RAW_BRANCH()
And remove PPC_INST_BRANCH.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fa8807108a2ef2287a2c9651d6e1ff7c051923d9.1652074503.git.christophe.leroy@csgroup.eu
|
|
PPC_RAW_xxx() macros are self explanatory and less error prone
than open coding.
Use them in ftrace.c
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9292094c9a69cef6d29ee83f435a557b59c45065.1652074503.git.christophe.leroy@csgroup.eu
|
|
Per the ISA, a Trace interrupt is not generated for:
- [h|u]rfi[d]
- rfscv
- sc, scv, and Trap instructions that trap
- Power-Saving Mode instructions
- other instructions that cause interrupts (other than Trace interrupts)
- the first instructions of any interrupt handler (applies to Branch and Single Step tracing;
CIABR matches may still occur)
- instructions that are emulated by software
Add a helper to check for instructions belonging to the first four
categories above and to reject kprobes, uprobes and xmon breakpoints on
such instructions. We reject probing on instructions belonging to these
categories across all ISA versions and across both BookS and BookE.
For trap instructions, we can't know in advance if they can cause a
trap, and there is no good reason to allow probing on those. Also,
uprobes already refuses to probe trap instructions and kprobes does not
allow probes on trap instructions used for kernel warnings and bugs. As
such, stop allowing any type of probes/breakpoints on trap instruction
across uprobes, kprobes and xmon.
For some of the fp/altivec instructions that can generate an interrupt
and which we emulate in the kernel (altivec assist, for example), we
check and turn off single stepping in emulate_single_step().
Instructions generating a DSI are restarted and single stepping normally
completes once the instruction is completed.
In uprobes, if a single stepped instruction results in a non-fatal
signal to be delivered to the task, such signals are "delayed" until
after the instruction completes. For fatal signals, single stepping is
cancelled and the instruction restarted in-place so that core dump
captures proper addresses.
In kprobes, we do not allow probes on instructions having an extable
entry and we also do not allow probing interrupt vectors.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f56ee979d50b8711fae350fc97870f3ca34acd75.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
|
|
Some of the primary opcodes are duplicated. Remove those, and sort the
rest of the primary opcodes to make it easy to read.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a05edf638a2638d708fc2db0272f6317837b5eab.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
|
|
The VAS window may not be active if the system looses credits and
the NX generates page fault when it receives request on unmap
paste address.
The kernel handles the fault by remap new paste address if the
window is active again, Otherwise return the paste instruction
failure if the executed instruction that caused the fault was
a paste.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/492b9aefd593061d51dda67ee4d2fc449c000dce.camel@linux.ibm.com
|
|
Johan reported the below crash with test_bpf on ppc64 e5500:
test_bpf: #296 ALU_END_FROM_LE 64: 0x0123456789abcdef -> 0x67452301 jited:1
Oops: Exception in kernel mode, sig: 4 [#1]
BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
Modules linked in: test_bpf(+)
CPU: 0 PID: 76 Comm: insmod Not tainted 5.14.0-03771-g98c2059e008a-dirty #1
NIP: 8000000000061c3c LR: 80000000006dea64 CTR: 8000000000061c18
REGS: c0000000032d3420 TRAP: 0700 Not tainted (5.14.0-03771-g98c2059e008a-dirty)
MSR: 0000000080089000 <EE,ME> CR: 88002822 XER: 20000000 IRQMASK: 0
<...>
NIP [8000000000061c3c] 0x8000000000061c3c
LR [80000000006dea64] .__run_one+0x104/0x17c [test_bpf]
Call Trace:
.__run_one+0x60/0x17c [test_bpf] (unreliable)
.test_bpf_init+0x6a8/0xdc8 [test_bpf]
.do_one_initcall+0x6c/0x28c
.do_init_module+0x68/0x28c
.load_module+0x2460/0x2abc
.__do_sys_init_module+0x120/0x18c
.system_call_exception+0x110/0x1b8
system_call_common+0xf0/0x210
--- interrupt: c00 at 0x101d0acc
<...>
---[ end trace 47b2bf19090bb3d0 ]---
Illegal instruction
The illegal instruction turned out to be 'ldbrx' emitted for
BPF_FROM_[L|B]E, which was only introduced in ISA v2.06. Guard use of
the same and implement an alternative approach for older processors.
Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Reported-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d1e51c6fdf572062cf3009a751c3406bda01b832.1641468127.git.naveen.n.rao@linux.vnet.ibm.com
|
|
The llvm integrated assembler does not recognise the ISA 2.05 tlbiel
version. Work around it by switching to .long when an old arch level
detected.
Signed-off-by: Daniel Axtens <dja@axtens.net>
[aik: did "Eventually do this more smartly"]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-7-aik@ozlabs.ru
|
|
The dssall ("Data Stream Stop All") instruction is obsolete altogether
with other Data Cache Instructions since ISA 2.03 (year 2006).
LLVM IAS does not support it but PPC970 seems to be using it.
This switches dssall to .long as there is no much point in fixing LLVM.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
|
|
Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
macro is used while adding BPF_PROBE_MEM support.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-5-hbathini@linux.ibm.com
|
|
Force the eh flag at 0 on PPC32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.leroy@csgroup.eu
|
|
Use PPC_RAW_ macros to simplify the code.
And use PPC_LO/PPC_HI instead of IMM_L/IMM_H which are for
internal use inside ppc-opcode.h
Those macros are self explanatory, comments can go as well.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5a167b8ba4d33a5c09cd504f0c862e25ffe85459.1621516826.git.christophe.leroy@csgroup.eu
|
|
On the road to removing all PPC_INST_xx defines in
asm/ppc-opcodes.h, change PPC_INST_NOP to PPC_RAW_NOP().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ad46c195ca1b8572629ef07ba6bfe247585239a6.1621506159.git.christophe.leroy@csgroup.eu
|
|
Start using PPC_RAW_xx() macros where relevant.
PPC_INST_SYNC is used to both represent the 'sync' instruction and
the family of synchronisation instructions. Keep it for the later,
maybe we'll change the name in the future to avoid confusion.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0945c155d6cb113431185fc1296ac127359fe29b.1621506159.git.christophe.leroy@csgroup.eu
|
|
Use PPC_RAW_xxx() macros instead of open coding assembly
opcodes.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix bad converison in do_stf_exit_barrier_fixups()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e79cd8e111ca13bf8c61a384bac365aa7e207647.1621506159.git.christophe.leroy@csgroup.eu
|
|
Use PPC_RAW_MFLR() instead of open coding with PPC_INST_MFLR.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c1887623e91e8b4da36e669e4c74de86320a5092.1621506159.git.christophe.leroy@csgroup.eu
|
|
On the road to remove all use of PPC_INST_xxx, replace
PPC_INST_BLR by PPC_RAW_BLR(). Same for PPC_INST_NOP.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c04f88d0e53d2122fbbe92226892a01ebc668b6a.1621506159.git.christophe.leroy@csgroup.eu
|
|
To improve readability, use PPC_RAW_xx() macros instead of
open coding. Those macros are self-explanatory so the comments
can go as well.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/99d9ee8849d3992beeadb310a665aae01c3abfb1.1621506159.git.christophe.leroy@csgroup.eu
|
|
To improve readability, use PPC_RAW_xx() macros instead of
open coding. Those macros are self-explanatory so the comments
can go as well.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4ca2bfdca2f47a293d05f61eb3c4e487ee170f1f.1621506159.git.christophe.leroy@csgroup.eu
|
|
Today we have __REG_Rx macros . They are mainly meant for
internal use by macros __PPC_RA() and friends macros which
allows uses like __PPC_RA(R12).
When used with PPC_RAW_xx() macros, it gives a result which is
not very readable.
Add shorter macros _Rx in order to improve readability when
used with PPC_RAW_xx() macros.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ec34d92b7c2f810622261acfeeed4b0a0f4d01bd.1621506159.git.christophe.leroy@csgroup.eu
|
|
At the time being, we have PPC_RAW_PLXVP() and PPC_RAW_PSTXVP() which
provide a 64 bits value, and then it gets split by open coding to
format it into a 'struct ppc_inst' instruction.
Instead, define a PPC_RAW_xxx_P() and a PPC_RAW_xxx_S() to be used
as is.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d146b31b943e7ad674894421db4feef54804b9b.1621506159.git.christophe.leroy@csgroup.eu
|
|
This adds selftests for setb instruction.
Signed-off-by: Sathvika Vasireddy <sathvika@linux.vnet.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b05b61ccb5f10279d46fed490796f32ea2ccc270.1620727160.git.sathvika@linux.vnet.ibm.com
|
|
If the target of a function call is within 32 Mbytes distance, use a
standard function call with 'bl' instead of the 'lis/ori/mtlr/blrl'
sequence.
In the first pass, no memory has been allocated yet and the code
position is not known yet (image pointer is NULL). This pass is there
to calculate the amount of memory to allocate for the EBPF code, so
assume the 4 instructions sequence is required, so that enough memory
is allocated.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/74944a1e3e5cfecc141e440a6ccd37920e186b70.1618227846.git.christophe.leroy@csgroup.eu
|
|
The following opcodes will be needed for the implementation
of eBPF for PPC32. Add them in asm/ppc-opcode.h
PPC_RAW_ADDE
PPC_RAW_ADDZE
PPC_RAW_ADDME
PPC_RAW_MFLR
PPC_RAW_ADDIC
PPC_RAW_ADDIC_DOT
PPC_RAW_SUBFC
PPC_RAW_SUBFE
PPC_RAW_SUBFIC
PPC_RAW_SUBFZE
PPC_RAW_ANDIS
PPC_RAW_NOR
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7bd573a368edd78006f8a5af508c726e7ce1ed2.1616430991.git.christophe.leroy@csgroup.eu
|
|
Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
* Load VSX Vector Paired (lxvp)
* Load VSX Vector Paired Indexed (lxvpx)
* Prefixed Load VSX Vector Paired (plxvp)
* Store VSX Vector Paired (stxvp)
* Store VSX Vector Paired Indexed (stxvpx)
* Prefixed Store VSX Vector Paired (pstxvp)
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-5-ravi.bangoria@linux.ibm.com
|
|
Add PPC_RAW_MFSPR() to replace open coding done in 8xx-pmu.c
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e281e3a611eead8817c49cf06a60072a021af823.1606231483.git.christophe.leroy@csgroup.eu
|
|
Include instruction opcodes for divde and divdeu as macros.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200728130308.1790982-2-bala24@linux.ibm.com
|
|
Add tests for the prefixed versions of the floating-point load/stores
that are currently tested. This includes the following instructions:
* Prefixed Load Floating-Point Single (plfs)
* Prefixed Load Floating-Point Double (plfd)
* Prefixed Store Floating-Point Single (pstfs)
* Prefixed Store Floating-Point Double (pstfd)
Skip the new tests if ISA v3.10 is unsupported.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Fix conflicts with ppc-opcode.h changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200525025923.19843-2-jniethe5@gmail.com
|
|
Add tests for the prefixed versions of the integer load/stores that
are currently tested. This includes the following instructions:
* Prefixed Load Doubleword (pld)
* Prefixed Load Word and Zero (plwz)
* Prefixed Store Doubleword (pstd)
Skip the new tests if ISA v3.1 is unsupported.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Fix conflicts with ppc-opcode.h changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200525025923.19843-1-jniethe5@gmail.com
|
|
Lots of PPC_INST_* macros are used only ever in PPC_* macros, fold
those PPC_INST_* into PPC_RAW_* to avoid using PPC_INST_*
accidentally.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Deal with PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-7-bala24@linux.ibm.com
|
|
Wrap existing stringify macros to reuse raw instruction encoding
macros that are newly added.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-6-bala24@linux.ibm.com
|
|
Move macro definitions of powerpc instructions from bpf_jit.h to
ppc-opcode.h and adopt the users of the macros accordingly. `PPC_MR()`
is defined twice in bpf_jit.h, remove the duplicate one.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-5-bala24@linux.ibm.com
|
|
Few ppc instructions are encoded in test_emulate_step.c, consolidate
them and use it from ppc-opcode.h
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-3-bala24@linux.ibm.com
|
|
Introduce PPC_RAW_* macros to have all the bare encoding of ppc
instructions. Move `VSX_XX*()` and `TMRN()` macros up to reuse it.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200624113038.908074-2-bala24@linux.ibm.com
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200609070610.846703-6-npiggin@gmail.com
|
|
POWER10 introduces two new variants of dcbf instructions (dcbstps and dcbfps)
that can be used to write modified locations back to persistent storage.
Additionally, POWER10 also introduce phwsync and plwsync which can be used
to establish order of these writes to persistent storage.
This patch exposes these instructions to the rest of the kernel. The existing
dcbf and hwsync instructions in P8 and P9 are adequate to enable appropriate
synchronization with OpenCAPI-hosted persistent storage. Hence the new
instructions are added as a variant of the old ones that old hardware
won't differentiate.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-3-aneesh.kumar@linux.ibm.com
|
|
Returning from an interrupt or syscall to a signal handler currently
begins execution directly at the handler's entry point, with LR set to
the address of the sigreturn trampoline. When the signal handler
function returns, it runs the trampoline. It looks like this:
# interrupt at user address xyz
# kernel stuff... signal is raised
rfid
# void handler(int sig)
addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
mflr 0
std 0,16(1)
stdu 1,-96(1)
# handler stuff
ld 0,16(1)
mtlr 0
blr
# __kernel_sigtramp_rt64
addi r1,r1,__SIGNAL_FRAMESIZE
li r0,__NR_rt_sigreturn
sc
# kernel executes rt_sigreturn
rfid
# back to user address xyz
Note the blr with no matching bl. This can corrupt the return
predictor.
Solve this by instead resuming execution at the signal trampoline
which then calls the signal handler. qtrace-tools link_stack checker
confirms the entire user/kernel/vdso cycle is balanced after this
patch, whereas it's not upstream.
Alan confirms the dwarf unwind info still looks good. gdb still
recognises the signal frame and can step into parent frames if it
break inside a signal handler.
Performance is pretty noisy, not a very significant change on a POWER9
here, but branch misses are consistently a lot lower on a
microbenchmark:
Performance counter stats for './signal':
13,085.72 msec task-clock # 1.000 CPUs utilized
45,024,760,101 cycles # 3.441 GHz
65,102,895,542 instructions # 1.45 insn per cycle
11,271,673,787 branches # 861.372 M/sec
59,468,979 branch-misses # 0.53% of all branches
12,989.09 msec task-clock # 1.000 CPUs utilized
44,692,719,559 cycles # 3.441 GHz
65,109,984,964 instructions # 1.46 insn per cycle
11,282,136,057 branches # 868.585 M/sec
39,786,942 branch-misses # 0.35% of all branches
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200511101952.1463138-1-npiggin@gmail.com
|