linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
5 days	bpf, arm64: inline bpf_get_smp_processor_id() helper	Puranjay Mohan
	Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting a read from struct thread_info. The SP_EL0 system register holds the pointer to the task_struct and thread_info is the first member of this struct. We can read the cpu number from the thread_info. Here is how the ARM64 JITed assembly changes after this commit: ARM64 JIT =========== BEFORE AFTER -------- ------- int cpu = bpf_get_smp_processor_id(); int cpu = bpf_get_smp_processor_id(); mov x10, #0xfffffffffffff4d0 mrs x10, sp_el0 movk x10, #0x802b, lsl #16 ldr w7, [x10, #24] movk x10, #0x8000, lsl #32 blr x10 add x7, x0, #0x0 Performance improvement using benchmark[1] ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+-------------------+-------------------+--------------+ \| Name \| Before \| After \| % change \| \|---------------+-------------------+-------------------+--------------\| \| glob-arr-inc \| 23.380 ± 1.675M/s \| 25.893 ± 0.026M/s \| + 10.74% \| \| arr-inc \| 23.928 ± 0.034M/s \| 25.213 ± 0.063M/s \| + 5.37% \| \| hash-inc \| 12.352 ± 0.005M/s \| 12.609 ± 0.013M/s \| + 2.08% \| +---------------+-------------------+-------------------+--------------+ [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 days	arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs	Puranjay Mohan
	Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs. Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu access using tpidr_el1"), the per-cpu offset for the CPU is stored in the tpidr_el1/2 register of that CPU. To support this BPF instruction in the ARM64 JIT, the following ARM64 instructions are emitted: mov dst, src // Move src to dst, if src != dst mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp. add dst, dst, tmp // Add the per cpu offset to the dst. To measure the performance improvement provided by this change, the benchmark in [1] was used: Before: glob-arr-inc : 23.597 ± 0.012M/s arr-inc : 23.173 ± 0.019M/s hash-inc : 12.186 ± 0.028M/s After: glob-arr-inc : 23.819 ± 0.034M/s arr-inc : 23.285 ± 0.017M/s hash-inc : 12.419 ± 0.011M/s [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-18	bpf, arm64: Support signed div/mod instructions	Xu Kuohai
	Add JIT for signed div/mod instructions. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-7-xukuohai@huaweicloud.com
2023-08-18	bpf, arm64: Support sign-extension mov instructions	Xu Kuohai
	Add JIT support for BPF sign-extension mov instructions with arm64 SXTB/SXTH/SXTW instructions. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-4-xukuohai@huaweicloud.com
2023-08-18	bpf, arm64: Support sign-extension load instructions	Xu Kuohai
	Add JIT support for sign-extension load instructions. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-3-xukuohai@huaweicloud.com
2023-04-03	bpf, arm64: Fixed a BTI error on returning to patched function	Xu Kuohai
	When BPF_TRAMP_F_CALL_ORIG is set, BPF trampoline uses BLR to jump back to the instruction next to call site to call the patched function. For BTI-enabled kernel, the instruction next to call site is usually PACIASP, in this case, it's safe to jump back with BLR. But when the call site is not followed by a PACIASP or bti, a BTI exception is triggered. Here is a fault log: Unhandled 64-bit el1h sync exception on CPU0, ESR 0x0000000034000002 -- BTI CPU: 0 PID: 263 Comm: test_progs Tainted: GF Hardware name: linux,dummy-virt (DT) pstate: 40400805 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=-c) pc : bpf_fentry_test1+0xc/0x30 lr : bpf_trampoline_6442573892_0+0x48/0x1000 sp : ffff80000c0c3a50 x29: ffff80000c0c3a90 x28: ffff0000c2e6c080 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000050 x23: 0000000000000000 x22: 0000ffffcfd2a7f0 x21: 000000000000000a x20: 0000ffffcfd2a7f0 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffcfd2a7f0 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: ffff80000914f5e4 x9 : ffff8000082a1528 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0101010101010101 x5 : 0000000000000000 x4 : 00000000fffffff2 x3 : 0000000000000001 x2 : ffff8001f4b82000 x1 : 0000000000000000 x0 : 0000000000000001 Kernel panic - not syncing: Unhandled exception CPU: 0 PID: 263 Comm: test_progs Tainted: GF Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0xec/0x144 show_stack+0x24/0x7c dump_stack_lvl+0x8c/0xb8 dump_stack+0x18/0x34 panic+0x1cc/0x3ec __el0_error_handler_common+0x0/0x130 el1h_64_sync_handler+0x60/0xd0 el1h_64_sync+0x78/0x7c bpf_fentry_test1+0xc/0x30 bpf_fentry_test1+0xc/0x30 bpf_prog_test_run_tracing+0xdc/0x2a0 __sys_bpf+0x438/0x22a0 __arm64_sys_bpf+0x30/0x54 invoke_syscall+0x78/0x110 el0_svc_common.constprop.0+0x6c/0x1d0 do_el0_svc+0x38/0xe0 el0_svc+0x30/0xd0 el0t_64_sync_handler+0x1ac/0x1b0 el0t_64_sync+0x1a0/0x1a4 Kernel Offset: disabled CPU features: 0x0000,00034c24,f994fdab Memory Limit: none And the instruction next to call site of bpf_fentry_test1 is ADD, not PACIASP: <bpf_fentry_test1>: bti c nop nop add w0, w0, #0x1 paciasp For BPF prog, JIT always puts a PACIASP after call site for BTI-enabled kernel, so there is no problem. To fix it, replace BLR with RET to bypass the branch target check. Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64") Reported-by: Florent Revest <revest@chromium.org> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230401234144.3719742-1-xukuohai@huaweicloud.com
2022-07-11	bpf, arm64: Implement bpf_arch_text_poke() for arm64	Xu Kuohai
	Implement bpf_arch_text_poke() for arm64, so bpf prog or bpf trampoline can be patched with it. When the target address is NULL, the original instruction is patched to a NOP. When the target address and the source address are within the branch range, the original instruction is patched to a bl instruction to the target address directly. To support attaching bpf trampoline to both regular kernel function and bpf prog, we follow the ftrace patchsite way for bpf prog. That is, two instructions are inserted at the beginning of bpf prog, the first one saves the return address to x9, and the second is a nop which will be patched to a bl instruction when a bpf trampoline is attached. However, when a bpf trampoline is attached to bpf prog, the distance between target address and source address may exceed 128MB, the maximum branch range, because bpf trampoline and bpf prog are allocated separately with vmalloc. So long jump should be handled. When a bpf prog is constructed, a plt pointing to empty trampoline dummy_tramp is placed at the end: bpf_prog: mov x9, lr nop // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target This is also the state when no trampoline is attached. When a short-jump bpf trampoline is attached, the patchsite is patched to a bl instruction to the trampoline directly: bpf_prog: mov x9, lr bl <short-jump bpf trampoline address> // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target When a long-jump bpf trampoline is attached, the plt target is filled with the trampoline address and the patchsite is patched to a bl instruction to the plt: bpf_prog: mov x9, lr bl plt // patchsite ... ret plt: ldr x10, target br x10 target: .quad <long-jump bpf trampoline address> dummy_tramp is used to prevent another CPU from jumping to an unknown location during the patching process, making the patching process easier. The patching process is as follows: 1. when neither the old address or the new address is a long jump, the patchsite is replaced with a bl to the new address, or nop if the new address is NULL; 2. when the old address is not long jump but the new one is, the branch target address is written to plt first, then the patchsite is replaced with a bl instruction to the plt; 3. when the old address is long jump but the new one is not, the address of dummy_tramp is written to plt first, then the patchsite is replaced with a bl to the new address, or a nop if the new address is NULL; 4. when both the old address and the new address are long jump, the new address is written to plt and the patchsite is not changed. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-4-xukuohai@huawei.com
2022-04-06	bpf, arm64: Sign return address for JITed code	Xu Kuohai
	Sign return address for JITed code when the kernel is built with pointer authentication enabled: 1. Sign LR with paciasp instruction before LR is pushed to stack. Since paciasp acts like landing pads for function entry, no need to insert bti instruction before paciasp. 2. Authenticate LR with autiasp instruction after LR is popped from stack. For BPF tail call, the stack frame constructed by the caller is reused by the callee. That is, the stack frame is constructed by the caller and destructed by the callee. Thus LR is signed and pushed to the stack in the caller's prologue, and poped from the stack and authenticated in the callee's epilogue. For BPF2BPF call, the caller and callee construct their own stack frames, and sign and authenticate their own LRs. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf Link: https://lore.kernel.org/bpf/20220402073942.3782529-1-xukuohai@huawei.com
2022-04-01	bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate offset)	Xu Kuohai
	The current BPF store/load instruction is translated by the JIT into two instructions. The first instruction moves the immediate offset into a temporary register. The second instruction uses this temporary register to do the real store/load. In fact, arm64 supports addressing with immediate offsets. So This patch introduces optimization that uses arm64 str/ldr instruction with immediate offset when the offset fits. Example of generated instuction for r2 = (u64 )(r1 + 0): without optimization: mov x10, 0 ldr x1, [x0, x10] with optimization: ldr x1, [x0, 0] If the offset is negative, or is not aligned correctly, or exceeds max value, rollback to the use of temporary register. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220321152852.2334294-3-xukuohai@huawei.com
2022-02-28	bpf, arm64: Support more atomic operations	Hou Tao
	Atomics for eBPF patch series adds support for atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and atomic[64]_{xchg\|cmpxchg}, but it only adds support for x86-64, so support these atomic operations for arm64 as well. Basically the implementation procedure is almost mechanical translation of code snippets in atomic_ll_sc.h & atomic_lse.h & cmpxchg.h located under arch/arm64/include/asm. When LSE atomic is unavailable, an extra temporary register is needed for (BPF_ADD \| BPF_FETCH) to save the value of src register, instead of adding TMP_REG_4 just use BPF_REG_AX instead. Also make emit_lse_atomic() as an empty inline function when CONFIG_ARM64_LSE_ATOMICS is disabled. For cpus_have_cap(ARM64_HAS_LSE_ATOMICS) case and no-LSE-ATOMICS case, the following three tests: "./test_verifier", "./test_progs -t atomic" and "insmod ./test_bpf.ko" are exercised and passed. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220217072232.1186625-4-houtao1@huawei.com
2022-02-22	arm64: insn: add encoders for atomic operations	Hou Tao
	It is a preparation patch for eBPF atomic supports under arm64. eBPF needs support atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and atomic[64]_{xchg\|cmpxchg}. The ordering semantics of eBPF atomics are the same with the implementations in linux kernel. Add three helpers to support LDCLR/LDEOR/LDSET/SWP, CAS and DMB instructions. STADD/STCLR/STEOR/STSET are simply encoded as aliases for LDADD/LDCLR/LDEOR/LDSET with XZR as the destination register, so no extra helper is added. atomic_fetch_add() and other atomic ops needs support for STLXR instruction, so extend enum aarch64_insn_ldst_type to do that. LDADD/LDEOR/LDSET/SWP and CAS instructions are only available when LSE atomics is enabled, so just return AARCH64_BREAK_FAULT directly in these newly-added helpers if CONFIG_ARM64_LSE_ATOMICS is disabled. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20220217072232.1186625-3-houtao1@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2020-05-28	Merge branch 'for-next/bti' into for-next/core	Will Deacon
	Support for Branch Target Identification (BTI) in user and kernel (Mark Brown and others) * for-next/bti: (39 commits) arm64: vdso: Fix CFI directives in sigreturn trampoline arm64: vdso: Don't prefix sigreturn trampoline with a BTI C instruction arm64: bti: Fix support for userspace only BTI arm64: kconfig: Update and comment GCC version check for kernel BTI arm64: vdso: Map the vDSO text with guarded pages when built for BTI arm64: vdso: Force the vDSO to be linked as BTI when built for BTI arm64: vdso: Annotate for BTI arm64: asm: Provide a mechanism for generating ELF note for BTI arm64: bti: Provide Kconfig for kernel mode BTI arm64: mm: Mark executable text as guarded pages arm64: bpf: Annotate JITed code for BTI arm64: Set GP bit in kernel page tables to enable BTI for the kernel arm64: asm: Override SYM_FUNC_START when building the kernel with BTI arm64: bti: Support building kernel C code using BTI arm64: Document why we enable PAC support for leaf functions arm64: insn: Report PAC and BTI instructions as skippable arm64: insn: Don't assume unrecognized HINTs are skippable arm64: insn: Provide a better name for aarch64_insn_is_nop() arm64: insn: Add constants for new HINT instruction decode arm64: Disable old style assembly annotations ...
2020-05-11	bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates	Luke Nelson
	The current code for BPF_{ADD,SUB} BPF_K loads the BPF immediate to a temporary register before performing the addition/subtraction. Similarly, BPF_JMP BPF_K cases load the immediate to a temporary register before comparison. This patch introduces optimizations that use arm64 immediate add, sub, cmn, or cmp instructions when the BPF immediate fits. If the immediate does not fit, it falls back to using a temporary register. Example of generated code for BPF_ALU64_IMM(BPF_ADD, R0, 2): without optimization: 24: mov x10, #0x2 28: add x7, x7, x10 with optimization: 24: add x7, x7, #0x2 The code could use A64_{ADD,SUB}_I directly and check if it returns AARCH64_BREAK_FAULT, similar to how logical immediates are handled. However, aarch64_insn_gen_add_sub_imm from insn.c prints error messages when the immediate does not fit, and it's simpler to check if the immediate fits ahead of time. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20200508181547.24783-4-luke.r.nels@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2020-05-11	bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical immediates	Luke Nelson
	The current code for BPF_{AND,OR,XOR,JSET} BPF_K loads the immediate to a temporary register before use. This patch changes the code to avoid using a temporary register when the BPF immediate is encodable using an arm64 logical immediate instruction. If the encoding fails (due to the immediate not being encodable), it falls back to using a temporary register. Example of generated code for BPF_ALU32_IMM(BPF_AND, R0, 0x80000001): without optimization: 24: mov w10, #0x8000ffff 28: movk w10, #0x1 2c: and w7, w7, w10 with optimization: 24: and w7, w7, #0x80000001 Since the encoding process is quite complex, the JIT reuses existing functionality in arch/arm64/kernel/insn.c for encoding logical immediates rather than duplicate it in the JIT. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20200508181547.24783-3-luke.r.nels@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2020-05-07	arm64: bpf: Annotate JITed code for BTI	Mark Brown
	In order to extend the protection offered by BTI to all code executing in kernel mode we need to annotate JITed BPF code appropriately for BTI. To do this we need to add a landing pad to the start of each BPF function and also immediately after the function prologue if we are emitting a function which can be tail called. Jumps within BPF functions are all to immediate offsets and therefore do not require landing pads. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200506195138.22086-6-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2019-09-03	arm64: bpf: optimize modulo operation	Jerin Jacob
	Optimize modulo operation instruction generation by using single MSUB instruction vs MUL followed by SUB instruction scheme. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-19	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234	Thomas Gleixner
	Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-26	bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd	Daniel Borkmann
	Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016, lets add support for STADD and use that in favor of LDXR / STXR loop for the XADD mapping if available. STADD is encoded as an alias for LDADD with XZR as the destination register, therefore add LDADD to the instruction encoder along with STADD as special case and use it in the JIT for CPUs that advertise LSE atomics in CPUID register. If immediate offset in the BPF XADD insn is 0, then use dst register directly instead of temporary one. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26	bpf, arm64: remove prefetch insn in xadd mapping	Daniel Borkmann
	Prefetch-with-intent-to-write is currently part of the XADD mapping in the AArch64 JIT and follows the kernel's implementation of atomic_add. This may interfere with other threads executing the LDXR/STXR loop, leading to potential starvation and fairness issues. Drop the optional prefetch instruction. Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-08-09	bpf, arm64: implement jiting of BPF_J{LT, LE, SLT, SLE}	Daniel Borkmann
	This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions with BPF_X/BPF_K variants for the arm64 eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02	bpf, arm64: implement jiting of BPF_XADD	Daniel Borkmann
	This work adds BPF_XADD for BPF_W/BPF_DW to the arm64 JIT and therefore completes JITing of all BPF instructions, meaning we can thus also remove the 'notyet' label and do not need to fall back to the interpreter when BPF_XADD is used in a program! This now also brings arm64 JIT in line with x86_64, s390x, ppc64, sparc64, where all current eBPF features are supported. BPF_W example from test_bpf: .u.insns_int = { BPF_ALU32_IMM(BPF_MOV, R0, 0x12), BPF_ST_MEM(BPF_W, R10, -40, 0x10), BPF_STX_XADD(BPF_W, R10, R0, -40), BPF_LDX_MEM(BPF_W, R0, R10, -40), BPF_EXIT_INSN(), }, [...] 00000020: 52800247 mov w7, #0x12 // #18 00000024: 928004eb mov x11, #0xffffffffffffffd8 // #-40 00000028: d280020a mov x10, #0x10 // #16 0000002c: b82b6b2a str w10, [x25,x11] // start of xadd mapping: 00000030: 928004ea mov x10, #0xffffffffffffffd8 // #-40 00000034: 8b19014a add x10, x10, x25 00000038: f9800151 prfm pstl1strm, [x10] 0000003c: 885f7d4b ldxr w11, [x10] 00000040: 0b07016b add w11, w11, w7 00000044: 880b7d4b stxr w11, w11, [x10] 00000048: 35ffffab cbnz w11, 0x0000003c // end of xadd mapping: [...] BPF_DW example from test_bpf: .u.insns_int = { BPF_ALU32_IMM(BPF_MOV, R0, 0x12), BPF_ST_MEM(BPF_DW, R10, -40, 0x10), BPF_STX_XADD(BPF_DW, R10, R0, -40), BPF_LDX_MEM(BPF_DW, R0, R10, -40), BPF_EXIT_INSN(), }, [...] 00000020: 52800247 mov w7, #0x12 // #18 00000024: 928004eb mov x11, #0xffffffffffffffd8 // #-40 00000028: d280020a mov x10, #0x10 // #16 0000002c: f82b6b2a str x10, [x25,x11] // start of xadd mapping: 00000030: 928004ea mov x10, #0xffffffffffffffd8 // #-40 00000034: 8b19014a add x10, x10, x25 00000038: f9800151 prfm pstl1strm, [x10] 0000003c: c85f7d4b ldxr x11, [x10] 00000040: 8b07016b add x11, x11, x7 00000044: c80b7d4b stxr w11, x11, [x10] 00000048: 35ffffab cbnz w11, 0x0000003c // end of xadd mapping: [...] Tested on Cavium ThunderX ARMv8, test suite results after the patch: No JIT: [ 3751.855362] test_bpf: Summary: 311 PASSED, 0 FAILED, [0/303 JIT'ed] With JIT: [ 3573.759527] test_bpf: Summary: 311 PASSED, 0 FAILED, [303/303 JIT'ed] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10	arm64: bpf: implement bpf_tail_call() helper	Zi Shen Lim
	Add support for JMP_CALL_X (tail call) introduced by commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs"). bpf_tail_call() arguments: ctx - context pointer passed to next program array - pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY index - index inside array that selects specific program to run In this implementation arm64 JIT jumps into callee program after prologue, so callee program reuses the same stack. For tail_call_cnt, we use the callee-saved R26 (which was already saved/restored but previously unused by JIT). With this patch a tail call generates the following code on arm64: if (index >= array->map.max_entries) goto out; 34: mov x10, #0x10 // #16 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x0000000000000074 if (tail_call_cnt > MAX_TAIL_CALL_CNT) goto out; tail_call_cnt++; 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x0000000000000074 50: add x26, x26, #0x1 prog = array->ptrs[index]; if (prog == NULL) goto out; 54: mov x10, #0x68 // #104 58: ldr x10, [x1,x10] 5c: ldr x11, [x10,x2] 60: cbz x11, 0x0000000000000074 goto *(prog->bpf_func + prologue_size); 64: mov x10, #0x20 // #32 68: ldr x10, [x11,x10] 6c: add x10, x10, #0x20 70: br x10 74: Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-06	arm64: bpf: fix div-by-zero case	Zi Shen Lim
	In the case of division by zero in a BPF program: A = A / X; (X == 0) the expected behavior is to terminate with return value 0. This is confirmed by the test case introduced in commit 86bf1721b226 ("test_bpf: add tests checking that JIT/interpreter sets A and X to 0."). Reported-by: Yang Shi <yang.shi@linaro.org> Tested-by: Yang Shi <yang.shi@linaro.org> CC: Xi Wang <xi.wang@gmail.com> CC: Alexei Starovoitov <ast@plumgrid.com> CC: linux-arm-kernel@lists.infradead.org CC: linux-kernel@vger.kernel.org Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Cc: <stable@vger.kernel.org> # 3.18+ Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-06-26	arm64: bpf: fix endianness conversion bugs	Xi Wang
	Upper bits should be zeroed in endianness conversion: - even when there's no need to change endianness (i.e., BPF_FROM_BE on big endian or BPF_FROM_LE on little endian); - after rev16. This patch fixes such bugs by emitting extra instructions to clear upper bits. Cc: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Cc: <stable@vger.kernel.org> # 3.18+ Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-10-20	arm64: bpf: add 'shift by register' instructions	Zi Shen Lim
	Commit 72b603ee8cfc ("bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT") noted support for 'shift by register' in eBPF and added support for it for x64. Let's enable this for arm64 as well. The arm64 eBPF JIT compiler now passes the new 'shift by register' test case introduced in the same commit 72b603ee8cfc. Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-09-08	arm64: eBPF JIT compiler	Zi Shen Lim
	The JIT compiler emits A64 instructions. It supports eBPF only. Legacy BPF is supported thanks to conversion by BPF core. JIT is enabled in the same way as for other architectures: echo 1 > /proc/sys/net/core/bpf_jit_enable Or for additional compiler output: echo 2 > /proc/sys/net/core/bpf_jit_enable See Documentation/networking/filter.txt for more information. The implementation passes all 57 tests in lib/test_bpf.c on ARMv8 Foundation Model :) Also tested by Will on Juno platform. Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>