Age | Commit message (Collapse) | Author |
|
Andrii Nakryiko says:
====================
BPF control flow graph and precision backtrack fixes
A small fix to BPF verifier's CFG logic around handling and reporting ldimm64
instructions. Patch #1 was previously submitted separately ([0]), and so this
patch set supersedes that patch.
Second patch is fixing obscure corner case in mark_chain_precise() logic. See
patch for details. Patch #3 adds a dedicated test, however fragile it might.
[0] https://patchwork.kernel.org/project/netdevbpf/patch/20231101205626.119243-1-andrii@kernel.org/
====================
Link: https://lore.kernel.org/r/20231110002638.4168352-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a dedicated selftests to try to set up conditions to have a state
with same first and last instruction index, but it actually is a loop
3->4->1->2->3. This confuses mark_chain_precision() if verifier doesn't
take into account jump history.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231110002638.4168352-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fix an edge case in __mark_chain_precision() which prematurely stops
backtracking instructions in a state if it happens that state's first
and last instruction indexes are the same. This situations doesn't
necessarily mean that there were no instructions simulated in a state,
but rather that we starting from the instruction, jumped around a bit,
and then ended up at the same instruction before checkpointing or
marking precision.
To distinguish between these two possible situations, we need to consult
jump history. If it's empty or contain a single record "bridging" parent
state and first instruction of processed state, then we indeed
backtracked all instructions in this state. But if history is not empty,
we are definitely not done yet.
Move this logic inside get_prev_insn_idx() to contain it more nicely.
Use -ENOENT return code to denote "we are out of instructions"
situation.
This bug was exposed by verifier_loop1.c's bounded_recursion subtest, once
the next fix in this patch set is applied.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231110002638.4168352-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
ldimm64 instructions are 16-byte long, and so have to be handled
appropriately in check_cfg(), just like the rest of BPF verifier does.
This has implications in three places:
- when determining next instruction for non-jump instructions;
- when determining next instruction for callback address ldimm64
instructions (in visit_func_call_insn());
- when checking for unreachable instructions, where second half of
ldimm64 is expected to be unreachable;
We take this also as an opportunity to report jump into the middle of
ldimm64. And adjust few test_verifier tests accordingly.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Fixes: 475fb78fbf48 ("bpf: verifier (add branch/goto checks)")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231110002638.4168352-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Crossbuilding selftests/bpf for architecture arm64, format specifies
type error show up like.
xskxceiver.c:912:34: error: format specifies type 'int' but the argument
has type '__u64' (aka 'unsigned long long') [-Werror,-Wformat]
ksft_print_msg("[%s] expected meta_count [%d], got meta_count [%d]\n",
~~
%llu
__func__, pkt->pkt_nb, meta->count);
^~~~~~~~~~~
xskxceiver.c:929:55: error: format specifies type 'unsigned long long' but
the argument has type 'u64' (aka 'unsigned long') [-Werror,-Wformat]
ksft_print_msg("Frag invalid addr: %llx len: %u\n", addr, len);
~~~~ ^~~~
Fixing the issues by casting to (unsigned long long) and changing the
specifiers to be %llu from %d and %u, since with u64s it might be %llx
or %lx, depending on architecture.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20231109174328.1774571-1-anders.roxell@linaro.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Martin and Vadim reported a verifier failure with bpf_dynptr usage.
The issue is mentioned but Vadim workarounded the issue with source
change ([1]). The below describes what is the issue and why there
is a verification failure.
int BPF_PROG(skb_crypto_setup) {
struct bpf_dynptr algo, key;
...
bpf_dynptr_from_mem(..., ..., 0, &algo);
...
}
The bpf program is using vmlinux.h, so we have the following definition in
vmlinux.h:
struct bpf_dynptr {
long: 64;
long: 64;
};
Note that in uapi header bpf.h, we have
struct bpf_dynptr {
long: 64;
long: 64;
} __attribute__((aligned(8)));
So we lost alignment information for struct bpf_dynptr by using vmlinux.h.
Let us take a look at a simple program below:
$ cat align.c
typedef unsigned long long __u64;
struct bpf_dynptr_no_align {
__u64 :64;
__u64 :64;
};
struct bpf_dynptr_yes_align {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
void bar(void *, void *);
int foo() {
struct bpf_dynptr_no_align a;
struct bpf_dynptr_yes_align b;
bar(&a, &b);
return 0;
}
$ clang --target=bpf -O2 -S -emit-llvm align.c
Look at the generated IR file align.ll:
...
%a = alloca %struct.bpf_dynptr_no_align, align 1
%b = alloca %struct.bpf_dynptr_yes_align, align 8
...
The compiler dictates the alignment for struct bpf_dynptr_no_align is 1 and
the alignment for struct bpf_dynptr_yes_align is 8. So theoretically compiler
could allocate variable %a with alignment 1 although in reallity the compiler
may choose a different alignment by considering other local variables.
In [1], the verification failure happens because variable 'algo' is allocated
on the stack with alignment 4 (fp-28). But the verifer wants its alignment
to be 8.
To fix the issue, the RFC patch ([1]) tried to add '__attribute__((aligned(8)))'
to struct bpf_dynptr plus other similar structs. Andrii suggested that
we could directly modify uapi struct with named fields like struct 'bpf_iter_num':
struct bpf_iter_num {
/* opaque iterator state; having __u64 here allows to preserve correct
* alignment requirements in vmlinux.h, generated from BTF
*/
__u64 __opaque[1];
} __attribute__((aligned(8)));
Indeed, adding named fields for those affected structs in this patch can preserve
alignment when bpf program references them in vmlinux.h. With this patch,
the verification failure in [1] can also be resolved.
[1] https://lore.kernel.org/bpf/1b100f73-7625-4c1f-3ae5-50ecf84d3ff0@linux.dev/
[2] https://lore.kernel.org/bpf/20231103055218.2395034-1-yonghong.song@linux.dev/
Cc: Vadim Fedorenko <vadfed@meta.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231104024900.1539182-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Dave Marchevsky says:
====================
Allow bpf_refcount_acquire of mapval obtained via direct LD
Consider this BPF program:
struct cgv_node {
int d;
struct bpf_refcount r;
struct bpf_rb_node rb;
};
struct val_stash {
struct cgv_node __kptr *v;
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, struct val_stash);
__uint(max_entries, 10);
} array_map SEC(".maps");
long bpf_program(void *ctx)
{
struct val_stash *mapval;
struct cgv_node *p;
int idx = 0;
mapval = bpf_map_lookup_elem(&array_map, &idx);
if (!mapval || !mapval->v) { /* omitted */ }
p = bpf_refcount_acquire(mapval->v); /* Verification FAILs here */
/* Add p to some tree */
return 0;
}
Verification fails on the refcount_acquire:
160: (79) r1 = *(u64 *)(r8 +8) ; R1_w=untrusted_ptr_or_null_cgv_node(id=11,off=0,imm=0) R8_w=map_value(id=10,off=0,ks=8,vs=16,imm=0) refs=6
161: (b7) r2 = 0 ; R2_w=0 refs=6
162: (85) call bpf_refcount_acquire_impl#117824
arg#0 is neither owning or non-owning ref
The above verifier dump is actually from sched_ext's scx_flatcg [0],
which is the motivating usecase for this series' changes. Specifically,
scx_flatcg stashes a rb_node type w/ cgroup-specific info (struct
cgv_node) in a map when the cgroup is created, then later puts that
cgroup's node in a rbtree in .enqueue . Making struct cgv_node
refcounted would simplify the code a bit by virtue of allowing us to
remove the kptr_xchg's, but "later puts that cgroups node in a rbtree"
is not possible without a refcount_acquire, which suffers from the above
verification failure.
If we get rid of PTR_UNTRUSTED flag, and add MEM_ALLOC | NON_OWN_REF,
mapval->v would be a non-owning ref and verification would succeed. Due
to the most recent set of refcount changes [1], which modified
bpf_obj_drop behavior to not reuse refcounted graph node's underlying
memory until after RCU grace period, this is safe to do. Once mapval->v
has the correct flags it _is_ a non-owning reference and verification of
the motivating example will succeed.
[0]: https://github.com/sched-ext/sched_ext/blob/52911e1040a0f94b9c426dddcc00be5364a7ae9f/tools/sched_ext/scx_flatcg.bpf.c#L275
[1]: https://lore.kernel.org/bpf/20230821193311.3290257-1-davemarchevsky@fb.com/
Summary of patches:
* Patch 1 fixes an issue with bpf_refcount_acquire verification
letting MAYBE_NULL ptrs through
* Patch 2 tests Patch 1's fix
* Patch 3 broadens the use of "free only after RCU GP" to all
user-allocated types
* Patch 4 is a small nonfunctional refactoring
* Patch 5 changes verifier to mark direct LD of stashed graph node
kptr as non-owning ref
* Patch 6 tests Patch 5's verifier changes
Changelog:
v1 -> v2: https://lore.kernel.org/bpf/20231025214007.2920506-1-davemarchevsky@fb.com/
Series title changed to "Allow bpf_refcount_acquire of mapval obtained via
direct LD". V1's title was mistakenly truncated.
* Patch 5 ("bpf: Mark direct ld of stashed bpf_{rb,list}_node as non-owning ref")
* Direct LD of percpu kptr should not have MEM_ALLOC flag (Yonghong)
* Patch 6 ("selftests/bpf: Test bpf_refcount_acquire of node obtained via direct ld")
* Test read from stashed value as well
====================
Link: https://lore.kernel.org/r/20231107085639.3016113-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The addition of is_reg_const() in commit 171de12646d2 ("bpf: generalize
is_branch_taken to handle all conditional jumps in one place") has made the
register_is_const() redundant. Give the former has more feature, plus the
fact the latter is only used in one place, replace register_is_const() with
is_reg_const(), and remove the definition of register_is_const.
This requires moving the definition of is_reg_const() further up. And since
the comment of reg_const_value() reference is_reg_const(), move it up as
well.
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231108140043.12282-1-shung-hsi.yu@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch demonstrates that verifier changes earlier in this series
result in bpf_refcount_acquire(mapval->stashed_kptr) passing
verification. The added test additionally validates that stashing a kptr
in mapval and - in a separate BPF program - refcount_acquiring the kptr
without unstashing works as expected at runtime.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20231107085639.3016113-7-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add ability to filter top B results, both in replay/verifier mode and
comparison mode. Just adding `-n10` will emit only first 10 rows, or
less, if there is not enough rows.
This is not just a shortcut instead of passing veristat output through
`head`, though. Filtering out all the other rows influences final table
formatting, as table column widths are calculated based on actual
emitted test.
To demonstrate the difference, compare two "equivalent" forms below, one
using head and another using -n argument.
TOP N FEATURE
=============
[vmuser@archvm bpf]$ sudo ./veristat -C ~/baseline-results-selftests.csv ~/sanity2-results-selftests.csv -e file,prog,insns,states -s '|insns_diff|' -n10
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
---------------------------------------- --------------------- --------- --------- ------------ ---------- ---------- -------------
test_seg6_loop.bpf.linked3.o __add_egr_x 12440 12360 -80 (-0.64%) 364 357 -7 (-1.92%)
async_stack_depth.bpf.linked3.o async_call_root_check 145 145 +0 (+0.00%) 3 3 +0 (+0.00%)
async_stack_depth.bpf.linked3.o pseudo_call_check 139 139 +0 (+0.00%) 3 3 +0 (+0.00%)
atomic_bounds.bpf.linked3.o sub 7 7 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o kmalloc 5 5 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o sched_process_fork 22 22 +0 (+0.00%) 2 2 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o socket_post_create 23 23 +0 (+0.00%) 2 2 +0 (+0.00%)
bind4_prog.bpf.linked3.o bind_v4_prog 358 358 +0 (+0.00%) 33 33 +0 (+0.00%)
bind6_prog.bpf.linked3.o bind_v6_prog 429 429 +0 (+0.00%) 37 37 +0 (+0.00%)
bind_perm.bpf.linked3.o bind_v4_prog 15 15 +0 (+0.00%) 1 1 +0 (+0.00%)
PIPING TO HEAD
==============
[vmuser@archvm bpf]$ sudo ./veristat -C ~/baseline-results-selftests.csv ~/sanity2-results-selftests.csv -e file,prog,insns,states -s '|insns_diff|' | head -n12
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
----------------------------------------------------- ---------------------------------------------------- --------- --------- ------------ ---------- ---------- -------------
test_seg6_loop.bpf.linked3.o __add_egr_x 12440 12360 -80 (-0.64%) 364 357 -7 (-1.92%)
async_stack_depth.bpf.linked3.o async_call_root_check 145 145 +0 (+0.00%) 3 3 +0 (+0.00%)
async_stack_depth.bpf.linked3.o pseudo_call_check 139 139 +0 (+0.00%) 3 3 +0 (+0.00%)
atomic_bounds.bpf.linked3.o sub 7 7 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o kmalloc 5 5 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o sched_process_fork 22 22 +0 (+0.00%) 2 2 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o socket_post_create 23 23 +0 (+0.00%) 2 2 +0 (+0.00%)
bind4_prog.bpf.linked3.o bind_v4_prog 358 358 +0 (+0.00%) 33 33 +0 (+0.00%)
bind6_prog.bpf.linked3.o bind_v6_prog 429 429 +0 (+0.00%) 37 37 +0 (+0.00%)
bind_perm.bpf.linked3.o bind_v4_prog 15 15 +0 (+0.00%) 1 1 +0 (+0.00%)
Note all the wasted whitespace in the "PIPING TO HEAD" variant.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231108051430.1830950-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch enables the following pattern:
/* mapval contains a __kptr pointing to refcounted local kptr */
mapval = bpf_map_lookup_elem(&map, &idx);
if (!mapval || !mapval->some_kptr) { /* omitted */ }
p = bpf_refcount_acquire(&mapval->some_kptr);
Currently this doesn't work because bpf_refcount_acquire expects an
owning or non-owning ref. The verifier defines non-owning ref as a type:
PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
while mapval->some_kptr is PTR_TO_BTF_ID | PTR_UNTRUSTED. It's possible
to do the refcount_acquire by first bpf_kptr_xchg'ing mapval->some_kptr
into a temp kptr, refcount_acquiring that, and xchg'ing back into
mapval, but this is unwieldy and shouldn't be necessary.
This patch modifies btf_ld_kptr_type such that user-allocated types are
marked MEM_ALLOC and if those types have a bpf_{rb,list}_node they're
marked NON_OWN_REF as well. Additionally, due to changes to
bpf_obj_drop_impl earlier in this series, rcu_protected_object now
returns true for all user-allocated types, resulting in
mapval->some_kptr being marked MEM_RCU.
After this patch's changes, mapval->some_kptr is now:
PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
which results in it passing the non-owning ref test, and the motivating
example passing verification.
Future work will likely get rid of special non-owning ref lifetime logic
in the verifier, at which point we'll be able to delete the NON_OWN_REF
flag entirely.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20231107085639.3016113-6-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add ability to sort results by absolute values of specified stats. This
is especially useful to find biggest deviations in comparison mode. When
comparing verifier change effect against a large base of BPF object
files, it's necessary to see big changes both in positive and negative
directions, as both might be a signal for regressions or bugs.
The syntax is natural, e.g., adding `-s '|insns_diff|'^` will instruct
veristat to sort by absolute value of instructions difference in
ascending order.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231108051430.1830950-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This refactoring patch removes the unused BPF_GRAPH_NODE_OR_ROOT
btf_field_type and moves BPF_GRAPH_{NODE,ROOT} macros into the
btf_field_type enum. Further patches in the series will use
BPF_GRAPH_NODE, so let's move this useful definition out of btf.c.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20231107085639.3016113-5-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Martin reported that there is a libbpf complaining of non-zero-value tail
padding with LIBBPF_OPTS_RESET macro if struct bpf_netkit_opts is modified
to have a 4-byte tail padding. This only happens to clang compiler.
The commend line is: ./test_progs -t tc_netkit_multi_links
Martin and I did some investigation and found this indeed the case and
the following are the investigation details.
Clang:
clang version 18.0.0
<I tried clang15/16/17 and they all have similar results>
tools/lib/bpf/libbpf_common.h:
#define LIBBPF_OPTS_RESET(NAME, ...) \
do { \
memset(&NAME, 0, sizeof(NAME)); \
NAME = (typeof(NAME)) { \
.sz = sizeof(NAME), \
__VA_ARGS__ \
}; \
} while (0)
#endif
tools/lib/bpf/libbpf.h:
struct bpf_netkit_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
__u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
size_t :0;
};
#define bpf_netkit_opts__last_field expected_revision
In the above struct bpf_netkit_opts, there is no tail padding.
prog_tests/tc_netkit.c:
static void serial_test_tc_netkit_multi_links_target(int mode, int target)
{
...
LIBBPF_OPTS(bpf_netkit_opts, optl);
...
LIBBPF_OPTS_RESET(optl,
.flags = BPF_F_BEFORE,
.relative_fd = bpf_program__fd(skel->progs.tc1),
);
...
}
Let us make the following source change, note that we have a 4-byte
tailing padding now.
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 6cd9c501624f..0dd83910ae9a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -803,13 +803,13 @@ bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
struct bpf_netkit_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
- __u32 flags;
__u32 relative_fd;
__u32 relative_id;
__u64 expected_revision;
+ __u32 flags;
size_t :0;
};
-#define bpf_netkit_opts__last_field expected_revision
+#define bpf_netkit_opts__last_field flags
The clang 18 generated asm code looks like below:
; LIBBPF_OPTS_RESET(optl,
55e3: 48 8d 7d 98 leaq -0x68(%rbp), %rdi
55e7: 31 f6 xorl %esi, %esi
55e9: ba 20 00 00 00 movl $0x20, %edx
55ee: e8 00 00 00 00 callq 0x55f3 <serial_test_tc_netkit_multi_links_target+0x18d3>
55f3: 48 c7 85 10 fd ff ff 20 00 00 00 movq $0x20, -0x2f0(%rbp)
55fe: 48 8b 85 68 ff ff ff movq -0x98(%rbp), %rax
5605: 48 8b 78 18 movq 0x18(%rax), %rdi
5609: e8 00 00 00 00 callq 0x560e <serial_test_tc_netkit_multi_links_target+0x18ee>
560e: 89 85 18 fd ff ff movl %eax, -0x2e8(%rbp)
5614: c7 85 1c fd ff ff 00 00 00 00 movl $0x0, -0x2e4(%rbp)
561e: 48 c7 85 20 fd ff ff 00 00 00 00 movq $0x0, -0x2e0(%rbp)
5629: c7 85 28 fd ff ff 08 00 00 00 movl $0x8, -0x2d8(%rbp)
5633: 48 8b 85 10 fd ff ff movq -0x2f0(%rbp), %rax
563a: 48 89 45 98 movq %rax, -0x68(%rbp)
563e: 48 8b 85 18 fd ff ff movq -0x2e8(%rbp), %rax
5645: 48 89 45 a0 movq %rax, -0x60(%rbp)
5649: 48 8b 85 20 fd ff ff movq -0x2e0(%rbp), %rax
5650: 48 89 45 a8 movq %rax, -0x58(%rbp)
5654: 48 8b 85 28 fd ff ff movq -0x2d8(%rbp), %rax
565b: 48 89 45 b0 movq %rax, -0x50(%rbp)
; link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);
At -O0 level, the clang compiler creates an intermediate copy.
We have below to store 'flags' with 4-byte store and leave another 4 byte
in the same 8-byte-aligned storage undefined,
5629: c7 85 28 fd ff ff 08 00 00 00 movl $0x8, -0x2d8(%rbp)
and later we store 8-byte to the original zero'ed buffer
5654: 48 8b 85 28 fd ff ff movq -0x2d8(%rbp), %rax
565b: 48 89 45 b0 movq %rax, -0x50(%rbp)
This caused a problem as the 4-byte value at [%rbp-0x2dc, %rbp-0x2e0)
may be garbage.
gcc (gcc 11.4) does not have this issue as it does zeroing struct first before
doing assignments:
; LIBBPF_OPTS_RESET(optl,
50fd: 48 8d 85 40 fc ff ff leaq -0x3c0(%rbp), %rax
5104: ba 20 00 00 00 movl $0x20, %edx
5109: be 00 00 00 00 movl $0x0, %esi
510e: 48 89 c7 movq %rax, %rdi
5111: e8 00 00 00 00 callq 0x5116 <serial_test_tc_netkit_multi_links_target+0x1522>
5116: 48 8b 45 f0 movq -0x10(%rbp), %rax
511a: 48 8b 40 18 movq 0x18(%rax), %rax
511e: 48 89 c7 movq %rax, %rdi
5121: e8 00 00 00 00 callq 0x5126 <serial_test_tc_netkit_multi_links_target+0x1532>
5126: 48 c7 85 40 fc ff ff 00 00 00 00 movq $0x0, -0x3c0(%rbp)
5131: 48 c7 85 48 fc ff ff 00 00 00 00 movq $0x0, -0x3b8(%rbp)
513c: 48 c7 85 50 fc ff ff 00 00 00 00 movq $0x0, -0x3b0(%rbp)
5147: 48 c7 85 58 fc ff ff 00 00 00 00 movq $0x0, -0x3a8(%rbp)
5152: 48 c7 85 40 fc ff ff 20 00 00 00 movq $0x20, -0x3c0(%rbp)
515d: 89 85 48 fc ff ff movl %eax, -0x3b8(%rbp)
5163: c7 85 58 fc ff ff 08 00 00 00 movl $0x8, -0x3a8(%rbp)
; link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);
It is not clear how to resolve the compiler code generation as the compiler
generates correct code w.r.t. how to handle unnamed padding in C standard.
So this patch changed LIBBPF_OPTS_RESET macro to avoid uninitialized tail
padding. We already knows LIBBPF_OPTS macro works on both gcc and clang,
even with tail padding. So LIBBPF_OPTS_RESET is changed to be a
LIBBPF_OPTS followed by a memcpy(), thus avoiding uninitialized tail padding.
The below is asm code generated with this patch and with clang compiler:
; LIBBPF_OPTS_RESET(optl,
55e3: 48 8d bd 10 fd ff ff leaq -0x2f0(%rbp), %rdi
55ea: 31 f6 xorl %esi, %esi
55ec: ba 20 00 00 00 movl $0x20, %edx
55f1: e8 00 00 00 00 callq 0x55f6 <serial_test_tc_netkit_multi_links_target+0x18d6>
55f6: 48 c7 85 10 fd ff ff 20 00 00 00 movq $0x20, -0x2f0(%rbp)
5601: 48 8b 85 68 ff ff ff movq -0x98(%rbp), %rax
5608: 48 8b 78 18 movq 0x18(%rax), %rdi
560c: e8 00 00 00 00 callq 0x5611 <serial_test_tc_netkit_multi_links_target+0x18f1>
5611: 89 85 18 fd ff ff movl %eax, -0x2e8(%rbp)
5617: c7 85 1c fd ff ff 00 00 00 00 movl $0x0, -0x2e4(%rbp)
5621: 48 c7 85 20 fd ff ff 00 00 00 00 movq $0x0, -0x2e0(%rbp)
562c: c7 85 28 fd ff ff 08 00 00 00 movl $0x8, -0x2d8(%rbp)
5636: 48 8b 85 10 fd ff ff movq -0x2f0(%rbp), %rax
563d: 48 89 45 98 movq %rax, -0x68(%rbp)
5641: 48 8b 85 18 fd ff ff movq -0x2e8(%rbp), %rax
5648: 48 89 45 a0 movq %rax, -0x60(%rbp)
564c: 48 8b 85 20 fd ff ff movq -0x2e0(%rbp), %rax
5653: 48 89 45 a8 movq %rax, -0x58(%rbp)
5657: 48 8b 85 28 fd ff ff movq -0x2d8(%rbp), %rax
565e: 48 89 45 b0 movq %rax, -0x50(%rbp)
; link = bpf_program__attach_netkit(skel->progs.tc2, ifindex, &optl);
In the above code, a temporary buffer is zeroed and then has proper value assigned.
Finally, values in temporary buffer are copied to the original variable buffer,
hence tail padding is guaranteed to be 0.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/bpf/20231107201511.2548645-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The use of bpf_mem_free_rcu to free refcounted local kptrs was added
in commit 7e26cd12ad1c ("bpf: Use bpf_mem_free_rcu when
bpf_obj_dropping refcounted nodes"). In the cover letter for the
series containing that patch [0] I commented:
Perhaps it makes sense to move to mem_free_rcu for _all_
non-owning refs in the future, not just refcounted. This might
allow custom non-owning ref lifetime + invalidation logic to be
entirely subsumed by MEM_RCU handling. IMO this needs a bit more
thought and should be tackled outside of a fix series, so it's not
attempted here.
It's time to start moving in the "non-owning refs have MEM_RCU
lifetime" direction. As mentioned in that comment, using
bpf_mem_free_rcu for all local kptrs - not just refcounted - is
necessarily the first step towards that goal. This patch does so.
After this patch the memory pointed to by all local kptrs will not be
reused until RCU grace period elapses. The verifier's understanding of
non-owning ref validity and the clobbering logic it uses to enforce
that understanding are not changed here, that'll happen gradually in
future work, including further patches in the series.
[0]: https://lore.kernel.org/all/20230821193311.3290257-1-davemarchevsky@fb.com/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20231107085639.3016113-4-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test added in this patch exercises the logic fixed in the previous
patch in this series. Before the previous patch's changes,
bpf_refcount_acquire accepts MAYBE_NULL local kptrs; after the change
the verifier correctly rejects the such a call.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20231107085639.3016113-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Refcounted local kptrs are kptrs to user-defined types with a
bpf_refcount field. Recent commits ([0], [1]) modified the lifetime of
refcounted local kptrs such that the underlying memory is not reused
until RCU grace period has elapsed.
Separately, verification of bpf_refcount_acquire calls currently
succeeds for MAYBE_NULL non-owning reference input, which is a problem
as bpf_refcount_acquire_impl has no handling for this case.
This patch takes advantage of aforementioned lifetime changes to tag
bpf_refcount_acquire_impl kfunc KF_RCU, thereby preventing MAYBE_NULL
input to the kfunc. The KF_RCU flag applies to all kfunc params; it's
fine for it to apply to the void *meta__ign param as that's populated by
the verifier and is tagged __ign regardless.
[0]: commit 7e26cd12ad1c ("bpf: Use bpf_mem_free_rcu when
bpf_obj_dropping refcounted nodes") is the actual change to
allocation behaivor
[1]: commit 0816b8c6bf7f ("bpf: Consider non-owning refs to refcounted
nodes RCU protected") modified verifier understanding of
refcounted local kptrs to match [0]'s changes
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Fixes: 7c50b1cb76ac ("bpf: Add bpf_refcount_acquire kfunc")
Link: https://lore.kernel.org/r/20231107085639.3016113-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Song Liu says:
====================
This set contains the first 3 patches of set [1]. Currently, [1] is waiting
for [3] to be merged to bpf-next tree. So send these 3 patches first to
unblock other works, such as [2]. This set is verified with new version of
[1] in CI run [4].
Changes since v12 of [1]:
1. Reuse bpf_dynptr_slice() in __bpf_dynptr_data(). (Andrii)
2. Add Acked-by from Vadim Fedorenko.
[1] https://lore.kernel.org/bpf/20231104001313.3538201-1-song@kernel.org/
[2] https://lore.kernel.org/bpf/20231031134900.1432945-1-vadfed@meta.com/
[3] https://lore.kernel.org/bpf/20231031215625.2343848-1-davemarchevsky@fb.com/
[4] https://github.com/kernel-patches/bpf/actions/runs/6779945063/job/18427926114
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When looking up an element in LPM trie, the condition 'matchlen ==
trie->max_prefixlen' will never return true, if key->prefixlen is larger
than trie->max_prefixlen. Consequently all elements in the LPM trie will
be visited and no element is returned in the end.
To resolve this, check key->prefixlen first before walking the LPM trie.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Signed-off-by: Florian Lehner <dev@der-flo.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231105085801.3742-1-dev@der-flo.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Similar to ARG_PTR_TO_CONST_STR for BPF helpers, KF_ARG_PTR_TO_CONST_STR
specifies kfunc args that point to const strings. Annotation "__str" is
used to specify kfunc arg of type KF_ARG_PTR_TO_CONST_STR. Also, add
documentation for the "__str" annotation.
bpf_get_file_xattr() will be the first kfunc that uses this type.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/bpf/20231107045725.2278852-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Building an arm64 kernel and seftests/bpf with defconfig +
selftests/bpf/config and selftests/bpf/config.aarch64 the fragment
CONFIG_DEBUG_INFO_REDUCED is enabled in arm64's defconfig, it should be
disabled in file sefltests/bpf/config.aarch64 since if its not disabled
CONFIG_DEBUG_INFO_BTF wont be enabled.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231103220912.333930-1-anders.roxell@linaro.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
ARG_PTR_TO_CONST_STR is used to specify constant string args for BPF
helpers. The logic that verifies a reg is ARG_PTR_TO_CONST_STR is
implemented in check_func_arg().
As we introduce kfuncs with constant string args, it is necessary to
do the same check for kfuncs (in check_kfunc_args). Factor out the logic
for ARG_PTR_TO_CONST_STR to a new check_reg_const_str() so that it can be
reused.
check_func_arg() ensures check_reg_const_str() is only called with reg of
type PTR_TO_MAP_VALUE. Add a redundent type check in check_reg_const_str()
to avoid misuse in the future. Other than this redundent check, there is
no change in behavior.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/bpf/20231107045725.2278852-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpftool's man page lists "program" as one of possible values for OBJECT,
while in fact bpftool accepts "prog" instead.
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231103081126.170034-1-asavkov@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Different types of bpf dynptr have different internal data storage.
Specifically, SKB and XDP type of dynptr may have non-continuous data.
Therefore, it is not always safe to directly access dynptr->data.
Add __bpf_dynptr_data and __bpf_dynptr_data_rw to replace direct access to
dynptr->data.
Update bpf_verify_pkcs7_signature to use __bpf_dynptr_data instead of
dynptr->data.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/bpf/20231107045725.2278852-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Those configs are needed to be able to run VM somewhat consistently.
For instance, ATM, s390x is missing the `CONFIG_VIRTIO_CONSOLE` which
prevents s390x kernels built in CI to leverage qemu-guest-agent.
By moving them to `config,vm`, we should have selftest kernels which are
equal in term of VM functionalities when they include this file.
The set of config unabled were picked using
grep -h -E '(_9P|_VIRTIO)' config.x86_64 config | sort | uniq
added to `config.vm` and then
grep -vE '(_9P|_VIRTIO)' config.{x86_64,aarch64,s390x}
as a side-effect, some config may have disappeared to the aarch64 and
s390x kernels, but they should not be needed. CI will tell.
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231031212717.4037892-1-chantr4@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Hou Tao says:
====================
From: Hou Tao <houtao1@huawei.com>
Hi,
BPF CI failed due to map_percpu_stats_percpu_hash from time to time [1].
It seems that the failure reason is per-cpu bpf memory allocator may not
be able to allocate per-cpu pointer successfully and it can not refill
free llist timely, and bpf_map_update_elem() will return -ENOMEM.
Patch #1 fixes the size of value passed to per-cpu map update API. The
problem was found when fixing the ENOMEM problem, so also post it in
this patchset. Patch #2 & #3 mitigates the ENOMEM problem by retrying
the update operation for non-preallocated per-cpu map.
Please see individual patches for more details. And comments are always
welcome.
Regards,
Tao
[1]: https://github.com/kernel-patches/bpf/actions/runs/6713177520/job/18244865326?pr=5909
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Andrii Nakryiko says:
====================
BPF register bounds logic and testing improvements
This patch set adds a big set of manual and auto-generated test cases
validating BPF verifier's register bounds tracking and deduction logic. See
details in the last patch.
We start with building a tester that validates existing <range> vs <scalar>
verifier logic for range bounds. To make all this work, BPF verifier's logic
needed a bunch of improvements to handle some cases that previously were not
covered. This had no implications as to correctness of verifier logic, but it
was incomplete enough to cause significant disagreements with alternative
implementation of register bounds logic that tests in this patch set
implement. So we need BPF verifier logic improvements to make all the tests
pass. This is what we do in patches #3 through #9.
The end goal of this work, though, is to extend BPF verifier range state
tracking such as to allow to derive new range bounds when comparing non-const
registers. There is some more investigative work required to investigate and
fix existing potential issues with range tracking as part of ALU/ALU64
operations, so <range> x <range> part of v5 patch set ([0]) is dropped until
these issues are sorted out.
For now, we include preparatory refactorings and clean ups, that set up BPF
verifier code base to extend the logic to <range> vs <range> logic in
subsequent patch set. Patches #10-#16 perform preliminary refactorings without
functionally changing anything. But they do clean up check_cond_jmp_op() logic
and generalize a bunch of other pieces in is_branch_taken() logic.
[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=797178&state=*
v5->v6:
- dropped <range> vs <range> patches (original patches #18 through #23) to
add more register range sanity checks and fix preexisting issues;
- comments improvements, addressing other feedback on first 17 patches
(Eduard, Alexei);
v4->v5:
- added entirety of verifier reg bounds tracking changes, now handling
<range> vs <range> cases (Alexei);
- added way more comments trying to explain why deductions added are
correct, hopefully they are useful and clarify things a bit (Daniel,
Shung-Hsi);
- added two preliminary selftests fixes necessary for RELEASE=1 build to
work again, it keeps breaking.
v3->v4:
- improvements to reg_bounds tester (progress report, split 32-bit and
64-bit ranges, fix various verbosity output issues, etc);
v2->v3:
- fix a subtle little-endianness assumption inside parge_reg_state() (CI);
v1->v2:
- fix compilation when building selftests with llvm-16 toolchain (CI).
====================
Link: https://lore.kernel.org/r/20231102033759.2541186-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
BPF CI failed due to map_percpu_stats_percpu_hash from time to time [1].
It seems that the failure reason is per-cpu bpf memory allocator may not
be able to allocate per-cpu pointer successfully and it can not refill
free llist timely, and bpf_map_update_elem() will return -ENOMEM.
So mitigate the problem by retrying the update operation for
non-preallocated per-cpu map.
[1]: https://github.com/kernel-patches/bpf/actions/runs/6713177520/job/18244865326?pr=5909
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231101032455.3808547-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Change reg_set_min_max() to take FALSE/TRUE sets of two registers each,
instead of assuming that we are always comparing to a constant. For now
we still assume that right-hand side registers are constants (and make
sure that's the case by swapping src/dst regs, if necessary), but
subsequent patches will remove this limitation.
reg_set_min_max() is now called unconditionally for any register
comparison, so that might include pointer vs pointer. This makes it
consistent with is_branch_taken() generality. But we currently only
support adjustments based on SCALAR vs SCALAR comparisons, so
reg_set_min_max() has to guard itself againts pointers.
Taking two by two registers allows to further unify and simplify
check_cond_jmp_op() logic. We utilize fake register for BPF_K
conditional jump case, just like with is_branch_taken() part.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-18-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Export map_update_retriable() to make it usable for other map_test
cases. These cases may only need retry for specific errno, so add
a new callback parameter to let map_update_retriable() decide whether or
not the errno is retriable.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231101032455.3808547-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Similarly to is_branch_taken()-related refactorings, start preparing
reg_set_min_max() to handle more generic case of two non-const
registers. Start with renaming arguments to accommodate later addition
of second register as an input argument.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-17-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When updating per-cpu map in map_percpu_stats test, patch_map_thread()
only passes 4-bytes-sized value to bpf_map_update_elem(). The expected
size of the value is 8 * num_possible_cpus(), so fix it by passing a
value with enough-size for per-cpu map update.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231101032455.3808547-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Combine 32-bit and 64-bit is_branch_taken logic for SCALAR_VALUE
registers. It makes it easier to see parallels between two domains
(32-bit and 64-bit), and makes subsequent refactoring more
straightforward.
No functional changes.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-16-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make is_branch_taken() a single entry point for branch pruning decision
making, handling both pointer vs pointer, pointer vs scalar, and scalar
vs scalar cases in one place. This also nicely cleans up check_cond_jmp_op().
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-15-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Move is_branch_taken() slightly down. In subsequent patched we'll need
both flip_opcode() and is_pkt_ptr_branch_taken() for is_branch_taken(),
but instead of sprinkling forward declarations around, it makes more
sense to move is_branch_taken() lower below is_pkt_ptr_branch_taken(),
and also keep it closer to very tightly related reg_set_min_max(), as
they are two critical parts of the same SCALAR range tracking logic.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-14-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
While still assuming that second register is a constant, generalize
is_branch_taken-related code to accept two registers instead of register
plus explicit constant value. This also, as a side effect, allows to
simplify check_cond_jmp_op() by unifying BPF_K case with BPF_X case, for
which we use a fake register to represent BPF_K's imm constant as
a register.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231102033759.2541186-13-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Just taking mundane refactoring bits out into a separate patch. No
functional changes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231102033759.2541186-12-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When performing 32-bit conditional operation operating on lower 32 bits
of a full 64-bit register, register full value isn't changed. We just
potentially gain new knowledge about that register's lower 32 bits.
Unfortunately, __reg_combine_{32,64}_into_{64,32} logic that
reg_set_min_max() performs as a last step, can lose information in some
cases due to __mark_reg64_unbounded() and __reg_assign_32_into_64().
That's bad and completely unnecessary. Especially __reg_assign_32_into_64()
looks completely out of place here, because we are not performing
zero-extending subregister assignment during conditional jump.
So this patch replaced __reg_combine_* with just a normal
reg_bounds_sync() which will do a proper job of deriving u64/s64 bounds
from u32/s32, and vice versa (among all other combinations).
__reg_combine_64_into_32() is also used in one more place,
coerce_reg_to_size(), while handling 1- and 2-byte register loads.
Looking into this, it seems like besides marking subregister as
unbounded before performing reg_bounds_sync(), we were also performing
deduction of smin32/smax32 and umin32/umax32 bounds from respective
smin/smax and umin/umax bounds. It's now redundant as reg_bounds_sync()
performs all the same logic more generically (e.g., without unnecessary
assumption that upper 32 bits of full register should be zero).
Long story short, we remove __reg_combine_64_into_32() completely, and
coerce_reg_to_size() now only does resetting subreg to unbounded and then
performing reg_bounds_sync() to recover as much information as possible
from 64-bit umin/umax and smin/smax bounds, set explicitly in
coerce_reg_to_size() earlier.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231102033759.2541186-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
There are cases (caught by subsequent reg_bounds tests in selftests/bpf)
where performing one round of __reg_deduce_bounds() doesn't propagate
all the information from, say, s32 to u32 bounds and than from newly
learned u32 bounds back to u64 and s64. So perform __reg_deduce_bounds()
twice to make sure such derivations are propagated fully after
reg_bounds_sync().
One such example is test `(s64)[0xffffffff00000001; 0] (u64)<
0xffffffff00000000` from selftest patch from this patch set. It demonstrates an
intricate dance of u64 -> s64 -> u64 -> u32 bounds adjustments, which requires
two rounds of __reg_deduce_bounds(). Here are corresponding refinement log from
selftest, showing evolution of knowledge.
REFINING (FALSE R1) (u64)SRC=[0xffffffff00000000; U64_MAX] (u64)DST_OLD=[0; U64_MAX] (u64)DST_NEW=[0xffffffff00000000; U64_MAX]
REFINING (FALSE R1) (u64)SRC=[0xffffffff00000000; U64_MAX] (s64)DST_OLD=[0xffffffff00000001; 0] (s64)DST_NEW=[0xffffffff00000001; -1]
REFINING (FALSE R1) (s64)SRC=[0xffffffff00000001; -1] (u64)DST_OLD=[0xffffffff00000000; U64_MAX] (u64)DST_NEW=[0xffffffff00000001; U64_MAX]
REFINING (FALSE R1) (u64)SRC=[0xffffffff00000001; U64_MAX] (u32)DST_OLD=[0; U32_MAX] (u32)DST_NEW=[1; U32_MAX]
R1 initially has smin/smax set to [0xffffffff00000001; -1], while umin/umax is
unknown. After (u64)< comparison, in FALSE branch we gain knowledge that
umin/umax is [0xffffffff00000000; U64_MAX]. That causes smin/smax to learn that
zero can't happen and upper bound is -1. Then smin/smax is adjusted from
umin/umax improving lower bound from 0xffffffff00000000 to 0xffffffff00000001.
And then eventually umin32/umax32 bounds are drived from umin/umax and become
[1; U32_MAX].
Selftest in the last patch is actually implementing a multi-round fixed-point
convergence logic, but so far all the tests are handled by two rounds of
reg_bounds_sync() on the verifier state, so we keep it simple for now.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a few interesting cases in which we can tighten 64-bit bounds based
on newly learnt information about 32-bit bounds. E.g., when full u64/s64
registers are used in BPF program, and then eventually compared as
u32/s32. The latter comparison doesn't change the value of full
register, but it does impose new restrictions on possible lower 32 bits
of such full registers. And we can use that to derive additional full
register bounds information.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231102033759.2541186-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a special case where we can derive valid s32 bounds from umin/umax
or smin/smax by stitching together negative s32 subrange and
non-negative s32 subrange. That requires upper 32 bits to form a [N, N+1]
range in u32 domain (taking into account wrap around, so 0xffffffff
to 0x00000000 is a valid [N, N+1] range in this sense). See code comment
for concrete examples.
Eduard Zingerman also provided an alternative explanation ([0]) for more
mathematically inclined readers:
Suppose:
. there are numbers a, b, c
. 2**31 <= b < 2**32
. 0 <= c < 2**31
. umin = 2**32 * a + b
. umax = 2**32 * (a + 1) + c
The number of values in the range represented by [umin; umax] is:
. N = umax - umin + 1 = 2**32 + c - b + 1
. min(N) = 2**32 + 0 - (2**32-1) + 1 = 2, with b = 2**32-1, c = 0
. max(N) = 2**32 + (2**31 - 1) - 2**31 + 1 = 2**32, with b = 2**31, c = 2**31-1
Hence [(s32)b; (s32)c] forms a valid range.
[0] https://lore.kernel.org/bpf/d7af631802f0cfae20df77fe70068702d24bbd31.camel@gmail.com/
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Comments in code try to explain the idea behind why this is correct.
Please check the code and comments.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
All the logic that applies to u64 vs s64, equally applies for u32 vs s32
relationships (just taken in a smaller 32-bit numeric space). So do the
same deduction of smin32/smax32 from umin32/umax32, if we can.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add smin/smax derivation from appropriate umin/umax values. Previously the
logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax
(if possible), but not trying to do the same in the other direction. A simple
addition to __reg64_deduce_bounds() fixes this.
Added also generic comment about u64/s64 ranges and their relationship.
Hopefully that helps readers to understand all the bounds deductions
a bit better.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since some malloc calls in bpf_iter may at times fail,
this patch adds the appropriate fail checks, and ensures that
any previously allocated resource is appropriately destroyed
before returning the function.
Signed-off-by: Yuran Pereira <yuran.pereira@hotmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/DB3PR10MB6835F0ECA792265FA41FC39BE8A3A@DB3PR10MB6835.EURPRD10.PROD.OUTLOOK.COM
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Some compilers complain about get_pprint_mapv_size() not returning value
in some code paths. Fix with explicit return.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
As it was pointed out by Yonghong Song [1], in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patch replaces all CHECK calls in bpf_iter with the appropriate
ASSERT_* macros.
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Yuran Pereira <yuran.pereira@hotmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Kui-Feng Lee <thinker.li@gmail.com>
Link: https://lore.kernel.org/r/DB3PR10MB6835E9C8DFCA226DD6FEF914E8A3A@DB3PR10MB6835.EURPRD10.PROD.OUTLOOK.COM
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Compiler complains about malloc(). We also don't need to dynamically
allocate anything, so make the life easier by using statically sized
buffer.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp:
- fix usec timestamps with TCP fastopen
- fix possible out-of-bounds reads in tcp_hash_fail()
- fix SYN option room calculation for TCP-AO
- tcp_sigpool: fix some off by one bugs
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come"
* tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: ti: icss-iep: fix setting counter value
ptp: fix corrupted list in ptp_open
ptp: ptp_read should not release queue
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
net: kcm: fill in MODULE_DESCRIPTION()
net/sched: act_ct: Always fill offloading tuple iifidx
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
drivers/net/ppp: use standard array-copy-function
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
r8169: respect userspace disabling IFF_MULTICAST
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
net: phylink: initialize carrier state at creation
test/vsock: add dobule bind connect test
test/vsock: refactor vsock_accept
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a regression in ahash and hides the Kconfig sub-options for
the jitter RNG"
* tag 'v6.7-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ahash - Set using_shash for cloned ahash wrapper over shash
crypto: jitterentropy - Hide esoteric Kconfig options under FIPS and EXPERT
|