summaryrefslogtreecommitdiff
path: root/tools/testing/selftests/bpf/progs
AgeCommit message (Collapse)Author
2025-02-20Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels (Alan Maguire) - Fix a missing allocation failure check in BPF verifier's acquire_lock_state (Kumar Kartikeya Dwivedi) - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima) - Fix a deadlock when freeing BPF cgroup storage (Abel Wu) - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex (Andrii Nakryiko) - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is accessing skb data not containing an Ethernet header (Shigeru Yoshida) - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai) - Several BPF sockmap fixes to address incorrect TCP copied_seq calculations, which prevented correct data reads from recv(2) in user space (Jiayuan Chen) - Two fixes for BPF map lookup nullness elision (Daniel Xu) - Fix a NULL-pointer dereference from vmlinux BTF lookup in bpf_sk_storage_tracing_allowed (Jared Kangas) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests: bpf: test batch lookup on array of maps with holes bpf: skip non exist keys in generic_map_lookup_batch bpf: Handle allocation failure in acquire_lock_state bpf: verifier: Disambiguate get_constant_map_key() errors bpf: selftests: Test constant key extraction on irrelevant maps bpf: verifier: Do not extract constant map keys for irrelevant maps bpf: Fix softlockup in arena_map_free on 64k page kernel net: Add rx_skb of kfree_skb to raw_tp_null_args[]. bpf: Fix deadlock when freeing cgroup storage selftests/bpf: Add strparser test for bpf selftests/bpf: Fix invalid flag of recv() bpf: Disable non stream socket for strparser bpf: Fix wrong copied_seq calculation strparser: Add read_sock callback bpf: avoid holding freeze_mutex during mmap operation bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic selftests/bpf: Adjust data size to have ETH_HLEN bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type() bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-07bpf: selftests: Test constant key extraction on irrelevant mapsDaniel Xu
Test that very high constant map keys are not interpreted as an error value by the verifier. This would previously fail. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/c0590b62eb9303f389b2f52c0c7e9cf22a358a30.1738689872.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-30Merge tag 'pull-revalidate' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs d_revalidate updates from Al Viro: "Provide stable parent and name to ->d_revalidate() instances Most of the filesystem methods where we care about dentry name and parent have their stability guaranteed by the callers; ->d_revalidate() is the major exception. It's easy enough for callers to supply stable values for expected name and expected parent of the dentry being validated. That kills quite a bit of boilerplate in ->d_revalidate() instances, along with a bunch of races where they used to access ->d_name without sufficient precautions" * tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: fix ->rename_sem exclusion orangefs_d_revalidate(): use stable parent inode and name passed by caller ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller nfs: fix ->d_revalidate() UAF on ->d_name accesses nfs{,4}_lookup_validate(): use stable parent inode passed by caller gfs2_drevalidate(): use stable parent inode and name passed by caller fuse_dentry_revalidate(): use stable parent inode and name passed by caller vfat_revalidate{,_ci}(): use stable parent inode passed by caller exfat_d_revalidate(): use stable parent inode passed by caller fscrypt_d_revalidate(): use stable parent inode passed by caller ceph_d_revalidate(): propagate stable name down into request encoding ceph_d_revalidate(): use stable parent inode passed by caller afs_d_revalidate(): use stable name and parent inode passed by caller Pass parent directory inode and expected name to ->d_revalidate() generic_ci_d_compare(): use shortname_storage ext4 fast_commit: make use of name_snapshot primitives dissolve external_name.u into separate members make take_dentry_name_snapshot() lockless dcache: back inline names with a struct-wrapped array of unsigned long make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-29selftests/bpf: Add strparser test for bpfJiayuan Chen
Add test cases for bpf + strparser and separated them from sockmap_basic, as strparser has more encapsulation and parsing capabilities compared to standard sockmap. Signed-off-by: Jiayuan Chen <mrpre@163.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://patch.msgid.link/20250122100917.49845-6-mrpre@163.com
2025-01-23Merge tag 'bpf-next-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "A smaller than usual release cycle. The main changes are: - Prepare selftest to run with GCC-BPF backend (Ihor Solodrai) In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile only mode. Half of the tests are failing, since support for btf_decl_tag is still WIP, but this is a great milestone. - Convert various samples/bpf to selftests/bpf/test_progs format (Alexis Lothoré and Bastien Curutchet) - Teach verifier to recognize that array lookup with constant in-range index will always succeed (Daniel Xu) - Cleanup migrate disable scope in BPF maps (Hou Tao) - Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao) - Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin KaFai Lau) - Refactor verifier lock support (Kumar Kartikeya Dwivedi) This is a prerequisite for upcoming resilient spin lock. - Remove excessive 'may_goto +0' instructions in the verifier that LLVM leaves when unrolls the loops (Yonghong Song) - Remove unhelpful bpf_probe_write_user() warning message (Marco Elver) - Add fd_array_cnt attribute for prog_load command (Anton Protopopov) This is a prerequisite for upcoming support for static_branch" * tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits) selftests/bpf: Add some tests related to 'may_goto 0' insns bpf: Remove 'may_goto 0' instruction in opt_remove_nops() bpf: Allow 'may_goto 0' instruction in verifier selftests/bpf: Add test case for the freeing of bpf_timer bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT bpf: Free element after unlock in __htab_map_lookup_and_delete_elem() bpf: Bail out early in __htab_map_lookup_and_delete_elem() bpf: Free special fields after unlock in htab_lru_map_delete_node() tools: Sync if_xdp.h uapi tooling header libbpf: Work around kernel inconsistently stripping '.llvm.' suffix bpf: selftests: verifier: Add nullness elision tests bpf: verifier: Support eliding map lookup nullness bpf: verifier: Refactor helper access type tracking bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write bpf: verifier: Add missing newline on verbose() call selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED libbpf: Fix return zero when elf_begin failed selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test veristat: Load struct_ops programs only once ...
2025-01-20selftests/bpf: Add some tests related to 'may_goto 0' insnsYonghong Song
Add both asm-based and C-based tests which have 'may_goto 0' insns. For the following code in C-based test, int i, tmp[3]; for (i = 0; i < 3 && can_loop; i++) tmp[i] = 0; The clang compiler (clang 19 and 20) generates may_goto 2 may_goto 1 may_goto 0 r1 = 0 r2 = 0 r3 = 0 The above asm codes are due to llvm pass SROAPass. This ensures the successful verification since tmp[0-2] are initialized. Otherwise, the code without SROAPass like may_goto 5 r1 = 0 may_goto 3 r2 = 0 may_goto 1 r3 = 0 will have verification failure. Although from the source code C-based test should have verification failure, clang compiler optimization generates code with successful verification. If gcc generates different asm codes than clang, the following code can be used for gcc: int i, tmp[3]; for (i = 0; i < 3; i++) tmp[i] = 0; Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250118192034.2124952-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20selftests/bpf: Add test case for the freeing of bpf_timerHou Tao
The main purpose of the test is to demonstrate the lock problem for the free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel() will try to acquire a spin-lock (namely softirq_expiry_lock), however the freeing procedure has already held a raw-spin-lock. The test first creates two threads: one to start timers and the other to free timers. The start-timers thread will start the timer and then wake up the free-timers thread to free these timers when the starts complete. After freeing, the free-timer thread will wake up the start-timer thread to complete the current iteration. A loop of 10 iterations is used. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250117101816.2101857-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-17dcache: back inline names with a struct-wrapped array of unsigned longAl Viro
... so that they can be copied with struct assignment (which generates better code) and accessed word-by-word. The type is union shortname_storage; it's a union of arrays of unsigned char and unsigned long. struct name_snapshot.inline_name turned into union shortname_storage; users (all in fs/dcache.c) adjusted. struct dentry.d_iname has some users outside of fs/dcache.c; to reduce the amount of noise in commit, it is replaced with union shortname_storage d_shortname and d_iname is turned into a macro that expands to d_shortname.string (similar to d_lock handling). That compat macro is temporary - most of the remaining instances will be taken out by debugfs series, and once that is merged and few others are taken care of this will go away. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-16bpf: selftests: verifier: Add nullness elision testsDaniel Xu
Test that nullness elision works for common use cases. For example, we want to check that both constant scalar spills and STACK_ZERO functions. As well as when there's both const and non-const values of R2 leading up to a lookup. And obviously some bound checks. Particularly tricky are spills both smaller or larger than key size. For smaller, we need to ensure verifier doesn't let through a potential read into unchecked bytes. For larger, endianness comes into play, as the native endian value tracked in the verifier may not be the bytes the kernel would have read out of the key pointer. So check that we disallow both. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/f1dacaa777d4516a5476162e0ea549f7c3354d73.1736886479.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16bpf: verifier: Support eliding map lookup nullnessDaniel Xu
This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check. This is useful for two reasons: 1. Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ. 2. It forms a tighter contract between programmer and verifier. For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well. For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic. Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/68f3ea96ff3809a87e502a11a4bd30177fc5823e.1736886479.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16bpf: verifier: Refactor helper access type trackingDaniel Xu
Previously, the verifier was treating all PTR_TO_STACK registers passed to a helper call as potentially written to by the helper. However, all calls to check_stack_range_initialized() already have precise access type information available. Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass enum bpf_access_type to check_stack_range_initialized() to more precisely track helper arguments. One benefit from this precision is that registers tracked as valid spills and passed as a read-only helper argument remain tracked after the call. Rather than being marked STACK_MISC afterwards. An additional benefit is the verifier logs are also more precise. For this particular error, users will enjoy a slightly clearer message. See included selftest updates for examples. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/ff885c0e5859e0cd12077c3148ff0754cad4f7ed.1736886479.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-15selftests/bpf: Fix test_xdp_adjust_tail_grow2 selftest on powerpcSaket Kumar Bhaskar
On powerpc cache line size is 128 bytes, so skb_shared_info must be aligned accordingly. Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250110103109.3670793-1-skb99@linux.ibm.com
2025-01-10selftests/bpf: Migrate test_xdp_redirect.c to test_xdp_do_redirect.cBastien Curutchet (eBPF Foundation)
prog_tests/xdp_do_redirect.c is the only user of the BPF programs located in progs/test_xdp_do_redirect.c and progs/test_xdp_redirect.c. There is no need to keep both files with such close names. Move test_xdp_redirect.c contents to test_xdp_do_redirect.c and remove progs/test_xdp_redirect.c Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250110-xdp_redirect-v2-3-b8f3ae53e894@bootlin.com
2025-01-10selftests/bpf: test_xdp_redirect: Rename BPF sectionsBastien Curutchet (eBPF Foundation)
SEC("redirect_to_111") and SEC("redirect_to_222") can't be loaded by the __load() helper. Rename both sections SEC("xdp") so it can be interpreted by the __load() helper in upcoming patch. Update the test_xdp_redirect.sh to use the program name instead of the section name to load the BPF program. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://patch.msgid.link/20250110-xdp_redirect-v2-1-b8f3ae53e894@bootlin.com
2025-01-08selftests/bpf: Add kprobe session recursion check testJiri Olsa
Adding kprobe.session probe to bpf_kfunc_common_test that misses bpf program execution due to recursion check and making sure it increases the program missed count properly. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250106175048.1443905-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-07Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2025-01-07 We've added 7 non-merge commits during the last 32 day(s) which contain a total of 11 files changed, 190 insertions(+), 103 deletions(-). The main changes are: 1) Migrate the test_xdp_meta.sh BPF selftest into test_progs framework, from Bastien Curutchet. 2) Add ability to configure head/tailroom for netkit devices, from Daniel Borkmann. 3) Fixes and improvements to the xdp_hw_metadata selftest, from Song Yoong Siang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Extend netkit tests to validate set {head,tail}room netkit: Add add netkit {head,tail}room to rt_link.yaml netkit: Allow for configuring needed_{head,tail}room selftests/bpf: Migrate test_xdp_meta.sh into xdp_context_test_run.c selftests/bpf: test_xdp_meta: Rename BPF sections selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata ==================== Link: https://patch.msgid.link/20250107130908.143644-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-06selftests/bpf: test bpf_for within spin lock sectionEmil Tsalapatis
Add a selftest to ensure BPF for loops within critical sections are accepted by the verifier. Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250104202528.882482-3-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-06selftests/bpf: Extend netkit tests to validate set {head,tail}roomDaniel Borkmann
Extend the netkit selftests to specify and validate the {head,tail}room on the netdevice: # ./vmtest.sh -- ./test_progs -t netkit [...] ./test_progs -t netkit [ 1.174147] bpf_testmod: loading out-of-tree module taints kernel. [ 1.174585] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.422307] tsc: Refined TSC clocksource calibration: 3407.983 MHz [ 1.424511] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc3e5084, max_idle_ns: 440795359833 ns [ 1.428092] clocksource: Switched to clocksource tsc #363 tc_netkit_basic:OK #364 tc_netkit_device:OK #365 tc_netkit_multi_links:OK #366 tc_netkit_multi_opts:OK #367 tc_netkit_neigh_links:OK #368 tc_netkit_pkt_type:OK #369 tc_netkit_scrub:OK Summary: 7/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/bpf/20241220234658.490686-3-daniel@iogearbox.net
2024-12-30selftests/bpf: Add testcases for BPF_MULMatan Shachnai
The previous commit improves precision of BPF_MUL. Add tests to exercise updated BPF_MUL. Signed-off-by: Matan Shachnai <m.shachnai@gmail.com> Link: https://lore.kernel.org/r/20241218032337.12214-3-m.shachnai@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-20selftests/bpf: Test bpf_skb_change_tail() in TC ingressCong Wang
Similarly to the previous test, we also need a test case to cover positive offsets as well, TC is an excellent hook for this. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Zijian Zhang <zijianzhang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-5-xiyou.wangcong@gmail.com
2024-12-20selftests/bpf: Add a BPF selftest for bpf_skb_change_tail()Cong Wang
As requested by Daniel, we need to add a selftest to cover bpf_skb_change_tail() cases in skb_verdict. Here we test trimming, growing and error cases, and validate its expected return values and the expected sizes of the payload. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-3-xiyou.wangcong@gmail.com
2024-12-16selftests/bpf: test_xdp_meta: Rename BPF sectionsBastien Curutchet
SEC("t") and SEC("x") can't be loaded by the __load() helper. Rename these sections SEC("tc") and SEC("xdp") so they can be interpreted by the __load() helper in upcoming patch. Update the test_xdp_meta.sh to fit these new names. Signed-off-by: Bastien Curutchet <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20241213-xdp_meta-v2-1-634582725b90@bootlin.com
2024-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov
Cross-merge bpf fixes after downstream PR. No conflicts. Adjacent changes in: Auto-merging include/linux/bpf.h Auto-merging include/linux/bpf_verifier.h Auto-merging kernel/bpf/btf.c Auto-merging kernel/bpf/verifier.c Auto-merging kernel/trace/bpf_trace.c Auto-merging tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13selftests/bpf: Add tests for raw_tp NULL argsKumar Kartikeya Dwivedi
Add tests to ensure that arguments are correctly marked based on their specified positions, and whether they get marked correctly as maybe null. For modules, all tracepoint parameters should be marked PTR_MAYBE_NULL by default. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13bpf: Augment raw_tp arguments with PTR_MAYBE_NULLKumar Kartikeya Dwivedi
Arguments to a raw tracepoint are tagged as trusted, which carries the semantics that the pointer will be non-NULL. However, in certain cases, a raw tracepoint argument may end up being NULL. More context about this issue is available in [0]. Thus, there is a discrepancy between the reality, that raw_tp arguments can actually be NULL, and the verifier's knowledge, that they are never NULL, causing explicit NULL check branch to be dead code eliminated. A previous attempt [1], i.e. the second fixed commit, was made to simulate symbolic execution as if in most accesses, the argument is a non-NULL raw_tp, except for conditional jumps. This tried to suppress branch prediction while preserving compatibility, but surfaced issues with production programs that were difficult to solve without increasing verifier complexity. A more complete discussion of issues and fixes is available at [2]. Fix this by maintaining an explicit list of tracepoints where the arguments are known to be NULL, and mark the positional arguments as PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments are known to be ERR_PTR, and mark these arguments as scalar values to prevent potential dereference. Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2), shifted by the zero-indexed argument number x 4. This can be represented as follows: 1st arg: 0x1 2nd arg: 0x10 3rd arg: 0x100 ... and so on (likewise for ERR_PTR case). In the future, an automated pass will be used to produce such a list, or insert __nullable annotations automatically for tracepoints. Each compilation unit will be analyzed and results will be collated to find whether a tracepoint pointer is definitely not null, maybe null, or an unknown state where verifier conservatively marks it PTR_MAYBE_NULL. A proof of concept of this tool from Eduard is available at [3]. Note that in case we don't find a specification in the raw_tp_null_args array and the tracepoint belongs to a kernel module, we will conservatively mark the arguments as PTR_MAYBE_NULL. This is because unlike for in-tree modules, out-of-tree module tracepoints may pass NULL freely to the tracepoint. We don't protect against such tracepoints passing ERR_PTR (which is uncommon anyway), lest we mark all such arguments as SCALAR_VALUE. While we are it, let's adjust the test raw_tp_null to not perform dereference of the skb->mark, as that won't be allowed anymore, and make it more robust by using inline assembly to test the dead code elimination behavior, which should still stay the same. [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com [3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Co-developed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"Kumar Kartikeya Dwivedi
This patch reverts commit cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The patch was well-intended and meant to be as a stop-gap fixing branch prediction when the pointer may actually be NULL at runtime. Eventually, it was supposed to be replaced by an automated script or compiler pass detecting possibly NULL arguments and marking them accordingly. However, it caused two main issues observed for production programs and failed to preserve backwards compatibility. First, programs relied on the verifier not exploring == NULL branch when pointer is not NULL, thus they started failing with a 'dereference of scalar' error. Next, allowing raw_tp arguments to be modified surfaced the warning in the verifier that warns against reg->off when PTR_MAYBE_NULL is set. More information, context, and discusson on both problems is available in [0]. Overall, this approach had several shortcomings, and the fixes would further complicate the verifier's logic, and the entire masking scheme would have to be removed eventually anyway. Hence, revert the patch in preparation of a better fix avoiding these issues to replace this commit. [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com Reported-by: Manu Bretelle <chantra@meta.com> Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13selftest/bpf: Replace magic constants by macrosAnton Protopopov
Replace magic constants in a BTF structure initialization code by proper macros, as is done in other similar selftests. Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-8-aspsk@isovalent.com
2024-12-12selftests/bpf: Add test for narrow ctx load for pointer argsKumar Kartikeya Dwivedi
Ensure that performing narrow ctx loads other than size == 8 are rejected when the argument is a pointer type. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241212092050.3204165-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-12bpf: Check size for BTF-based ctx access of pointer membersKumar Kartikeya Dwivedi
Robert Morris reported the following program type which passes the verifier in [0]: SEC("struct_ops/bpf_cubic_init") void BPF_PROG(bpf_cubic_init, struct sock *sk) { asm volatile("r2 = *(u16*)(r1 + 0)"); // verifier should demand u64 asm volatile("*(u32 *)(r2 +1504) = 0"); // 1280 in some configs } The second line may or may not work, but the first instruction shouldn't pass, as it's a narrow load into the context structure of the struct ops callback. The code falls back to btf_ctx_access to ensure correctness and obtaining the types of pointers. Ensure that the size of the access is correctly checked to be 8 bytes, otherwise the verifier thinks the narrow load obtained a trusted BTF pointer and will permit loads/stores as it sees fit. Perform the check on size after we've verified that the load is for a pointer field, as for scalar values narrow loads are fine. Access to structs passed as arguments to a BPF program are also treated as scalars, therefore no adjustment is needed in their case. Existing verifier selftests are broken by this change, but because they were incorrect. Verifier tests for d_path were performing narrow load into context to obtain path pointer, had this program actually run it would cause a crash. The same holds for verifier_btf_ctx_access tests. [0]: https://lore.kernel.org/bpf/51338.1732985814@localhost Fixes: 9e15db66136a ("bpf: Implement accurate raw_tp context access via BTF") Reported-by: Robert Morris <rtm@mit.edu> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241212092050.3204165-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-12selftests/bpf: extend changes_pkt_data with cases w/o subprogramsEduard Zingerman
Extend changes_pkt_data tests with test cases freplacing the main program that does not have subprograms. Try four combinations when both main program and replacement do and do not change packet data. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241212070711.427443-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-10selftests/bpf: validate that tail call invalidates packet pointersEduard Zingerman
Add a test case with a tail call done from a global sub-program. Such tails calls should be considered as invalidating packet pointers. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241210041100.1898468-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-10bpf: consider that tail calls invalidate packet pointersEduard Zingerman
Tail-called programs could execute any of the helpers that invalidate packet pointers. Hence, conservatively assume that each tail call invalidates packet pointers. Making the change in bpf_helper_changes_pkt_data() automatically makes use of check_cfg() logic that computes 'changes_pkt_data' effect for global sub-programs, such that the following program could be rejected: int tail_call(struct __sk_buff *sk) { bpf_tail_call_static(sk, &jmp_table, 0); return 0; } SEC("tc") int not_safe(struct __sk_buff *sk) { int *p = (void *)(long)sk->data; ... make p valid ... tail_call(sk); *p = 42; /* this is unsafe */ ... } The tc_bpf2bpf.c:subprog_tc() needs change: mark it as a function that can invalidate packet pointers. Otherwise, it can't be freplaced with tailcall_freplace.c:entry_freplace() that does a tail call. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241210041100.1898468-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-10selftests/bpf: freplace tests for tracking of changes_packet_dataEduard Zingerman
Try different combinations of global functions replacement: - replace function that changes packet data with one that doesn't; - replace function that changes packet data with one that does; - replace function that doesn't change packet data with one that does; - replace function that doesn't change packet data with one that doesn't; Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241210041100.1898468-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-10selftests/bpf: test for changing packet data from global functionsEduard Zingerman
Check if verifier is aware of packet pointers invalidation done in global functions. Based on a test shared by Nick Zavaritsky in [0]. [0] https://lore.kernel.org/bpf/0498CA22-5779-4767-9C0C-A9515CEA711F@gmail.com/ Suggested-by: Nick Zavaritsky <mejedi@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241210041100.1898468-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-09selftests/bpf: Fix fill_link_info selftest on powerpcSaket Kumar Bhaskar
With CONFIG_KPROBES_ON_FTRACE enabled on powerpc, ftrace_location_range returns ftrace location for bpf_fentry_test1 at offset of 4 bytes from function entry. This is because branch to _mcount function is at offset of 4 bytes in function profile sequence. To fix this, add entry_offset of 4 bytes while verifying the address for kprobe entry address of bpf_fentry_test1 in verify_perf_link_info in selftest, when CONFIG_KPROBES_ON_FTRACE is enabled. Disassemble of bpf_fentry_test1: c000000000e4b080 <bpf_fentry_test1>: c000000000e4b080: a6 02 08 7c mflr r0 c000000000e4b084: b9 e2 22 4b bl c00000000007933c <_mcount> c000000000e4b088: 01 00 63 38 addi r3,r3,1 c000000000e4b08c: b4 07 63 7c extsw r3,r3 c000000000e4b090: 20 00 80 4e blr When CONFIG_PPC_FTRACE_OUT_OF_LINE [1] is enabled, these function profile sequence is moved out of line with an unconditional branch at offset 0. So, the test works without altering the offset for 'CONFIG_KPROBES_ON_FTRACE && CONFIG_PPC_FTRACE_OUT_OF_LINE' case. Disassemble of bpf_fentry_test1: c000000000f95190 <bpf_fentry_test1>: c000000000f95190: 00 00 00 60 nop c000000000f95194: 01 00 63 38 addi r3,r3,1 c000000000f95198: b4 07 63 7c extsw r3,r3 c000000000f9519c: 20 00 80 4e blr [1] https://lore.kernel.org/all/20241030070850.1361304-13-hbathini@linux.ibm.com/ Fixes: 23cf7aa539dc ("selftests/bpf: Add selftest for fill_link_info") Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241209065720.234344-1-skb99@linux.ibm.com
2024-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov
Cross-merge bpf fixes after downstream PR. Trivial conflict: tools/testing/selftests/bpf/prog_tests/verifier.c Adjacent changes in: Auto-merging kernel/bpf/verifier.c Auto-merging samples/bpf/Makefile Auto-merging tools/testing/selftests/bpf/.gitignore Auto-merging tools/testing/selftests/bpf/Makefile Auto-merging tools/testing/selftests/bpf/prog_tests/verifier.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-06selftests/bpf: Consolidate kernel modules into common directoryToke Høiland-Jørgensen
The selftests build four kernel modules which use copy-pasted Makefile targets. This is a bit messy, and doesn't scale so well when we add more modules, so let's consolidate these rules into a single rule generated for each module name, and move the module sources into a single directory. To avoid parallel builds of the different modules stepping on each other's toes during the 'modpost' phase of the Kbuild 'make modules', the module files should really be a grouped target. However, make only added explicit support for grouped targets in version 4.3, which is newer than the minimum version supported by the kernel. However, make implicitly treats pattern matching rules with multiple targets as a grouped target, so we can work around this by turning the rule into a pattern matching target. We do this by replacing '.ko' with '%ko' in the targets with subst(). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Viktor Malik <vmalik@redhat.com> Link: https://lore.kernel.org/bpf/20241204-bpf-selftests-mod-compile-v5-1-b96231134a49@redhat.com
2024-12-04selftests/bpf: Add test for narrow spill into 64-bit spilled scalarKumar Kartikeya Dwivedi
Add a test case to verify that without CAP_PERFMON, the test now succeeds instead of failing due to a verification error. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204044757.1483141-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04selftests/bpf: Add test for reading from STACK_INVALID slotsKumar Kartikeya Dwivedi
Ensure that when CAP_PERFMON is dropped, and the verifier sees allow_ptr_leaks as false, we are not permitted to read from a STACK_INVALID slot. Without the fix, the test will report unexpected success in loading. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204044757.1483141-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04selftests/bpf: Introduce __caps_unpriv annotation for testsEduard Zingerman
Add a __caps_unpriv annotation so that tests requiring specific capabilities while dropping the rest can conveniently specify them during selftest declaration instead of munging with capabilities at runtime from the testing binary. While at it, let us convert test_verifier_mtu to use this new support instead. Since we do not want to include linux/capability.h, we only defined the four main capabilities BPF subsystem deals with in bpf_misc.h for use in tests. If the user passes a CAP_SYS_NICE or anything else that's not defined in the header, capability parsing code will return a warning. Also reject strtol returning 0. CAP_CHOWN = 0 but we'll never need to use it, and strtol doesn't errno on failed conversion. Fail the test in such a case. The original diff for this idea is available at link [0]. [0]: https://lore.kernel.org/bpf/a1e48f5d9ae133e19adc6adf27e19d585e06bab4.camel@gmail.com Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> [ Kartikeya: rebase on bpf-next, add warn to parse_caps, convert test_verifier_mtu ] Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204044757.1483141-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04selftests/bpf: Add IRQ save/restore testsKumar Kartikeya Dwivedi
Include tests that check for rejection in erroneous cases, like unbalanced IRQ-disabled counts, within and across subprogs, invalid IRQ flag state or input to kfuncs, behavior upon overwriting IRQ saved state on stack, interaction with sleepable kfuncs/helpers, global functions, and out of order restore. Include some success scenarios as well to demonstrate usage. #128/1 irq/irq_save_bad_arg:OK #128/2 irq/irq_restore_bad_arg:OK #128/3 irq/irq_restore_missing_2:OK #128/4 irq/irq_restore_missing_3:OK #128/5 irq/irq_restore_missing_3_minus_2:OK #128/6 irq/irq_restore_missing_1_subprog:OK #128/7 irq/irq_restore_missing_2_subprog:OK #128/8 irq/irq_restore_missing_3_subprog:OK #128/9 irq/irq_restore_missing_3_minus_2_subprog:OK #128/10 irq/irq_balance:OK #128/11 irq/irq_balance_n:OK #128/12 irq/irq_balance_subprog:OK #128/13 irq/irq_global_subprog:OK #128/14 irq/irq_restore_ooo:OK #128/15 irq/irq_restore_ooo_3:OK #128/16 irq/irq_restore_3_subprog:OK #128/17 irq/irq_restore_4_subprog:OK #128/18 irq/irq_restore_ooo_3_subprog:OK #128/19 irq/irq_restore_invalid:OK #128/20 irq/irq_save_invalid:OK #128/21 irq/irq_restore_iter:OK #128/22 irq/irq_save_iter:OK #128/23 irq/irq_flag_overwrite:OK #128/24 irq/irq_flag_overwrite_partial:OK #128/25 irq/irq_ooo_refs_array:OK #128/26 irq/irq_sleepable_helper:OK #128/27 irq/irq_sleepable_kfunc:OK #128 irq:OK Summary: 1/27 PASSED, 0 SKIPPED, 0 FAILED Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04selftests/bpf: Expand coverage of preempt tests to sleepable kfuncKumar Kartikeya Dwivedi
For preemption-related kfuncs, we don't test their interaction with sleepable kfuncs (we do test helpers) even though the verifier has code to protect against such a pattern. Expand coverage of the selftest to include this case. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04bpf: Improve verifier log for resource leak on exitKumar Kartikeya Dwivedi
The verifier log when leaking resources on BPF_EXIT may be a bit confusing, as it's a problem only when finally existing from the main prog, not from any of the subprogs. Hence, update the verifier error string and the corresponding selftests matching on it. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02bpf: Zero index arg error string for dynptr and iterKumar Kartikeya Dwivedi
Andrii spotted that process_dynptr_func's rejection of incorrect argument register type will print an error string where argument numbers are not zero-indexed, unlike elsewhere in the verifier. Fix this by subtracting 1 from regno. The same scenario exists for iterator messages. Fix selftest error strings that match on the exact argument number while we're at it to ensure clean bisection. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241203002235.3776418-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02selftests/bpf: Add tests for iter arg checkKumar Kartikeya Dwivedi
Add selftests to cover argument type check for iterator kfuncs, and cover all three kinds (new, next, destroy). Without the fix in the previous patch, the selftest would not cause a verifier error. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241203000238.3602922-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02bpf: Ensure reg is PTR_TO_STACK in process_iter_argTao Lyu
Currently, KF_ARG_PTR_TO_ITER handling missed checking the reg->type and ensuring it is PTR_TO_STACK. Instead of enforcing this in the caller of process_iter_arg, move the check into it instead so that all callers will gain the check by default. This is similar to process_dynptr_func. An existing selftest in verifier_bits_iter.c fails due to this change, but it's because it was passing a NULL pointer into iter_next helper and getting an error further down the checks, but probably meant to pass an uninitialized iterator on the stack (as is done in the subsequent test below it). We will gain coverage for non-PTR_TO_STACK arguments in later patches hence just change the declaration to zero-ed stack object. Fixes: 06accc8779c1 ("bpf: add support for open-coded iterator loops") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tao Lyu <tao.lyu@epfl.ch> [ Kartikeya: move check into process_iter_arg, rewrite commit log ] Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241203000238.3602922-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02selftests/bpf: add cgroup skb direct packet access testMahe Tardy
This verifies that programs of BPF_PROG_TYPE_CGROUP_SKB can access skb->data_end with direct packet access when being run with BPF_PROG_TEST_RUN. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com> Link: https://lore.kernel.org/r/20241125152603.375898-2-mahe.tardy@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02selftests/bpf: use the same udp and tcp headers in tests under test_progsAlexis Lothoré (eBPF Foundation)
Trying to add udp-dedicated helpers in network_helpers involves including some udp header, which makes multiple test_progs tests build fail: In file included from ./progs/test_cls_redirect.h:13, from [...]/prog_tests/cls_redirect.c:15: [...]/usr/include/linux/udp.h:23:8: error: redefinition of ‘struct udphdr’ 23 | struct udphdr { | ^~~~~~ In file included from ./network_helpers.h:17, from [...]/prog_tests/cls_redirect.c:13: [...]/usr/include/netinet/udp.h:55:8: note: originally defined here 55 | struct udphdr | ^~~~~~ This error is due to struct udphdr being defined in both <linux/udp.h> and <netinet/udp.h>. Use only <netinet/udp.h> in every test. While at it, perform the same for tcp.h. For some tests, the change needs to be done in the eBPF program part as well, because of some headers sharing between both sides. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/r/20241120-flow_dissector-v3-11-45b46494f937@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-26selftests/bpf: Check for PREEMPTION instead of PREEMPTSebastian Andrzej Siewior
CONFIG_PREEMPT is a preemtion model the so called "Low-Latency Desktop". A different preemption model is PREEMPT_RT the so called "Real-Time". Both implement preemption in kernel and set CONFIG_PREEMPTION. There is also the so called "LAZY PREEMPT" which the "Scheduler controlled preemption model". Here we have also preemption in the kernel the rules are slightly different. Therefore the testsuite should not check for CONFIG_PREEMPT (as one model) but for CONFIG_PREEMPTION to figure out if preemption in the kernel is possible. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20241119161819.qvEcs-n_@linutronix.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-21Merge tag 'net-next-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most significant set of changes is the per netns RTNL. The new behavior is disabled by default, regression risk should be contained. Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its default value from PTP_1588_CLOCK_KVM, as the first is intended to be a more reliable replacement for the latter. Core: - Started a very large, in-progress, effort to make the RTNL lock scope per network-namespace, thus reducing the lock contention significantly in the containerized use-case, comprising: - RCU-ified some relevant slices of the FIB control path - introduce basic per netns locking helpers - namespacified the IPv4 address hash table - remove rtnl_register{,_module}() in favour of rtnl_register_many() - refactor rtnl_{new,del,set}link() moving as much validation as possible out of RTNL lock - convert all phonet doit() and dumpit() handlers to RCU - convert IPv4 addresses manipulation to per-netns RTNL - convert virtual interface creation to per-netns RTNL the per-netns lock infrastructure is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim. - Introduce NAPI suspension, to efficiently switching between busy polling (NAPI processing suspended) and normal processing. - Migrate the IPv4 routing input, output and control path from direct ToS usage to DSCP macros. This is a work in progress to make ECN handling consistent and reliable. - Add drop reasons support to the IPv4 rotue input path, allowing better introspection in case of packets drop. - Make FIB seqnum lockless, dropping RTNL protection for read access. - Make inet{,v6} addresses hashing less predicable. - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets and timestamps Things we sprinkled into general kernel code: - Add small file operations for debugfs, to reduce the struct ops size. - Refactoring and optimization for the implementation of page_frag API, This is a preparatory work to consolidate the page_frag implementation. Netfilter: - Optimize set element transactions to reduce memory consumption - Extended netlink error reporting for attribute parser failure. - Make legacy xtables configs user selectable, giving users the option to configure iptables without enabling any other config. - Address a lot of false-positive RCU issues, pointed by recent CI improvements. BPF: - Put xsk sockets on a struct diet and add various cleanups. Overall, this helps to bump performance by 12% for some workloads. - Extend BPF selftests to increase coverage of XDP features in combination with BPF cpumap. - Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it. - Extend netkit with an option to delegate skb->{mark,priority} scrubbing to its BPF program. - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs. Protocols: - Introduces 4-tuple hash for connected udp sockets, speeding-up significantly connected sockets lookup. - Add a fastpath for some TCP timers that usually expires after close, the socket lock contention. - Add inbound and outbound xfrm state caches to speed up state lookups. - Avoid sending MPTCP advertisements on stale subflows, reducing risks on loosing them. - Make neighbours table flushing more scalable, maintaining per device neigh lists. Driver API: - Introduce a unified interface to configure transmission H/W shaping, and expose it to user-space via generic-netlink. - Add support for per-NAPI config via netlink. This makes napi configuration persistent across queues removal and re-creation. Requires driver updates, currently supported drivers are: nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice. - Add ethtool support for writing SFP / PHY firmware blocks. - Track RSS context allocation from ethtool core. - Implement support for mirroring to DSA CPU port, via TC mirror offload. - Consolidate FDB updates notification, to avoid duplicates on device-specific entries. - Expose DPLL clock quality level to the user-space. - Support master-slave PHY config via device tree. Tests and tooling: - forwarding: introduce deferred commands, to simplify the cleanup phase Drivers: - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic, Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the IRQs and queues to NAPI IDs, allowing busy polling and better introspection. - Ethernet high-speed NICs: - nVidia/Mellanox: - mlx5: - a large refactor to implement support for cross E-Switch scheduling - refactor H/W conter management to let it scale better - H/W GRO cleanups - Intel (100G, ice):: - add support for ethtool reset - implement support for per TX queue H/W shaping - AMD/Solarflare: - implement per device queue stats support - Broadcom (bnxt): - improve wildcard l4proto on IPv4/IPv6 ntuple rules - Marvell Octeon: - Add representor support for each Resource Virtualization Unit (RVU) device. - Hisilicon: - add support for the BMC Gigabit Ethernet - IBM (EMAC): - driver cleanup and modernization - Cisco (VIC): - raise the queues number limit to 256 - Ethernet virtual: - Google vNIC: - implement page pool support - macsec: - inherit lower device's features and TSO limits when offloading - virtio_net: - enable premapped mode by default - support for XDP socket(AF_XDP) zerocopy TX - wireguard: - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger packets. - Ethernet NICs embedded and virtual: - Broadcom ASP: - enable software timestamping - Freescale: - add enetc4 PF driver - MediaTek: Airoha SoC: - implement BQL support - RealTek r8169: - enable TSO by default on r8168/r8125 - implement extended ethtool stats - Renesas AVB: - enable TX checksum offload - Synopsys (stmmac): - support header splitting for vlan tagged packets - move common code for DWMAC4 and DWXGMAC into a separate FPE module. - add dwmac driver support for T-HEAD TH1520 SoC - Synopsys (xpcs): - driver refactor and cleanup - TI: - icssg_prueth: add VLAN offload support - Xilinx emaclite: - add clock support - Ethernet switches: - Microchip: - implement support for the lan969x Ethernet switch family - add LAN9646 switch support to KSZ DSA driver - Ethernet PHYs: - Marvel: 88q2x: enable auto negotiation - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2 - PTP: - Add support for the Amazon virtual clock device - Add PtP driver for s390 clocks - WiFi: - mac80211 - EHT 1024 aggregation size for transmissions - new operation to indicate that a new interface is to be added - support radio separation of multi-band devices - move wireless extension spy implementation to libiw - Broadcom: - brcmfmac: optional LPO clock support - Microchip: - add support for Atmel WILC3000 - Qualcomm (ath12k): - firmware coredump collection support - add debugfs support for a multitude of statistics - Qualcomm (ath5k): - Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support - Realtek: - rtw88: 8821au and 8812au USB adapters support - rtw89: add thermal protection - rtw89: fine tune BT-coexsitence to improve user experience - rtw89: firmware secure boot for WiFi 6 chip - Bluetooth - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and 0x13d3:0x3623 - add Realtek RTL8852BE support for id Foxconn 0xe123 - add MediaTek MT7920 support for wireless module ids - btintel_pcie: add handshake between driver and firmware - btintel_pcie: add recovery mechanism - btnxpuart: add GPIO support to power save feature" * tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits) mm: page_frag: fix a compile error when kernel is not compiled Documentation: tipc: fix formatting issue in tipc.rst selftests: nic_performance: Add selftest for performance of NIC driver selftests: nic_link_layer: Add selftest case for speed and duplex states selftests: nic_link_layer: Add link layer selftest for NIC driver bnxt_en: Add FW trace coredump segments to the coredump bnxt_en: Add a new ethtool -W dump flag bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr() bnxt_en: Add functions to copy host context memory bnxt_en: Do not free FW log context memory bnxt_en: Manage the FW trace context memory bnxt_en: Allocate backing store memory for FW trace logs bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem() bnxt_en: Refactor bnxt_free_ctx_mem() bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type bnxt_en: Update firmware interface spec to 1.10.3.85 selftests/bpf: Add some tests with sockmap SK_PASS bpf: fix recursive lock when verdict program return SK_PASS wireguard: device: support big tcp GSO wireguard: selftests: load nf_conntrack if not present ...