summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-15bpf: Add verifier support for timed may_gotoKumar Kartikeya Dwivedi
Implement support in the verifier for replacing may_goto implementation from a counter-based approach to one which samples time on the local CPU to have a bigger loop bound. We implement it by maintaining 16-bytes per-stack frame, and using 8 bytes for maintaining the count for amortizing time sampling, and 8 bytes for the starting timestamp. To minimize overhead, we need to avoid spilling and filling of registers around this sequence, so we push this cost into the time sampling function 'arch_bpf_timed_may_goto'. This is a JIT-specific wrapper around bpf_check_timed_may_goto which returns us the count to store into the stack through BPF_REG_AX. All caller-saved registers (r0-r5) are guaranteed to remain untouched. The loop can be broken by returning count as 0, otherwise we dispatch into the function when the count drops to 0, and the runtime chooses to refresh it (by returning count as BPF_MAX_TIMED_LOOPS) or returning 0 and aborting the loop on next iteration. Since the check for 0 is done right after loading the count from the stack, all subsequent cond_break sequences should immediately break as well, of the same loop or subsequent loops in the program. We pass in the stack_depth of the count (and thus the timestamp, by adding 8 to it) to the arch_bpf_timed_may_goto call so that it can be passed in to bpf_check_timed_may_goto as an argument after r1 is saved, by adding the offset to r10/fp. This adjustment will be arch specific, and the next patch will introduce support for x86. Note that depending on loop complexity, time spent in the loop can be more than the current limit (250 ms), but imposing an upper bound on program runtime is an orthogonal problem which will be addressed when program cancellations are supported. The current time afforded by cond_break may not be enough for cases where BPF programs want to implement locking algorithms inline, and use cond_break as a promise to the verifier that they will eventually terminate. Below are some benchmarking numbers on the time taken per-iteration for an empty loop that counts the number of iterations until cond_break fires. For comparison, we compare it against bpf_for/bpf_repeat which is another way to achieve the same number of spins (BPF_MAX_LOOPS). The hardware used for benchmarking was a Sapphire Rapids Intel server with performance governor enabled, mitigations were enabled. +-----------------------------+--------------+--------------+------------------+ | Loop type | Iterations | Time (ms) | Time/iter (ns) | +-----------------------------|--------------+--------------+------------------+ | may_goto | 8388608 | 3 | 0.36 | | timed_may_goto (count=65535)| 589674932 | 250 | 0.42 | | bpf_for | 8388608 | 10 | 1.19 | +-----------------------------+--------------+--------------+------------------+ This gives a good approximation at low overhead while staying close to the current implementation. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250304003239.2390751-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15libbpf: Split bpf object load into prepare/loadMykyta Yatsenko
Introduce bpf_object__prepare API: additional intermediate preparation step that performs ELF processing, relocations, prepares final state of BPF program instructions (accessible with bpf_program__insns()), creates and (potentially) pins maps, and stops short of loading BPF programs. We anticipate few use cases for this API, such as: * Use prepare to initialize bpf_token, without loading freplace programs, unlocking possibility to lookup BTF of other programs. * Execute prepare to obtain finalized BPF program instructions without loading programs, enabling tools like veristat to process one program at a time, without incurring cost of ELF parsing and processing. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250303135752.158343-4-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15libbpf: Introduce more granular state for bpf_objectMykyta Yatsenko
We are going to split bpf_object loading into 2 stages: preparation and loading. This will increase flexibility when working with bpf_object and unlock some optimizations and use cases. This patch substitutes a boolean flag (loaded) by more finely-grained state for bpf_object. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250303135752.158343-3-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15net: filter: Avoid shadowing variable in bpf_convert_ctx_access()Breno Leitao
Rename the local variable 'off' to 'offset' to avoid shadowing the existing 'off' variable that is declared as an `int` in the outer scope of bpf_convert_ctx_access(). This fixes a compiler warning: net/core/filter.c:9679:8: warning: declaration shadows a local variable [-Wshadow] Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://patch.msgid.link/20250228-fix_filter-v1-1-ce13eae66fe9@debian.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15libbpf: Use map_is_created helper in map settersMykyta Yatsenko
Refactoring: use map_is_created helper in map setters that need to check the state of the map. This helps to reduce the number of the places that depend explicitly on the loaded flag, simplifying refactoring in the next patch of this set. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250303135752.158343-2-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15Merge branch 'selftests-bpf-migrate-test_tunnel-sh-to-test_progs'Martin KaFai Lau
Bastien Curutchet says: ==================== selftests/bpf: Migrate test_tunnel.sh to test_progs Hi all, This patch series continues the work to migrate the *.sh tests into prog_tests framework. The test_tunnel.sh script has already been partly migrated to test_progs in prog_tests/test_tunnel.c so I add my work to it. PATCH 1 & 2 create some helpers to avoid code duplication and ease the migration in the following patches. PATCH 3 to 9 migrate the tests of gre, ip6gre, erspan, ip6erspan, geneve, ip6geneve and ip6tnl tunnels. PATCH 10 removes test_tunnel.sh ==================== Link: https://patch.msgid.link/20250303-tunnels-v2-0-8329f38f0678@bootlin.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Remove test_tunnel.shBastien Curutchet (eBPF Foundation)
All tests from test_tunnel.sh have been migrated into test test_progs. The last test remaining in the script is the test_ipip() that is already covered in the test_prog framework by the NONE case of test_ipip_tunnel(). Remove the test_tunnel.sh script and its Makefile entry Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-10-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Move ip6tnl tunnel tests to test_progsBastien Curutchet (eBPF Foundation)
ip6tnl tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6tnl tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_ipip6() and test_ip6ip6() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-9-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Move ip6geneve tunnel test to test_progsBastien Curutchet (eBPF Foundation)
ip6geneve tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6geneve tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_ip6geneve() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-8-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Move geneve tunnel test to test_progsBastien Curutchet (eBPF Foundation)
geneve tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test geneve tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_geneve() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-7-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Move ip6erspan tunnel test to test_progsBastien Curutchet (eBPF Foundation)
ip6erspan tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6erspan tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_ip6erspan() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-6-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Move erspan tunnel tests to test_progsBastien Curutchet (eBPF Foundation)
erspan tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test erspan tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_erspan() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-5-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Move ip6gre tunnel test to test_progsBastien Curutchet (eBPF Foundation)
ip6gre tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6gre tunnels. It uses the same network topology and the same BPF programs than the script. Disable the IPv6 DAD feature because it can take lot of time and cause some tests to fail depending on the environment they're run on. Remove test_ip6gre() and test_ip6gretap() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-4-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Move gre tunnel test to test_progsBastien Curutchet (eBPF Foundation)
gre tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test gre tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_gre() and test_gre_no_tunnel_key() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-3-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15Merge branch 'veristat-files-list-txt-notation-for-object-files-list'Andrii Nakryiko
Eduard Zingerman says: ==================== veristat: @files-list.txt notation for object files list A few small veristat improvements: - It is possible to hit command line parameters number limit, e.g. when running veristat for all object files generated for test_progs. This patch-set adds an option to read objects files list from a file. - Correct usage of strerror() function. - Avoid printing log lines to CSV output. Changelog: - v1 -> v2: - replace strerror(errno) with strerror(-err) in patch #2 (Andrii) v1: https://lore.kernel.org/bpf/3ee39a16-bc54-4820-984a-0add2b5b5f86@gmail.com/T/ ==================== Link: https://patch.msgid.link/20250301000147.1583999-1-eddyz87@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Add ping helpersBastien Curutchet (eBPF Foundation)
All tests use more or less the same ping commands as final validation. Also test_ping()'s return value is checked with ASSERT_OK() while this check is already done by the SYS() macro inside test_ping(). Create helpers around test_ping() and use them in the tests to avoid code duplication. Remove the unnecessary ASSERT_OK() from the tests. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-2-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf: Factor out check_load_mem() and check_store_reg()Peilin Ye
Extract BPF_LDX and most non-ATOMIC BPF_STX instruction handling logic in do_check() into helper functions to be used later. While we are here, make that comment about "reserved fields" more specific. Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/8b39c94eac2bb7389ff12392ca666f939124ec4f.1740978603.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15veristat: Report program type guess results to sdterrEduard Zingerman
In order not to pollute CSV output, e.g.: $ ./veristat -o csv exceptions_ext.bpf.o > test.csv Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/extension... Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/throwing_extension... Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/bpf/20250301000147.1583999-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: test_tunnel: Add generic_attach* helpersBastien Curutchet (eBPF Foundation)
A fair amount of code duplication is present among tests to attach BPF programs. Create generic_attach* helpers that attach BPF programs to a given interface. Use ASSERT_OK_FD() instead of ASSERT_GE() to check fd's validity. Use these helpers in all the available tests. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-1-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf: Factor out check_atomic_rmw()Peilin Ye
Currently, check_atomic() only handles atomic read-modify-write (RMW) instructions. Since we are planning to introduce other types of atomic instructions (i.e., atomic load/store), extract the existing RMW handling logic into its own function named check_atomic_rmw(). Remove the @insn_idx parameter as it is not really necessary. Use 'env->insn_idx' instead, as in other places in verifier.c. Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/6323ac8e73a10a1c8ee547c77ed68cf8eb6b90e1.1740978603.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15veristat: Strerror expects positive number (errno)Eduard Zingerman
Before: ./veristat -G @foobar iters.bpf.o Failed to open presets in 'foobar': Unknown error -2 ... After: ./veristat -G @foobar iters.bpf.o Failed to open presets in 'foobar': No such file or directory ... Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/bpf/20250301000147.1583999-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf: Factor out atomic_ptr_type_ok()Peilin Ye
Factor out atomic_ptr_type_ok() as a helper function to be used later. Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/e5ef8b3116f3fffce78117a14060ddce05eba52a.1740978603.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15veristat: @files-list.txt notation for object files listEduard Zingerman
Allow reading object file list from file. E.g. the following command: ./veristat @list.txt Is equivalent to the following invocation: ./veristat line-1 line-2 ... line-N Where line-i corresponds to lines from list.txt. Lines starting with '#' are ignored. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/bpf/20250301000147.1583999-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf: no longer acquire map_idr_lock in bpf_map_inc_not_zero()Eric Dumazet
bpf_sk_storage_clone() is the only caller of bpf_map_inc_not_zero() and is holding rcu_read_lock(). map_idr_lock does not add any protection, just remove the cost for passive TCP flows. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kui-Feng Lee <kuifeng@meta.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/r/20250301191315.1532629-1-edumazet@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15Merge branch 'global-subprogs-in-rcu-preempt-irq-disabled-sections'Alexei Starovoitov
Kumar Kartikeya Dwivedi says: ==================== Global subprogs in RCU/{preempt,irq}-disabled sections Small change to allow non-sleepable global subprogs in RCU, preempt-disabled, and irq-disabled sections. For now, we don't lift the limitation for locks as it requires more analysis, and will do this one resilient spin locks land. This surfaced a bug where sleepable global subprogs were allowed in RCU read sections, that has been fixed. Tests have been added to cover various cases. Changelog: ---------- v2 -> v3 v2: https://lore.kernel.org/bpf/20250301030205.1221223-1-memxor@gmail.com * Fix broken to_be_replaced argument in the selftest. * Adjust selftest program type. v1 -> v2 v1: https://lore.kernel.org/bpf/20250228162858.1073529-1-memxor@gmail.com * Rename subprog_info[i].sleepable to might_sleep, which more accurately reflects the nature of the bit. 'sleepable' means whether a given context is allowed to, while might_sleep captures if it does. * Disallow extensions that might sleep to attach to targets that don't sleep, since they'd be permitted to be called in atomic contexts. (Eduard) * Add tests for mixing non-sleepable and sleepable global function calls, and extensions attaching to non-sleepable global functions. (Eduard) * Rename changes_pkt_data -> summarization ==================== Link: https://patch.msgid.link/20250301151846.1552362-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf/selftests: test_select_reuseport_kern: Remove unused headerAlexis Lothoré (eBPF Foundation)
test_select_reuseport_kern.c is currently including <stdlib.h>, but it does not use any definition from there. Remove stdlib.h inclusion from test_select_reuseport_kern.c Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250227-remove_wrong_header-v1-1-bc94eb4e2f73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: Add tests for extending sleepable global subprogsKumar Kartikeya Dwivedi
Add tests for freplace behavior with the combination of sleepable and non-sleepable global subprogs. The changes_pkt_data selftest did all the hardwork, so simply rename it and include new support for more summarization tests for might_sleep bit. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250301151846.1552362-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: Add selftests allowing cgroup prog pre-orderingYonghong Song
Add a few selftests with cgroup prog pre-ordering. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250224230121.283601-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: Test sleepable global subprogs in atomic contextsKumar Kartikeya Dwivedi
Add tests for rejecting sleepable and accepting non-sleepable global function calls in atomic contexts. For spin locks, we still reject all global function calls. Once resilient spin locks land, we will carefully lift in cases where we deem it safe. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250301151846.1552362-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf: Allow pre-ordering for bpf cgroup progsYonghong Song
Currently for bpf progs in a cgroup hierarchy, the effective prog array is computed from bottom cgroup to upper cgroups (post-ordering). For example, the following cgroup hierarchy root cgroup: p1, p2 subcgroup: p3, p4 have BPF_F_ALLOW_MULTI for both cgroup levels. The effective cgroup array ordering looks like p3 p4 p1 p2 and at run time, progs will execute based on that order. But in some cases, it is desirable to have root prog executes earlier than children progs (pre-ordering). For example, - prog p1 intends to collect original pkt dest addresses. - prog p3 will modify original pkt dest addresses to a proxy address for security reason. The end result is that prog p1 gets proxy address which is not what it wants. Putting p1 to every child cgroup is not desirable either as it will duplicate itself in many child cgroups. And this is exactly a use case we are encountering in Meta. To fix this issue, let us introduce a flag BPF_F_PREORDER. If the flag is specified at attachment time, the prog has higher priority and the ordering with that flag will be from top to bottom (pre-ordering). For example, in the above example, root cgroup: p1, p2 subcgroup: p3, p4 Let us say p2 and p4 are marked with BPF_F_PREORDER. The final effective array ordering will be p2 p4 p3 p1 Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf: Summarize sleepable global subprogsKumar Kartikeya Dwivedi
The verifier currently does not permit global subprog calls when a lock is held, preemption is disabled, or when IRQs are disabled. This is because we don't know whether the global subprog calls sleepable functions or not. In case of locks, there's an additional reason: functions called by the global subprog may hold additional locks etc. The verifier won't know while verifying the global subprog whether it was called in context where a spin lock is already held by the program. Perform summarization of the sleepable nature of a global subprog just like changes_pkt_data and then allow calls to global subprogs for non-sleepable ones from atomic context. While making this change, I noticed that RCU read sections had no protection against sleepable global subprog calls, include it in the checks and fix this while we're at it. Care needs to be taken to not allow global subprog calls when regular bpf_spin_lock is held. When resilient spin locks is held, we want to potentially have this check relaxed, but not for now. Also make sure extensions freplacing global functions cannot do so in case the target is non-sleepable, but the extension is. The other combination is ok. Tests are included in the next patch to handle all special conditions. Fixes: 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250301151846.1552362-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15Merge branch 'optimize-bpf-selftest-to-increase-ci-success-rate'Alexei Starovoitov
Jiayuan Chen says: ==================== Optimize bpf selftest to increase CI success rate 1. Optimized some static bound port selftests to avoid port occupation when running test_progs -j. 2. Optimized the retry logic for test_maps. Some Failed CI: https://github.com/kernel-patches/bpf/actions/runs/13275542359/job/37064974076 https://github.com/kernel-patches/bpf/actions/runs/13549227497/job/37868926343 https://github.com/kernel-patches/bpf/actions/runs/13548089029/job/37865812030 https://github.com/kernel-patches/bpf/actions/runs/13553536268/job/37883329296 (Perhaps it's due to the large number of pull requests requiring CI runs?) ==================== Link: https://patch.msgid.link/20250227142646.59711-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: Fixes for test_maps testJiayuan Chen
BPF CI has failed 3 times in the last 24 hours. Add retry for ENOMEM. It's similar to the optimization plan: commit 2f553b032cad ("selftsets/bpf: Retry map update for non-preallocated per-cpu map") Failed CI: https://github.com/kernel-patches/bpf/actions/runs/13549227497/job/37868926343 https://github.com/kernel-patches/bpf/actions/runs/13548089029/job/37865812030 https://github.com/kernel-patches/bpf/actions/runs/13553536268/job/37883329296 selftests/bpf: Fixes for test_maps test Fork 100 tasks to 'test_update_delete' Fork 100 tasks to 'test_update_delete' Fork 100 tasks to 'test_update_delete' Fork 100 tasks to 'test_update_delete' ...... test_task_storage_map_stress_lookup:PASS test_maps: OK, 0 SKIPPED Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250227142646.59711-4-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15Merge branch 'introduce-bpf_dynptr_copy-kfunc'Andrii Nakryiko
Mykyta Yatsenko says: ==================== introduce bpf_dynptr_copy kfunc From: Mykyta Yatsenko <yatsenko@meta.com> Introduce a new kfunc, bpf_dynptr_copy, which enables copying of data from one dynptr to another. This functionality may be useful in scenarios such as capturing XDP data to a ring buffer. The patch set is split into 3 patches: 1. Refactor bpf_dynptr_read and bpf_dynptr_write by extracting code into static functions, that allows calling them with no compiler warnings 2. Introduce bpf_dynptr_copy 3. Add tests for bpf_dynptr_copy v2->v3: * Implemented bpf_memcmp in dynptr_success.c test, as __builtin_memcmp was not inlined on GCC-BPF. ==================== Link: https://patch.msgid.link/20250226183201.332713-1-mykyta.yatsenko5@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: Allow auto port binding for bpf nfJiayuan Chen
Allow auto port binding for bpf nf test to avoid binding conflict. ./test_progs -a bpf_nf 24/1 bpf_nf/xdp-ct:OK 24/2 bpf_nf/tc-bpf-ct:OK 24/3 bpf_nf/alloc_release:OK 24/4 bpf_nf/insert_insert:OK 24/5 bpf_nf/lookup_insert:OK 24/6 bpf_nf/set_timeout_after_insert:OK 24/7 bpf_nf/set_status_after_insert:OK 24/8 bpf_nf/change_timeout_after_alloc:OK 24/9 bpf_nf/change_status_after_alloc:OK 24/10 bpf_nf/write_not_allowlisted_field:OK 24 bpf_nf:OK Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250227142646.59711-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: Allow auto port binding for cgroup connectJiayuan Chen
Allow auto port binding for cgroup connect test to avoid binding conflict. Result: ./test_progs -a cgroup_v1v2 59 cgroup_v1v2:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250227142646.59711-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15selftests/bpf: Add tests for bpf_dynptr_copyMykyta Yatsenko
Add XDP setup type for dynptr tests, enabling testing for non-contiguous buffer. Add 2 tests: - test_dynptr_copy - verify correctness for the fast (contiguous buffer) code path. - test_dynptr_copy_xdp - verifies code paths that handle non-contiguous buffer. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250226183201.332713-4-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf/helpers: Introduce bpf_dynptr_copy kfuncMykyta Yatsenko
Introducing bpf_dynptr_copy kfunc allowing copying data from one dynptr to another. This functionality is useful in scenarios such as capturing XDP data to a ring buffer. The implementation consists of 4 branches: * A fast branch for contiguous buffer capacity in both source and destination dynptrs * 3 branches utilizing __bpf_dynptr_read and __bpf_dynptr_write to copy data to/from non-contiguous buffer Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250226183201.332713-3-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15bpf/helpers: Refactor bpf_dynptr_read and bpf_dynptr_writeMykyta Yatsenko
Refactor bpf_dynptr_read and bpf_dynptr_write helpers: extract code into the static functions namely __bpf_dynptr_read and __bpf_dynptr_write, this allows calling these without compiler warnings. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250226183201.332713-2-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15Merge branch 'selftests-bpf-implement-setting-global-variables-in-veristat'Andrii Nakryiko
Mykyta Yatsenko says: ==================== selftests/bpf: implement setting global variables in veristat From: Mykyta Yatsenko <yatsenko@meta.com> To better verify some complex BPF programs by veristat, it would be useful to preset global variables. This patch set implements this functionality and introduces tests for veristat. v4->v5 * Rework parsing to use sscanf for integers * Addressing nits v3->v4: * Fixing bug in set_global_var introduced by refactoring in previous patch set * Addressed nits from Eduard v2->v3: * Reworked parsing of the presets, using sscanf to split into variable and value, but still use strtoll/strtoull to support range checks when parsing integers * Fix test failures for no_alu32 & cpuv4 by checking if veristat binary is in parent folder * Introduce __CHECK_STR macro for simplifying checks in test * Modify tests into sub-tests ==================== Link: https://patch.msgid.link/20250225163101.121043-1-mykyta.yatsenko5@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15Merge tag 'fsnotify_for_v6.14-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify reverts from Jan Kara: "Syzbot has found out that fsnotify HSM events generated on page fault can be generated while we already hold freeze protection for the filesystem (when you do buffered write from a buffer which is mmapped file on the same filesystem) which violates expectations for HSM events and could lead to deadlocks of HSM clients with filesystem freezing. Since it's quite late in the cycle we've decided to revert changes implementing HSM events on page fault for now and instead just generate one event for the whole range on mmap(2) so that HSM client can fetch the data at that moment" * tag 'fsnotify_for_v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: Revert "fanotify: disable readahead if we have pre-content watches" Revert "mm: don't allow huge faults for files with pre content watches" Revert "fsnotify: generate pre-content permission event on page fault" Revert "xfs: add pre-content fsnotify hook for DAX faults" Revert "ext4: add pre-content fsnotify hook for DAX faults" fsnotify: add pre-content hooks on mmap()
2025-03-15mm: Fix the flipped condition in gfpflags_allow_spinning()Vlastimil Babka
The function gfpflags_allow_spinning() has a bug that makes it return the opposite result than intended. This could contribute to deadlocks as usage profilerates, for now it was noticed as a performance regression due to try_charge_memcg() not refilling memcg stock when it could. Fix the flipped condition. Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation") Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250310124017.187-1-alexei.starovoitov@gmail.com Closes: https://lore.kernel.org/oe-lkp/202503101254.cfd454df-lkp@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-15PCI: imx6: Use devm_clk_bulk_get_all() to fetch clocksRichard Zhu
Use devm_clk_bulk_get_all() helper to simplify clock handle code. No functional changes intended. Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> [kwilczynski: commit log, refactor to use dev_err_probe()] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20250226025628.1681206-1-hongxing.zhu@nxp.com
2025-03-15PCI: imx6: Identify controller via 'linux,pci-domain', not addressRichard Zhu
Instead of testing the controller register address to distinguish controller 1 from controller 0 on i.MX8MQ platforms, use the PCI domain number, which comes from the devicetree 'linux,pci-domain' property. All relevant devicetrees should already supply 'linux,pci-domain', which was added by c0b70f05c87f ("arm64: dts: imx8mq: use_dt_domains for pci node"). Instead of being set directly in imx_pcie_probe(), pci->dbi_base will be set by the DWC core in dw_pcie_get_resources(). No functional changes intended. Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250226024256.1678103-3-hongxing.zhu@nxp.com
2025-03-15remoteproc: qcom: pas: add minidump_id to SC7280 WPSSLuca Weiss
Add the minidump ID to the wpss resources, based on msm-5.4 devicetree. Fixes: 300ed425dfa9 ("remoteproc: qcom_q6v5_pas: Add SC7280 ADSP, CDSP & WPSS") Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250314-sc7280-wpss-minidump-v1-1-d869d53fd432@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-03-15arm64: dts: rockchip: remove ethm0_clk0_25m_out from Sige5 gmac0Nicolas Frattaroli
The GPIO3 A4 pin on the ArmSoM Sige5 is routed to the 40-pin GPIO header. This pin can serve a variety of functions, including ones of questionable use to us on a GPIO header such as the 25MHz clock of the ethernet controller. Unfortunately, this is the precise function that it is being claimed for by the gmac0 node in the Sige5 board dts, meaning it can't be used for anything else despite serving no useful function in this role. Since it goes through a RS0108 bidirectional voltage level translator with a maximum data rate of 24Mbit/s in push-pull mode and 2Mbit/s data rate in open-drain mode, it's doubtful as to whether the 25MHz clock signal would even survive to the actual user-accessible pin it terminates in. Remove it to leave the pin for users to play with. It's infinitely more useful as a GPIO or even as a PWM. Fixes: 40f742b07ab2 ("arm64: dts: rockchip: Add rk3576-armsom-sige5 board") Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/20250314-rk3576-sige5-eth-clk-begone-v1-1-2858338fc555@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-03-15ALSA: hda/realtek: Always honor no_shutup_pinsTakashi Iwai
The workaround for Dell machines to skip the pin-shutup for mic pins introduced alc_headset_mic_no_shutup() that is replaced from the generic snd_hda_shutup_pins() for certain codecs. The problem is that the call is done unconditionally even if spec->no_shutup_pins is set. This seems causing problems on other platforms like Lenovo. This patch corrects the behavior and the driver honors always spec->no_shutup_pins flag and skips alc_headset_mic_no_shutup() if it's set. Fixes: dad3197da7a3 ("ALSA: hda/realtek - Fixup headphone noise via runtime suspend") Reported-and-tested-by: Oleg Gorobets <oleg.goro@gmail.com> Link: https://patch.msgid.link/20250315143020.27184-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-03-15arm64: dts: marvell: Use preferred node names for "simple-bus"Rob Herring (Arm)
The "simple-bus" binding has preferred node names such as "bus", ".*-bus", or "soc". Rename the Marvell platforms to use these names. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2025-03-15arm64: dts: marvell: Drop unused CP11X_TYPE defineRob Herring (Arm)
The CP11X_TYPE define is not used anywhere, remove it. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2025-03-15arm64: dts: marvell: Move arch timer and pmu nodes to top-levelRob Herring (Arm)
The Arm arch timer and PMU are not memory-mapped peripherals, and therefore should not be under a "simple-bus" node. Move them to the top-level like other platforms. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>