linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-04-02	selftests/bpf: Skip test when perf_event_open returns EOPNOTSUPP	Pu Lehui
	When testing send_signal and stacktrace_build_id_nmi using the riscv sbi pmu driver without the sscofpmf extension or the riscv legacy pmu driver, then failures as follows are encountered: test_send_signal_common:FAIL:perf_event_open unexpected perf_event_open: actual -1 < expected 0 #272/3 send_signal/send_signal_nmi:FAIL test_stacktrace_build_id_nmi:FAIL:perf_event_open err -1 errno 95 #304 stacktrace_build_id_nmi:FAIL The reason is that the above pmu driver or hardware does not support sampling events, that is, PERF_PMU_CAP_NO_INTERRUPT is set to pmu capabilities, and then perf_event_open returns EOPNOTSUPP. Since PERF_PMU_CAP_NO_INTERRUPT is not only set in the riscv-related pmu driver, it is better to skip testing when this capability is set. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240402073029.1299085-1-pulehui@huaweicloud.com
2024-04-02	selftests/bpf: Using llvm may_goto inline asm for cond_break macro	Yonghong Song
	Currently, cond_break macro uses bytes to encode the may_goto insn. Patch [1] in llvm implemented may_goto insn in BPF backend. Replace byte-level encoding with llvm inline asm for better usability. Using llvm may_goto insn is controlled by macro __BPF_FEATURE_MAY_GOTO. [1] https://github.com/llvm/llvm-project/commit/0e0bfacff71859d1f9212205f8f873d47029d3fb Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240402025446.3215182-1-yonghong.song@linux.dev
2024-04-01	selftests: mptcp: join: fix dev in check_endpoint	Geliang Tang
	There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly. If calling it in the 2nd test of endpoint_tests() too, it fails with an error like this: creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \ found '10.0.2.2 id 2 subflow dev ns2eth2' The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	mptcp: don't account accept() of non-MPC client as fallback to TCP	Davide Caratti
	Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this is "surprising" because the counter might become greater than MPTcpExtMPCapableSYNRX. MPTcpExtMPCapableFallbackACK counter's name suggests it should only be incremented when a connection was seen using MPTCP options, then a fallback to TCP has been done. Let's do that by incrementing it when the subflow context of an inbound MPC connection attempt is dropped. Also, update mptcp_connect.sh kselftest, to ensure that the above MIB does not increment in case a pure TCP client connects to a MPTCP server. Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	selftests: reuseaddr_conflict: add missing new line at the end of the output	Jakub Kicinski
	The netdev CI runs in a VM and captures serial, so stdout and stderr get combined. Because there's a missing new line in stderr the test ends up corrupting KTAP: # Successok 1 selftests: net: reuseaddr_conflict which should have been: # Success ok 1 selftests: net: reuseaddr_conflict Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test") Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftests/bpf: make multi-uprobe tests work in RELEASE=1 mode	Andrii Nakryiko
	When BPF selftests are built in RELEASE=1 mode with -O2 optimization level, uprobe_multi binary, called from multi-uprobe tests is optimized to the point that all the thousands of target uprobe_multi_func_XXX functions are eliminated, breaking tests. So ensure they are preserved by using weak attribute. But, actually, compiling uprobe_multi binary with -O2 takes a really long time, and is quite useless (it's not a benchmark). So in addition to ensuring that uprobe_multi_func_XXX functions are preserved, opt-out of -O2 explicitly in Makefile and stick to -O0. This saves a lot of compilation time. With -O2, just recompiling uprobe_multi: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 291.66s user 2.54s system 99% cpu 4:55.52 total With -O0: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 22.40s user 1.91s system 99% cpu 24.355 total 5 minutes vs (still slow, but...) 24 seconds. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240329190410.4191353-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-29	Merge tag 'linux_kselftest-fixes-6.9-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Fixes to seccomp and ftrace tests and a change to add config file for dmabuf-heap test to increase coverage" * tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: dmabuf-heap: add config file for the test selftests/seccomp: Try to fit runtime of benchmark into timeout selftests/ftrace: Fix event filter target_func selection
2024-03-29	Merge tag 'linux_kselftest-kunit-fixes-6.9-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "One urgent fix for --alltests build failure related to renaming of CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED to the missing config option" * tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
2024-03-29	selftest: tcp: Add bind() tests for SO_REUSEADDR/SO_REUSEPORT.	Kuniyuki Iwashima
	This patch adds two tests using SO_REUSEADDR and SO_REUSEPORT and defines errno for each test case. SO_REUSEADDR/SO_REUSEPORT is set for the per-fixture two bind() calls. The notable pattern is the pair of v6only [::] and plain [::]. The two sockets are put into the same tb2, where per-bucket v6only flag would be useless to detect bind() conflict. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Add bind() tests for IPV6_V6ONLY.	Kuniyuki Iwashima
	bhash2 was not well tested for IPv6-only sockets. This patch adds test cases where we set IPV6_V6ONLY for per-fixture bind() calls if variant->ipv6_only[i] is true. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Add more bind() calls.	Kuniyuki Iwashima
	In addtition to the two addresses defined in the fixtures, this patch add 6 more bind calls(): * 0.0.0.0 * 127.0.0.1 * :: * ::1 * ::ffff:0.0.0.0 * ::ffff:127.0.0.1 The first two per-fixture bind() calls control how inet_bind2_bucket is created, and the rest 6 bind() calls cover as many conflicting patterns as possible. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Add v4-v4 and v6-v6 bind() conflict tests.	Kuniyuki Iwashima
	We don't have bind() conflict tests for the same protocol pairs. Let's add them except for the same address pair, which will be covered by the following patch adding 6 more bind() calls for each test case. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Define the reverse order bind() tests explicitly.	Kuniyuki Iwashima
	Currently, bind_wildcard.c calls bind() twice for two addresses and checks the pre-defined errno against the 2nd call. Also, the two bind() calls are swapped to cover various patterns how bind buckets are created. However, only testing two addresses is insufficient to detect regression. So, we will add more bind() calls, and then, we need to define different errno for each bind() per test case. As a prepartion, let's define the reverse order bind() test cases as fixtures. No functional changes are intended. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Make bind() selftest flexible.	Kuniyuki Iwashima
	Currently, bind_wildcard.c tests only (IPv4, IPv6) pairs, but we will add more tests for the same protocol pairs. This patch makes it possible by changing the address pointer to void. No functional changes are intended. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftests: dmabuf-heap: add config file for the test	Muhammad Usama Anjum
	The config fragment enlists all the config options needed for the test. This config is merged into the kernel's config on which this test is run. Fixed whitespace errors during commit: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: T.J. Mercier <tjmercier@google.com> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-29	selftests/seccomp: Try to fit runtime of benchmark into timeout	Mark Brown
	The seccomp benchmark runs five scenarios, one calibration run with no seccomp filters enabled then four further runs each adding a filter. The calibration run times itself for 15s and then each additional run executes for the same number of times. Currently the seccomp tests, including the benchmark, run with an extended 120s timeout but this is not sufficient to robustly run the tests on a lot of platforms. Sample timings from some recent runs: Platform Run 1 Run 2 Run 3 Run 4 --------- ----- ----- ----- ----- PowerEdge R200 16.6s 16.6s 31.6s 37.4s BBB (arm) 20.4s 20.4s 54.5s Synquacer (arm64) 20.7s 23.7s 40.3s The x86 runs from the PowerEdge are quite marginal and routinely fail, for the successful run reported here the timed portions of the run are at 117.2s leaving less than 3s of margin which is frequently breached. The added overhead of adding filters on the other platforms is such that there is no prospect of their runs fitting into the 120s timeout, especially on 32 bit arm where there is no BPF JIT. While we could lower the time we calibrate for I'm also already seeing the currently completing runs reporting issues with the per filter overheads not matching expectations: Let's instead raise the timeout to 180s which is only a 50% increase on the current timeout which is itself not too large given that there's only two tests in this suite. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-29	selftests/ftrace: Fix event filter target_func selection	Mark Rutland
	The event filter function test has been failing in our internal test farm: \| # not ok 33 event filter function - test event filtering on functions Running the test in verbose mode indicates that this is because the test erroneously determines that kmem_cache_free() is the most common caller of kmem_cache_free(): # # + cut -d: -f3 trace # # + sed s/call_site=([^+])+0x./1/ # # + sort # # + uniq -c # # + sort # # + tail -n 1 # # + sed s/^[ 0-9]// # # + target_func=kmem_cache_free ... and as kmem_cache_free() doesn't call itself, setting this as the filter function for kmem_cache_free() results in no hits, and consequently the test fails: # # + grep kmem_cache_free trace # # + grep kmem_cache_free # # + wc -l # # + hitcnt=0 # # + grep kmem_cache_free trace # # + grep -v kmem_cache_free # # + wc -l # # + misscnt=0 # # + [ 0 -eq 0 ] # # + exit_fail This seems to be because the system in question has tasks with ':' in their name (which a number of kernel worker threads have). These show up in the trace, e.g. test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache ... and so when we try to extact the call_site with: cut -d: -f3 trace \| sed 's/call_site=$[^+]$+0x.*/\1/' ... the 'cut' command will extrace the column containing 'kmem_cache_free' rather than the column containing 'call_site=...', and the 'sed' command will leave this unchanged. Consequently, the test will decide to use 'kmem_cache_free' as the filter function, resulting in the failure seen above. Fix this by matching the 'call_site=<func>' part specifically to extract the function name. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-kernel@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-trace-kernel@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-29	selftest: af_unix: Test GC for SCM_RIGHTS.	Kuniyuki Iwashima
	This patch adds test cases to verify the new GC. We run each test for the following cases: * SOCK_DGRAM * SOCK_STREAM without embryo socket * SOCK_STREAM without embryo socket + MSG_OOB * SOCK_STREAM with embryo sockets * SOCK_STREAM with embryo sockets + MSG_OOB Before and after running each test case, we ensure that there is no AF_UNIX socket left in the netns by reading /proc/net/protocols. We cannot use /proc/net/unix and UNIX_DIAG because the embryo socket does not show up there. Each test creates multiple sockets in an array. We pass sockets in the even index using the peer sockets in the odd index. So, send_fd(0, 1) actually sends fd[0] to fd[2] via fd[0 + 1]. Test 1 : A <-> A Test 2 : A <-> B Test 3 : A -> B -> C <- D ^.___\|___.' ^ `---------' Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240325202425.60930-16-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftests: net: gro fwd: update vxlan GRO test expectations	Antoine Tenart
	UDP tunnel packets can't be GRO in-between their endpoints as this causes different issues. The UDP GRO fwd vxlan tests were relying on this and their expectations have to be fixed. We keep both vxlan tests and expected no GRO from happening. The vxlan UDP GRO bench test was removed as it's not providing any valuable information now. Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-29	x86/selftests: Skip the tests if prerequisites aren't fulfilled	Muhammad Usama Anjum
	Skip instead of failing when prerequisite conditions aren't fulfilled, such as invalid xstate values etc. Make the tests show as 'SKIP' when run: make -C tools/testing/selftests/ TARGETS=x86 run_tests ... # timeout set to 45 # selftests: x86: amx_64 # # xstate cpuid: invalid tile data size/offset: 0/0 ok 42 selftests: x86: amx_64 # SKIP # timeout set to 45 # selftests: x86: lam_64 # # Unsupported LAM feature! ok 43 selftests: x86: lam_64 # SKIP ... In the AMX test, Move away from check_cpuid_xsave() and start using arch_prctl() to find out if AMX support is present or not. In the kernels where AMX isn't present, arch_prctl() returns -EINVAL, hence it is backward compatible. Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240327111720.3509180-1-usama.anjum@collabora.com
2024-03-28	selftests/bpf: Drop settimeo in do_test	Geliang Tang
	settimeo is invoked in start_server() and in connect_fd_to_fd() already, no need to invoke settimeo(lfd, 0) and settimeo(fd, 0) in do_test() anymore. This patch drops them. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/dbc3613bee3b1c78f95ac9ff468bf47c92f106ea.1711447102.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-28	selftests/bpf: Use connect_fd_to_fd in bpf_tcp_ca	Geliang Tang
	To simplify the code, use BPF selftests helper connect_fd_to_fd() in bpf_tcp_ca.c instead of open-coding it. This helper is defined in network_helpers.c, and exported in network_helpers.h, which is already included in bpf_tcp_ca.c. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/e105d1f225c643bee838409378dd90fd9aabb6dc.1711447102.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-28	selftests/bpf: Add a kprobe_multi subtest to use addrs instead of syms	Yonghong Song
	Get addrs directly from available_filter_functions_addrs and send to the kernel during kprobe_multi_attach. This avoids consultation of /proc/kallsyms. But available_filter_functions_addrs is introduced in 6.5, i.e., it is introduced recently, so I skip the test if the kernel does not support it. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041523.1200301-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Fix kprobe_multi_bench_attach test failure with LTO kernel	Yonghong Song
	In my locally build clang LTO kernel (enabling CONFIG_LTO and CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest failed like: test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec libbpf: prog 'test_kprobe_empty': failed to attach: No such process test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3 #117/1 kprobe_multi_bench_attach/kernel:FAIL There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions are renamed in /proc/kallsyms due to cross file inlining. One example is for static function __access_remote_vm in mm/memory.c. In a non-LTO kernel, we have the following call stack: ptrace_access_vm (global, kernel/ptrace.c) access_remote_vm (global, mm/memory.c) __access_remote_vm (static, mm/memory.c) With LTO kernel, it is possible that access_remote_vm() is inlined by ptrace_access_vm(). So we end up with the following call stack: ptrace_access_vm (global, kernel/ptrace.c) __access_remote_vm (static, mm/memory.c) The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash> to prevent potential name collision. The kernel bpf_kprobe_multi_link_attach() and ftrace_lookup_symbols() try to find addresses based on /proc/kallsyms, hence the current test failed with LTO kenrel. This patch consulted /proc/kallsyms to find the corresponding entries for the ksym and this solved the issue. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041518.1199758-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Add {load,search}_kallsyms_custom_local()	Yonghong Song
	These two functions allow selftests to do loading/searching kallsyms based on their specific compare functions. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041513.1199440-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Refactor trace helper func load_kallsyms_local()	Yonghong Song
	Refactor trace helper function load_kallsyms_local() such that it invokes a common function with a compare function as input. The common function will be used later for other local functions. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041508.1199239-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Refactor some functions for kprobe_multi_test	Yonghong Song
	Refactor some functions in kprobe_multi_test.c to extract some helper functions who will be used in later patches to avoid code duplication. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041503.1198982-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Replace CHECK with ASSERT macros for ksyms test	Yonghong Song
	Replace CHECK with ASSERT macros for ksyms tests. This test failed earlier with clang lto kernel, but the issue is gone with latest code base. But replacing CHECK with ASSERT still improves code as ASSERT is preferred in selftests. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240326041448.1197812-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Test loading bpf-tcp-cc prog calling the kernel tcp-cc kfuncs	Martin KaFai Lau
	This patch adds a test to ensure all static tcp-cc kfuncs is visible to the struct_ops bpf programs. It is checked by successfully loading the struct_ops programs calling these tcp-cc kfuncs. This patch needs to enable the CONFIG_TCP_CONG_DCTCP and the CONFIG_TCP_CONG_BBR. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240322191433.4133280-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: add batched tp/raw_tp/fmodret tests	Andrii Nakryiko
	Utilize bpf_modify_return_test_tp() kfunc to have a fast way to trigger tp/raw_tp/fmodret programs from another BPF program, which gives us comparable batched benchmarks to (batched) kprobe/fentry benchmarks. We don't switch kprobe/fentry batched benchmarks to this kfunc to make bench tool usable on older kernels as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: lazy-load trigger bench BPF programs	Andrii Nakryiko
	Instead of front-loading all possible benchmarking BPF programs for trigger benchmarks, explicitly specify which BPF programs are used by specific benchmark and load only it. This allows to be more flexible in supporting older kernels, where some program types might not be possible to load (e.g., those that rely on newly added kfunc). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: remove syscall-driven benchs, keep syscall-count only	Andrii Nakryiko
	Remove "legacy" benchmarks triggered by syscalls in favor of newly added in-kernel/batched benchmarks. Drop -batched suffix now as well. Next patch will restore "feature parity" by adding back tp/raw_tp/fmodret benchmarks based on in-kernel kfunc approach. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: add batched, mostly in-kernel BPF triggering benchmarks	Andrii Nakryiko
	Existing kprobe/fentry triggering benchmarks have 1-to-1 mapping between one syscall execution and BPF program run. While we use a fast get_pgid() syscall, syscall overhead can still be non-trivial. This patch adds kprobe/fentry set of benchmarks significantly amortizing the cost of syscall vs actual BPF triggering overhead. We do this by employing BPF_PROG_TEST_RUN command to trigger "driver" raw_tp program which does a tight parameterized loop calling cheap BPF helper (bpf_get_numa_node_id()), to which kprobe/fentry programs are attached for benchmarking. This way 1 bpf() syscall causes N executions of BPF program being benchmarked. N defaults to 100, but can be adjusted with --trig-batch-iters CLI argument. For comparison we also implement a new baseline program that instead of triggering another BPF program just does N atomic per-CPU counter increments, establishing the limit for all other types of program within this batched benchmarking setup. Taking the final set of benchmarks added in this patch set (including tp/raw_tp/fmodret, added in later patch), and keeping for now "legacy" syscall-driven benchmarks, we can capture all triggering benchmarks in one place for comparison, before we remove the legacy ones (and rename xxx-batched into just xxx). $ benchs/run_bench_trigger.sh usermode-count : 79.500 ± 0.024M/s kernel-count : 49.949 ± 0.081M/s syscall-count : 9.009 ± 0.007M/s fentry-batch : 31.002 ± 0.015M/s fexit-batch : 20.372 ± 0.028M/s fmodret-batch : 21.651 ± 0.659M/s rawtp-batch : 36.775 ± 0.264M/s tp-batch : 19.411 ± 0.248M/s kprobe-batch : 12.949 ± 0.220M/s kprobe-multi-batch : 15.400 ± 0.007M/s kretprobe-batch : 5.559 ± 0.011M/s kretprobe-multi-batch: 5.861 ± 0.003M/s fentry-legacy : 8.329 ± 0.004M/s fexit-legacy : 6.239 ± 0.003M/s fmodret-legacy : 6.595 ± 0.001M/s rawtp-legacy : 8.305 ± 0.004M/s tp-legacy : 6.382 ± 0.001M/s kprobe-legacy : 5.528 ± 0.003M/s kprobe-multi-legacy : 5.864 ± 0.022M/s kretprobe-legacy : 3.081 ± 0.001M/s kretprobe-multi-legacy: 3.193 ± 0.001M/s Note how xxx-batch variants are measured with significantly higher throughput, even though it's exactly the same in-kernel overhead. As such, results can be compared only between benchmarks of the same kind (syscall vs batched): fentry-legacy : 8.329 ± 0.004M/s fentry-batch : 31.002 ± 0.015M/s kprobe-multi-legacy : 5.864 ± 0.022M/s kprobe-multi-batch : 15.400 ± 0.007M/s Note also that syscall-count is setting a theoretical limit for syscall-triggered benchmarks, while kernel-count is setting similar limits for batch variants. usermode-count is a happy and unachievable case of user space counting without doing any syscalls, and is mostly the measure of CPU speed for such a trivial benchmark. As was mentioned, tp/raw_tp/fmodret require kernel-side kfunc to produce similar benchmark, which we address in a separate patch. Note that run_bench_trigger.sh allows to override a list of benchmarks to run, which is very useful for performance work. Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: rename and clean up userspace-triggered benchmarks	Andrii Nakryiko
	Rename uprobe-base to more precise usermode-count (it will match other baseline-like benchmarks, kernel-count and syscall-count). Also use BENCH_TRIG_USERMODE() macro to define all usermode-based triggering benchmarks, which include usermode-count and uprobe/uretprobe benchmarks. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240326162151.3981687-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	bpf: improve error message for unsupported helper	Mykyta Yatsenko
	BPF verifier emits "unknown func" message when given BPF program type does not support BPF helper. This message may be confusing for users, as important context that helper is unknown only to current program type is not provided. This patch changes message to "program of this type cannot use helper " and aligns dependent code in libbpf and tests. Any suggestions on improving/changing this message are welcome. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20240325152210.377548-1-yatsenko@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests/bpf: Add BPF_FIB_LOOKUP_MARK tests	Anton Protopopov
	This patch extends the fib_lookup test suite by adding a few test cases for each IP family to test the new BPF_FIB_LOOKUP_MARK flag to the bpf_fib_lookup: * Test destination IP address selection with and without a mark and/or the BPF_FIB_LOOKUP_MARK flag set Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-3-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28	selftests: forwarding: Add a test for testing lib.sh functionality	Petr Machata
	Rerunning various scenarios to make sure lib.sh changes do not impact the observable behavior is no fun. Add a selftest at least for the bare basics -- the mechanics of setting RET, retmsg, and EXIT_STATUS. Since the selftest itself uses lib.sh, it would be possible to break lib.sh in such a way that invalidates result of the selftest. Since the metatest only uses the bare basics (just pass/fail), hopefully such fundamental breakages would be noticed. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/6d25cedbf2d4b83614944809a34fe023fbe8db38.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth	Petr Machata
	When the NH group stats tests are currently run on a veth topology, the HW-stats leg of each test is SKIP'ped. But kernel networking CI interprets skips as a sign that tooling is missing, and prompts maintainer investigation. Lack of capability to pass a test should be expressed as XFAIL. Selftests that require HW should normally be put in drivers/net/hw, but doing so for the NH counter selftests would just lead to a lot of duplicity. So instead, introduce a helper, xfail_on_veth(), which can be used to mark selftests that should XFAIL instead of FAILing when run on a veth topology. On non-veth topology, they don't do anything. Use the helper in the HW-stats part of router_mpath_nh_lib selftest. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/15f0ab9637aa0497f164ec30e83c1c8f53d53719.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Mark performance-sensitive tests	Petr Machata
	When run on a slow machine, the scheduler traffic tests can be expected to fail, and should be reported as XFAIL in that case. Therefore run these tests through the perf_sensitive wrapper. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/9a357f8cf34f5ececac08d43a3eb023008996035.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Support for performance sensitive tests	Petr Machata
	Several tests in the suite use large amounts of traffic to e.g. cause congestion and evaluate RED or shaper performance. These tests will not run well on a slow machine, be it one with heavy debug kernel, or a VM, or e.g. a single-board computer. Allow users to specify an environment variable, KSFT_MACHINE_SLOW=yes, to indicate that the tests are being run on one such machine. Performance sensitive tests can then use a new helper, xfail_on_slow(), to mark parts of the test that are sensitive to low-performance machines. The helper can be used to just mark the whole suite, like so: xfail_on_slow tests_run ... or, on the other side of the granularity spectrum, to override individual checks: xfail_on_slow check_err $? "Expected much, got little." Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/99a376a2d2ffdaeee7752b1910cb0c3ea5d80fbe.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Convert log_test() to recognize RET values	Petr Machata
	In a previous patch, the interpretation of RET value was changed to mean the kselftest framework constant with the test outcome: $ksft_pass, $ksft_xfail, etc. Update log_test() to recognize the various possible RET values. Then have EXIT_STATUS track the RET value of the current test. This differs subtly from the way RET tracks the value: while for RET we want to recognize XFAIL as a separate status, for purposes of exit code, we want to to conflate XFAIL and PASS, because they both communicate non-failure. Thus add a new helper, ksft_exit_status_merge(). With this log_test_skip() and log_test_xfail() can be reexpressed as thin wrappers around log_test. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/e5f807cb5476ab795fd14ac74da53a731a9fc432.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Have RET track kselftest framework constants	Petr Machata
	The variable RET keeps track of whether the test under execution has so far failed or not. Currently it works in binary fashion: zero means everything is fine, non-zero means something failed. log_test() then uses the value to given a human-readable message. In order to allow log_test() to report skips and xfails, the semantics of RET need to be more fine-grained. Therefore have RET value be one of kselftest framework constants: $ksft_fail, $ksft_xfail, etc. The current logic in check_err() is such that first non-zero value of RET trumps all those that follow. But that is not right when RET has more fine-grained value semantics. Different outcomes have different weights. The results of PASS and XFAIL are mostly the same: they both communicate a test that did not go wrong. SKIP communicates lack of tooling, which the user should go and try to fix, and as such should not be overridden by the passes. So far, the higher-numbered statuses can be considered weightier. But FAIL should be the weightiest. Add a helper, ksft_status_merge(), which merges two statuses in a way that respects the above conditions. Express it in a generic manner, because exit status merge is subtly different, and we want to reuse the same logic. Use the new helper when setting RET in check_err(). Re-express check_fail() in terms of check_err() to avoid duplication. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/7dfff51cc925c7a3ac879b9050a0d6a327c8d21f.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: lib: Define more kselftest exit codes	Petr Machata
	The following patches will operate with more exit codes besides ksft_skip. Add them here. Additionally, move a duplicated skip exit code definition from forwarding/tc_tunnel_key.sh. Keep a similar duplicate in forwarding/devlink_lib.sh, because even though lib.sh will have been sourced in all cases where devlink_lib is, the inclusion is not visible in the file itself, and relying on it would be confusing. Cc: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/545a03046c7aca0628a51a389a9b81949ab288ce.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Change inappropriate log_test_skip() calls	Petr Machata
	The SKIP return should be used for cases where tooling of the machine under test is lacking. For cases where HW is lacking, the appropriate outcome is XFAIL. This is the case with ethtool_rmon and mlxsw_lib. For these, introduce a new helper, log_test_xfail(). Do the same for router_mpath_nh_lib. Note that it will be fixed using a more reusable way in a following patch. For the two resource_scale selftests, the log should simply not be written, because there is no problem. Cc: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/3d668d8fb6fa0d9eeb47ce6d9e54114348c7c179.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Ditch skip_on_veth()	Petr Machata
	Since the selftests that are not supposed to run on veth pairs are now in their own dedicated directory, the skip_on_veth logic can go away. Drop it from the selftests, and from lib.sh. Cc: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/63b470e10d65270571ee7de709b31672ce314872.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: Move several selftests	Petr Machata
	The tests in net/forwarding are generally expected to be HW-independent. There are however several tests that, while not depending on any HW in particular, nevertheless depend on being used on HW interfaces. Placing these selftests to net/forwarding is confusing, because the selftest will just report it can't be run on veth pairs. At the same time, placing them to a particular driver's selftests subdirectory would be wrong. Instead, add a new directory, drivers/net/hw, where these generic but HW independent selftests should be placed. Move over several such tests including one helper library. Since typically these tests will not be expected to run, omit the directory drivers/net/hw from the TARGETS list in selftests/Makefile. Retain a Makefile in the new directory itself, so that a user can make -C into that directory and act on those tests explicitly. Cc: Roger Quadros <rogerq@kernel.org> Cc: Tobias Waldekranz <tobias@waldekranz.com> Cc: Danielle Ratson <danieller@nvidia.com> Cc: Davide Caratti <dcaratti@redhat.com> Cc: Johannes Nixdorf <jnixdorf-oss@avm.de> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/e11dae1f62703059e9fc2240004288ac7cc15756.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: ipip_lib: Do not import lib.sh	Petr Machata
	This library is always sourced in the context where lib.sh has already been sourced as well. Therefore drop the explicit sourcing and expect the client to already have done it. This will simplify moving some of the clients to a different directory. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/a4da5e9cd42a34cbace917a048ca71081719d6ac.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding: README: Document customization	Petr Machata
	That any sort of customization is possible at all, let alone how it should be done, is currently not at all clear. Document the whats and hows in README. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/e819623af6aaeea49e9dc36cecd95694fad73bb8.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: forwarding.config.sample: Move overrides to lib.sh	Petr Machata
	forwarding.config.sample, net/lib.sh and net/forwarding/lib.sh contain definitions and redefinitions of some of the same variables. The overlap between net/forwarding/lib.sh and forwarding.config.sample is especially large. This duplication is a potential source of confusion and problems. It would be overall less error prone if each variable were defined in one place only. In this patch set, that place is the library itself. Therefore move all comments from forwarding.config.sample to net/forwarding/lib.sh. Move over also a definition of TC_FLAG, which was missing from lib.sh entirely. Additionally, add to lib.sh a default definition of the topology variables. The logic behind this is that forgetting to specify forwarding.config was a frequent source of frustration for the selftest users. But really, most of the time the default veth based topology is just fine. We considered just sourcing forwarding.config.sample instead if forwarding.config is not available, but this is a cleaner solution. That means the syntax of the forwarding.config.sample override has to change to an array assignment, so that the whole variable is overwritten, not just individual keys, which could leave the value of some keys unchanged. Do the same in lib.sh for any cut'n'pasters out there. The config file is then given a sort of carte blanche to redefine whatever variables it sees fit from the libraries. This is described in a comment in the file. Only a handful of variables are left behind, to illustrate the customization. The fact that the variables are now missing from forwarding.config.sample, and therefore would miss from forwarding.config derived from that file as well, should not change anything. This is just the sample file. Users that keep their own forwarding.config would retain it as before. The only observable change is introduction of TC_FLAG to lib.sh, because now the filters would not be attempted to install to HW datapath. For veth pairs this does not change anything. For HW deployments, users presumably have forwarding.config with this value overridden. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/b9b8a11a22821a7aa532211ff461a34f596e26bf.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28	selftests: net: libs: Change variable fallback syntax	Petr Machata
	The current syntax of X=${X:=X} first evaluates the ${X:=Y} expression, which either uses the existing value of $X if there is one, or uses the value of "Y" as a fallback, and assigns it to X. The expression is then replaced with the now-current value of $X. Assigning that value to X once more is meaningless. So avoid the outer X=... bit, and instead express the same idea though the do-nothing ":" built-in as : "${X:=Y}". This also cleans up the block nicely and makes it more readable. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/1890ddc58420c2c0d5ba3154c87ecc6d9faf6947.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>