summaryrefslogtreecommitdiff
path: root/tools/testing/selftests
AgeCommit message (Collapse)Author
2023-03-13selftests/bpf: use canonical ftrace pathRoss Zwisler
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many tests in the bpf selftest code still refer to this older debugfs path, so let's update them to avoid confusion. Signed-off-by: Ross Zwisler <zwisler@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230313205628.1058720-3-zwisler@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: Add local kptr stashing testDave Marchevsky
Add a new selftest, local_kptr_stash, which uses bpf_kptr_xchg to stash a bpf_obj_new-allocated object in a map. Test the following scenarios: * Stash two rb_nodes in an arraymap, don't unstash them, rely on map free to destruct them * Stash two rb_nodes in an arraymap, unstash the second one in a separate program, rely on map free to destruct first Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230310230743.2320707-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: Add local-storage-create benchmarkMartin KaFai Lau
This patch tests how many kmallocs is needed to create and free a batch of UDP sockets and each socket has a 64bytes bpf storage. It also measures how fast the UDP sockets can be created. The result is from my qemu setup. Before bpf_mem_cache_alloc/free: ./bench -p 1 local-storage-create Setting up benchmark 'local-storage-create'... Benchmark 'local-storage-create' started. Iter 0 ( 73.193us): creates 213.552k/s (213.552k/prod), 3.09 kmallocs/create Iter 1 (-20.724us): creates 211.908k/s (211.908k/prod), 3.09 kmallocs/create Iter 2 ( 9.280us): creates 212.574k/s (212.574k/prod), 3.12 kmallocs/create Iter 3 ( 11.039us): creates 213.209k/s (213.209k/prod), 3.12 kmallocs/create Iter 4 (-11.411us): creates 213.351k/s (213.351k/prod), 3.12 kmallocs/create Iter 5 ( -7.915us): creates 214.754k/s (214.754k/prod), 3.12 kmallocs/create Iter 6 ( 11.317us): creates 210.942k/s (210.942k/prod), 3.12 kmallocs/create Summary: creates 212.789 ± 1.310k/s (212.789k/prod), 3.12 kmallocs/create After bpf_mem_cache_alloc/free: ./bench -p 1 local-storage-create Setting up benchmark 'local-storage-create'... Benchmark 'local-storage-create' started. Iter 0 ( 68.265us): creates 243.984k/s (243.984k/prod), 1.04 kmallocs/create Iter 1 ( 30.357us): creates 238.424k/s (238.424k/prod), 1.04 kmallocs/create Iter 2 (-18.712us): creates 232.963k/s (232.963k/prod), 1.04 kmallocs/create Iter 3 (-15.885us): creates 238.879k/s (238.879k/prod), 1.04 kmallocs/create Iter 4 ( 5.590us): creates 237.490k/s (237.490k/prod), 1.04 kmallocs/create Iter 5 ( 8.577us): creates 237.521k/s (237.521k/prod), 1.04 kmallocs/create Iter 6 ( -6.263us): creates 238.508k/s (238.508k/prod), 1.04 kmallocs/create Summary: creates 237.298 ± 2.198k/s (237.298k/prod), 1.04 kmallocs/create Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-18-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: Check freeing sk->sk_local_storage with ↵Martin KaFai Lau
sk_local_storage->smap is NULL This patch tweats the socket_bind bpf prog to test the local_storage->smap == NULL case in the bpf_local_storage_free() code path. The idea is to create the local_storage with the sk_storage_map's selem first. Then add the sk_storage_map2's selem and then delete the earlier sk_storeage_map's selem. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-17-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: Replace CHECK with ASSERT in test_local_storageMartin KaFai Lau
This patch migrates the CHECK macro to ASSERT macro. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-16-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10bpf/selftests: Fix send_signal tracepoint testsDavid Vernet
The send_signal tracepoint tests are non-deterministically failing in CI. The test works as follows: 1. Two pairs of file descriptors are created using the pipe() function. One pair is used to communicate between a parent process -> child process, and the other for the reverse direction. 2. A child is fork()'ed. The child process registers a signal handler, notifies its parent that the signal handler is registered, and then and waits for its parent to have enabled a BPF program that sends a signal. 3. The parent opens and loads a BPF skeleton with programs that send signals to the child process. The different programs are triggered by different perf events (either NMI or normal perf), or by regular tracepoints. The signal is delivered to the child whenever the child triggers the program. 4. The child's signal handler is invoked, which sets a flag saying that the signal handler was reached. The child then signals to the parent that it received the signal, and the test ends. The perf testcases (send_signal_perf{_thread} and send_signal_nmi{_thread}) work 100% of the time, but the tracepoint testcases fail non-deterministically because the tracepoint is not always being fired for the child. There are two tracepoint programs registered in the test: 'tracepoint/sched/sched_switch', and 'tracepoint/syscalls/sys_enter_nanosleep'. The child never intentionally blocks, nor sleeps, so neither tracepoint is guaranteed to be triggered. To fix this, we can have the child trigger the nanosleep program with a usleep(). Before this patch, the test would fail locally every 2-3 runs. Now, it doesn't fail after more than 1000 runs. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230310061909.1420887-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: make BPF compiler flags stricterAndrii Nakryiko
We recently added -Wuninitialized, but it's not enough to catch various silly mistakes or omissions. Let's go all the way to -Wall, just like we do for user-space code. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: fix lots of silly mistakes pointed out by compilerAndrii Nakryiko
Once we enable -Wall for BPF sources, compiler will complain about lots of unused variables, variables that are set but never read, etc. Fix all these issues first before enabling -Wall in Makefile. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: add __sink() macro to fake variable consumptionAndrii Nakryiko
Add __sink(expr) macro that forces compiler to believe that passed in expression is both read and written. It used a simple embedded asm for this. This is useful in a lot of tests where we assign value to some variable to trigger some action, but later don't read variable, causing compiler to complain (if corresponding compiler warnings are turned on, which we'll do in the next patch). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10selftests/bpf: prevent unused variable warning in bpf_for()Andrii Nakryiko
Add __attribute__((unused)) to inner __p variable inside bpf_for(), bpf_for_each(), and bpf_repeat() macros to avoid compiler warnings about unused variable. Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-09selftests/bpf: Workaround verification failure for ↵Yonghong Song
fexit_bpf2bpf/func_replace_return_code With latest llvm17, selftest fexit_bpf2bpf/func_replace_return_code has the following verification failure: 0: R1=ctx(off=0,imm=0) R10=fp0 ; int connect_v4_prog(struct bpf_sock_addr *ctx) 0: (bf) r7 = r1 ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0) 1: (b4) w6 = 0 ; R6_w=0 ; memset(&tuple.ipv4.saddr, 0, sizeof(tuple.ipv4.saddr)); ... ; return do_bind(ctx) ? 1 : 0; 179: (bf) r1 = r7 ; R1=ctx(off=0,imm=0) R7=ctx(off=0,imm=0) 180: (85) call pc+147 Func#3 is global and valid. Skipping. 181: R0_w=scalar() 181: (bc) w6 = w0 ; R0_w=scalar() R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 182: (05) goto pc-129 ; } 54: (bc) w0 = w6 ; R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 55: (95) exit At program exit the register R0 has value (0x0; 0xffffffff) should have been in (0x0; 0x1) processed 281 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 13 -- END PROG LOAD LOG -- libbpf: prog 'connect_v4_prog': failed to load: -22 The corresponding source code: __attribute__ ((noinline)) int do_bind(struct bpf_sock_addr *ctx) { struct sockaddr_in sa = {}; sa.sin_family = AF_INET; sa.sin_port = bpf_htons(0); sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4); if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0) return 0; return 1; } ... SEC("cgroup/connect4") int connect_v4_prog(struct bpf_sock_addr *ctx) { ... return do_bind(ctx) ? 1 : 0; } Insn 180 is a call to 'do_bind'. The call's return value is also the return value for the program. Since do_bind() returns 0/1, so it is legitimate for compiler to optimize 'return do_bind(ctx) ? 1 : 0' to 'return do_bind(ctx)'. However, such optimization breaks verifier as the return value of 'do_bind()' is marked as any scalar which violates the requirement of prog return value 0/1. There are two ways to fix this problem, (1) changing 'return 1' in do_bind() to e.g. 'return 10' so the compiler has to do 'do_bind(ctx) ? 1 :0', or (2) suggested by Andrii, marking do_bind() with __weak attribute so the compiler cannot make any assumption on do_bind() return value. This patch adopted adding __weak approach which is simpler and more resistant to potential compiler optimizations. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230310012410.2920570-1-yhs@fb.com
2023-03-09selftests/bpf: Improve error logs in XDP compliance test toolLorenzo Bianconi
Improve some error logs reported in the XDP compliance test tool. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/212fc5bd214ff706f6ef1acbe7272cf4d803ca9c.1678382940.git.lorenzo@kernel.org
2023-03-09selftests/bpf: Use ifname instead of ifindex in XDP compliance test toolLorenzo Bianconi
Rely on interface name instead of interface index in error messages or logs from XDP compliance test tool. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/7dc5a8ff56c252b1a7ae29b059d0b2b1543c8b5d.1678382940.git.lorenzo@kernel.org
2023-03-09selftests/bpf: Fix flaky fib_lookup testMartin KaFai Lau
There is a report that fib_lookup test is flaky when running in parallel. A symptom of slowness or delay. An example: Testing IPv6 stale neigh set_lookup_params:PASS:inet_pton(IPV6_IFACE_ADDR) 0 nsec test_fib_lookup:PASS:bpf_prog_test_run_opts 0 nsec test_fib_lookup:FAIL:fib_lookup_ret unexpected fib_lookup_ret: actual 0 != expected 7 test_fib_lookup:FAIL:dmac not match unexpected dmac not match: actual 1 != expected 0 dmac expected 11:11:11:11:11:11 actual 00:00:00:00:00:00 [ Note that the "fib_lookup_ret unexpected fib_lookup_ret actual 0 ..." is reversed in terms of expected and actual value. Fixing in this patch also. ] One possibility is the testing stale neigh entry was marked dead by the gc (in neigh_periodic_work). The default gc_stale_time sysctl is 60s. This patch increases it to 15 mins. It also: - fixes the reversed arg (actual vs expected) in one of the ASSERT_EQ test - removes the nodad command arg when adding v4 neigh entry which currently has a warning. Fixes: 168de0233586 ("selftests/bpf: Add bpf_fib_lookup test") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230309060244.3242491-1-martin.lau@linux.dev
2023-03-08selftests/bpf: implement and test custom testmod_seq iteratorAndrii Nakryiko
Implement a trivial iterator returning same specified integer value N times as part of bpf_testmod kernel module. Add selftests to validate everything works end to end. We also reuse these tests as "verification-only" tests to validate that kernel prints the state of custom kernel module-defined iterator correctly: fp-16=iter_testmod_seq(ref_id=1,state=drained,depth=0) "testmod_seq" part is an iterator type, and is coming from module's BTF data dynamically at runtime. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08selftests/bpf: add number iterator testsAndrii Nakryiko
Add number iterator (bpf_iter_num_{new,next,destroy}()) tests, validating the correct handling of various corner and common cases *at runtime*. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08selftests/bpf: add iterators testsAndrii Nakryiko
Add various tests for open-coded iterators. Some of them excercise various possible coding patterns in C, some go down to low-level assembly for more control over various conditions, especially invalid ones. We also make use of bpf_for(), bpf_for_each(), bpf_repeat() macros in some of these tests. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macrosAndrii Nakryiko
Add bpf_for_each(), bpf_for(), and bpf_repeat() macros that make writing open-coded iterator-based loops much more convenient and natural. These macros utilize cleanup attribute to ensure proper destruction of the iterator and thanks to that manage to provide the ergonomics that is very close to C language's for() construct. Typical loop would look like: int i; int arr[N]; bpf_for(i, 0, N) { /* verifier will know that i >= 0 && i < N, so could be used to * directly access array elements with no extra checks */ arr[i] = i; } bpf_repeat() is very similar, but it doesn't expose iteration number and is meant as a simple "repeat action N times" loop: bpf_repeat(N) { /* whatever, N times */ } Note that `break` and `continue` statements inside the {} block work as expected. bpf_for_each() is a generalization over any kind of BPF open-coded iterator allowing to use for-each-like approach instead of calling low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.: struct cgroup *cg; bpf_for_each(cgroup, cg, some, input, args) { /* do something with each cg */ } would call (not-yet-implemented) bpf_iter_cgroup_{new,next,destroy}() functions to form a loop over cgroups, where `some, input, args` are passed verbatim into constructor as bpf_iter_cgroup_new(&it, some, input, args). As a first demonstration, add pyperf variant based on the bpf_for() loop. Also clean up a few tests that either included bpf_misc.h header unnecessarily from the user-space, which is unsupported, or included it before any common types are defined (and thus leading to unnecessary compilation warnings, potentially). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08selftests/bpf: Fix IMA testRoberto Sassu
Commit 62622dab0a28 ("ima: return IMA digest value only when IMA_COLLECTED flag is set") caused bpf_ima_inode_hash() to refuse to give non-fresh digests. IMA test #3 assumed the old behavior, that bpf_ima_inode_hash() still returned also non-fresh digests. Correct the test by accepting both cases. If the samples returned are 1, assume that the commit above is applied and that the returned digest is fresh. If the samples returned are 2, assume that the commit above is not applied, and check both the non-fresh and fresh digest. Fixes: 62622dab0a28 ("ima: return IMA digest value only when IMA_COLLECTED flag is set") Reported-by: David Vernet <void@manifault.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/bpf/20230308103713.1681200-1-roberto.sassu@huaweicloud.com
2023-03-06Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06selftests/bpf: Add test for legacy/perf kprobe/uprobe attach modeMenglong Dong
Add the testing for kprobe/uprobe attaching in default, legacy, perf and link mode. And the testing passed: ./test_progs -t attach_probe $5/1 attach_probe/manual-default:OK $5/2 attach_probe/manual-legacy:OK $5/3 attach_probe/manual-perf:OK $5/4 attach_probe/manual-link:OK $5/5 attach_probe/auto:OK $5/6 attach_probe/kprobe-sleepable:OK $5/7 attach_probe/uprobe-lib:OK $5/8 attach_probe/uprobe-sleepable:OK $5/9 attach_probe/uprobe-ref_ctr:OK $5 attach_probe:OK Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Biao Jiang <benbjiang@tencent.com> Link: https://lore.kernel.org/bpf/20230306064833.7932-4-imagedong@tencent.com
2023-03-06selftests/bpf: Split test_attach_probe into multi subtestsMenglong Dong
In order to adapt to the older kernel, now we split the "attach_probe" testing into multi subtests: manual // manual attach tests for kprobe/uprobe auto // auto-attach tests for kprobe and uprobe kprobe-sleepable // kprobe sleepable test uprobe-lib // uprobe tests for library function by name uprobe-sleepable // uprobe sleepable test uprobe-ref_ctr // uprobe ref_ctr test As sleepable kprobe needs to set BPF_F_SLEEPABLE flag before loading, we need to move it to a stand alone skel file, in case of it is not supported by kernel and make the whole loading fail. Therefore, we can only enable part of the subtests for older kernel. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Biao Jiang <benbjiang@tencent.com> Link: https://lore.kernel.org/bpf/20230306064833.7932-3-imagedong@tencent.com
2023-03-04selftests/bpf: adjust log_fixup's buffer size for proper truncationAndrii Nakryiko
Adjust log_fixup's expected buffer length to fix the test. It's pretty finicky in its length expectation, but it doesn't break often. So just adjust the length to work on current kernel and with follow up iterator changes as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04selftests/bpf: enhance align selftest's expected log matchingAndrii Nakryiko
Allow to search for expected register state in all the verifier log output that's related to specified instruction number. See added comment for an example of possible situation that is happening due to a simple enhancement done in the next patch, which fixes handling of env->test_state_freq flag in state checkpointing logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()Eduard Zingerman
Function verifier.c:convert_ctx_access() applies some rewrites to BPF instructions that read or write BPF program context. This commit adds machinery to allow test cases that inspect BPF program after these rewrites are applied. An example of a test case: { // Shorthand for field offset and size specification N(CGROUP_SOCKOPT, struct bpf_sockopt, retval), // Pattern generated for field read .read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::current_task);" "$dst = *(u64 *)($dst + task_struct::bpf_ctx);" "$dst = *(u32 *)($dst + bpf_cg_run_ctx::retval);", // Pattern generated for field write .write = "*(u64 *)($ctx + bpf_sockopt_kern::tmp_reg) = r9;" "r9 = *(u64 *)($ctx + bpf_sockopt_kern::current_task);" "r9 = *(u64 *)(r9 + task_struct::bpf_ctx);" "*(u32 *)(r9 + bpf_cg_run_ctx::retval) = $src;" "r9 = *(u64 *)($ctx + bpf_sockopt_kern::tmp_reg);" , }, For each test case, up to three programs are created: - One that uses BPF_LDX_MEM to read the context field. - One that uses BPF_STX_MEM to write to the context field. - One that uses BPF_ST_MEM to write to the context field. The disassembly of each program is compared with the pattern specified in the test case. Kernel code for disassembly is reused (as is in the bpftool). To keep Makefile changes to the minimum, symbolic links to `kernel/bpf/disasm.c` and `kernel/bpf/disasm.h ` are added. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03selftests/bpf: test if pointer type is tracked for BPF_ST_MEMEduard Zingerman
Check that verifier tracks pointer types for BPF_ST_MEM instructions and reports error if pointer types do not match for different execution branches. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03bpf: allow ctx writes using BPF_ST_MEM instructionEduard Zingerman
Lift verifier restriction to use BPF_ST_MEM instructions to write to context data structures. This requires the following changes: - verifier.c:do_check() for BPF_ST updated to: - no longer forbid writes to registers of type PTR_TO_CTX; - track dst_reg type in the env->insn_aux_data[...].ptr_type field (same way it is done for BPF_STX and BPF_LDX instructions). - verifier.c:convert_ctx_access() and various callbacks invoked by it are updated to handled BPF_ST instruction alongside BPF_STX. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03bpf: Refactor RCU enforcement in the verifier.Alexei Starovoitov
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack of such key mechanism makes it impossible for sleepable bpf programs to use RCU pointers. Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't support btf_type_tag yet) and allowlist certain field dereferences in important data structures like tast_struct, cgroup, socket that are used by sleepable programs either as RCU pointer or full trusted pointer (which is valid outside of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such tagging. They will be removed once GCC supports btf_type_tag. With that refactor check_ptr_to_btf_access(). Make it strict in enforcing PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without modifier flags. There is a chance that this strict enforcement might break existing programs (especially on GCC compiled kernels), but this cleanup has to start sooner than later. Note PTR_TO_CTX access still yields old deprecated PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0. Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
2023-03-03selftests/bpf: Tweak cgroup kfunc test.Alexei Starovoitov
Adjust cgroup kfunc test to dereference RCU protected cgroup pointer as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-6-alexei.starovoitov@gmail.com
2023-03-03selftests/bpf: Add a test case for kptr_rcu.Alexei Starovoitov
Tweak existing map_kptr test to check kptr_rcu. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-5-alexei.starovoitov@gmail.com
2023-03-03bpf: Introduce kptr_rcu.Alexei Starovoitov
The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED. For example: struct map_value { struct cgroup __kptr *cgrp; }; SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2; cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg); cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
2023-03-03bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.Alexei Starovoitov
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps. The concept felt useful, but didn't get much traction, since bpf_rdonly_cast() was added soon after and bpf programs received a simpler way to access PTR_UNTRUSTED kernel pointers without going through restrictive __kptr usage. Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate its intended usage. The main goal of __kptr_untrusted was to read/write such pointers directly while bpf_kptr_xchg was a mechanism to access refcnted kernel pointers. The next patch will allow RCU protected __kptr access with direct read. At that point __kptr_untrusted will be deprecated. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
2023-03-02selftests/bpf: Add absolute timer testTero Kristo
Add test for the absolute BPF timer under the existing timer tests. This will run the timer two times with 1us expiration time, and then re-arm the timer at ~35s in the future. At the end, it is verified that the absolute timer expired exactly two times. Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20230302114614.2985072-3-tero.kristo@linux.intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02selftests/bpf: Add -Wuninitialized flag to bpf prog flagsDave Marchevsky
Per C99 standard [0], Section 6.7.8, Paragraph 10: If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. And in the same document, in appendix "J.2 Undefined behavior": The behavior is undefined in the following circumstances: [...] The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8). This means that use of an uninitialized stack variable is undefined behavior, and therefore that clang can choose to do a variety of scary things, such as not generating bytecode for "bunch of useful code" in the below example: void some_func() { int i; if (!i) return; // bunch of useful code } To add insult to injury, if some_func above is a helper function for some BPF program, clang can choose to not generate an "exit" insn, causing verifier to fail with "last insn is not an exit or jmp". Going from that verification failure to the root cause of uninitialized use is certain to be frustrating. This patch adds -Wuninitialized to the cflags for selftest BPF progs and fixes up existing instances of uninitialized use. [0]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Cc: David Vernet <void@manifault.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230303005500.1614874-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01selftests/bpf: Support custom per-test flags and multiple expected messagesAndrii Nakryiko
Extend __flag attribute by allowing to specify one of the following: * BPF_F_STRICT_ALIGNMENT * BPF_F_ANY_ALIGNMENT * BPF_F_TEST_RND_HI32 * BPF_F_TEST_STATE_FREQ * BPF_F_SLEEPABLE * BPF_F_XDP_HAS_FRAGS * Some numeric value Extend __msg attribute by allowing to specify multiple exepcted messages. All messages are expected to be present in the verifier log in the order of application. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230301175417.3146070-2-eddyz87@gmail.com [ Eduard: added commit message, formatting, comments ]
2023-03-01selftests/bpf: Set __BITS_PER_LONG if target is bpf for LoongArchTiezhu Yang
If target is bpf, there is no __loongarch__ definition, __BITS_PER_LONG defaults to 32, __NR_nanosleep is not defined: #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32 #define __NR_nanosleep 101 __SC_3264(__NR_nanosleep, sys_nanosleep_time32, sys_nanosleep) #endif Work around this problem, by explicitly setting __BITS_PER_LONG to __loongarch_grlen which is defined by compiler as 64 for LA64. This is similar with commit 36e70b9b06bf ("selftests, bpf: Fix broken riscv build"). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1677585781-21628-1-git-send-email-yangtiezhu@loongson.cn
2023-03-01selftests/bpf: Add more tests for kptrs in mapsKumar Kartikeya Dwivedi
Firstly, ensure programs successfully load when using all of the supported maps. Then, extend existing tests to test more cases at runtime. We are currently testing both the synchronous freeing of items and asynchronous destruction when map is freed, but the code needs to be adjusted a bit to be able to also accomodate support for percpu maps. We now do a delete on the item (and update for array maps which has a similar effect for kptrs) to perform a synchronous free of the kptr, and test destruction both for the synchronous and asynchronous deletion. Next time the program runs, it should observe the refcount as 1 since all existing references should have been released by then. By running the program after both possible paths freeing kptrs, we establish that they correctly release resources. Next, we augment the existing test to also test the same code path shared by all local storage maps using a task local storage map. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230225154010.391965-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01selftests/bpf: tests for using dynptrs to parse skb and xdp buffersJoanne Koong
Test skb and xdp dynptr functionality in the following ways: 1) progs/test_cls_redirect_dynptr.c * Rewrite "progs/test_cls_redirect.c" test to use dynptrs to parse skb data * This is a great example of how dynptrs can be used to simplify a lot of the parsing logic for non-statically known values. When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t cls_redirect"): original version: 0.092 sec with dynptrs: 0.078 sec 2) progs/test_xdp_dynptr.c * Rewrite "progs/test_xdp.c" test to use dynptrs to parse xdp data When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t xdp_attach"): original version: 0.118 sec with dynptrs: 0.094 sec 3) progs/test_l4lb_noinline_dynptr.c * Rewrite "progs/test_l4lb_noinline.c" test to use dynptrs to parse skb data When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t l4lb_all"): original version: 0.062 sec with dynptrs: 0.081 sec For number of processed verifier instructions: original version: 6268 insns with dynptrs: 2588 insns 4) progs/test_parse_tcp_hdr_opt_dynptr.c * Add sample code for parsing tcp hdr opt lookup using dynptrs. This logic is lifted from a real-world use case of packet parsing in katran [0], a layer 4 load balancer. The original version "progs/test_parse_tcp_hdr_opt.c" (not using dynptrs) is included here as well, for comparison. When measuring the user + system time between the original version vs. using dynptrs, and averaging the time for 10 runs (using "time ./test_progs -t parse_tcp_hdr_opt"): original version: 0.031 sec with dynptrs: 0.045 sec 5) progs/dynptr_success.c * Add test case "test_skb_readonly" for testing attempts at writes on a prog type with read-only skb ctx. * Add "test_dynptr_skb_data" for testing that bpf_dynptr_data isn't supported for skb progs. 6) progs/dynptr_fail.c * Add test cases "skb_invalid_data_slice{1,2,3,4}" and "xdp_invalid_data_slice{1,2}" for testing that helpers that modify the underlying packet buffer automatically invalidate the associated data slice. * Add test cases "skb_invalid_ctx" and "xdp_invalid_ctx" for testing that prog types that do not support bpf_dynptr_from_skb/xdp don't have access to the API. * Add test case "dynptr_slice_var_len{1,2}" for testing that variable-sized len can't be passed in to bpf_dynptr_slice * Add test case "skb_invalid_slice_write" for testing that writes to a read-only data slice are rejected by the verifier. * Add test case "data_slice_out_of_bounds_skb" for testing that writes to an area outside the slice are rejected. * Add test case "invalid_slice_rdwr_rdonly" for testing that prog types that don't allow writes to packet data don't accept any calls to bpf_dynptr_slice_rdwr. [0] https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230301154953.641654-11-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-27Merge tag 'net-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. The notable fixes here are the EEE fix which restores boot for many embedded platforms (real and QEMU); WiFi warning suppression and the ICE Kconfig cleanup. Current release - regressions: - phy: multiple fixes for EEE rework - wifi: wext: warn about usage only once - wifi: ath11k: allow system suspend to survive ath11k Current release - new code bugs: - mlx5: Fix memory leak in IPsec RoCE creation - ibmvnic: assign XPS map to correct queue index Previous releases - regressions: - netfilter: ip6t_rpfilter: Fix regression with VRF interfaces - netfilter: ctnetlink: make event listener tracking global - nf_tables: allow to fetch set elements when table has an owner - mlx5: - fix skb leak while fifo resync and push - fix possible ptp queue fifo use-after-free Previous releases - always broken: - sched: fix action bind logic - ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also uses a mutex - netfilter: conntrack: fix rmmod double-free race - netfilter: xt_length: use skb len to match in length_mt6, avoid issues with BIG TCP Misc: - ice: remove unnecessary CONFIG_ICE_GNSS - mlx5e: remove hairpin write debugfs files - sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy" * tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) tcp: tcp_check_req() can be called from process context net: phy: c45: fix network interface initialization failures on xtensa, arm:cubieboard xen-netback: remove unused variables pending_idx and index net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy net: dsa: ocelot_ext: remove unnecessary phylink.h include net: mscc: ocelot: fix duplicate driver name error net: dsa: felix: fix internal MDIO controller resource length net: dsa: seville: ignore mscc-miim read errors from Lynx PCS net/sched: act_sample: fix action bind logic net/sched: act_mpls: fix action bind logic net/sched: act_pedit: fix action bind logic wifi: wext: warn about usage only once wifi: mt76: usb: fix use-after-free in mt76u_free_rx_queue qede: avoid uninitialized entries in coal_entry array nfc: fix memory leak of se_io context in nfc_genl_se_io ice: remove unnecessary CONFIG_ICE_GNSS net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy() ibmvnic: Assign XPS map to correct queue index docs: net: fix inaccuracies in msg_zerocopy.rst tools: net: add __pycache__ to gitignore ...
2023-02-27selftests/bpf: Fix compilation errors: Assign a value to a constantRong Tao
Commit bc292ab00f6c("mm: introduce vma->vm_flags wrapper functions") turns the vm_flags into a const variable. Added bpf_find_vma test in commit f108662b27c9("selftests/bpf: Add tests for bpf_find_vma") to assign values to variables that declare const in find_vma_fail1.c programs, which is an error to the compiler and does not test BPF verifiers. It is better to replace 'const vm_flags_t vm_flags' with 'unsigned long vm_start' for testing. $ make -C tools/testing/selftests/bpf/ -j8 ... progs/find_vma_fail1.c:16:16: error: cannot assign to non-static data member 'vm_flags' with const-qualified type 'const vm_flags_t' (aka 'const unsigned long') vma->vm_flags |= 0x55; ~~~~~~~~~~~~~ ^ ../tools/testing/selftests/bpf/tools/include/vmlinux.h:1898:20: note: non-static data member 'vm_flags' declared const here const vm_flags_t vm_flags; ~~~~~~~~~~~`~~~~~~^~~~~~~~ Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/tencent_CB281722B3C1BD504C16CDE586CACC2BE706@qq.com
2023-02-27selftests/bpf: Use __NR_prlimit64 instead of __NR_getrlimit in user_ringbuf testTiezhu Yang
After commit 80d7da1cac62 ("asm-generic: Drop getrlimit and setrlimit syscalls from default list"), new architectures won't need to include getrlimit and setrlimit, they are superseded with prlimit64. In order to maintain compatibility for the new architectures, such as LoongArch which does not define __NR_getrlimit, it is better to use __NR_prlimit64 instead of __NR_getrlimit in user_ringbuf test to fix the following build error: TEST-OBJ [test_progs] user_ringbuf.test.o tools/testing/selftests/bpf/prog_tests/user_ringbuf.c: In function 'kick_kernel_cb': tools/testing/selftests/bpf/prog_tests/user_ringbuf.c:593:17: error: '__NR_getrlimit' undeclared (first use in this function) 593 | syscall(__NR_getrlimit); | ^~~~~~~~~~~~~~ tools/testing/selftests/bpf/prog_tests/user_ringbuf.c:593:17: note: each undeclared identifier is reported only once for each function it appears in make: *** [Makefile:573: tools/testing/selftests/bpf/user_ringbuf.test.o] Error 1 make: Leaving directory 'tools/testing/selftests/bpf' Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1677235015-21717-4-git-send-email-yangtiezhu@loongson.cn
2023-02-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
2023-02-25Merge tag 'powerpc-6.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Support for configuring secure boot with user-defined keys on PowerVM LPARs - Simplify the replay of soft-masked IRQs by making it non-recursive - Add support for KCSAN on 64-bit Book3S - Improvements to the API & code which interacts with RTAS (pseries firmware) - Change 32-bit powermac to assign PCI bus numbers per domain by default - Some improvements to the 32-bit BPF JIT - Various other small features and fixes Thanks to Anders Roxell, Andrew Donnellan, Andrew Jeffery, Benjamin Gray, Christophe Leroy, Frederic Barrat, Ganesh Goudar, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Josh Poimboeuf, Kajol Jain, Laurent Dufour, Mahesh Salgaonkar, Mathieu Desnoyers, Mimi Zohar, Murphy Zhou, Nathan Chancellor, Nathan Lynch, Nayna Jain, Nicholas Piggin, Pali Rohár, Petr Mladek, Rohan McLure, Russell Currey, Sachin Sant, Sathvika Vasireddy, Sourabh Jain, Stefan Berger, Stephen Rothwell, and Sudhakar Kuppusamy. * tag 'powerpc-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (114 commits) powerpc/pseries: Avoid hcall in plpks_is_available() on non-pseries powerpc: dts: turris1x.dts: Set lower priority for CPLD syscon-reboot powerpc/e500: Add missing prototype for 'relocate_init' powerpc/64: Fix unannotated intra-function call warning powerpc/epapr: Don't use wrteei on non booke powerpc: Pass correct CPU reference to assembler powerpc/mm: Rearrange if-else block to avoid clang warning powerpc/nohash: Fix build with llvm-as powerpc/nohash: Fix build error with binutils >= 2.38 powerpc/pseries: Fix endianness issue when parsing PLPKS secvar flags macintosh: windfarm: Use unsigned type for 1-bit bitfields powerpc/kexec_file: print error string on usable memory property update failure powerpc/machdep: warn when machine_is() used too early powerpc/64: Replace -mcpu=e500mc64 by -mcpu=e5500 powerpc/eeh: Set channel state after notifying the drivers selftests/powerpc: Fix incorrect kernel headers search path powerpc/rtas: arch-wide function token lookup conversions powerpc/rtas: introduce rtas_function_token() API powerpc/pseries/lpar: convert to papr_sysparm API powerpc/pseries/hv-24x7: convert to papr_sysparm API ...
2023-02-24selftests/bpf: run mptcp in a dedicated netnsHangbin Liu
The current mptcp test is run in init netns. If the user or default system config disabled mptcp, the test will fail. Let's run the mptcp test in a dedicated netns to avoid none kernel default mptcp setting. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230224061343.506571-3-liuhangbin@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-24selftests/bpf: move SYS() macro into the test_progs.hHangbin Liu
A lot of tests defined SYS() macro to run system calls with goto label. Let's move this macro to test_progs.h and add configurable "goto_label" as the first arg. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230224061343.506571-2-liuhangbin@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-24Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not add the same hwpt to the ioas->hwpt_list twice iommufd: Make sure to zero vfio_iommu_type1_info before copying to user vfio: Support VFIO_NOIOMMU with iommufd iommufd: Add three missing structures in ucmd_buffer selftests: iommu: Fix test_cmd_destroy_access() call in user_copy iommu: Remove IOMMU_CAP_INTR_REMAP irq/s390: Add arch_is_isolated_msi() for s390 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code iommufd: Convert to msi_device_has_isolated_msi() vfio/type1: Convert to iommu_group_has_isolated_msi() iommu: Add iommu_group_has_isolated_msi() genirq/msi: Add msi_device_has_isolated_msi()
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-23Merge tag 'probes-v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull kprobes updates from Masami Hiramatsu: - Skip negative return code check for snprintf in eprobe - Add recursive call test cases for kprobe unit test - Add 'char' type to probe events to show it as the character instead of value - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols - Fix kselftest to check filter on eprobe event correctly - Add filter on eprobe to the README file in tracefs - Fix optprobes to check whether there is 'under unoptimizing' optprobe when optimizing another kprobe correctly - Fix optprobe to check whether there is 'under unoptimizing' optprobe when fetching the original instruction correctly - Fix optprobe to free 'forcibly unoptimized' optprobe correctly * tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/eprobe: no need to check for negative ret value for snprintf test_kprobes: Add recursed kprobe test case tracing/probe: add a char type to show the character value of traced arguments selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols selftests/ftrace: Fix eprobe syntax test case to check filter support tracing/eprobe: Fix to add filter on eprobe description in README file x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range x86/kprobes: Fix __recover_optprobed_insn check optimizing logic kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
2023-02-23Merge tag 'trace-v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Add function names as a way to filter function addresses - Add sample module to test ftrace ops and dynamic trampolines - Allow stack traces to be passed from beginning event to end event for synthetic events. This will allow seeing the stack trace of when a task is scheduled out and recorded when it gets scheduled back in. - Add trace event helper __get_buf() to use as a temporary buffer when printing out trace event output. - Add kernel command line to create trace instances on boot up. - Add enabling of events to instances created at boot up. - Add trace_array_puts() to write into instances. - Allow boot instances to take a snapshot at the end of boot up. - Allow live patch modules to include trace events - Minor fixes and clean ups * tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits) tracing: Remove unnecessary NULL assignment tracepoint: Allow livepatch module add trace event tracing: Always use canonical ftrace path tracing/histogram: Fix stacktrace histogram Documententation tracing/histogram: Fix stacktrace key tracing/histogram: Fix a few problems with stacktrace variable printing tracing: Add BUILD_BUG() to make sure stacktrace fits in strings tracing/histogram: Don't use strlen to find length of stacktrace variables tracing: Allow boot instances to have snapshot buffers tracing: Add trace_array_puts() to write into instance tracing: Add enabling of events to boot instances tracing: Add creation of instances at boot command line tracing: Fix trace_event_raw_event_synth() if else statement samples: ftrace: Make some global variables static ftrace: sample: avoid open-coded 64-bit division samples: ftrace: Include the nospec-branch.h only for x86 tracing: Acquire buffer from temparary trace sequence tracing/histogram: Wrap remaining shell snippets in code blocks tracing/osnoise: No need for schedule_hrtimeout range bpf/tracing: Use stage6 of tracing to not duplicate macros ...
2023-02-23Merge tag 'linux-kselftest-next-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest update from Shuah Khan: - several patches to fix incorrect kernel headers search path from Mathieu Desnoyers - a few follow-on fixes found during testing the above change - miscellaneous fixes - support for filtering and enumerating tests * tag 'linux-kselftest-next-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (40 commits) selftests/user_events: add a note about user_events.h dependency selftests/mount_setattr: fix to make run_tests failure selftests/mount_setattr: fix redefine struct mount_attr build error selftests/sched: fix warn_unused_result build warns selftests/ptp: Remove clean target from Makefile selftests: use printf instead of echo -ne selftests/ftrace: Fix bash specific "==" operator selftests: tpm2: remove redundant ord() selftests: find echo binary to use -ne options selftests: Fix spelling mistake "allright" -> "all right" selftests: tdx: Use installed kernel headers search path selftests: ptrace: Use installed kernel headers search path selftests: memfd: Use installed kernel headers search path selftests: iommu: Use installed kernel headers search path selftests: x86: Fix incorrect kernel headers search path selftests: vm: Fix incorrect kernel headers search path selftests: user_events: Fix incorrect kernel headers search path selftests: sync: Fix incorrect kernel headers search path selftests: seccomp: Fix incorrect kernel headers search path selftests: sched: Fix incorrect kernel headers search path ...