summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-21selftests/bpf: Add extra link to uprobe_multi testsJiri Olsa
Attaching extra program to same functions system wide for api and link tests. This way we can test the pid filter works properly when there's extra system wide consumer on the same uprobe that will trigger the original uprobe handler. We expect to have the same counts as before. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-29-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi pid filter testsJiri Olsa
Running api and link tests also with pid filter and checking the probe gets executed only for specific pid. Spawning extra process to trigger attached uprobes and checking we get correct counts from executed programs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-28-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi cookie testJiri Olsa
Adding test for cookies setup/retrieval in uprobe_link uprobes and making sure bpf_get_attach_cookie works properly. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-27-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi usdt bench testJiri Olsa
Adding test that attaches 50k usdt probes in usdt_multi binary. After the attach is done we run the binary and make sure we get proper amount of hits. With current uprobes: # perf stat --null ./test_progs -n 254/6 #254/6 uprobe_multi_test/bench_usdt:OK #254 uprobe_multi_test:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Performance counter stats for './test_progs -n 254/6': 1353.659680562 seconds time elapsed With uprobe_multi link: # perf stat --null ./test_progs -n 254/6 #254/6 uprobe_multi_test/bench_usdt:OK #254 uprobe_multi_test:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Performance counter stats for './test_progs -n 254/6': 0.322046364 seconds time elapsed Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-26-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi usdt test codeJiri Olsa
Adding code in uprobe_multi test binary that defines 50k usdts and will serve as attach point for uprobe_multi usdt bench test in following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-25-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi bench testJiri Olsa
Adding test that attaches 50k uprobes in uprobe_multi binary. After the attach is done we run the binary and make sure we get proper amount of hits. The resulting attach/detach times on my setup: test_bench_attach_uprobe:PASS:uprobe_multi__open 0 nsec test_bench_attach_uprobe:PASS:uprobe_multi__attach 0 nsec test_bench_attach_uprobe:PASS:uprobes_count 0 nsec test_bench_attach_uprobe: attached in 0.346s test_bench_attach_uprobe: detached in 0.419s #262/5 uprobe_multi_test/bench_uprobe:OK Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-24-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi test programJiri Olsa
Adding uprobe_multi test program that defines 50k uprobe_multi_func_* functions and will serve as attach point for uprobe_multi bench test in following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-23-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi link testJiri Olsa
Adding uprobe_multi test for bpf_link_create attach function. Testing attachment using the struct bpf_link_create_opts. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-22-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi api testJiri Olsa
Adding uprobe_multi test for bpf_program__attach_uprobe_multi attach function. Testing attachment using glob patterns and via bpf_uprobe_multi_opts paths/syms fields. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-21-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Add uprobe_multi skel testJiri Olsa
Adding uprobe_multi test for skeleton load/attach functions, to test skeleton auto attach for uprobe_multi link. Test that bpf_get_func_ip works properly for uprobe_multi attachment. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-20-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21selftests/bpf: Move get_time_ns to testing_helpers.hJiri Olsa
We'd like to have single copy of get_time_ns used b bench and test_progs, but we can't just include bench.h, because of conflicting 'struct env' objects. Moving get_time_ns to testing_helpers.h which is being included by both bench and test_progs objects. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-19-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add uprobe multi link support to bpf_program__attach_usdtJiri Olsa
Adding support for usdt_manager_attach_usdt to use uprobe_multi link to attach to usdt probes. The uprobe_multi support is detected before the usdt program is loaded and its expected_attach_type is set accordingly. If uprobe_multi support is detected the usdt_manager_attach_usdt gathers uprobes info and calls bpf_program__attach_uprobe to create all needed uprobes. If uprobe_multi support is not detected the old behaviour stays. Also adding usdt.s program section for sleepable usdt probes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-18-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add uprobe multi link detectionJiri Olsa
Adding uprobe-multi link detection. It will be used later in bpf_program__attach_usdt function to check and use uprobe_multi link over standard uprobe links. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-17-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add support for u[ret]probe.multi[.s] program sectionsJiri Olsa
Adding support for several uprobe_multi program sections to allow auto attach of multi_uprobe programs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-16-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add bpf_program__attach_uprobe_multi functionJiri Olsa
Adding bpf_program__attach_uprobe_multi function that allows to attach multiple uprobes with uprobe_multi link. The user can specify uprobes with direct arguments: binary_path/func_pattern/pid or with struct bpf_uprobe_multi_opts opts argument fields: const char **syms; const unsigned long *offsets; const unsigned long *ref_ctr_offsets; const __u64 *cookies; User can specify 2 mutually exclusive set of inputs: 1) use only path/func_pattern/pid arguments 2) use path/pid with allowed combinations of: syms/offsets/ref_ctr_offsets/cookies/cnt - syms and offsets are mutually exclusive - ref_ctr_offsets and cookies are optional Any other usage results in error. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-15-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add bpf_link_create support for multi uprobesJiri Olsa
Adding new uprobe_multi struct to bpf_link_create_opts object to pass multiple uprobe data to link_create attr uapi. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-14-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add elf_resolve_pattern_offsets functionJiri Olsa
Adding elf_resolve_pattern_offsets function that looks up offsets for symbols specified by pattern argument. The 'pattern' argument allows wildcards (*?' supported). Offsets are returned in allocated array together with its size and needs to be released by the caller. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-13-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add elf_resolve_syms_offsets functionJiri Olsa
Adding elf_resolve_syms_offsets function that looks up offsets for symbols specified in syms array argument. Offsets are returned in allocated array with the 'cnt' size, that needs to be released by the caller. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-12-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add elf symbol iteratorJiri Olsa
Adding elf symbol iterator object (and some functions) that follow open-coded iterator pattern and some functions to ease up iterating elf object symbols. The idea is to iterate single symbol section with: struct elf_sym_iter iter; struct elf_sym *sym; if (elf_sym_iter_new(&iter, elf, binary_path, SHT_DYNSYM)) goto error; while ((sym = elf_sym_iter_next(&iter))) { ... } I considered opening the elf inside the iterator and iterate all symbol sections, but then it gets more complicated wrt user checks for when the next section is processed. Plus side is the we don't need 'exit' function, because caller/user is in charge of that. The returned iterated symbol object from elf_sym_iter_next function is placed inside the struct elf_sym_iter, so no extra allocation or argument is needed. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-11-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add elf_open/elf_close functionsJiri Olsa
Adding elf_open/elf_close functions and using it in elf_find_func_offset_from_file function. It will be used in following changes to save some common code. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-10-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Move elf_find_func_offset* functions to elf objectJiri Olsa
Adding new elf object that will contain elf related functions. There's no functional change. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-9-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21libbpf: Add uprobe_multi attach type and link namesJiri Olsa
Adding new uprobe_multi attach type and link names, so the functions can resolve the new values. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230809083440.3209381-8-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf: Add bpf_get_func_ip helper support for uprobe linkJiri Olsa
Adding support for bpf_get_func_ip helper being called from ebpf program attached by uprobe_multi link. It returns the ip of the uprobe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-7-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf: Add pid filter support for uprobe_multi linkJiri Olsa
Adding support to specify pid for uprobe_multi link and the uprobes are created only for task with given pid value. Using the consumer.filter filter callback for that, so the task gets filtered during the uprobe installation. We still need to check the task during runtime in the uprobe handler, because the handler could get executed if there's another system wide consumer on the same uprobe (thanks Oleg for the insight). Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf: Add cookies support for uprobe_multi linkJiri Olsa
Adding support to specify cookies array for uprobe_multi link. The cookies array share indexes and length with other uprobe_multi arrays (offsets/ref_ctr_offsets). The cookies[i] value defines cookie for i-the uprobe and will be returned by bpf_get_attach_cookie helper when called from ebpf program hooked to that specific uprobe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf: Add multi uprobe linkJiri Olsa
Adding new multi uprobe link that allows to attach bpf program to multiple uprobes. Uprobes to attach are specified via new link_create uprobe_multi union: struct { __aligned_u64 path; __aligned_u64 offsets; __aligned_u64 ref_ctr_offsets; __u32 cnt; __u32 flags; } uprobe_multi; Uprobes are defined for single binary specified in path and multiple calling sites specified in offsets array with optional reference counters specified in ref_ctr_offsets array. All specified arrays have length of 'cnt'. The 'flags' supports single bit for now that marks the uprobe as return probe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf: Add attach_type checks under bpf_prog_attach_check_attach_typeJiri Olsa
Add extra attach_type checks from link_create under bpf_prog_attach_check_attach_type. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enumJiri Olsa
Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum, so it'd show up in vmlinux.h. There's not functional change compared to having this as macro. Acked-by: Yafang Shao <laoar.shao@gmail.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21Merge branch 'samples-bpf-make-bpf-programs-more-libbpf-aware'Alexei Starovoitov
Daniel T. Lee says: ==================== samples/bpf: make BPF programs more libbpf aware The existing tracing programs have been developed for a considerable period of time and, as a result, do not properly incorporate the features of the current libbpf, such as CO-RE. This is evident in frequent usage of functions like PT_REGS* and the persistence of "hack" methods using underscore-style bpf_probe_read_kernel from the past. These programs are far behind the current level of libbpf and can potentially confuse users. The kernel has undergone significant changes, and some of these changes have broken these programs, but on the other hand, more robust APIs have been developed for increased stableness. To list some of the kernel changes that this patch set is focusing on, - symbol mismatch occurs due to compiler optimization [1] - inline of blk_account_io* breaks BPF kprobe program [2] - new tracepoints for the block_io_start/done are introduced [3] - map lookup probes can't be triggered (bpf_disable_instrumentation)[4] - BPF_KSYSCALL has been introduced to simplify argument fetching [5] - convert to vmlinux.h and use tp argument structure within it - make tracing programs to be more CO-RE centric In this regard, this patch set aims not only to integrate the latest features of libbpf into BPF programs but also to reduce confusion and clarify the BPF programs. This will help with the potential confusion among users and make the programs more intutitive. [1]: https://github.com/iovisor/bcc/issues/1754 [2]: https://github.com/iovisor/bcc/issues/4261 [3]: commit 5a80bd075f3b ("block: introduce block_io_start/block_io_done tracepoints") [4]: commit 7c4cd051add3 ("bpf: Fix syscall's stackmap lookup potential deadlock") [5]: commit 6f5d467d55f0 ("libbpf: improve BPF_KPROBE_SYSCALL macro and rename it to BPF_KSYSCALL") ==================== Link: https://lore.kernel.org/r/20230818090119.477441-1-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: simplify spintest with kprobe.multiDaniel T. Lee
With the introduction of kprobe.multi, it is now possible to attach multiple kprobes to a single BPF program without the need for multiple definitions. Additionally, this method supports wildcard-based matching, allowing for further simplification of BPF programs. In here, an asterisk (*) wildcard is used to map to all symbols relevant to spin_{lock|unlock}. Furthermore, since kprobe.multi handles symbol matching, this commit eliminates the need for the previous logic of reading the ksym table to verify the existence of symbols. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-10-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: refactor syscall tracing programs using BPF_KSYSCALL macroDaniel T. Lee
This commit refactors the syscall tracing programs by adopting the BPF_KSYSCALL macro. This change aims to enhance the clarity and simplicity of the BPF programs by reducing the complexity of argument parsing from pt_regs. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-9-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: fix broken map lookup probeDaniel T. Lee
In the commit 7c4cd051add3 ("bpf: Fix syscall's stackmap lookup potential deadlock"), a potential deadlock issue was addressed, which resulted in *_map_lookup_elem not triggering BPF programs. (prior to lookup, bpf_disable_instrumentation() is used) To resolve the broken map lookup probe using "htab_map_lookup_elem", this commit introduces an alternative approach. Instead, it utilize "bpf_map_copy_value" and apply a filter specifically for the hash table with map_type. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Fixes: 7c4cd051add3 ("bpf: Fix syscall's stackmap lookup potential deadlock") Link: https://lore.kernel.org/r/20230818090119.477441-8-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: fix bio latency check with tracepointDaniel T. Lee
Recently, a new tracepoint for the block layer, specifically the block_io_start/done tracepoints, was introduced in commit 5a80bd075f3b ("block: introduce block_io_start/block_io_done tracepoints"). Previously, the kprobe entry used for this purpose was quite unstable and inherently broke relevant probes [1]. Now that a stable tracepoint is available, this commit replaces the bio latency check with it. One of the changes made during this replacement is the key used for the hash table. Since 'struct request' cannot be used as a hash key, the approach taken follows that which was implemented in bcc/biolatency [2]. (uses dev:sector for the key) [1]: https://github.com/iovisor/bcc/issues/4261 [2]: https://github.com/iovisor/bcc/pull/4691 Fixes: 450b7879e345 ("block: move blk_account_io_{start,done} to blk-mq.c") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-7-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: make tracing programs to be more CO-RE centricDaniel T. Lee
The existing tracing programs have been developed for a considerable period of time and, as a result, do not properly incorporate the features of the current libbpf, such as CO-RE. This is evident in frequent usage of functions like PT_REGS* and the persistence of "hack" methods using underscore-style bpf_probe_read_kernel from the past. These programs are far behind the current level of libbpf and can potentially confuse users. Therefore, this commit aims to convert the outdated BPF programs to be more CO-RE centric. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-6-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: fix symbol mismatch by compiler optimizationDaniel T. Lee
Currently, multiple kprobe programs are suffering from symbol mismatch due to compiler optimization. These optimizations might induce additional suffix to the symbol name such as '.isra' or '.constprop'. # egrep ' finish_task_switch| __netif_receive_skb_core' /proc/kallsyms ffffffff81135e50 t finish_task_switch.isra.0 ffffffff81dd36d0 t __netif_receive_skb_core.constprop.0 ffffffff8205cc0e t finish_task_switch.isra.0.cold ffffffff820b1aba t __netif_receive_skb_core.constprop.0.cold To avoid this, this commit replaces the original kprobe section to kprobe.multi in order to match symbol with wildcard characters. Here, asterisk is used for avoiding symbol mismatch. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-5-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: unify bpf program suffix to .bpf with tracing programsDaniel T. Lee
Currently, BPF programs typically have a suffix of .bpf.c. However, some programs still utilize a mixture of _kern.c suffix alongside the naming convention. In order to achieve consistency in the naming of these programs, this commit unifies the inconsistency in the naming convention of BPF kernel programs. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-4-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: convert to vmlinux.h with tracing programsDaniel T. Lee
This commit replaces separate headers with a single vmlinux.h to tracing programs. Thanks to that, we no longer need to define the argument structure for tracing programs directly. For example, argument for the sched_switch tracpepoint (sched_switch_args) can be replaced with the vmlinux.h provided trace_event_raw_sched_switch. Additional defines have been added to the BPF program either directly or through the inclusion of net_shared.h. Defined values are PERF_MAX_STACK_DEPTH, IFNAMSIZ constants and __stringify() macro. This change enables the BPF program to access internal structures with BTF generated "vmlinux.h" header. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-3-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21samples/bpf: fix warning with ignored-attributesDaniel T. Lee
Currently, compiling the bpf programs will result the warning with the ignored attribute as follows. This commit fixes the warning by adding cf-protection option. In file included from ./arch/x86/include/asm/linkage.h:6: ./arch/x86/include/asm/ibt.h:77:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes] extern __noendbr u64 ibt_save(bool disable); ^ ./arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr' ^ Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-2-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21Merge branch 'remove-unnecessary-synchronizations-in-cpumap'Alexei Starovoitov
Hou Tao says: ==================== Remove unnecessary synchronizations in cpumap From: Hou Tao <houtao1@huawei.com> Hi, This is the formal patchset to remove unnecessary synchronizations in cpu-map after address comments and collect Rvb tags from Toke Høiland-Jørgensen (Big thanks to Toke). Patch #1 removes the unnecessary rcu_barrier() when freeing bpf_cpu_map_entry and replaces it by queue_rcu_work(). Patch #2 removes the unnecessary call_rcu() and queue_work() when destroying cpu-map and does the freeing directly. Test the patchset by using xdp_redirect_cpu and virtio-net. Both xdp-mode and skb-mode have been exercised and no issues were reported. As ususal, comments and suggestions are always welcome. Change Log: v1: * address comments from Toke Høiland-Jørgensen * add Rvb tags from Toke Høiland-Jørgensen * update outdated comment in cpu_map_delete_elem() RFC: https://lore.kernel.org/bpf/20230728023030.1906124-1-houtao@huaweicloud.com ==================== Link: https://lore.kernel.org/r/20230816045959.358059-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf, cpumask: Clean up bpf_cpu_map_entry directly in cpu_map_freeHou Tao
After synchronous_rcu(), both the dettached XDP program and xdp_do_flush() are completed, and the only user of bpf_cpu_map_entry will be cpu_map_kthread_run(), so instead of calling __cpu_map_entry_replace() to stop kthread and cleanup entry after a RCU grace period, do these things directly. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20230816045959.358059-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21bpf, cpumap: Use queue_rcu_work() to remove unnecessary rcu_barrier()Hou Tao
As for now __cpu_map_entry_replace() uses call_rcu() to wait for the inflight xdp program to exit the RCU read critical section, and then launch kworker cpu_map_kthread_stop() to call kthread_stop() to flush all pending xdp frames or skbs. But it is unnecessary to use rcu_barrier() in cpu_map_kthread_stop() to wait for the completion of __cpu_map_entry_free(), because rcu_barrier() will wait for all pending RCU callbacks and cpu_map_kthread_stop() only needs to wait for the completion of a specific __cpu_map_entry_free(). So use queue_rcu_work() to replace call_rcu(), schedule_work() and rcu_barrier(). queue_rcu_work() will queue a __cpu_map_entry_free() kworker after a RCU grace period. Because __cpu_map_entry_free() is running in a kworker context, so it is OK to do all of these freeing procedures include kthread_stop() in it. After the update, there is no need to do reference-counting for bpf_cpu_map_entry, because bpf_cpu_map_entry is freed directly in __cpu_map_entry_free(), so just remove it. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20230816045959.358059-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-21scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5Neil Armstrong
The qunipro_g4_sel clear is also needed for new platforms with major version > 5. Fix the version check to take this into account. Fixes: 9c02aa24bf40 ("scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW version major 5") Acked-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Nitin Rawat <quic_nitirawa@quicinc.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20230821-topic-sm8x50-upstream-ufs-major-5-plus-v2-1-f42a4b712e58@linaro.org Reviewed-by: "Bao D. Nguyen" <quic_nguyenb@quicinc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21scsi: ufs: mcq: Fix the search/wrap around logicBao D. Nguyen
The search and wrap around logic in the ufshcd_mcq_sqe_search() function does not work correctly when the hwq's queue depth is not a power of two number. Correct it so that any queue depth with a positive integer value within the supported range would work. Signed-off-by: "Bao D. Nguyen" <quic_nguyenb@quicinc.com> Link: https://lore.kernel.org/r/ff49c15be205135ed3ec186f3086694c02867dbd.1692149603.git.quic_nguyenb@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Fixes: 8d7290348992 ("scsi: ufs: mcq: Add supporting functions for MCQ abort") Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-21of/platform: increase refcount of fwnodePeng Fan
commit 0f8e5651095b ("of/platform: Propagate firmware node by calling device_set_node()") use of_fwnode_handle to replace of_node_get, which introduces a side effect that the refcount is not increased. Then the out of tree jailhouse hypervisor enable/disable test will trigger kernel dump in of_overlay_remove, with the following sequence " of_changeset_revert(&overlay_changeset); of_changeset_destroy(&overlay_changeset); of_overlay_remove(&overlay_id); " So increase the refcount to avoid issues. This patch also release the refcount when releasing amba device to avoid refcount leakage. Fixes: 0f8e5651095b ("of/platform: Propagate firmware node by calling device_set_node()") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20230821023928.3324283-2-peng.fan@oss.nxp.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-08-21mm: multi-gen LRU: don't spin during memcg releaseT.J. Mercier
When a memcg is in the process of being released mem_cgroup_tryget will fail because its reference count has already reached 0. This can happen during reclaim if the memcg has already been offlined, and we reclaim all remaining pages attributed to the offlined memcg. shrink_many attempts to skip the empty memcg in this case, and continue reclaiming from the remaining memcgs in the old generation. If there is only one memcg remaining, or if all remaining memcgs are in the process of being released then shrink_many will spin until all memcgs have finished being released. The release occurs through a workqueue, so it can take a while before kswapd is able to make any further progress. This fix results in reductions in kswapd activity and direct reclaim in a test where 28 apps (working set size > total memory) are repeatedly launched in a random sequence: A B delta ratio(%) allocstall_movable 5962 3539 -2423 -40.64 allocstall_normal 2661 2417 -244 -9.17 kswapd_high_wmark_hit_quickly 53152 7594 -45558 -85.71 pageoutrun 57365 11750 -45615 -79.52 Link: https://lkml.kernel.org/r/20230814151636.1639123-1-tjmercier@google.com Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21mm: memory-failure: fix unexpected return value in soft_offline_page()Miaohe Lin
When page_handle_poison() fails to handle the hugepage or free page in retry path, soft_offline_page() will return 0 while -EBUSY is expected in this case. Consequently the user will think soft_offline_page succeeds while it in fact failed. So the user will not try again later in this case. Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21radix tree: remove unused variableArnd Bergmann
Recent versions of clang warn about an unused variable, though older versions saw the 'slot++' as a use and did not warn: radix-tree.c:1136:50: error: parameter 'slot' set but not used [-Werror,-Wunused-but-set-parameter] It's clearly not needed any more, so just remove it. Link: https://lkml.kernel.org/r/20230811131023.2226509-1-arnd@kernel.org Fixes: 3a08cd52c37c7 ("radix tree: Remove multiorder support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Rong Tao <rongtao@cestc.cn> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21mm: add a call to flush_cache_vmap() in vmap_pfn()Alexandre Ghiti
flush_cache_vmap() must be called after new vmalloc mappings are installed in the page table in order to allow architectures to make sure the new mapping is visible. It could lead to a panic since on some architectures (like powerpc), the page table walker could see the wrong pte value and trigger a spurious page fault that can not be resolved (see commit f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")). But actually the patch is aiming at riscv: the riscv specification allows the caching of invalid entries in the TLB, and since we recently removed the vmalloc page fault handling, we now need to emit a tlb shootdown whenever a new vmalloc mapping is emitted (https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/). That's a temporary solution, there are ways to avoid that :) Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function") Reported-by: Dylan Jhong <dylan@andestech.com> Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Dylan Jhong <dylan@andestech.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21selftests/mm: FOLL_LONGTERM need to be updated to 0x100Ayush Jain
After commit 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at include/linux/mm_types.h. As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same here as well. Before this change test goes in an infinite assert loop in hmm.hmm_device_private.hmm_gup_test ========================================================== RUN hmm.hmm_device_private.hmm_gup_test ... hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE.. ..(2) == m[2] (34) hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0) hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0) ... ========================================================== Call Trace: <TASK> ? sched_clock+0xd/0x20 ? __lock_acquire.constprop.0+0x120/0x6c0 ? ktime_get+0x2c/0xd0 ? sched_clock+0xd/0x20 ? local_clock+0x12/0xd0 ? lock_release+0x26e/0x3b0 pin_user_pages_fast+0x4c/0x70 gup_test_ioctl+0x4ff/0xbb0 ? gup_test_ioctl+0x68c/0xbb0 __x64_sys_ioctl+0x99/0xd0 do_syscall_64+0x60/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? irqentry_exit_to_user_mode+0xd/0x20 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x96/0x200 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f6aaa31aaff After this change test is able to pass successfully. Link: https://lkml.kernel.org/r/20230808124347.79163-1-ayush.jain3@amd.com Fixes: 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h") Signed-off-by: Ayush Jain <ayush.jain3@amd.com> Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()Ryusuke Konishi
A syzbot stress test reported that create_empty_buffers() called from nilfs_lookup_dirty_data_buffers() can cause a general protection fault. Analysis using its reproducer revealed that the back reference "mapping" from a page/folio has been changed to NULL after dirty page/folio gang lookup in nilfs_lookup_dirty_data_buffers(). Fix this issue by excluding pages/folios from being collected if, after acquiring a lock on each page/folio, its back reference "mapping" differs from the pointer to the address space struct that held the page/folio. Link: https://lkml.kernel.org/r/20230805132038.6435-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+0ad741797f4565e7e2d2@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/0000000000002930a705fc32b231@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>