summaryrefslogtreecommitdiff
path: root/tools/testing/selftests/bpf/progs
AgeCommit message (Collapse)Author
2025-05-29Merge tag 'trace-v6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Have module addresses get updated in the persistent ring buffer The addresses of the modules from the previous boot are saved in the persistent ring buffer. If the same modules are loaded and an address is in the old buffer points to an address that was both saved in the persistent ring buffer and is loaded in memory, shift the address to point to the address that is loaded in memory in the trace event. - Print function names for irqs off and preempt off callsites When ignoring the print fmt of a trace event and just printing the fields directly, have the fields for preempt off and irqs off events still show the function name (via kallsyms) instead of just showing the raw address. - Clean ups of the histogram code The histogram functions saved over 800 bytes on the stack to process events as they come in. Instead, create per-cpu buffers that can hold this information and have a separate location for each context level (thread, softirq, IRQ and NMI). Also add some more comments to the code. - Add "common_comm" field for histograms Add "common_comm" that uses the current->comm as a field in an event histogram and acts like any of the other fields of the event. - Show "subops" in the enabled_functions file When the function graph infrastructure is used, a subsystem has a "subops" that it attaches its callback function to. Instead of the enabled_functions just showing a function calling the function that calls the subops functions, also show the subops functions that will get called for that function too. - Add "copy_trace_marker" option to instances There are cases where an instance is created for tooling to write into, but the old tooling has the top level instance hardcoded into the application. New tools want to consume the data from an instance and not the top level buffer. By adding a copy_trace_marker option, whenever the top instance trace_marker is written into, a copy of it is also written into the instance with this option set. This allows new tools to read what old tools are writing into the top buffer. If this option is cleared by the top instance, then what is written into the trace_marker is not written into the top instance. This is a way to redirect the trace_marker writes into another instance. - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp() If a tracepoint is created by DECLARE_TRACE() instead of TRACE_EVENT(), then it will not be exposed via tracefs. Currently there's no way to differentiate in the kernel the tracepoint functions between those that are exposed via tracefs or not. A calling convention has been made manually to append a "_tp" prefix for events created by DECLARE_TRACE(). Instead of doing this manually, force it so that all DECLARE_TRACE() events have this notation. - Use __string() for task->comm in some sched events Instead of hardcoding the comm to be TASK_COMM_LEN in some of the scheduler events use __string() which makes it dynamic. Note, if these events are parsed by user space it they may break, and the event may have to be converted back to the hardcoded size. - Have function graph "depth" be unsigned to the user Internally to the kernel, the "depth" field of the function graph event is signed due to -1 being used for end of boundary. What actually gets recorded in the event itself is zero or positive. Reflect this to user space by showing "depth" as unsigned int and be consistent across all events. - Allow an arbitrary long CPU string to osnoise_cpus_write() The filtering of which CPUs to write to can exceed 256 bytes. If a machine has 256 CPUs, and the filter is to filter every other CPU, the write would take a string larger than 256 bytes. Instead of using a fixed size buffer on the stack that is 256 bytes, allocate it to handle what is passed in. - Stop having ftrace check the per-cpu data "disabled" flag The "disabled" flag in the data structure passed to most ftrace functions is checked to know if tracing has been disabled or not. This flag was added back in 2008 before the ring buffer had its own way to disable tracing. The "disable" flag is now not always set when needed, and the ring buffer flag should be used in all locations where the disabled is needed. Since the "disable" flag is redundant and incorrect, stop using it. Fix up some locations that use the "disable" flag to use the ring buffer info. - Use a new tracer_tracing_disable/enable() instead of data->disable flag There's a few cases that set the data->disable flag to stop tracing, but this flag is not consistently used. It is also an on/off switch where if a function set it and calls another function that sets it, the called function may incorrectly enable it. Use a new trace_tracing_disable() and tracer_tracing_enable() that uses a counter and can be nested. These use the ring buffer flags which are always checked making the disabling more consistent. - Save the trace clock in the persistent ring buffer Save what clock was used for tracing in the persistent ring buffer and set it back to that clock after a reboot. - Remove unused reference to a per CPU data pointer in mmiotrace functions - Remove unused buffer_page field from trace_array_cpu structure - Remove more strncpy() instances - Other minor clean ups and fixes * tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (36 commits) tracing: Fix compilation warning on arm32 tracing: Record trace_clock and recover when reboot tracing/sched: Use __string() instead of fixed lengths for task->comm tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix tracing: Cleanup upper_empty() in pid_list tracing: Allow the top level trace_marker to write into another instances tracing: Add a helper function to handle the dereference arg in verifier tracing: Remove unnecessary "goto out" that simply returns ret is trigger code tracing: Fix error handling in event_trigger_parse() tracing: Rename event_trigger_alloc() to trigger_data_alloc() tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf tracing: Remove unused buffer_page field from trace_array_cpu structure tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer tracing: Convert the per CPU "disabled" counter to local from atomic tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field ring-buffer: Add ring_buffer_record_is_on_cpu() tracing: Do not use per CPU array_buffer.data->disabled for cpumask ftrace: Do not disabled function graph based on "disabled" field tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one() ...
2025-05-28Merge tag 'bpf-next-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Fix and improve BTF deduplication of identical BTF types (Alan Maguire and Andrii Nakryiko) - Support up to 12 arguments in BPF trampoline on arm64 (Xu Kuohai and Alexis Lothoré) - Support load-acquire and store-release instructions in BPF JIT on riscv64 (Andrea Parri) - Fix uninitialized values in BPF_{CORE,PROBE}_READ macros (Anton Protopopov) - Streamline allowed helpers across program types (Feng Yang) - Support atomic update for hashtab of BPF maps (Hou Tao) - Implement json output for BPF helpers (Ihor Solodrai) - Several s390 JIT fixes (Ilya Leoshkevich) - Various sockmap fixes (Jiayuan Chen) - Support mmap of vmlinux BTF data (Lorenz Bauer) - Support BPF rbtree traversal and list peeking (Martin KaFai Lau) - Tests for sockmap/sockhash redirection (Michal Luczaj) - Introduce kfuncs for memory reads into dynptrs (Mykyta Yatsenko) - Add support for dma-buf iterators in BPF (T.J. Mercier) - The verifier support for __bpf_trap() (Yonghong Song) * tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (135 commits) bpf, arm64: Remove unused-but-set function and variable. selftests/bpf: Add tests with stack ptr register in conditional jmp bpf: Do not include stack ptr register in precision backtracking bookkeeping selftests/bpf: enable many-args tests for arm64 bpf, arm64: Support up to 12 function arguments bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem() bpf: Avoid __bpf_prog_ret0_warn when jit fails bpftool: Add support for custom BTF path in prog load/loadall selftests/bpf: Add unit tests with __bpf_trap() kfunc bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable bpf: Remove special_kfunc_set from verifier selftests/bpf: Add test for open coded dmabuf_iter selftests/bpf: Add test for dmabuf_iter bpf: Add open coded dmabuf iterator bpf: Add dmabuf iterator dma-buf: Rename debugfs symbols bpf: Fix error return value in bpf_copy_from_user_dynptr libbpf: Use mmap to parse vmlinux BTF from sysfs selftests: bpf: Add a test for mmapable vmlinux BTF btf: Allow mmap of vmlinux btf ...
2025-05-28Merge tag 'net-next-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter: - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF: - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols: - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API: - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling: - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers: - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the steering table handling to significantly reduce the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature" * tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits) selftests/bpf: Fix bpf selftest build warning selftests: netfilter: Fix skip of wildcard interface test net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames net: openvswitch: Fix the dead loop of MPLS parse calipso: Don't call calipso functions for AF_INET sk. selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem net_sched: hfsc: Address reentrant enqueue adding class to eltree twice octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback octeontx2-pf: QOS: Perform cache sync on send queue teardown net: mana: Add support for Multi Vports on Bare metal net: devmem: ncdevmem: remove unused variable net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add ipv4 support net: devmem: preserve sockc_err page_pool: fix ugly page_pool formatting net: devmem: move list_add to net_devmem_bind_dmabuf. selftests: netfilter: nft_queue.sh: include file transfer duration in log message net: phy: mscc: Fix memory leak when using one step timestamping ...
2025-05-27Merge tag 'cgroup-for-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cgroup rstat shared the tracking tree across all controllers with the rationale being that a cgroup which is using one resource is likely to be using other resources at the same time (ie. if something is allocating memory, it's probably consuming CPU cycles). However, this turned out to not scale very well especially with memcg using rstat for internal operations which made memcg stat read and flush patterns substantially different from other controllers. JP Kobryn split the rstat tree per controller. - cgroup BPF support was hooking into cgroup init/exit paths directly. Convert them to use a notifier chain instead so that other usages can be added easily. The two of the patches which implement this are mislabeled as belonging to sched_ext instead of cgroup. Sorry. - Relatively minor cpuset updates - Documentation updates * tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits) sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier sched_ext: Introduce cgroup_lifetime_notifier cgroup: Minor reorganization of cgroup_create() cgroup, docs: cpu controller's interaction with various scheduling policies cgroup, docs: convert space indentation to tab indentation cgroup: avoid per-cpu allocation of size zero rstat cpu locks cgroup, docs: be specific about bandwidth control of rt processes cgroup: document the rstat per-cpu initialization cgroup: helper for checking rstat participation of css cgroup: use subsystem-specific rstat locks to avoid contention cgroup: use separate rstat trees for each subsystem cgroup: compare css to cgroup::self in helper for distingushing css cgroup: warn on rstat usage by early init subsystems cgroup/cpuset: drop useless cpumask_empty() in compute_effective_exclusive_cpumask() cgroup/rstat: Improve cgroup_rstat_push_children() documentation cgroup: fix goto ordering in cgroup_init() cgroup: fix pointer check in css_rstat_init() cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs cgroup/cpuset: Fix obsolete comment in cpuset_css_offline() cgroup/cpuset: Always use cpu_active_mask ...
2025-05-27selftests/bpf: Add tests with stack ptr register in conditional jmpYonghong Song
Add two tests: - one test has 'rX <op> r10' where rX is not r10, and - another test has 'rX <op> rY' where rX and rY are not r10 but there is an early insn 'rX = r10'. Without previous verifier change, both tests will fail. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250524041340.4046304-1-yonghong.song@linux.dev
2025-05-27selftests/bpf: Add unit tests with __bpf_trap() kfuncYonghong Song
Add some inline-asm tests and C tests where __bpf_trap() or __builtin_trap() is used in the code. The __builtin_trap() test is guarded with llvm21 ([1]) since otherwise the compilation failure will happen. [1] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205331.1291734-1-yonghong.song@linux.dev Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for open coded dmabuf_iterT.J. Mercier
Use the same test buffers as the traditional iterator and a new BPF map to verify the test buffers can be found with the open coded dmabuf iterator. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-6-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for dmabuf_iterT.J. Mercier
This test creates a udmabuf, and a dmabuf from the system dmabuf heap, and uses a BPF program that prints dmabuf metadata with the new dmabuf_iter to verify they can be found. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-5-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-23tcp: Restrict SO_TXREHASH to TCP socket.Kuniyuki Iwashima
sk->sk_txrehash is only used for TCP. Let's restrict SO_TXREHASH to TCP to reflect this. Later, we will make sk_txrehash a part of the union for other protocol families. Note that we need to modify BPF selftest not to get/set SO_TEREHASH for non-TCP sockets. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-22selftests/bpf: Introduce verdict programs for sockmap_redirMichal Luczaj
Instead of piggybacking on test_sockmap_listen, introduce test_sockmap_redir especially for sockmap redirection tests. Suggested-by: Jiayuan Chen <mrpre@163.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-4-a1ea723f7e7e@rbox.co
2025-05-19cgroup: use separate rstat trees for each subsystemJP Kobryn
Different subsystems may call cgroup_rstat_updated() within the same cgroup, resulting in a tree of pending updates from multiple subsystems. When one of these subsystems is flushed via cgroup_rstat_flushed(), all other subsystems with pending updates on the tree will also be flushed. Change the paradigm of having a single rstat tree for all subsystems to having separate trees for each subsystem. This separation allows for subsystems to perform flushes without the side effects of other subsystems. As an example, flushing the cpu stats will no longer cause the memory stats to be flushed and vice versa. In order to achieve subsystem-specific trees, change the tree node type from cgroup to cgroup_subsys_state pointer. Then remove those pointers from the cgroup and instead place them on the css. Finally, change update/flush functions to make use of the different node type (css). These changes allow a specific subsystem to be associated with an update or flush. Separate rstat trees will now exist for each unique subsystem. Since updating/flushing will now be done at the subsystem level, there is no longer a need to keep track of updated css nodes at the cgroup level. The list management of these nodes done within the cgroup (rstat_css_list and related) has been removed accordingly. Conditional guards for checking validity of a given css were placed within css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc usage in bpf programs. Guards were also placed within css_rstat_init/exit() in order to help consolidate calls to them. At call sites for all four functions, the existing guards were removed. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-14selftests/bpf: Relax TCPOPT_WINDOW validation in test_tcp_custom_syncookie.c.Kuniyuki Iwashima
The custom syncookie test expects TCPOPT_WINDOW to be 7 based on the kernel’s behaviour at the time, but the upcoming series [0] will bump it to 10. Let's relax the test to allow any valid TCPOPT_WINDOW value in the range 1–14. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/netdev/20250513193919.1089692-1-edumazet@google.com/ #[0] Link: https://patch.msgid.link/20250514214021.85187-1-kuniyu@amazon.com
2025-05-14tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffixSteven Rostedt
Most tracepoints in the kernel are created with TRACE_EVENT(). The TRACE_EVENT() macro (and DECLARE_EVENT_CLASS() and DEFINE_EVENT() where in reality, TRACE_EVENT() is just a helper macro that calls those other two macros), will create not only a tracepoint (the function trace_<event>() used in the kernel), it also exposes the tracepoint to user space along with defining what fields will be saved by that tracepoint. There are a few places that tracepoints are created in the kernel that are not exposed to userspace via tracefs. They can only be accessed from code within the kernel. These tracepoints are created with DEFINE_TRACE() Most of these tracepoints end with "_tp". This is useful as when the developer sees that, they know that the tracepoint is for in-kernel only (meaning it can only be accessed inside the kernel, either directly by the kernel or indirectly via modules and BPF programs) and is not exposed to user space. Instead of making this only a process to add "_tp", enforce it by making the DECLARE_TRACE() append the "_tp" suffix to the tracepoint. This requires adding DECLARE_TRACE_EVENT() macros for the TRACE_EVENT() macro to use that keeps the original name. Link: https://lore.kernel.org/all/20250418083351.20a60e64@gandalf.local.home/ Cc: netdev <netdev@vger.kernel.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Ahern <dsahern@kernel.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Breno Leitao <leitao@debian.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/20250510163730.092fad5b@gandalf.local.home Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-12selftests/bpf: introduce tests for dynptr copy kfuncsMykyta Yatsenko
Introduce selftests verifying newly-added dynptr copy kfuncs. Covering contiguous and non-contiguous memory backed dynptrs. Disable test_probe_read_user_str_dynptr that triggers bug in strncpy_from_user_nofault. Patch to fix the issue [1]. [1] https://patchwork.kernel.org/project/linux-mm/patch/20250422131449.57177-1-mykyta.yatsenko5@gmail.com/ Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20250512205348.191079-4-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/bpf: Add test to cover sockmap with ktlsJiayuan Chen
The selftest can reproduce an issue where we miss the uncharge operation when freeing msg, which will cause the following warning. We fixed the issue and added this reproducer to selftest to ensure it will not happen again. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 40 at net/ipv4/af_inet.c inet_sock_destruct+0x173/0x1d5 Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: events sk_psock_destroy RIP: 0010:inet_sock_destruct+0x173/0x1d5 RSP: 0018:ffff8880085cfc18 EFLAGS: 00010202 RAX: 1ffff11003dbfc00 RBX: ffff88801edfe3e8 RCX: ffffffff822f5af4 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88801edfe16c RBP: ffff88801edfe184 R08: ffffed1003dbfc31 R09: 0000000000000000 R10: ffffffff822f5ab7 R11: ffff88801edfe187 R12: ffff88801edfdec0 R13: ffff888020376ac0 R14: ffff888020376ac0 R15: ffff888020376a60 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556365155830 CR3: 000000001d6aa000 CR4: 0000000000350ef0 Call Trace: <TASK> __sk_destruct+0x46/0x222 sk_psock_destroy+0x22f/0x242 process_one_work+0x504/0x8a8 ? process_one_work+0x39d/0x8a8 ? __pfx_process_one_work+0x10/0x10 ? worker_thread+0x44/0x2ae ? __list_add_valid_or_report+0x83/0xea ? srso_return_thunk+0x5/0x5f ? __list_add+0x45/0x52 process_scheduled_works+0x73/0x82 worker_thread+0x1ce/0x2ae Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250425060015.6968-3-jiayuan.chen@linux.dev
2025-05-09selftests/bpf: Enable non-arena load-acquire/store-release selftests for riscv64Peilin Ye
For riscv64, enable all BPF_{LOAD_ACQ,STORE_REL} selftests except the arena_atomics/* ones (not guarded behind CAN_USE_LOAD_ACQ_STORE_REL), since arena access is not yet supported. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/9d878fa99a72626208a8eed3c04c4140caf77fda.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/bpf: Verify zero-extension behavior in load-acquire testsPeilin Ye
Verify that 8-, 16- and 32-bit load-acquires are zero-extending by using immediate values with their highest bit set. Do the same for the 64-bit variant to keep the style consistent. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/11097fd515f10308b3941469ee4c86cb8872db3f.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/bpf: Avoid passing out-of-range values to __retval()Peilin Ye
Currently, we pass 0x1234567890abcdef to __retval() for the following two tests: verifier_load_acquire/load_acquire_64 verifier_store_release/store_release_64 However, the upper 32 bits of that value are being ignored, since __retval() expects an int. Actually, the tests would still pass even if I change '__retval(0x1234567890abcdef)' to e.g. '__retval(0x90abcdef)'. Restructure the tests a bit to test the entire 64-bit values properly. Do the same to their 8-, 16- and 32-bit variants as well to keep the style consistent. Fixes: ff3afe5da998 ("selftests/bpf: Add selftests for load-acquire and store-release instructions") Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/d67f4c6f6ee0d0388cbce1f4892ec4176ee2d604.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/bpf: Use CAN_USE_LOAD_ACQ_STORE_REL when appropriatePeilin Ye
Instead of open-coding the conditions, use '#ifdef CAN_USE_LOAD_ACQ_STORE_REL' to guard the following tests: verifier_precision/bpf_load_acquire verifier_precision/bpf_store_release verifier_store_release/* Note that, for the first two tests in verifier_precision.c, switching to '#ifdef CAN_USE_LOAD_ACQ_STORE_REL' means also checking if '__clang_major__ >= 18', which has already been guaranteed by the outer '#if' check. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/45d7e025f6e390a8ff36f08fc51e31705ac896bd.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06selftests/bpf: Add test for bpf_list_{front,back}Martin KaFai Lau
This patch adds the "list_peek" test to use the new bpf_list_{front,back} kfunc. The test_{front,back}* tests ensure that the return value is a non_own_ref node pointer and requires the spinlock to be held. Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # check non_own_ref marking Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250506015857.817950-9-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06selftests/bpf: Add tests for bpf_rbtree_{root,left,right}Martin KaFai Lau
This patch has a much simplified rbtree usage from the kernel sch_fq qdisc. It has a "struct node_data" which can be added to two different rbtrees which are ordered by different keys. The test first populates both rbtrees. Then search for a lookup_key from the "groot0" rbtree. Once the lookup_key is found, that node refcount is taken. The node is then removed from another "groot1" rbtree. While searching the lookup_key, the test will also try to remove all rbnodes in the path leading to the lookup_key. The test_{root,left,right}_spinlock_true tests ensure that the return value of the bpf_rbtree functions is a non_own_ref node pointer. This is done by forcing an verifier error by calling a helper bpf_jiffies64() while holding the spinlock. The tests then check for the verifier message "call bpf_rbtree...R0=rcu_ptr_or_null_node..." The other test_{root,left,right}_spinlock_false tests ensure that they must be called with spinlock held. Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # Check non_own_ref marking Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250506015857.817950-6-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06bpf: Allow refcounted bpf_rb_node used in bpf_rbtree_{remove,left,right}Martin KaFai Lau
The bpf_rbtree_{remove,left,right} requires the root's lock to be held. They also check the node_internal->owner is still owned by that root before proceeding, so it is safe to allow refcounted bpf_rb_node pointer to be used in these kfuncs. In a bpf fq implementation which is much closer to the kernel fq, https://lore.kernel.org/bpf/20250418224652.105998-13-martin.lau@linux.dev/, a networking flow (allocated by bpf_obj_new) can be added to two different rbtrees. There are cases that the flow is searched from one rbtree, held the refcount of the flow, and then removed from another rbtree: struct fq_flow { struct bpf_rb_node fq_node; struct bpf_rb_node rate_node; struct bpf_refcount refcount; unsigned long sk_long; }; int bpf_fq_enqueue(...) { /* ... */ bpf_spin_lock(&root->lock); while (can_loop) { /* ... */ if (!p) break; gc_f = bpf_rb_entry(p, struct fq_flow, fq_node); if (gc_f->sk_long == sk_long) { f = bpf_refcount_acquire(gc_f); break; } /* ... */ } bpf_spin_unlock(&root->lock); if (f) { bpf_spin_lock(&q->lock); bpf_rbtree_remove(&q->delayed, &f->rate_node); bpf_spin_unlock(&q->lock); } } bpf_rbtree_{left,right} do not need this change but are relaxed together with bpf_rbtree_remove instead of adding extra verifier logic to exclude these kfuncs. To avoid bi-sect failure, this patch also changes the selftests together. The "rbtree_api_remove_unadded_node" is not expecting verifier's error. The test now expects bpf_rbtree_remove(&groot, &m->node) to return NULL. The test uses __retval(0) to ensure this NULL return value. Some of the "only take non-owning..." failure messages are changed also. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250506015857.817950-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-05Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-05-02 We've added 14 non-merge commits during the last 10 day(s) which contain a total of 13 files changed, 740 insertions(+), 121 deletions(-). The main changes are: 1) Avoid skipping or repeating a sk when using a UDP bpf_iter, from Jordan Rife. 2) Fixed a crash when a bpf qdisc is set in the net.core.default_qdisc, from Amery Hung. 3) A few other fixes in the bpf qdisc, from Amery Hung. - Always call qdisc_watchdog_init() in the .init prologue such that the .reset/.destroy epilogue can always call qdisc_watchdog_cancel() without issue. - bpf_qdisc_init_prologue() was incorrectly returning an error when the bpf qdisc is set as the default_qdisc and the mq is creating the default_qdisc. It is now fixed. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Cleanup bpf qdisc selftests selftests/bpf: Test attaching a bpf qdisc with incomplete operators bpf: net_sched: Make some Qdisc_ops ops mandatory selftests/bpf: Test setting and creating bpf qdisc as default qdisc bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc selftests/bpf: Add tests for bucket resume logic in UDP socket iterators selftests/bpf: Return socket cookies from sock_iter_batch progs bpf: udp: Avoid socket skips and repeats during iteration bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items bpf: udp: Get rid of st_bucket_done bpf: udp: Make sure iter->batch always contains a full bucket snapshot bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch bpf: net_sched: Fix using bpf qdisc as default qdisc selftests/bpf: Fix compilation errors ==================== Link: https://patch.msgid.link/20250503010755.4030524-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-02selftests/bpf: Cleanup bpf qdisc selftestsAmery Hung
Some cleanups: - Remove unnecessary kfuncs declaration - Use _ns in the test name to run tests in a separate net namespace - Call skeleton __attach() instead of bpf_map__attach_struct_ops() to simplify tests. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02selftests/bpf: Test attaching a bpf qdisc with incomplete operatorsAmery Hung
Implement .destroy in bpf_fq and bpf_fifo as it is now mandatory. Test attaching a bpf qdisc with a missing operator .init. This is not allowed as bpf qdisc qdisc_watchdog_cancel() could have been called with an uninitialized timer. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02selftests/bpf: Test setting and creating bpf qdisc as default qdiscAmery Hung
First, test that bpf qdisc can be set as default qdisc. Then, attach an mq qdisc to see if bpf qdisc can be successfully created and grafted. The test is a sequential test as net.core.default_qdisc is global. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02selftests/bpf: Return socket cookies from sock_iter_batch progsJordan Rife
Extend the iter_udp_soreuse and iter_tcp_soreuse programs to write the cookie of the current socket, so that we can track the identity of the sockets that the iterator has seen so far. Update the existing do_test function to account for this change to the iterator program output. At the same time, teach both programs to work with AF_INET as well. Signed-off-by: Jordan Rife <jordan@jrife.io> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.15-rc5). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01selftests/bpf: xdp_metadata: Check XDP_REDIRCT support for dev-bound progsLorenzo Bianconi
Improve xdp_metadata bpf selftest in order to check it is possible for a XDP dev-bound program to perform XDP_REDIRECT into a DEVMAP but it is still not allowed to attach a XDP dev-bound program to a DEVMAP entry. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-04-29selftests/bpf: Fix compilation errorsFeng Yang
If the CONFIG_NET_SCH_BPF configuration is not enabled, the BPF test compilation will report the following error: In file included from progs/bpf_qdisc_fq.c:39: progs/bpf_qdisc_common.h:17:51: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] 17 | void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym; | ^ progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] 309 | struct bpf_sk_buff_ptr *to_free) | ^ progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility] progs/bpf_qdisc_fq.c:308:5: error: conflicting types for '____bpf_fq_enqueue' Fixes: 11c701639ba9 ("selftests/bpf: Add a basic fifo qdisc test") Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250428033445.58113-1-yangfeng59949@163.com
2025-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc4Alexei Starovoitov
Cross-merge bpf and other fixes after downstream PRs. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-25selftests/bpf: Correct typo in __clang_major__ macroPeilin Ye
Make sure that CAN_USE_BPF_ST test (compute_live_registers/store) is enabled when __clang_major__ >= 18. Fixes: 2ea8f6a1cda7 ("selftests/bpf: test cases for compute_live_registers()") Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/20250425213712.1542077-1-yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-25selftests/bpf: add test for softlock when modifying hashmap while iteratingBrandon Kammerdiener
Add test that modifies the map while it's being iterated in such a way that hangs the kernel thread unless the _safe fix is applied to bpf_for_each_hash_elem. Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com> Link: https://lore.kernel.org/r/20250424153246.141677-3-brandon.kammerdiener@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com>
2025-04-24selftests/bpf: Fix endianness issue in __qspinlock declarationIlya Leoshkevich
Copy the big-endian field declarations from qspinlock_types.h, otherwise some properties won't hold on big-endian systems. For example, assigning lock->val = 1 should result in lock->locked == 1, which is not the case there. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250424165525.154403-4-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-24selftests/bpf: Fix arena_spin_lock.c build dependencyIlya Leoshkevich
Changing bpf_arena_spin_lock.h does not lead to recompiling arena_spin_lock.c. By convention, all BPF progs depend on all header files in progs/, so move this header file there. There are no other users besides arena_spin_lock.c. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250424165525.154403-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-23selftests/bpf: Add test to access const void pointer argument in tracing programKaFai Wan
Adding verifier test for accessing const void pointer argument in tracing programs. The test program loads 1st argument of bpf_fentry_test10 function which is const void pointer and checks that verifier allows that. Signed-off-by: KaFai Wan <mannkafai@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250423121329.3163461-3-mannkafai@gmail.com
2025-04-21Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-04-17 We've added 12 non-merge commits during the last 9 day(s) which contain a total of 18 files changed, 1748 insertions(+), 19 deletions(-). The main changes are: 1) bpf qdisc support, from Amery Hung. A qdisc can be implemented in bpf struct_ops programs and can be used the same as other existing qdiscs in the "tc qdisc" command. 2) Add xsk tail adjustment tests, from Tushar Vyavahare. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Test attaching bpf qdisc to mq and non root selftests/bpf: Add a bpf fq qdisc to selftest selftests/bpf: Add a basic fifo qdisc test libbpf: Support creating and destroying qdisc bpf: net_sched: Disable attaching bpf qdisc to non root bpf: net_sched: Support updating bstats bpf: net_sched: Add a qdisc watchdog timer bpf: net_sched: Add basic bpf qdisc kfuncs bpf: net_sched: Support implementation of Qdisc_ops in bpf bpf: Prepare to reuse get_ctx_arg_idx selftests/xsk: Add tail adjustment tests and support check selftests/xsk: Add packet stream replacement function ==================== Link: https://patch.msgid.link/20250417184338.3152168-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3Alexei Starovoitov
Cross-merge bpf and other fixes after downstream PRs. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-17selftests/bpf: Add a bpf fq qdisc to selftestAmery Hung
This test implements a more sophisticated qdisc using bpf. The bpf fair- queueing (fq) qdisc gives each flow an equal chance to transmit data. It also respects the timestamp of skb for rate limiting. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250409214606.2000194-10-ameryhung@gmail.com
2025-04-17selftests/bpf: Add a basic fifo qdisc testAmery Hung
This selftest includes a bare minimum fifo qdisc, which simply enqueues sk_buffs into the back of a bpf list and dequeues from the front of the list. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250409214606.2000194-9-ameryhung@gmail.com
2025-04-10selftests/bpf: Make res_spin_lock AA test condition strongerKumar Kartikeya Dwivedi
Let's make sure that we see a EDEADLK and ETIMEDOUT whenever checking for the AA tests (in case of simple AA and AA after exhausting 31 entries). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250410170023.2670683-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-10selftests/xsk: Add tail adjustment tests and support checkTushar Vyavahare
Introduce tail adjustment functionality in xskxceiver using bpf_xdp_adjust_tail(). Add `xsk_xdp_adjust_tail` to modify packet sizes and drop unmodified packets. Implement `is_adjust_tail_supported` to check helper availability. Develop packet resizing tests, including shrinking and growing scenarios, with functions for both single-buffer and multi-buffer cases. Update the test framework to handle various scenarios and adjust MTU settings. These changes enhance the testing of packet tail adjustments, improving AF_XDP framework reliability. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250410033116.173617-3-tushar.vyavahare@intel.com
2025-04-09selftests/bpf: Add test case for atomic update of fd htabHou Tao
Add a test case to verify the atomic update of existing elements in the htab of maps. The test proceeds in three steps: 1) fill the outer map with keys in the range [0, 8] For each inner array map, the value of its first element is set as the key used to lookup the inner map. 2) create 16 threads to lookup these keys concurrently Each lookup thread first lookups the inner map, then it checks whether the first value of the inner array map is the same as the key used to lookup the inner map. 3) create 8 threads to overwrite these keys concurrently Each update thread first creates an inner array, it sets the first value of the array to the key used to update the outer map, then it uses the key and the inner map to update the outer map. Without atomic update support, the lookup operation may return -ENOENT during the lookup of outer map, or return -EINVAL during the comparison of the first value in the inner map and the key used for inner map, and the test will fail. After the atomic update change, both the lookup and the comparison will succeed. Given that the update of outer map is slow, the test case sets the loop number for each thread as 5 to reduce the total running time. However, the loop number could also be adjusted through FD_HTAB_LOOP_NR environment variable. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250401062250.543403-7-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09selftest/bpf/benchs: Add benchmark for sockmap usageJiayuan Chen
Add TCP+sockmap-based benchmark. Since sockmap's own update and delete operations are generally less critical, the performance of the fast forwarding framework built upon it is the key aspect. Also with cgset/cgexec, we can observe the behavior of sockmap under memory pressure. The benchmark can be run with: ''' ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress ''' In the future, we plan to move socket_helpers.h out of the prog_tests directory to make it accessible for the benchmark. This will enable better support for various socket types. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250407142234.47591-5-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09selftests/bpf: add ktls selftestJiayuan Chen
add ktls selftest for sockmap Test results: sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK sockmap_ktls/tls simple offload:OK sockmap_ktls/tls tx cork:OK sockmap_ktls/tls tx cork with push:OK sockmap_ktls/tls simple offload:OK sockmap_ktls/tls tx cork:OK sockmap_ktls/tls tx cork with push:OK sockmap_ktls:OK Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20250219052015.274405-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09selftests/bpf: Add BTF.ext line/func info getter testsMykyta Yatsenko
Add selftests checking that line and func info retrieved by newly added libbpf APIs are the same as returned by kernel via bpf_prog_get_info_by_fd. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250408234417.452565-3-mykyta.yatsenko5@gmail.com
2025-04-09selftests/bpf: Support struct/union presets in veristatMykyta Yatsenko
Extend commit e3c9abd0d14b ("selftests/bpf: Implement setting global variables in veristat") to support applying presets to members of the global structs or unions in veristat. For example: ``` ./veristat set_global_vars.bpf.o -G "union1.struct3.var_u8_h = 0xBB" ``` Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250408104544.140317-1-mykyta.yatsenko5@gmail.com
2025-04-04cgroup: change rstat function signatures from cgroup-based to css-basedJP Kobryn
This non-functional change serves as preparation for moving to subsystem-based rstat trees. To simplify future commits, change the signatures of existing cgroup-based rstat functions to become css-based and rename them to reflect that. Though the signatures have changed, the implementations have not. Within these functions use the css->cgroup pointer to obtain the associated cgroup and allow code to function the same just as it did before this patch. At applicable call sites, pass the subsystem-specific css pointer as an argument or pass a pointer to cgroup::self if not in subsystem context. Note that cgroup_rstat_updated_list() and cgroup_rstat_push_children() are not altered yet since there would be a larger amount of css to cgroup conversions which may overcomplicate the code at this intermediate phase. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-04libbpf: Add likely/unlikely macros and use them in selftestsAnton Protopopov
A few selftests and, more importantly, consequent changes to the bpf_helpers.h file, use likely/unlikely macros, so define them here and remove duplicate definitions from existing selftests. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250331203618.1973691-3-a.s.protopopov@gmail.com
2025-04-02selftests/bpf: Fix verifier_private_stack test failureYonghong Song
Several verifier_private_stack tests failed with latest bpf-next. For example, for 'Private stack, single prog' subtest, the jitted code: func #0: 0: f3 0f 1e fa endbr64 4: 0f 1f 44 00 00 nopl (%rax,%rax) 9: 0f 1f 00 nopl (%rax) c: 55 pushq %rbp d: 48 89 e5 movq %rsp, %rbp 10: f3 0f 1e fa endbr64 14: 49 b9 58 74 8a 8f 7d 60 00 00 movabsq $0x607d8f8a7458, %r9 1e: 65 4c 03 0c 25 28 c0 48 87 addq %gs:-0x78b73fd8, %r9 27: bf 2a 00 00 00 movl $0x2a, %edi 2c: 49 89 b9 00 ff ff ff movq %rdi, -0x100(%r9) 33: 31 c0 xorl %eax, %eax 35: c9 leave 36: e9 20 5d 0f e1 jmp 0xffffffffe10f5d5b The insn 'addq %gs:-0x78b73fd8, %r9' does not match the expected regex 'addq %gs:0x{{.*}}, %r9' and this caused test failure. Fix it by changing '%gs:0x{{.*}}' to '%gs:{{.*}}' to accommodate the possible negative offset. A few other subtests are fixed in a similar way. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250331033828.365077-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>