summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-07-03libbpf: Add bpf_stream_printk() macroKumar Kartikeya Dwivedi
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to print to the respective streams. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03bpf: Introduce BPF standard streamsKumar Kartikeya Dwivedi
Add support for a stream API to the kernel and expose related kfuncs to BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These can be used for printing messages that can be consumed from user space, thus it's similar in spirit to existing trace_pipe interface. The kernel will use the BPF_STDERR stream to notify the program of any errors encountered at runtime. BPF programs themselves may use both streams for writing debug messages. BPF library-like code may use BPF_STDERR to print warnings or errors on misuse at runtime. The implementation of a stream is as follows. Everytime a message is emitted from the kernel (directly, or through a BPF program), a record is allocated by bump allocating from per-cpu region backed by a page obtained using alloc_pages_nolock(). This ensures that we can allocate memory from any context. The eventual plan is to discard this scheme in favor of Alexei's kmalloc_nolock() [0]. This record is then locklessly inserted into a list (llist_add()) so that the printing side doesn't require holding any locks, and works in any context. Each stream has a maximum capacity of 4MB of text, and each printed message is accounted against this limit. Messages from a program are emitted using the bpf_stream_vprintk kfunc, which takes a stream_id argument in addition to working otherwise similar to bpf_trace_vprintk. The bprintf buffer helpers are extracted out to be reused for printing the string into them before copying it into the stream, so that we can (with the defined max limit) format a string and know its true length before performing allocations of the stream element. For consuming elements from a stream, we expose a bpf(2) syscall command named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the stream of a given prog_fd into a user space buffer. The main logic is implemented in bpf_stream_read(). The log messages are queued in bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and ordered correctly in the stream backlog. For this purpose, we hold a lock around bpf_stream_backlog_peek(), as llist_del_first() (if we maintained a second lockless list for the backlog) wouldn't be safe from multiple threads anyway. Then, if we fail to find something in the backlog log, we splice out everything from the lockless log, and place it in the backlog log, and then return the head of the backlog. Once the full length of the element is consumed, we will pop it and free it. The lockless list bpf_stream::log is a LIFO stack. Elements obtained using a llist_del_all() operation are in LIFO order, thus would break the chronological ordering if printed directly. Hence, this batch of messages is first reversed. Then, it is stashed into a separate list in the stream, i.e. the backlog_log. The head of this list is the actual message that should always be returned to the caller. All of this is done in bpf_stream_backlog_fill(). From the kernel side, the writing into the stream will be a bit more involved than the typical printk. First, the kernel typically may print a collection of messages into the stream, and parallel writers into the stream may suffer from interleaving of messages. To ensure each group of messages is visible atomically, we can lift the advantage of using a lockless list for pushing in messages. To enable this, we add a bpf_stream_stage() macro, and require kernel users to use bpf_stream_printk statements for the passed expression to write into the stream. Underneath the macro, we have a message staging API, where a bpf_stream_stage object on the stack accumulates the messages being printed into a local llist_head, and then a commit operation splices the whole batch into the stream's lockless log list. This is especially pertinent for rqspinlock deadlock messages printed to program streams. After this change, we see each deadlock invocation as a non-interleaving contiguous message without any confusion on the reader's part, improving their user experience in debugging the fault. While programs cannot benefit from this staged stream writing API, they could just as well hold an rqspinlock around their print statements to serialize messages, hence this is kept kernel-internal for now. Overall, this infrastructure provides NMI-safe any context printing of messages to two dedicated streams. Later patches will add support for printing splats in case of BPF arena page faults, rqspinlock deadlocks, and cond_break timeouts, and integration of this facility into bpftool for dumping messages to user space. [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03selftests/bpf: Add test cases for bpf_dynptr_memset()Ihor Solodrai
Add tests to verify the behavior of bpf_dynptr_memset(): * normal memset 0 * normal memset non-0 * memset with an offset * memset in dynptr that was adjusted * error: size overflow * error: offset+size overflow * error: readonly dynptr * memset into non-linear xdp dynptr Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/bpf/20250702210309.3115903-3-isolodrai@meta.com
2025-07-03selftests/nolibc: use file driver for QEMU serialThomas Weißschuh
For the test implementation of the SuperH architecture a second serial serial port needs to be used. Unfortunately the currently used 'stdio' driver does not support multiple serial ports at the same time. Switch to the 'file' driver which does support multiple ports and is sufficient for the nolibc-test usecase. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20250623-nolibc-sh-v2-2-0f5b4b303025@weissschuh.net
2025-07-03selftests/nolibc: fix EXTRACONFIG variables orderingThomas Weißschuh
The variable block got disordered at some point. Use the correct ordering. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20250623-nolibc-sh-v2-1-0f5b4b303025@weissschuh.net
2025-07-03KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_tAnshuman Khandual
Change MDSCR_EL1 register holding local variables as uint64_t that reflects its true register width as well. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Joey Gouly <joey.gouly@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.linux.dev Cc: linux-kernel@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250613023646.1215700-3-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03perf test: Add more test cases to sched testNamhyung Kim
$ sudo ./perf test -vv 92 92: perf sched tests: --- start --- test child forked, pid 1360101 Sched record pid 1360105's current affinity list: 0-3 pid 1360105's new affinity list: 0 pid 1360107's current affinity list: 0-3 pid 1360107's new affinity list: 0 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.330 MB /tmp/__perf_test_sched.perf.data.b3319 (12246 samples) ] Sched latency Sched script Sched map Sched timehist Samples of sched_switch event do not have callchains. ---- end(0) ---- 92: perf sched tests : Ok Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix memory leaks in 'perf sched latency'Namhyung Kim
The work_atoms should be freed after use. Add free_work_atoms() to make sure to release all. It should use list_splice_init() when merging atoms to prevent accessing invalid pointers. Fixes: b1ffe8f3e0c96f552 ("perf sched: Finish latency => atom rename and misc cleanups") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Use RC_CHK_EQUAL() to compare pointersNamhyung Kim
So that it can check two pointers to the same object properly when REFCNT_CHECKING is on. Fixes: 78c32f4cb12f9430 ("libperf rc_check: Add RC_CHK_EQUAL") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix memory leaks for evsel->priv in timehistNamhyung Kim
It uses evsel->priv to save per-cpu timing information. It should be freed when the evsel is released. Add the priv destructor for evsel same as thread to handle that. Fixes: 49394a2a24c78ce0 ("perf sched timehist: Introduce timehist command") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix thread leaks in 'perf sched timehist'Namhyung Kim
Add missing thread__put() after machine__findnew_thread() or timehist_get_thread(). Also idle threads' last_thread should be refcounted properly. Fixes: 699b5b920db04a6f ("perf sched timehist: Save callchain when entering idle") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Fix memory leaks in 'perf sched map'Namhyung Kim
It maintains per-cpu pointers for the current thread but it doesn't release the refcounts. Fixes: 5e895278697c014e ("perf sched: Move curr_thread initialization to perf_sched__map()") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Free thread->priv using priv_destructorNamhyung Kim
In many perf sched subcommand saves priv data structure in the thread but it forgot to free them. As it's an opaque type with 'void *', it needs to register that knows how to free the data. In this case, just regular 'free()' is fine. Fixes: 04cb4fc4d40a5bf1 ("perf thread: Allow tools to register a thread->priv destructor") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf sched: Make sure it frees the usage stringNamhyung Kim
The parse_options_subcommand() allocates the usage string based on the given subcommands. So it should reach the end of the function to free the string to prevent memory leaks. Fixes: 1a5efc9e13f357ab ("libsubcmd: Don't free the usage string") Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703014942.1369397-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf tests make: Add NO_LIBDW=1 to minimal and add standalone testIan Rogers
Missing testing coverage of NO_LIBDW=1 and add NO_LIBDW=1 to the minimal test configuration. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250703053622.3141424-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03perf header: Fix pipe mode header dumpingIan Rogers
The pipe mode header dumping was accidentally removed when tracing of header feature events in pipe mode was added. Minor spelling tweak to header test failure message. Fixes: 61051f9a8452 ("perf header: In pipe mode dump features without --header/-I") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250703042000.2740640-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-03selftests/sched_ext: Fix exit selftest hang on UPAndrea Righi
On single-CPU systems, ops.select_cpu() is never called, causing the EXIT_SELECT_CPU test case to wait indefinitely. Avoid the stall by skipping this specific sub-test when only one CPU is available. Reported-by: Phil Auld <pauld@redhat.com> Fixes: a5db7817af780 ("sched_ext: Add selftests") Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Phil Auld <pauld@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-03kselftest/arm64: Specify SVE data when testing VL set in sve-ptraceMark Brown
Since f916dd32a943 ("arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state") we reject attempts to write to the streaming mode regset even if there is no register data supplied, causing the tests for setting vector lengths and setting SVE_VL_INHERIT in sve-ptrace to spuriously fail. Set the flag to avoid the issue, we still support not supplying register data. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-ssve-fixups-v2-3-998fcfa6f240@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptraceMark Brown
Since f916dd32a943 ("arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state") we do not support writing FPSIMD payload data when writing NT_ARM_SSVE but the sve-ptrace test has an explicit test for this being supported which was not updated to reflect the new behaviour. Fix the test to expect a failure when writing FPSIMD data to the streaming mode register set. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-ssve-fixups-v2-2-998fcfa6f240@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03kselftest/arm64: Fix check for setting new VLs in sve-ptraceMark Brown
The check that the new vector length we set was the expected one was typoed to an assignment statement which for some reason the compilers didn't spot, most likely due to the macros involved. Fixes: a1d7111257cd ("selftests: arm64: More comprehensively test the SVE ptrace interface") Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-ssve-fixups-v2-1-998fcfa6f240@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-03kselftest/arm64: Convert tpidr2 test to use kselftest.hMark Brown
Recent work by Thomas Weißschuh means that it is now possible to use kselftest.h with nolibc. Convert the tpidr2 test which is nolibc specific to use kselftest.h, making it look more standard and ensuring it gets the benefit of any work done on kselftest.h. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-nolibc-header-v1-1-16ee1c6fbfed@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02perf test: In forked mode add check that fds aren't leakedIan Rogers
When a test is forked no file descriptors should be open, however, parent ones may have been inherited - in particular those of the pipes of other forked child test processes. Add a loop to clean-up/close those file descriptors prior to running the test. At the end of the test assert that no additional file descriptors are present as this would indicate a file descriptor leak. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250624190326.2038704-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02perf dso: With ref count checking, avoid dso_data holding dso liveIan Rogers
With the dso_data embedded in a dso there is a reference counted pointer to the dso rather than using container_of with reference count checking. This data can hold the dso live meaning that no dso__put ever deletes it. Add a check for this case and close the dso_data when it happens. There isn't an infinite loop as the dso_data clears the file descriptor prior to putting on the dso. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250624190326.2038704-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02perf hwmon_pmu: Hold path rather than fdIan Rogers
Hold the path to the hwmon_pmu rather than the file descriptor. The file descriptor is somewhat problematic in that it reflects the directory state when opened, something that may vary in testing. Using a path simplifies testing and to some extent cleanup as the hwmon_pmu is owned by the pmus list and intentionally global and leaked when perf terminates, the file descriptor being left open looks like a leak. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250624190326.2038704-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02perf test code-reading: Avoid a leak of cpus and threadsIan Rogers
The perf_evlist__set_maps does the necessary gets on the arguments passed, so the reference count bumping isn't necessary and creates a memory leak. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250624190326.2038704-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02perf dso: Add missed dso__put to dso__load_kcoreIan Rogers
The kcore loading creates a set of list nodes that have reference counted references to maps of the kcore. The list node freeing in the success path wasn't releasing the maps, add the missing puts. It is unclear why this leak was being missed by leak sanitizer. Fixes: 83720209961f ("perf map: Move map list node into symbol") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250624190326.2038704-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02perf genelf: Fix NO_LIBDW=1 buildIan Rogers
With NO_LIBDW=1 a new unused-parameter warning/error has appeared: ``` util/genelf.c: In function ‘jit_write_elf’: util/genelf.c:163:32: error: unused parameter ‘load_addr’ [-Werror=unused-parameter] 163 | jit_write_elf(int fd, uint64_t load_addr, const char *sym, ``` Fixes: e3f612c1d8f3 ("perf genelf: Remove libcrypto dependency and use built-in sha1()") Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250702175402.761818-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02perf list: Add IBM z17 event descriptionsThomas Richter
Update IBM z17 counter description using document SA23-2260-08: "The Load-Program-Parameter and the CPU-Measurement Facilities" released in May 2025 to include counter definitions for IBM z17 counter sets: * Basic counter set * Problem/user counter set * Crypto counter set. Use document SA23-2261-09: "The CPU-Measurement Facility Extended Counters Definition for z10, z196/z114, zEC12/zBC12, z13/z13s, z14, z15, z16 and z17" released on April 2025 to include counter definitions for IBM z17 * Extended counter set * MT-Diagnostic counter set. Use document SA22-7832-14: "z/Architecture Principles of Operation." released in April 2025 to include counter definitions for IBM z17 * PAI-Crypto counter set * PAI-Extention counter set. Use document "CPU MF Formulas and Updates April 2025" released in April 2025 to include metric calculations. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250623132731.899525-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02perf tools: Fix use-after-free in help_unknown_cmd()Namhyung Kim
Currently perf aborts when it finds an invalid command. I guess it depends on the environment as I have some custom commands in the path. $ perf bad-command perf: 'bad-command' is not a perf-command. See 'perf --help'. Aborted (core dumped) It's because the exclude_cmds() in libsubcmd has a use-after-free when it removes some entries. After copying one to another entry, it keeps the pointer in the both position. And the next copy operation will free the later one but it's the same entry in the previous one. For example, let's say cmds = { A, B, C, D, E } and excludes = { B, E }. ci cj ei cmds-name excludes -----------+-------------------- 0 0 0 | A B : cmp < 0, ci == cj 1 1 0 | B B : cmp == 0 2 1 1 | C E : cmp < 0, ci != cj At this point, it frees cmds->names[1] and cmds->names[1] is assigned to cmds->names[2]. 3 2 1 | D E : cmp < 0, ci != cj Now it frees cmds->names[2] but it's the same as cmds->names[1]. So accessing cmds->names[1] will be invalid. This makes the subcmd tests succeed. $ perf test subcmd 69: libsubcmd help tests : 69.1: Load subcmd names : Ok 69.2: Uniquify subcmd names : Ok 69.3: Exclude duplicate subcmd names : Ok Fixes: 4b96679170c6 ("libsubcmd: Avoid SEGV/use-after-free when commands aren't excluded") Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250701201027.1171561-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-02selftests: drv-net: Add test for devlink-rate traffic class bandwidth ↵Carolina Jubran
distribution This test suite validates the functionality of the devlink-rate API for traffic class (TC) bandwidth allocation. It ensures that bandwidth can be distributed between different traffic classes as configured, and verifies that explicit TC-to-queue mapping is required for the allocation to be effective. The first test (test_no_tc_mapping_bandwidth) is marked as expected failure on mlx5, since the hardware automatically enforces traffic class separation by dynamically moving queues to the correct TC scheduler, even without explicit TC-to-queue mapping configuration. Test output on mlx5: 1..2 # Created VF interface: eth5 # Created VLAN eth5.101 on eth5 with tc 3 and IP 198.51.100.2 # Created VLAN eth5.102 on eth5 with tc 4 and IP 198.51.100.10 # Set representor eth4 up and added to bridge # Bandwidth check results without TC mapping: # TC 3: 0.19 Gbits/sec # TC 4: 0.76 Gbits/sec # Total bandwidth: 0.95 Gbits/sec # TC 3 percentage: 20.0% # TC 4 percentage: 80.0% ok 1 devlink_rate_tc_bw.test_no_tc_mapping_bandwidth # XFAIL Bandwidth matched 80/20 split without TC mapping # Created VF interface: eth5 # Created VLAN eth5.101 on eth5 with tc 3 and IP 198.51.100.2 # Created VLAN eth5.102 on eth5 with tc 4 and IP 198.51.100.10 # Set representor eth4 up and added to bridge # Bandwidth check results with TC mapping: # TC 3: 0.21 Gbits/sec # TC 4: 0.78 Gbits/sec # Total bandwidth: 0.98 Gbits/sec # TC 3 percentage: 21.1% # TC 4 percentage: 78.9% # Bandwidth is distributed as 80/20 with TC mapping ok 2 devlink_rate_tc_bw.test_tc_mapping_bandwidth # Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0 Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02selftest: netdevsim: Add devlink rate tc-bw testCarolina Jubran
Test verifies that netdevsim correctly implements devlink ops callbacks that set tc-bw on leaf or node rate object. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02selftest: net: extend msg_zerocopy test with forwardingWillem de Bruijn
Zerocopy skbs are converted to regular copy skbs when data is queued to a local socket. This happens in the existing test with a sender and receiver communicating over a veth device. Zerocopy skbs are sent without copying if egressing a device. Verify that this behavior is maintained even in the common container setup where data is forwarded over a veth to the physical device. Update msg_zerocopy.sh to 1. Have a dummy network device to simulate a physical device. 2. Have forwarding enabled between veth and dummy. 3. Add a tx-only test that sends out dummy via the forwarding path. 4. Verify the exitcode of the sender, which signals zerocopy success. As dummy drops all packets, this cannot be a TCP connection. Test the new case with unconnected UDP only. Update msg_zerocopy.c to - Accept an argument whether send with zerocopy is expected. - Return an exitcode whether behavior matched that expectation. Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250630194312.1571410-3-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02vsock/test: Add test for null ptr deref when transport changesLuigi Leonardi
Add a new test to ensure that when the transport changes a null pointer dereference does not occur. The bug was reported upstream [1] and fixed with commit 2cb7c756f605 ("vsock/virtio: discard packets if the transport changes"). KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067] CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted Workqueue: vsock-loopback vsock_loopback_work RIP: 0010:vsock_stream_has_data+0x44/0x70 Call Trace: virtio_transport_do_close+0x68/0x1a0 virtio_transport_recv_pkt+0x1045/0x2ae4 vsock_loopback_work+0x27d/0x3f0 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x35a/0x700 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 Note that this test may not fail in a kernel without the fix, but it may hang on the client side if it triggers a kernel oops. This works by creating a socket, trying to connect to a server, and then executing a second connect operation on the same socket but to a different CID (0). This triggers a transport change. If the connect operation is interrupted by a signal, this could cause a null-ptr-deref. Since this bug is non-deterministic, we need to try several times. It is reasonable to assume that the bug will show up within the timeout period. If there is a G2H transport loaded in the system, the bug is not triggered and this test will always pass. This is because `vsock_assign_transport`, when using CID 0, like in this case, sets vsk->transport to `transport_g2h` that is not NULL if a G2H transport is available. [1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/ Suggested-by: Hyunwoo Kim <v4bel@theori.io> Suggested-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250630-test_vsock-v5-2-2492e141e80b@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02vsock/test: Add macros to identify transportsLuigi Leonardi
Add three new macros: TRANSPORTS_G2H, TRANSPORTS_H2G and TRANSPORTS_LOCAL. They can be used to identify the type of the transport(s) loaded when using the `get_transports()` function. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250630-test_vsock-v5-1-2492e141e80b@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02selftests/bpf: Allow veristat compile standaloneMykyta Yatsenko
Veristat is synced into the standalone repo, where it compiles without kernel private dependencies. This patch fixes compilation errors in standalone veristat. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250702175622.358405-1-mykyta.yatsenko5@gmail.com
2025-07-02kselftest/arm64/mte: Add MTE_STORE_ONLY testcasesYeoreum Yun
Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. Adds new test cases using MTE_STORE_ONLY feature. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-9-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64/mte: Preparation for mte store only testYeoreum Yun
Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. This patch is preparation for testing FEAT_MTE_STORE_ONLY It shouldn't change test result. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-8-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64/abi: Add MTE_STORE_ONLY feature hwcap testYeoreum Yun
add MTE_STORE_ONLY feature hwcap test. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-7-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02selftests/bpf: Negative test case for ref_obj_id in argsPaul Chaignon
This patch adds a test case, as shown below, for the verifier error "more than one arg with ref_obj_id". 0: (b7) r2 = 20 1: (b7) r3 = 0 2: (18) r1 = 0xffff92cee3cbc600 4: (85) call bpf_ringbuf_reserve#131 5: (55) if r0 == 0x0 goto pc+3 6: (bf) r1 = r0 7: (bf) r2 = r0 8: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 9: (95) exit This error is currently incorrectly reported as a verifier bug, with a warning. The next patch in this series will address that. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/3ba78e6cda47ccafd6ea70dadbc718d020154664.1751463262.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02selftests/bpf: null checks for rdonly_untrusted_mem should be preservedEduard Zingerman
Test case checking that verifier does not assume rdonly_untrusted_mem values as not null. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250702073620.897517-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02selftests/bpf: Don't call fsopen() as privileged userMatteo Croce
In the BPF token example, the fsopen() syscall is called as privileged user. This is unneeded because fsopen() can be called also as unprivileged user from the user namespace. As the `fs_fd` file descriptor which was sent back and forth is still the same, keep it open instead of cloning and closing it twice via SCM_RIGHTS. cfr. https://github.com/systemd/systemd/pull/36134 Signed-off-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20250701183123.31781-1-technoboy85@gmail.com
2025-07-02kselftest/arm64/mte: Add mtefar tests on check_mmap_optionsYeoreum Yun
If FEAT_MTE_TAGGED_FAR (Armv8.9) is supported, bits 63:60 of the fault address are preserved in response to synchronous tag check faults (SEGV_MTESERR). This patch adds new test cases using address tags (bits 63:60), corresponding to each existing test in check_mmap_option. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618084513.1761345-11-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64/mte: Refactor check_mmap_option testYeoreum Yun
Before add mtefar testcase on check_mmap_option.c, refactor check_mmap_option: - make testcase suite array with test options (mem_type, mte_sync type and etc) to use general testcase pattern - generate each test case name acoording to test options. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618084513.1761345-10-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64/mte: Add verification for address tag in signal handlerYeoreum Yun
Add the address tag [63:60] verification when synchronous mte fault is happen. when signal handler is registered with SA_EXPOSE_TAGBITS, address includes not only memory tag [59:56] but also address tag. Therefore, when verify fault address location, remove both tags Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618084513.1761345-9-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64/mte: Add address tag related macro and functionYeoreum Yun
Add address tag related macro and function to test MTE_FAR feature. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618084513.1761345-8-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64/mte: Check MTE_FAR feature is supportedYeoreum Yun
To run the MTE_FAR test when cpu supports MTE_FAR feature, check the MTE_FAR feature is supported in mte test. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618084513.1761345-7-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITSYeoreum Yun
To test address tag[63:60] and memory tag[59:56] is preserved when memory tag fault happen, Let mte_register_signal() to register signal handler with SA_EXPOSE_TAGBITS. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618084513.1761345-6-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02kselftest/arm64: Add MTE_FAR hwcap testYeoreum Yun
add MTE_FAR hwcap test on kselftest. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618084513.1761345-5-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "Some changes to the userspace selftest framework cause the iommufd tests to start failing. This turned out to be bugs in the iommufd side that were just getting uncovered. - Deal with MAP_HUGETLB mmaping more than requested even when in MAP_FIXED mode - Fixup missing error flow cleanup in the test - Check that the memory allocations suceeded - Suppress some bogus gcc 'may be used uninitialized' warnings" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd/selftest: Fix build warnings due to uninitialized mfd iommufd/selftest: Add asserts testing global mfd iommufd/selftest: Add missing close(mfd) in memfd_mmap() iommufd/selftest: Fix iommufd_dirty_tracking with large hugepage sizes
2025-07-02selftests/kernfs: test xattr retrievalChristian Brauner
Make sure that listxattr() returns zero and that getxattr() returns ENODATA when no extended attributs are set. Use /sys/kernel/warn_count as that always exists and is a read-only file. Link: https://lore.kernel.org/20250702-hochmoderne-abklatsch-af9c605b57b2@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>