summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-05-09perf pmu: Use available core PMU for raw eventsNamhyung Kim
When it finds a matching PMU for a legacy event, it should look for core PMUs. The raw events also refers to core events so it should be handled similarly. On x86, PERF_TYPE_RAW should match with the existing cpu PMU. But on ARM, there's no PMU with the matching type so it'll pick the first core PMU for it. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250507215939.54399-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf lock contention: Add -J/--inject-delay optionNamhyung Kim
This is to slow down lock acquistion (on contention locks) deliberately. A possible use case is to estimate impact on application performance by optimization of kernel locking behavior. By delaying the lock it can simulate the worse condition as a control group, and then compare with the current behavior as a optimized condition. The syntax is 'time@function' and the time can have unit suffix like "us" and "ms". For example, I ran a simple test like below. $ sudo perf lock con -abl -L tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 92 1.18 ms 199.54 us 12.79 us ffffffff8a806080 tasklist_lock (rwlock) The contention count was 92 and the average wait time was around 10 us. But if I add 100 usec of delay to the tasklist_lock, $ sudo perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 190 15.67 ms 230.10 us 82.46 us ffffffff8a806080 tasklist_lock (rwlock) The contention count increased and the average wait time was up closed to 100 usec. If I increase the delay even more, $ sudo perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1002 2.80 s 3.01 ms 2.80 ms ffffffff8a806080 tasklist_lock (rwlock) Now every sleep process had contention and the wait time was more than 1 msec. This is on my 4 CPU laptop so I guess one CPU has the lock while other 3 are waiting for it mostly. For simplicity, it only supports global locks for now. Committer testing: root@number:~# grep -m1 'model name' /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processor root@number:~# perf lock con -abl -L tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 142 453.85 us 25.39 us 3.20 us ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1040 2.39 s 3.11 ms 2.30 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1025 24.72 s 31.01 ms 24.12 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250509171950.183591-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf tests: Fix 'perf report' tests installationMichael Petlan
There was a copy-paste mistake in the installation commands. Also, we need to install stderr-whitelist.txt file, which contains allowed messages that are printed on stderr and should not cause test fail. Fixes: 097fe67df1aa9cc7 ("perf testsuite: Install perf-report tests in the 'make install-tests -C tools/perf' target") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250113182605.130719-6-vmolnaro@redhat.com Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09selftests/bpf: Enable non-arena load-acquire/store-release selftests for riscv64Peilin Ye
For riscv64, enable all BPF_{LOAD_ACQ,STORE_REL} selftests except the arena_atomics/* ones (not guarded behind CAN_USE_LOAD_ACQ_STORE_REL), since arena access is not yet supported. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/9d878fa99a72626208a8eed3c04c4140caf77fda.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/bpf: Verify zero-extension behavior in load-acquire testsPeilin Ye
Verify that 8-, 16- and 32-bit load-acquires are zero-extending by using immediate values with their highest bit set. Do the same for the 64-bit variant to keep the style consistent. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/11097fd515f10308b3941469ee4c86cb8872db3f.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/bpf: Avoid passing out-of-range values to __retval()Peilin Ye
Currently, we pass 0x1234567890abcdef to __retval() for the following two tests: verifier_load_acquire/load_acquire_64 verifier_store_release/store_release_64 However, the upper 32 bits of that value are being ignored, since __retval() expects an int. Actually, the tests would still pass even if I change '__retval(0x1234567890abcdef)' to e.g. '__retval(0x90abcdef)'. Restructure the tests a bit to test the entire 64-bit values properly. Do the same to their 8-, 16- and 32-bit variants as well to keep the style consistent. Fixes: ff3afe5da998 ("selftests/bpf: Add selftests for load-acquire and store-release instructions") Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/d67f4c6f6ee0d0388cbce1f4892ec4176ee2d604.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/bpf: Use CAN_USE_LOAD_ACQ_STORE_REL when appropriatePeilin Ye
Instead of open-coding the conditions, use '#ifdef CAN_USE_LOAD_ACQ_STORE_REL' to guard the following tests: verifier_precision/bpf_load_acquire verifier_precision/bpf_store_release verifier_store_release/* Note that, for the first two tests in verifier_precision.c, switching to '#ifdef CAN_USE_LOAD_ACQ_STORE_REL' means also checking if '__clang_major__ >= 18', which has already been guaranteed by the outer '#if' check. Acked-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23 Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/45d7e025f6e390a8ff36f08fc51e31705ac896bd.1746588351.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09selftests/tc-testing: Add qdisc limit trimming testsCong Wang
Added new test cases for FQ, FQ_CODEL, FQ_PIE, and HHF qdiscs to verify queue trimming behavior when the qdisc limit is dynamically reduced. Each test injects packets, reduces the qdisc limit, and checks that the new limit is enforced. This is still best effort since timing qdisc backlog is not easy. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-08selftests: net-drv: remove the nic_performance and nic_link_layer testsJakub Kicinski
Revert fbbf93556f0c ("selftests: nic_performance: Add selftest for performance of NIC driver") Revert c087dc54394b ("selftests: nic_link_layer: Add selftest case for speed and duplex states") Revert 6116075e18f7 ("selftests: nic_link_layer: Add link layer selftest for NIC driver") These tests don't clean up after themselves, don't use the disruptive annotations, don't get included in make install etc. etc. The tests were added before we have any "HW" runner, so the issues were missed. Our CI doesn't have any way of excluding broken tests, remove these for now to stop the random pollution of results due to broken env. We can always add them back once / if fixed. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250507140109.929801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-08selftests: netfilter: fix conntrack stress test failures on debug kernelsFlorian Westphal
Jakub reports test failures on debug kernel: FAIL: proc inconsistency after uniq filter for ... This is because entries are expiring while validation is happening. Increase the timeout of ctnetlink injected entries and the icmp (ping) timeout to 1h to avoid this. To reduce run-time, add less entries via ctnetlink when KSFT_MACHINE_SLOW is set. also log of a failed run had: PASS: dump in netns had same entry count (-C 0, -L 0, -p 0, /proc 0) ... i.e. all entries already expired: add a check and set failure if this happens. While at it, include a diff when there were duplicate entries and add netns name to error messages (it tells if icmp or ctnetlink failed). Fixes: d33f889fd80c ("selftests: netfilter: add conntrack stress test") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20250506061125.1a244d12@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20250507075000.5819-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-08perf trace: Fix leaks of 'struct thread' in set_filter_loop_pids()Namhyung Kim
I've found some leaks from 'perf trace -a'. It seems there are more leaks but this is what I can find for now. Fixes: 082ab9a18e532864 ("perf trace: Filter out 'sshd' in the tracer ancestry in syswide tracing") Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org [ split from a larget patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Fix leaks of 'struct thread' in fprintf_sys_enter()Namhyung Kim
I've found some leaks from 'perf trace -a'. It seems there are more leaks but this is what I can find for now. Fixes: 70351029b55677eb ("perf thread: Add support for reading the e_machine type for a thread") Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org [ split from a larget patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08bpftool: Fix cgroup command to only show cgroup bpf programsMartin KaFai Lau
The netkit program is not a cgroup bpf program and should not be shown in the output of the "bpftool cgroup show" command. However, if the netkit device happens to have ifindex 3, the "bpftool cgroup show" command will output the netkit bpf program as well: > ip -d link show dev nk1 3: nk1@if2: ... link/ether ... netkit mode ... > bpftool net show tc: nk1(3) netkit/peer tw_ns_nk2phy prog_id 469447 > bpftool cgroup show /sys/fs/cgroup/... ID AttachType AttachFlags Name ... ... ... 469447 netkit_peer tw_ns_nk2phy The reason is that the target_fd (which is the cgroup_fd here) and the target_ifindex are in a union in the uapi/linux/bpf.h. The bpftool iterates all values in "enum bpf_attach_type" which includes non cgroup attach types like netkit. The cgroup_fd is usually 3 here, so the bug is triggered when the netkit ifindex just happens to be 3 as well. The bpftool's cgroup.c already has a list of cgroup-only attach type defined in "cgroup_attach_types[]". This patch fixes it by iterating over "cgroup_attach_types[]" instead of "__MAX_BPF_ATTACH_TYPE". Cc: Quentin Monnet <qmo@kernel.org> Reported-by: Takshak Chahande <ctakshak@meta.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20250507203232.1420762-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.15-rc6). No conflicts. Adjacent changes: net/core/dev.c: 08e9f2d584c4 ("net: Lock netdevices during dev_shutdown") a82dc19db136 ("net: avoid potential race between netdev_get_by_index_lock() and netns switch") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-08perf parse-events: Add debug dump of evlist if reorderedIan Rogers
Add debug verbose output to show how evsels were reordered by parse_events__sort_events_and_fix_groups(). For example: ``` $ perf record -v -e '{instructions,cycles}' true Using CPUID GenuineIntel-6-B7-1 WARNING: events were regrouped to match PMUs evlist after sorting/fixing: '{cpu_atom/instructions/,cpu_atom/cycles/},{cpu_core/instructions/,cpu_core/cycles/}' ``` Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf evlist: Make groups visible in evlist__format_evsels() outputIan Rogers
Make groups visible in output: Before: {cycles,instructions} -> cpu_atom/cycles/,cpu_atom/instructions/,cpu_core/cycles/,cpu_core/instructions/ After: {cycles,instructions} -> {cpu_atom/cycles/,cpu_atom/instructions/},{cpu_core/cycles/,cpu_core/instructions/} Committer testing: Before: root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied root@number:~# After: root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla Failed to collect '{cycles,instructions,cache-misses}' for the '/tmp/bla' workload: Permission denied root@number:~# Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf evlist: Refactor evlist__scnprintf_evsels()Ian Rogers
Switch output to using a strbuf so the storage can be resized. Add a maximum size argument to avoid too much output that may happen for uncore events. Rename as scnprintf is no longer used. Committer testing: With the patch applied: root@number:~# perf probe -x ~/bin/perf evlist__format_evsels Added new event: probe_perf:evlist_format_evsels (on evlist__format_evsels in /home/acme/bin/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:evlist_format_evsels -aR sleep 1 root@number:~# perf probe -l probe_perf:evlist_format_evsels (on evlist__format_evsels@util/evlist.c in /home/acme/bin/perf) root@number:~# perf trace -e probe_perf:*/max-stack=10/ perf record -e cycles,instructions,cache-misses /tmp/bla Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied 0.000 perf/3893011 probe_perf:evlist_format_evsels(__probe_ip: 6183397) evlist__format_evsels (/home/acme/bin/perf) __cmd_record (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) handle_internal_command (/home/acme/bin/perf) run_argv (/home/acme/bin/perf) main (/home/acme/bin/perf) __libc_start_call_main (/usr/lib64/libc.so.6) __libc_start_main@@GLIBC_2.34 (/usr/lib64/libc.so.6) _start (/home/acme/bin/perf) root@number:~# Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf stat: Remove print_mixed_hw_group_errorIan Rogers
print_mixed_hw_group_error will print a warning when a group of events uses different PMUs. This isn't possible to happen as parse_events__sort_events_and_fix_groups() will break groups when this happens, adding the warning at the start of perf of: WARNING: events were regrouped to match PMUs As the previous mixed group warning can never happen, remove the associated code. Committer testing: Before/after: acme@five:~$ perf stat -e '{cpu_core/cycles/,cpu_atom/cycles/}' sleep 1 WARNING: events were regrouped to match PMUs Performance counter stats for 'sleep 1': 424,895 cpu_atom/cycles/u <not counted> cpu_core/cycles/u (0.00%) 1.011862314 seconds time elapsed 0.000000000 seconds user 0.003166000 seconds sys acme@five:~$ Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08Merge tag 'net-6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN, WiFi and netfilter. We have still a comple of regressions open due to the recent drivers locking refactor. The patches are in-flight, but not ready yet. Current release - regressions: - core: lock netdevices during dev_shutdown - sch_htb: make htb_deactivate() idempotent - eth: virtio-net: don't re-enable refill work too early Current release - new code bugs: - eth: icssg-prueth: fix kernel panic during concurrent Tx queue access Previous releases - regressions: - gre: fix again IPv6 link-local address generation. - eth: b53: fix learning on VLAN unaware bridges Previous releases - always broken: - wifi: fix out-of-bounds access during multi-link element defragmentation - can: - initialize spin lock on device probe - fix order of unregistration calls - openvswitch: fix unsafe attribute parsing in output_userspace() - eth: - virtio-net: fix total qstat values - mtk_eth_soc: reset all TX queues on DMA free - fbnic: firmware IPC mailbox fixes" * tag 'net-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) virtio-net: fix total qstat values net: export a helper for adding up queue stats fbnic: Do not allow mailbox to toggle to ready outside fbnic_mbx_poll_tx_ready fbnic: Pull fbnic_fw_xmit_cap_msg use out of interrupt context fbnic: Improve responsiveness of fbnic_mbx_poll_tx_ready fbnic: Cleanup handling of completions fbnic: Actually flush_tx instead of stalling out fbnic: Add additional handling of IRQs fbnic: Gate AXI read/write enabling on FW mailbox fbnic: Fix initialization of mailbox descriptor rings net: dsa: b53: do not set learning and unicast/multicast on up net: dsa: b53: fix learning on VLAN unaware bridges net: dsa: b53: fix toggling vlan_filtering net: dsa: b53: do not program vlans when vlan filtering is off net: dsa: b53: do not allow to configure VLAN 0 net: dsa: b53: always rejoin default untagged VLAN on bridge leave net: dsa: b53: fix VLAN ID for untagged vlan on bridge leave net: dsa: b53: fix flushing old pvid VLAN on pvid change net: dsa: b53: fix clearing PVID of a port net: dsa: b53: keep CPU port always tagged again ...
2025-05-08perf stat: Better hybrid support for the NMI watchdog warningIan Rogers
Prior to this patch evlist__has_hybrid would return false if the processor wasn't hybrid or the evlist didn't contain any core events. If the only PMU used by events was cpu_core then it would true even though there are no cpu_atom events. For example: ``` $ perf stat --cputype=cpu_core -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}' true Performance counter stats for 'true': <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) 0.001981900 seconds time elapsed 0.002311000 seconds user 0.000000000 seconds sys ``` This patch changes evlist__has_hybrid to return true only if the evlist contains events from >1 core PMU. This means the NMI watchdog warning is shown for the case above. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Add missing thread__put() in thread__e_machine()Ian Rogers
Add missing thread__put() of the found parent thread in thread__e_machine(). Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Free the files.max entry in files->tableIan Rogers
The files.max is the maximum valid fd in the files array and so freeing the values needs to be inclusive of the max value. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08kselftest/arm64: fp-ptrace: Adjust to new inactive mode behaviourMark Rutland
In order to fix an ABI problem, we recently changed the way that reads of the NT_ARM_SVE and NT_ARM_SSVE regsets behave when their corresponding vector state is inactive. Update the fp-ptrace test for the new behaviour. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-25-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08kselftest/arm64: fp-ptrace: Adjust to new VL change behaviourMark Rutland
In order to fix an ABI problem, we recently changed the way that changing the SVE/SME vector length affects PSTATE.SM. Historically, changing the SME vector length would clear PSTATE.SM. Now, changing the SME vector length preserves PSTATE.SM. Update the fp-ptrace test for the new behaviour. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-24-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08kselftest/arm64: tpidr2: Adjust to new clone() behaviourMark Rutland
In order to fix an ABI problem, we recently changed the way that a clone() syscall manipulates TPIDR2 and PSTATE.ZA. Historically the child would inherit the parent's TPIDR2 value unless CLONE_SETTLS was set, and now the child will inherit the parent's TPIDR2 value unless CLONE_VM is set. Update the tpidr2 test for the new behaviour. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Richard Sandiford <richard.sandiford@arm.com> Cc: Sander De Smalen <sander.desmalen@arm.com> Cc: Tamas Petz <tamas.petz@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yury Khrustalev <yury.khrustalev@arm.com> Link: https://lore.kernel.org/r/20250508132644.1395904-23-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08kselftest/arm64: fp-ptrace: Fix expected FPMR value when PSTATE.SM is changedMark Rutland
The fp-ptrace test suite expects that FPMR is set to zero when PSTATE.SM is changed via ptrace, but ptrace has never altered FPMR in this way, and the test logic erroneously relies upon (and has concealed) a bug where task_fpsimd_load() would unexpectedly and non-deterministically clobber FPMR. Using ptrace, FPMR can only be altered by writing to the NT_ARM_FPMR regset. The value of PSTATE.SM can be altered by writing to the NT_ARM_SVE or NT_ARM_SSVE regsets, and/or by changing the SME vector length (when writing to the NT_ARM_SVE, NT_ARM_SSVE, or NT_ARM_ZA regsets), but none of these writes will change the value of FPMR. The task_fpsimd_load() bug was introduced with the initial FPMR support in commit: 203f2b95a882 ("arm64/fpsimd: Support FEAT_FPMR") The incorrect FPMR test code was introduced in commit: 7dbd26d0b22d ("kselftest/arm64: Add FPMR coverage to fp-ptrace") Subsequently, the task_fpsimd_load() bug was fixed in commit: e5fa85fce08b ("arm64/fpsimd: Don't corrupt FPMR when streaming mode changes") ... whereupon the fp-ptrace FPMR tests started failing reliably, e.g. | # # Mismatch in saved FPMR: 915058000 != 0 | # not ok 25 SVE write, SVE 64->64, SME 64/0->64/1 Fix this by changing the test to expect that FPMR is *NOT* changed when PSTATE.SM is changed via ptrace, matching the extant behaviour. I've chosen to update the test code rather than modifying ptrace to zero FPMR when PSTATE.SM changes. Not zeroing FPMR is simpler overall, and allows the NT_ARM_FPMR regset to be handled independently from other regsets, leaving less scope for error. Fixes: 7dbd26d0b22d ("kselftest/arm64: Add FPMR coverage to fp-ptrace") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-22-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08KVM: selftests: Add a test for x86's fastops emulationSean Christopherson
Add a test to verify KVM's fastops emulation via forced emulation. KVM's so called "fastop" infrastructure executes the to-be-emulated instruction directly on hardware instead of manually emulating the instruction in software, using various shenanigans to glue together the emulator context and CPU state, e.g. to get RFLAGS fed into the instruction and back out for the emulator. Add testcases for all instructions that are low hanging fruit. While the primary goal of the selftest is to validate the glue code, a secondary goal is to ensure "emulation" matches hardware exactly, including for arithmetic flags that are architecturally undefined. While arithmetic flags may be *architecturally* undefined, their behavior is deterministic for a given CPU (likely a given uarch, and possibly even an entire family or class of CPUs). I.e. KVM has effectively been emulating underlying hardware behavior for years. Link: https://lore.kernel.org/r/20250506011250.1089254-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-07selftests/mm: fix a build failure on powerpcNysal Jan K.A.
The compiler is unaware of the size of code generated by the ".rept" assembler directive. This results in the compiler emitting branch instructions where the offset to branch to exceeds the maximum allowed value, resulting in build failures like the following: CC protection_keys /tmp/ccypKWAE.s: Assembler messages: /tmp/ccypKWAE.s:2073: Error: operand out of range (0x0000000000020158 is not between 0xffffffffffff8000 and 0x0000000000007ffc) /tmp/ccypKWAE.s:2509: Error: operand out of range (0x0000000000020130 is not between 0xffffffffffff8000 and 0x0000000000007ffc) Fix the issue by manually adding nop instructions using the preprocessor. Link: https://lkml.kernel.org/r/20250428131937.641989-2-nysal@linux.ibm.com Fixes: 46036188ea1f ("selftests/mm: build with -O2") Reported-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Tested-by: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07selftests/mm: fix build break when compiling pkey_util.cMadhavan Srinivasan
Commit 50910acd6f615 ("selftests/mm: use sys_pkey helpers consistently") added a pkey_util.c to refactor some of the protection_keys functions accessible by other tests. But this broken the build in powerpc in two ways, pkey-powerpc.h: In function `arch_is_powervm': pkey-powerpc.h:73:21: error: storage size of `buf' isn't known 73 | struct stat buf; | ^~~ pkey-powerpc.h:75:14: error: implicit declaration of function `stat'; did you mean `strcat'? [-Wimplicit-function-declaration] 75 | if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", &buf) == 0) && | ^~~~ | strcat Since pkey_util.c includes pkeys-helper.h, which in turn includes pkeys-powerpc.h, stat.h including is missing for "struct stat". This is fixed by adding "sys/stat.h" in pkeys-powerpc.h Secondly, pkey-powerpc.h:55:18: warning: format `%llx' expects argument of type `long long unsigned int', but argument 3 has type `u64' {aka `long unsigned int'} [-Wformat=] 55 | dprintf4("%s() changing %016llx to %016llx\n", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 56 | __func__, __read_pkey_reg(), pkey_reg); | ~~~~~~~~~~~~~~~~~ | | | u64 {aka long unsigned int} pkey-helpers.h:63:32: note: in definition of macro `dprintf_level' 63 | sigsafe_printf(args); \ | ^~~~ These format specifier related warning are removed by adding "__SANE_USERSPACE_TYPES__" to pkeys_utils.c. Link: https://lkml.kernel.org/r/20250428131937.641989-1-nysal@linux.ibm.com Fixes: 50910acd6f61 ("selftests/mm: use sys_pkey helpers consistently") Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07tools/testing/selftests: fix guard region test tmpfs assumptionLorenzo Stoakes
The current implementation of the guard region tests assume that /tmp is mounted as tmpfs, that is shmem. This isn't always the case, and at least one instance of a spurious test failure has been reported as a result. This assumption is unsafe, rushed and silly - and easily remedied by simply using memfd, so do so. We also have to fixup the readonly_file test to explicitly only be applicable to file-backed cases. Link: https://lkml.kernel.org/r/20250425162436.564002-1-lorenzo.stoakes@oracle.com Fixes: 272f37d3e99a ("tools/selftests: expand all guard region tests to file-backed") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Ryan Roberts <ryan.roberts@arm.com> Closes: https://lore.kernel.org/linux-mm/a2d2766b-0ab4-437b-951a-8595a7506fe9@arm.com/ Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07selftests/mm: compaction_test: support platform with huge mount of memoryFeng Tang
When running mm selftest to verify mm patches, 'compaction_test' case failed on an x86 server with 1TB memory. And the root cause is that it has too much free memory than what the test supports. The test case tries to allocate 100000 huge pages, which is about 200 GB for that x86 server, and when it succeeds, it expects it's large than 1/3 of 80% of the free memory in system. This logic only works for platform with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false alarm for others. Fix it by changing the fixed page number to self-adjustable number according to the real number of free memory. Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Acked-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@inux.alibaba.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07tools: ynl-gen: move the count into a presence struct tooJakub Kicinski
While we reshuffle the presence members, move the counts as well. Previously array count members would have been place directly in the struct, so: struct family_op_req { struct { u32 a:1; u32 b:1; } _present; struct { u32 bin; } _len; u32 a; u64 b; const unsigned char *bin; u32 n_multi; << count u32 *multi; << objects }; Since len has been moved to its own presence struct move the count as well: struct family_op_req { struct { u32 a:1; u32 b:1; } _present; struct { u32 bin; } _len; struct { u32 multi; << count } _count; u32 a; u64 b; const unsigned char *bin; u32 *multi; << objects }; This improves the consistency and allows us to remove some hacks in the codegen. Unlike for len there is no known name collision with the existing scheme. Link: https://patch.msgid.link/20250505165208.248049-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-07tools: ynl-gen: split presence metadataJakub Kicinski
Each YNL struct contains the data and a sub-struct indicating which fields are valid. Something like: struct family_op_req { struct { u32 a:1; u32 b:1; u32 bin_len; } _present; u32 a; u64 b; const unsigned char *bin; }; Note that the bin object 'bin' has a length stored, and that length has a _len suffix added to the field name. This breaks if there is a explicit field called bin_len, which is the case for some TC actions. Move the length fields out of the _present struct, create a new struct called _len: struct family_op_req { struct { u32 a:1; u32 b:1; } _present; struct { u32 bin; } _len; u32 a; u64 b; const unsigned char *bin; }; This should prevent name collisions and help with the packing of the struct. Unfortunately this is a breaking change, but hopefully the migration isn't too painful. Link: https://patch.msgid.link/20250505165208.248049-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-07tools: ynl-gen: rename basic presence from 'bit' to 'present'Jakub Kicinski
Internal change to the code gen. Rename how we indicate a type has a single bit presence from using a 'bit' string to 'present'. This is a noop in terms of generated code but will make next breaking change easier. Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250505165208.248049-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-07bpf: Clarify handling of mark and tstamp by redirect_peerPaul Chaignon
When switching network namespaces with the bpf_redirect_peer helper, the skb->mark and skb->tstamp fields are not zeroed out like they can be on a typical netns switch. This patch clarifies that in the helper description. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/ccc86af26d43c5c0b776bcba2601b7479c0d46d0.1746460653.git.paul.chaignon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-07rtla: Define _GNU_SOURCE in timerlat_bpf.cTomas Glozar
Newer versions of glibc include a definition of struct sched_attr in bits/sched.h (included through sched.h which is included by rtla). Commit 0eecee340672 ("tools/rtla: fix collision with glibc sched_attr/sched_set_attr") has modified the definition of struct sched_attr in utils.h, so that it is only applied with older versions of glibc that do not define it, in order to prevent build failure. The definition in bits/sched.h depends on _GNU_SOURCE. timerlat_bpf.c does not define _GNU_SOURCE, making it fall back to the definition in utils.h. The latter has two fields less, leading to shifted offsets of struct timerlat_params in timerlat_bpf_init. Because of the shift, timerlat_bpf_init incorrectly reads params->entries as 0 for timerlat-hist and disables the creation of histogram maps, causing breakage in BPF sample collection mode: $ rtla timerlat hist -d 1s Error pulling BPF data Fix the issue by also defining _GNU_SOURCE in timerlat_bpf.c. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250430144651.621766-1-tglozar@redhat.com Fixes: e34293ddcebd ("rtla/timerlat: Add BPF skeleton to collect samples") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-07rtla: Define __NR_sched_setattr for LoongArchTiezhu Yang
When executing "make -C tools/tracing/rtla" on LoongArch, there exists the following error: src/utils.c:237:24: error: '__NR_sched_setattr' undeclared Just define __NR_sched_setattr for LoongArch if not exist. Link: https://lore.kernel.org/20250422074917.25771-1-yangtiezhu@loongson.cn Reported-by: Haiyong Sun <sunhaiyong@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-07rtla: Set distinctive exit value for failed testsCosta Shulyupin
A test is considered failed when a sample trace exceeds the threshold. Failed tests return the same exit code as passed tests, requiring test frameworks to determine the result by searching for "hit stop tracing" in the output. Assign a distinct exit code for failed tests to enable the use of shell expressions and seamless integration with testing frameworks without the need to parse output. Add enum type for return value. Update `make check`. Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Eder Zulian <ezulian@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jan Stancek <jstancek@redhat.com> Link: https://lore.kernel.org/20250417185757.2194541-1-costa.shul@redhat.com Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-07Merge remote-tracking branch 'torvalds/master' into perf-tools-nextArnaldo Carvalho de Melo
To pick up fixes from the latest perf-tools pull request from Namhyung and get perf-tools-next in line with thinngs in other areas it uses, like tools/lib/bpf, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-06tools: ynl-gen: allow noncontiguous enumsJiri Pirko
in case the enum has holes, instead of hard stop, generate a validation callback to check valid enum values. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250505114513.53370-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06io_uring/zcrx: selftests: fix setting ntuple rule into rssDavid Wei
Fix ethtool syntax for setting ntuple rule into rss. It should be `context' instead of `action'. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250503043007.857215-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-06cxl/test: Address missing MODULE_DESCRIPTION warnings for cxl_testDave Jiang
Add MODULE_DESCRIPTION() to address the following warnings: WARNING: modpost: missing MODULE_DESCRIPTION() in test/cxl_test.o WARNING: modpost: missing MODULE_DESCRIPTION() in test/cxl_mock.o WARNING: modpost: missing MODULE_DESCRIPTION() in test/cxl_mock_mem.o [dj: s/CXL test/cxl_test:/ per djbw's comment] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20250429235953.4175408-1-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-07objtool/rust: add one more `noreturn` Rust function for Rust 1.87.0Miguel Ojeda
Starting with Rust 1.87.0 (expected 2025-05-15), `objtool` may report: rust/core.o: warning: objtool: _R..._4core9panicking9panic_fmt() falls through to next function _R..._4core9panicking18panic_nounwind_fmt() rust/core.o: warning: objtool: _R..._4core9panicking18panic_nounwind_fmt() falls through to next function _R..._4core9panicking5panic() The reason is that `rust_begin_unwind` is now mangled: _R..._7___rustc17rust_begin_unwind Thus add the mangled one to the list so that `objtool` knows it is actually `noreturn`. See commit 56d680dd23c3 ("objtool/rust: list `noreturn` Rust functions") for more details. Alternatively, we could remove the fixed one in `noreturn.h` and relax this test to cover both, but it seems best to be strict as long as we can. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250502140237.1659624-2-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-06bpftool: Fix regression of "bpftool cgroup tree" EINVAL on older kernelsYiFei Zhu
If cgroup_has_attached_progs queries an attach type not supported by the running kernel, due to the kernel being older than the bpftool build, it would encounter an -EINVAL from BPF_PROG_QUERY syscall. Prior to commit 98b303c9bf05 ("bpftool: Query only cgroup-related attach types"), this EINVAL would be ignored by the function, allowing the function to only consider supported attach types. The commit changed so that, instead of querying all attach types, only attach types from the array `cgroup_attach_types` is queried. The assumption is that because these are only cgroup attach types, they should all be supported. Unfortunately this assumption may be false when the kernel is older than the bpftool build, where the attach types queried by bpftool is not yet implemented in the kernel. This would result in errors such as: $ bpftool cgroup tree CgroupPath ID AttachType AttachFlags Name Error: can't query bpf programs attached to /sys/fs/cgroup: Invalid argument This patch restores the logic of ignoring EINVAL from prior to that patch. Fixes: 98b303c9bf05 ("bpftool: Query only cgroup-related attach types") Reported-by: Sagarika Sharma <sharmasagarika@google.com> Reported-by: Minh-Anh Nguyen <minhanhdn@google.com> Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20250428211536.1651456-1-zhuyifei@google.com
2025-05-06selftests/bpf: Add test for bpf_list_{front,back}Martin KaFai Lau
This patch adds the "list_peek" test to use the new bpf_list_{front,back} kfunc. The test_{front,back}* tests ensure that the return value is a non_own_ref node pointer and requires the spinlock to be held. Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # check non_own_ref marking Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250506015857.817950-9-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06selftests/bpf: Add tests for bpf_rbtree_{root,left,right}Martin KaFai Lau
This patch has a much simplified rbtree usage from the kernel sch_fq qdisc. It has a "struct node_data" which can be added to two different rbtrees which are ordered by different keys. The test first populates both rbtrees. Then search for a lookup_key from the "groot0" rbtree. Once the lookup_key is found, that node refcount is taken. The node is then removed from another "groot1" rbtree. While searching the lookup_key, the test will also try to remove all rbnodes in the path leading to the lookup_key. The test_{root,left,right}_spinlock_true tests ensure that the return value of the bpf_rbtree functions is a non_own_ref node pointer. This is done by forcing an verifier error by calling a helper bpf_jiffies64() while holding the spinlock. The tests then check for the verifier message "call bpf_rbtree...R0=rcu_ptr_or_null_node..." The other test_{root,left,right}_spinlock_false tests ensure that they must be called with spinlock held. Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # Check non_own_ref marking Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250506015857.817950-6-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06bpf: Allow refcounted bpf_rb_node used in bpf_rbtree_{remove,left,right}Martin KaFai Lau
The bpf_rbtree_{remove,left,right} requires the root's lock to be held. They also check the node_internal->owner is still owned by that root before proceeding, so it is safe to allow refcounted bpf_rb_node pointer to be used in these kfuncs. In a bpf fq implementation which is much closer to the kernel fq, https://lore.kernel.org/bpf/20250418224652.105998-13-martin.lau@linux.dev/, a networking flow (allocated by bpf_obj_new) can be added to two different rbtrees. There are cases that the flow is searched from one rbtree, held the refcount of the flow, and then removed from another rbtree: struct fq_flow { struct bpf_rb_node fq_node; struct bpf_rb_node rate_node; struct bpf_refcount refcount; unsigned long sk_long; }; int bpf_fq_enqueue(...) { /* ... */ bpf_spin_lock(&root->lock); while (can_loop) { /* ... */ if (!p) break; gc_f = bpf_rb_entry(p, struct fq_flow, fq_node); if (gc_f->sk_long == sk_long) { f = bpf_refcount_acquire(gc_f); break; } /* ... */ } bpf_spin_unlock(&root->lock); if (f) { bpf_spin_lock(&q->lock); bpf_rbtree_remove(&q->delayed, &f->rate_node); bpf_spin_unlock(&q->lock); } } bpf_rbtree_{left,right} do not need this change but are relaxed together with bpf_rbtree_remove instead of adding extra verifier logic to exclude these kfuncs. To avoid bi-sect failure, this patch also changes the selftests together. The "rbtree_api_remove_unadded_node" is not expecting verifier's error. The test now expects bpf_rbtree_remove(&groot, &m->node) to return NULL. The test uses __retval(0) to ensure this NULL return value. Some of the "only take non-owning..." failure messages are changed also. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250506015857.817950-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06arm64: tools: Resync sysreg.hMarc Zyngier
Perform a bulk resync of tools/arch/arm64/include/asm/sysreg.h. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06tools/arch/x86: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>Ingo Molnar
Synchronize with what we did with the kernel side header in: 3846389c03a8 ("x86/platform/amd: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2025-05-06Merge tag 'nf-next-25-05-06' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Apparently, nf_conntrack_bridge changes the way in which fragments are handled, dealing to packet drop. From Huajian Yang. 2) Add a selftest to stress the conntrack subsystem, from Florian Westphal. 3) nft_quota depletion is off-by-one byte, Zhongqiu Duan. 4) Rewrites the procfs to read the conntrack table to speed it up, from Florian Westphal. 5) Two patches to prevent overflow in nft_pipapo lookup table and to clamp the maximum bucket size. 6) Update nft_fib selftest to check for loopback packet bypass. From Florian Westphal. netfilter pull request 25-05-06 * tag 'nf-next-25-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: selftests: netfilter: nft_fib.sh: check lo packets bypass fib lookup netfilter: nft_set_pipapo: clamp maximum map bucket size to INT_MAX netfilter: nft_set_pipapo: prevent overflow in lookup table allocation netfilter: nf_conntrack: speed up reads from nf_conntrack proc file netfilter: nft_quota: match correctly when the quota just depleted selftests: netfilter: add conntrack stress test netfilter: bridge: Move specific fragmented packet to slow_path instead of dropping it ==================== Link: https://patch.msgid.link/20250505234151.228057-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>