summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-02-18perf report: Add latency and parallelism profiling documentationDmitry Vyukov
Describe latency and parallelism profiling, related flags, and differences with the currently only supported CPU-consumption-centric profiling. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/a13f270ed33cedb03ce9ebf9ddbd064854ca0f19.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-18perf report: Add --latency flagDmitry Vyukov
Add record/report --latency flag that allows to capture and show latency-centric profiles rather than the default CPU-consumption-centric profiles. For latency profiles record captures context switch events, and report shows Latency as the first column. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/e9640464bcbc47dde2cb557003f421052ebc9eec.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-18perf report: Add latency output fieldDmitry Vyukov
Latency output field is similar to overhead, but represents overhead for latency rather than CPU consumption. It's re-scaled from overhead by dividing weight by the current parallelism level at the time of the sample. It effectively models profiling with 1 sample taken per unit of wall-clock time rather than unit of CPU time. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/b6269518758c2166e6ffdc2f0e24cfdecc8ef9c1.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-18perf report: Add parallelism filterDmitry Vyukov
Add parallelism filter that can be used to look at specific parallelism levels only. The format is the same as cpu lists. For example: Only single-threaded samples: --parallelism=1 Low parallelism only: --parallelism=1-4 High parallelism only: --parallelism=64-128 Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/e61348985ff0a6a14b07c39e880edbd60a8f8635.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-18perf report: Switch filtered from u8 to u16Dmitry Vyukov
We already have all u8 bits taken, adding one more filter leads to unpleasant failure mode, where code compiles w/o warnings, but the last filters silently don't work. Add a typedef and switch to u16. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/32b4ce1731126c88a2d9e191dc87e39ae4651cb7.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-18selftests/bpf: Remove test_xdp_redirect_multi.shBastien Curutchet (eBPF Foundation)
The tests done by test_xdp_redirect_multi.sh are now fully covered by the CI through test_xdp_veth.c. Remove test_xdp_redirect_multi.sh Remove xdp_redirect_multi.c that was used by the script to load and attach the BPF programs. Remove their entries in the Makefile Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-6-fd0d39fca6e6@bootlin.com
2025-02-18selftests/bpf: test_xdp_veth: Add XDP program on egress testBastien Curutchet (eBPF Foundation)
XDP programs loaded on egress is tested by test_xdp_redirect_multi.sh but not by the test_progs framework. Add a test case in test_xdp_veth.c to test the XDP program on egress. Use the same BPF program than test_xdp_redirect_multi.sh that replaces the source MAC address by one provided through a BPF map. Use a BPF program that stores the source MAC of received packets in a map to check the test results. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-5-fd0d39fca6e6@bootlin.com
2025-02-18selftests/bpf: test_xdp_veth: Add XDP broadcast redirection testsBastien Curutchet (eBPF Foundation)
XDP redirections with BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS flags are tested by test_xdp_redirect_multi.sh but not within the test_progs framework. Add a broadcast test case in test_xdp_veth.c to test them. Use the same BPF programs than the one used by test_xdp_redirect_multi.sh. Use a BPF map to select the broadcast flags. Use a BPF map with an entry per veth to check whether packets are received or not Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-4-fd0d39fca6e6@bootlin.com
2025-02-18selftests/bpf: Optionally select broadcasting flagsBastien Curutchet (eBPF Foundation)
Broadcasting flags are hardcoded for each kind for protocol. Create a redirect_flags map that allows to select the broadcasting flags to use in the bpf_redirect_map(). The protocol ID is used as a key. Set the old hardcoded values as default if the map isn't filled by the BPF caller. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250212-redirect-multi-v5-3-fd0d39fca6e6@bootlin.com
2025-02-18selftests/bpf: test_xdp_veth: Use a dedicated namespaceBastien Curutchet (eBPF Foundation)
Tests use the root network namespace, so they aren't fully independent of each other. For instance, the index of the created veth interfaces is incremented every time a new test is launched. Wrap the network topology in a network namespace to ensure full isolation. Use the append_tid() helper to ensure the uniqueness of this namespace's name during parallel runs. Remove the use of the append_tid() on the veth names as they now belong to an already unique namespace. Simplify cleanup_network() by directly deleting the namespaces Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250212-redirect-multi-v5-2-fd0d39fca6e6@bootlin.com
2025-02-18selftests/bpf: test_xdp_veth: Create struct net_configurationBastien Curutchet (eBPF Foundation)
The network configuration is defined by a table of struct veth_configuration. This isn't convenient if we want to add a network configuration that isn't linked to a veth pair. Create a struct net_configuration that holds the veth_configuration table to ease adding new configuration attributes in upcoming patch. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250212-redirect-multi-v5-1-fd0d39fca6e6@bootlin.com
2025-02-18kunit: tool: Build GDB scriptsBrendan Jackman
Following a similar rationale as commit e4835f1da425f ("kunit: tool: Build compile_commands.json"), make a common developer tool available by default for KUnit users. Compared to compile_commands.json, there is a little more work to be done to build the GDB scripts. Is it enough to affect development cycle duration? Unscientific evaluation: rm -rf .kunit; time tools/testing/kunit/kunit.py build --kunitconfig ./lib/kunit/.kunitconfig --jobs 96 Without this patch it took 14.77s, with this patch it took 14.83. So, although `make scripts_gdb` is pretty slow, presumably most of that is just the overhead of running Kbuild at all, actually building the scripts is approximately free. Note also, to actually get the GDB scripts the user needs to enable CONFIG_SCRIPTS_GDB, but building the scripts_gdb target without that is still harmless. Link: https://lore.kernel.org/r/20250121-kunit-gdb-v1-1-faedfd0653ef@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-18tools: Remove redundant quiet setupCharlie Jenkins
Q is exported from Makefile.include so it is not necessary to manually set it. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <qmo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Benjamin Tissoires <bentiss@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Mykola Lysenko <mykolal@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20250213-quiet_tools-v3-2-07de4482a581@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-02-18sched_ext: idle: Introduce node-aware idle cpu kfunc helpersAndrea Righi
Introduce a new kfunc to retrieve the node associated to a CPU: int scx_bpf_cpu_node(s32 cpu) Add the following kfuncs to provide BPF schedulers direct access to per-node idle cpumasks information: const struct cpumask *scx_bpf_get_idle_cpumask_node(int node) const struct cpumask *scx_bpf_get_idle_smtmask_node(int node) s32 scx_bpf_pick_idle_cpu_node(const cpumask_t *cpus_allowed, int node, u64 flags) s32 scx_bpf_pick_any_cpu_node(const cpumask_t *cpus_allowed, int node, u64 flags) Moreover, trigger an scx error when any of the non-node aware idle CPU kfuncs are used when SCX_OPS_BUILTIN_IDLE_PER_NODE is enabled. Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-18tools: Unify top-level quiet infrastructureCharlie Jenkins
Commit f2868b1a66d4f40f ("perf tools: Expose quiet/verbose variables in Makefile.perf") moved the quiet infrastructure out of tools/build/Makefile.build and into the top-level Makefile.perf file so that the quiet infrastructure could be used throughout perf and not just in Makefile.build. Extract out the quiet infrastructure into Makefile.include so that it can be leveraged outside of perf. Fixes: f2868b1a66d4f40f ("perf tools: Expose quiet/verbose variables in Makefile.perf") Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Benjamin Tissoires <bentiss@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Mykola Lysenko <mykolal@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <qmo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20250213-quiet_tools-v3-1-07de4482a581@rivosinc.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-02-18selftests: test_vxlan_fdb_changelink: Add a test for MC remote changePetr Machata
Changes to MC remote need to be reflected in actual group memberships. Add a test to verify that it is the case. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftests: test_vxlan_fdb_changelink: Convert to lib.shPetr Machata
Instead of inlining equivalents, use lib.sh-provided primitives. Use defer to manage vx lifetime. This will make it easier to extend the test in the next patch. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftests: forwarding: lib: Move require_command to net, generalizePetr Machata
This helper could be useful to more than just forwarding tests. Move it upstairs and port over to log_test_skip(). Split the function into two parts: the bit that actually checks and reports skip, which is in a new function check_command(). And a bit that exits the test script if the check fails. This allows users consistent checking behavior while giving an option to bail out from a single test without bailing out of the whole script. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftest/bpf: Add vsock test for sockmap rejecting unconnectedMichal Luczaj
Verify that for a connectible AF_VSOCK socket, merely having a transport assigned is insufficient; socket must be connected for the sockmap to accept. This does not test datagram vsocks. Even though it hardly matters. VMCI is the only transport that features VSOCK_TRANSPORT_F_DGRAM, but it has an unimplemented vsock_transport::readskb() callback, making it unsupported by BPF/sockmap. Signed-off-by: Michal Luczaj <mhal@rbox.co> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18selftest/bpf: Adapt vsock_delete_on_close to sockmap rejecting unconnectedMichal Luczaj
Commit 515745445e92 ("selftest/bpf: Add test for vsock removal from sockmap on close()") added test that checked if proto::close() callback was invoked on AF_VSOCK socket release. I.e. it verified that a close()d vsock does indeed get removed from the sockmap. It was done simply by creating a socket pair and attempting to replace a close()d one with its peer. Since, due to a recent change, sockmap does not allow updating index with a non-established connectible vsock, redo it with a freshly established one. Signed-off-by: Michal Luczaj <mhal@rbox.co> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-18Merge tag 'v6.14-rc3' into x86/core, to pick up fixesIngo Molnar
Pick up upstream x86 fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-17selftests/mm: fix check for running THP testsMark Brown
When testing if we should try to compact memory or drop caches before we run the THP or HugeTLB tests we use | as an or operator. This doesn't work since run_vmtests.sh is written in shell where this is used to pipe the output of the first argument into the second. Instead use the shell's -o operator. Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-1-44702f538522@kernel.org Fixes: b433ffa8dbac ("selftests: mm: perform some system cleanup before using hugepages") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Mariano Pache <npache@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17getdelays: fix error format charactersWang Yaxin
getdelays had a compilation issue because the format string was not updated when the "delay min" was added. For example, after adding the "delay min" in printf, there were 7 strings but only 6 "%s" format specifiers. Similarly, after adding the 't->cpu_delay_total', there were 7 variables but only 6 format characters specifiers, causing compilation issues as follows. This commit fixes these issues to ensure that getdelays compiles correctly. root@xx:~/linux-next/tools/accounting$ make getdelays.c:199:9: warning: format `%llu' expects argument of type `long long unsigned int', but argument 8 has type `char *' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 216 | "delay total", "delay average", "delay max", "delay min", | ~~~~~~~~~~~ | | | char * getdelays.c:200:21: note: format string is defined here 200 | " %15llu%15llu%15llu%15llu%15.3fms%13.6fms\n" | ~~~~~^ | | | long long unsigned int | %15s getdelays.c:199:9: warning: format `%f' expects argument of type `double', but argument 12 has type `long long unsigned int' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 220 | (unsigned long long)t->cpu_delay_total, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | long long unsigned int ..... Link: https://lkml.kernel.org/r/20250208144400544RduNRhwIpT3m2JyRBqskZ@zte.com.cn Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak") Reviewed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: Qiang Tu <tu.qiang35@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17tools/mm: fix build warnings with musl-libcFlorian Fainelli
musl-libc warns about the following: /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/errno.h:1:2: attention: #warning redirecting incorrect #include <sys/errno.h> to <errno.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/errno.h> to <errno.h> | ^~~~~~~ /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/fcntl.h:1:2: attention: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> | ^~~~~~~ include errno.h and fcntl.h directly. Link: https://lkml.kernel.org/r/20250210200518.1137295-1-florian.fainelli@broadcom.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17perf report: Add parallelism sort keyDmitry Vyukov
Show parallelism level in profiles if requested by user. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/7f7bb87cbaa51bf1fb008a0d68b687423ce4bad4.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-17perf report: Add machine parallelismDmitry Vyukov
Add calculation of the current parallelism level (number of threads actively running on CPUs). The parallelism level can be shown in reports on its own, and to calculate latency overheads. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/0f8c1b8eb12619029e31b3d5c0346f4616a5aeda.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-17selftests/bpf: Test returning referenced kptr from struct_ops programsAmery Hung
Test struct_ops programs returning referenced kptr. When the return type of a struct_ops operator is pointer to struct, the verifier should only allow programs that return a scalar NULL or a non-local kptr with the correct type in its unmodified form. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-6-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17selftests/bpf: Test referenced kptr arguments of struct_ops programsAmery Hung
Test referenced kptr acquired through struct_ops argument tagged with "__ref". The success case checks whether 1) a reference to the correct type is acquired, and 2) the referenced kptr argument can be accessed in multiple paths as long as it hasn't been released. In the fail cases, we first confirm that a referenced kptr acquried through a struct_ops argument is not allowed to be leaked. Then, we make sure this new referenced kptr acquiring mechanism does not accidentally allow referenced kptrs to flow into global subprograms through their arguments. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17selftests: drv-net: Test queue xsk attributeJoe Damato
Test that queues which are used for AF_XDP have the xsk nest attribute. The attribute is currently empty, but its existence means the AF_XDP is being used for the queue. Enable CONFIG_XDP_SOCKETS for selftests/drivers/net tests, as well. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-4-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17netdev-genl: Add an XSK attribute to queuesJoe Damato
Expose a new per-queue nest attribute, xsk, which will be present for queues that are being used for AF_XDP. If the queue is not being used for AF_XDP, the nest will not be present. In the future, this attribute can be extended to include more data about XSK as it is needed. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITYAnna Emese Nyiri
Introduce tests to verify the correct functionality of the SO_RCVMARK and SO_RCVPRIORITY socket options. Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250214205828.48503-1-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests: net: fix grammar in reuseaddr_ports_exhausted.c log messagePranav Tyagi
This patch fixes a grammatical error in a test log message in reuseaddr_ports_exhausted.c for better clarity. Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250213152612.4434-1-pranav.tyagi03@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests/powerpc: Use PKEY_UNRESTRICTED macroYury Khrustalev
Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions are used in mm selftests for memory protection keys for ppc target. Signed-off-by: Yury Khrustalev <yury.khrustalev@arm.com> Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20250113170619.484698-4-yury.khrustalev@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-02-17selftests/mm: Use PKEY_UNRESTRICTED macroYury Khrustalev
Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions are used in mm selftests for memory protection keys. Signed-off-by: Yury Khrustalev <yury.khrustalev@arm.com> Suggested-by: Joey Gouly <joey.gouly@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20250113170619.484698-3-yury.khrustalev@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-02-17perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernelRavi Bangoria
Sync load latency related bit fields into the tool's header copy Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250205060547.1337-4-ravi.bangoria@amd.com
2025-02-17io_uring/zcrx: add selftestDavid Wei
Add a selftest for io_uring zero copy Rx. This test cannot run locally and requires a remote host to be configured in net.config. The remote host must have hardware support for zero copy Rx as listed in the documentation page. The test will restore the NIC config back to before the test and is idempotent. liburing is required to compile the test and be installed on the remote host running the test. Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20250215000947.789731-12-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17Merge commit '71f0dd5a3293d75d26d405ffbaedfdda4836af32' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into for-6.15/io_uring-rx-zc Merge networking zerocopy receive tree, to get the prep patches for the io_uring rx zc support. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (63 commits) net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers net: page_pool: add a mp hook to unregister_netdevice* net: page_pool: add callback for mp info printing netdev: add io_uring memory provider info net: page_pool: create hooks for custom memory providers net: generalise net_iov chunk owners net: prefix devmem specific helpers net: page_pool: don't cast mp param to devmem tools: ynl: add all headers to makefile deps eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs eth: fbnic: add MAC address TCAM to debugfs tools: ynl-gen: support limits using definitions tools: ynl-gen: don't output external constants net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled net/mlx5e: Remove unused mlx5e_tc_flow_action struct net/mlx5: Remove stray semicolon in LAG port selection table creation net/mlx5e: Support FEC settings for 200G per lane link modes net/mlx5: Add support for 200Gbps per lane link modes ...
2025-02-17Merge 6.14-rc3 into driver-core-nextGreg Kroah-Hartman
We need the faux_device changes in here for future work. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-16Merge tag 'objtool_urgent_for_v6.14_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Move a warning about a lld.ld breakage into the verbose setting as said breakage has been fixed in the meantime - Teach objtool to ignore dangling jump table entries added by Clang * tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Move dodgy linker warn to verbose objtool: Ignore dangling jump table entries
2025-02-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...
2025-02-16sched_ext: idle: Per-node idle cpumasksAndrea Righi
Using a single global idle mask can lead to inefficiencies and a lot of stress on the cache coherency protocol on large systems with multiple NUMA nodes, since all the CPUs can create a really intense read/write activity on the single global cpumask. Therefore, split the global cpumask into multiple per-NUMA node cpumasks to improve scalability and performance on large systems. The concept is that each cpumask will track only the idle CPUs within its corresponding NUMA node, treating CPUs in other NUMA nodes as busy. In this way concurrent access to the idle cpumask will be restricted within each NUMA node. The split of multiple per-node idle cpumasks can be controlled using the SCX_OPS_BUILTIN_IDLE_PER_NODE flag. By default SCX_OPS_BUILTIN_IDLE_PER_NODE is not enabled and a global host-wide idle cpumask is used, maintaining the previous behavior. NOTE: if a scheduler explicitly enables the per-node idle cpumasks (via SCX_OPS_BUILTIN_IDLE_PER_NODE), scx_bpf_get_idle_cpu/smtmask() will trigger an scx error, since there are no system-wide cpumasks. = Test = Hardware: - System: DGX B200 - CPUs: 224 SMT threads (112 physical cores) - Processor: INTEL(R) XEON(R) PLATINUM 8570 - 2 NUMA nodes Scheduler: - scx_simple [1] (so that we can focus at the built-in idle selection policy and not at the scheduling policy itself) Test: - Run a parallel kernel build `make -j $(nproc)` and measure the average elapsed time over 10 runs: avg time | stdev ---------+------ before: 52.431s | 2.895 after: 50.342s | 2.895 = Conclusion = Splitting the global cpumask into multiple per-NUMA cpumasks helped to achieve a speedup of approximately +4% with this particular architecture and test case. The same test on a DGX-1 (40 physical cores, Intel Xeon E5-2698 v4 @ 2.20GHz, 2 NUMA nodes) shows a speedup of around 1.5-3%. On smaller systems, I haven't noticed any measurable regressions or improvements with the same test (parallel kernel build) and scheduler (scx_simple). Moreover, with a modified scx_bpfland that uses the new NUMA-aware APIs I observed an additional +2-2.5% performance improvement with the same test. [1] https://github.com/sched-ext/scx/blob/main/scheds/c/scx_simple.bpf.c Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-16sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODEAndrea Righi
Add the new scheduler flag SCX_OPS_BUILTIN_IDLE_PER_NODE, which allows BPF schedulers to select between using a global flat idle cpumask or multiple per-node cpumasks. This only introduces the flag and the mechanism to enable/disable this feature without affecting any scheduling behavior. Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-15kunit: qemu_configs: sparc: use Zilog consoleThomas Weißschuh
The driver for the 8250 console is not used, as no port is found. Instead the prom0 bootconsole is used the whole time. The prom driver translates '\n' to '\r\n' before handing of the message off to the firmware. The firmware performs the same translation again. In the final output produced by QEMU each line ends with '\r\r\n'. This breaks the kunit parser, which can only handle '\r\n' and '\n'. Use the Zilog console instead. It works correctly, is the one documented by the QEMU manual and also saves a bit of codesize: Before=4051011, After=4023326, chg -0.68% Observed on QEMU 9.2.0. Link: https://lore.kernel.org/r/20250214-kunit-qemu-sparc-console-v1-1-ba1dfdf8f0b1@linutronix.de Fixes: 87c9c1631788 ("kunit: tool: add support for QEMU") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-15kunit: tool: Use qboot on QEMU x86_64Brendan Jackman
As noted in [0], SeaBIOS (QEMU default) makes a mess of the terminal, qboot does not. It turns out this is actually useful with kunit.py, since the user is exposed to this issue if they set --raw_output=all. qboot is also faster than SeaBIOS, but it's is marginal for this usecase. [0] https://lore.kernel.org/all/CA+i-1C0wYb-gZ8Mwh3WSVpbk-LF-Uo+njVbASJPe1WXDURoV7A@mail.gmail.com/ Both SeaBIOS and qboot are x86-specific. Link: https://lore.kernel.org/r/20250124-kunit-qboot-v1-1-815e4d4c6f7c@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-15Merge tag 'rust-fixes-6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: - Fix objtool warning due to future Rust 1.85.0 (to be released in a few days) - Clean future Rust 1.86.0 (to be released 2025-04-03) Clippy warning * tag 'rust-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: rbtree: fix overindented list item objtool/rust: add one more `noreturn` Rust function
2025-02-15kernfs: Use RCU to access kernfs_node::parent.Sebastian Andrzej Siewior
kernfs_rename_lock is used to obtain stable kernfs_node::{name|parent} pointer. This is a preparation to access kernfs_node::parent under RCU and ensure that the pointer remains stable under the RCU lifetime guarantees. For a complete path, as it is done in kernfs_path_from_node(), the kernfs_rename_lock is still required in order to obtain a stable parent relationship while computing the relevant node depth. This must not change while the nodes are inspected in order to build the path. If the kernfs user never moves the nodes (changes the parent) then the kernfs_rename_lock is not required and the RCU guarantees are sufficient. This "restriction" can be set with KERNFS_ROOT_INVARIANT_PARENT. Otherwise the lock is required. Rename kernfs_node::parent to kernfs_node::__parent to denote the RCU access and use RCU accessor while accessing the node. Make cgroup use KERNFS_ROOT_INVARIANT_PARENT since the parent here can not change. Acked-by: Tejun Heo <tj@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250213145023.2820193-6-bigeasy@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-14selftests/bpf: add test for LDX/STX/ST relocations over array fieldAndrii Nakryiko
Add a simple repro for the issue of miscalculating LDX/STX/ST CO-RE relocation size adjustment when the CO-RE relocation target type is an ARRAY. We need to make sure that compiler generates LDX/STX/ST instruction with CO-RE relocation against entire ARRAY type, not ARRAY's element. With the code pattern in selftest, we get this: 59: 61 71 00 00 00 00 00 00 w1 = *(u32 *)(r7 + 0x0) 00000000000001d8: CO-RE <byte_off> [5] struct core_reloc_arrays::a (0:0) Where offset of `int a[5]` is embedded (through CO-RE relocation) into memory load instruction itself. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250207014809.1573841-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14libbpf: fix LDX/STX/ST CO-RE relocation size adjustment logicAndrii Nakryiko
Libbpf has a somewhat obscure feature of automatically adjusting the "size" of LDX/STX/ST instruction (memory store and load instructions), based on originally recorded access size (u8, u16, u32, or u64) and the actual size of the field on target kernel. This is meant to facilitate using BPF CO-RE on 32-bit architectures (pointers are always 64-bit in BPF, but host kernel's BTF will have it as 32-bit type), as well as generally supporting safe type changes (unsigned integer type changes can be transparently "relocated"). One issue that surfaced only now, 5 years after this logic was implemented, is how this all works when dealing with fields that are arrays. This isn't all that easy and straightforward to hit (see selftests that reproduce this condition), but one of sched_ext BPF programs did hit it with innocent looking loop. Long story short, libbpf used to calculate entire array size, instead of making sure to only calculate array's element size. But it's the element that is loaded by LDX/STX/ST instructions (1, 2, 4, or 8 bytes), so that's what libbpf should check. This patch adjusts the logic for arrays and fixed the issue. Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250207014809.1573841-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14selftests/bpf: Add selftest for may_gotoJiayuan Chen
Added test cases to ensure that programs with stack sizes exceeding 512 bytes are restricted in non-JITed mode, and can be executed normally in JITed mode, even with stack sizes exceeding 512 bytes due to the presence of may_goto instructions. Test result: echo "0" > /proc/sys/net/core/bpf_jit_enable ./test_progs -t verifier_stack_ptr ... stack size 512 with may_goto with jit:SKIP stack size 512 with may_goto without jit:OK ... Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED echo "1" > /proc/sys/net/core/bpf_jit_enable ./test_progs -t verifier_stack_ptr ... stack size 512 with may_goto with jit:OK stack size 512 with may_goto without jit:SKIP ... Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20250214091823.46042-4-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14selftests/bpf: Introduce __load_if_JITed annotation for testsJiayuan Chen
In some cases, the verification logic under the interpreter and JIT differs, such as may_goto, and the test program behaves differently under different runtime modes, requiring separate verification logic for each result. Introduce __load_if_JITed and __load_if_no_JITed annotation for tests. Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20250214091823.46042-3-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>