summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-02-17selftests/mm: fix check for running THP testsMark Brown
When testing if we should try to compact memory or drop caches before we run the THP or HugeTLB tests we use | as an or operator. This doesn't work since run_vmtests.sh is written in shell where this is used to pipe the output of the first argument into the second. Instead use the shell's -o operator. Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-1-44702f538522@kernel.org Fixes: b433ffa8dbac ("selftests: mm: perform some system cleanup before using hugepages") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Mariano Pache <npache@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17getdelays: fix error format charactersWang Yaxin
getdelays had a compilation issue because the format string was not updated when the "delay min" was added. For example, after adding the "delay min" in printf, there were 7 strings but only 6 "%s" format specifiers. Similarly, after adding the 't->cpu_delay_total', there were 7 variables but only 6 format characters specifiers, causing compilation issues as follows. This commit fixes these issues to ensure that getdelays compiles correctly. root@xx:~/linux-next/tools/accounting$ make getdelays.c:199:9: warning: format `%llu' expects argument of type `long long unsigned int', but argument 8 has type `char *' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 216 | "delay total", "delay average", "delay max", "delay min", | ~~~~~~~~~~~ | | | char * getdelays.c:200:21: note: format string is defined here 200 | " %15llu%15llu%15llu%15llu%15.3fms%13.6fms\n" | ~~~~~^ | | | long long unsigned int | %15s getdelays.c:199:9: warning: format `%f' expects argument of type `double', but argument 12 has type `long long unsigned int' [-Wformat=] 199 | printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ..... 220 | (unsigned long long)t->cpu_delay_total, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | long long unsigned int ..... Link: https://lkml.kernel.org/r/20250208144400544RduNRhwIpT3m2JyRBqskZ@zte.com.cn Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak") Reviewed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: Qiang Tu <tu.qiang35@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17tools/mm: fix build warnings with musl-libcFlorian Fainelli
musl-libc warns about the following: /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/errno.h:1:2: attention: #warning redirecting incorrect #include <sys/errno.h> to <errno.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/errno.h> to <errno.h> | ^~~~~~~ /home/florian/dev/buildroot/output/arm64/rpi4-b/host/aarch64-buildroot-linux-musl/sysroot/usr/include/sys/fcntl.h:1:2: attention: #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> [-Wcpp] 1 | #warning redirecting incorrect #include <sys/fcntl.h> to <fcntl.h> | ^~~~~~~ include errno.h and fcntl.h directly. Link: https://lkml.kernel.org/r/20250210200518.1137295-1-florian.fainelli@broadcom.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17perf report: Add parallelism sort keyDmitry Vyukov
Show parallelism level in profiles if requested by user. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/7f7bb87cbaa51bf1fb008a0d68b687423ce4bad4.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-17perf report: Add machine parallelismDmitry Vyukov
Add calculation of the current parallelism level (number of threads actively running on CPUs). The parallelism level can be shown in reports on its own, and to calculate latency overheads. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/0f8c1b8eb12619029e31b3d5c0346f4616a5aeda.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-17selftests/bpf: Test returning referenced kptr from struct_ops programsAmery Hung
Test struct_ops programs returning referenced kptr. When the return type of a struct_ops operator is pointer to struct, the verifier should only allow programs that return a scalar NULL or a non-local kptr with the correct type in its unmodified form. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-6-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17selftests/bpf: Test referenced kptr arguments of struct_ops programsAmery Hung
Test referenced kptr acquired through struct_ops argument tagged with "__ref". The success case checks whether 1) a reference to the correct type is acquired, and 2) the referenced kptr argument can be accessed in multiple paths as long as it hasn't been released. In the fail cases, we first confirm that a referenced kptr acquried through a struct_ops argument is not allowed to be leaked. Then, we make sure this new referenced kptr acquiring mechanism does not accidentally allow referenced kptrs to flow into global subprograms through their arguments. Signed-off-by: Amery Hung <amery.hung@bytedance.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250217190640.1748177-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-17selftests: drv-net: Test queue xsk attributeJoe Damato
Test that queues which are used for AF_XDP have the xsk nest attribute. The attribute is currently empty, but its existence means the AF_XDP is being used for the queue. Enable CONFIG_XDP_SOCKETS for selftests/drivers/net tests, as well. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-4-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17netdev-genl: Add an XSK attribute to queuesJoe Damato
Expose a new per-queue nest attribute, xsk, which will be present for queues that are being used for AF_XDP. If the queue is not being used for AF_XDP, the nest will not be present. In the future, this attribute can be extended to include more data about XSK as it is needed. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITYAnna Emese Nyiri
Introduce tests to verify the correct functionality of the SO_RCVMARK and SO_RCVPRIORITY socket options. Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250214205828.48503-1-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests: net: fix grammar in reuseaddr_ports_exhausted.c log messagePranav Tyagi
This patch fixes a grammatical error in a test log message in reuseaddr_ports_exhausted.c for better clarity. Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250213152612.4434-1-pranav.tyagi03@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17selftests/powerpc: Use PKEY_UNRESTRICTED macroYury Khrustalev
Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions are used in mm selftests for memory protection keys for ppc target. Signed-off-by: Yury Khrustalev <yury.khrustalev@arm.com> Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20250113170619.484698-4-yury.khrustalev@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-02-17selftests/mm: Use PKEY_UNRESTRICTED macroYury Khrustalev
Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions are used in mm selftests for memory protection keys. Signed-off-by: Yury Khrustalev <yury.khrustalev@arm.com> Suggested-by: Joey Gouly <joey.gouly@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20250113170619.484698-3-yury.khrustalev@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-02-17perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernelRavi Bangoria
Sync load latency related bit fields into the tool's header copy Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250205060547.1337-4-ravi.bangoria@amd.com
2025-02-17io_uring/zcrx: add selftestDavid Wei
Add a selftest for io_uring zero copy Rx. This test cannot run locally and requires a remote host to be configured in net.config. The remote host must have hardware support for zero copy Rx as listed in the documentation page. The test will restore the NIC config back to before the test and is idempotent. liburing is required to compile the test and be installed on the remote host running the test. Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20250215000947.789731-12-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17Merge commit '71f0dd5a3293d75d26d405ffbaedfdda4836af32' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into for-6.15/io_uring-rx-zc Merge networking zerocopy receive tree, to get the prep patches for the io_uring rx zc support. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (63 commits) net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers net: page_pool: add a mp hook to unregister_netdevice* net: page_pool: add callback for mp info printing netdev: add io_uring memory provider info net: page_pool: create hooks for custom memory providers net: generalise net_iov chunk owners net: prefix devmem specific helpers net: page_pool: don't cast mp param to devmem tools: ynl: add all headers to makefile deps eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs eth: fbnic: add MAC address TCAM to debugfs tools: ynl-gen: support limits using definitions tools: ynl-gen: don't output external constants net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled net/mlx5e: Remove unused mlx5e_tc_flow_action struct net/mlx5: Remove stray semicolon in LAG port selection table creation net/mlx5e: Support FEC settings for 200G per lane link modes net/mlx5: Add support for 200Gbps per lane link modes ...
2025-02-17Merge 6.14-rc3 into driver-core-nextGreg Kroah-Hartman
We need the faux_device changes in here for future work. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-16Merge tag 'objtool_urgent_for_v6.14_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: - Move a warning about a lld.ld breakage into the verbose setting as said breakage has been fixed in the meantime - Teach objtool to ignore dangling jump table entries added by Clang * tag 'objtool_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Move dodgy linker warn to verbose objtool: Ignore dangling jump table entries
2025-02-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...
2025-02-16sched_ext: idle: Per-node idle cpumasksAndrea Righi
Using a single global idle mask can lead to inefficiencies and a lot of stress on the cache coherency protocol on large systems with multiple NUMA nodes, since all the CPUs can create a really intense read/write activity on the single global cpumask. Therefore, split the global cpumask into multiple per-NUMA node cpumasks to improve scalability and performance on large systems. The concept is that each cpumask will track only the idle CPUs within its corresponding NUMA node, treating CPUs in other NUMA nodes as busy. In this way concurrent access to the idle cpumask will be restricted within each NUMA node. The split of multiple per-node idle cpumasks can be controlled using the SCX_OPS_BUILTIN_IDLE_PER_NODE flag. By default SCX_OPS_BUILTIN_IDLE_PER_NODE is not enabled and a global host-wide idle cpumask is used, maintaining the previous behavior. NOTE: if a scheduler explicitly enables the per-node idle cpumasks (via SCX_OPS_BUILTIN_IDLE_PER_NODE), scx_bpf_get_idle_cpu/smtmask() will trigger an scx error, since there are no system-wide cpumasks. = Test = Hardware: - System: DGX B200 - CPUs: 224 SMT threads (112 physical cores) - Processor: INTEL(R) XEON(R) PLATINUM 8570 - 2 NUMA nodes Scheduler: - scx_simple [1] (so that we can focus at the built-in idle selection policy and not at the scheduling policy itself) Test: - Run a parallel kernel build `make -j $(nproc)` and measure the average elapsed time over 10 runs: avg time | stdev ---------+------ before: 52.431s | 2.895 after: 50.342s | 2.895 = Conclusion = Splitting the global cpumask into multiple per-NUMA cpumasks helped to achieve a speedup of approximately +4% with this particular architecture and test case. The same test on a DGX-1 (40 physical cores, Intel Xeon E5-2698 v4 @ 2.20GHz, 2 NUMA nodes) shows a speedup of around 1.5-3%. On smaller systems, I haven't noticed any measurable regressions or improvements with the same test (parallel kernel build) and scheduler (scx_simple). Moreover, with a modified scx_bpfland that uses the new NUMA-aware APIs I observed an additional +2-2.5% performance improvement with the same test. [1] https://github.com/sched-ext/scx/blob/main/scheds/c/scx_simple.bpf.c Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-16sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODEAndrea Righi
Add the new scheduler flag SCX_OPS_BUILTIN_IDLE_PER_NODE, which allows BPF schedulers to select between using a global flat idle cpumask or multiple per-node cpumasks. This only introduces the flag and the mechanism to enable/disable this feature without affecting any scheduling behavior. Cc: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-15kunit: qemu_configs: sparc: use Zilog consoleThomas Weißschuh
The driver for the 8250 console is not used, as no port is found. Instead the prom0 bootconsole is used the whole time. The prom driver translates '\n' to '\r\n' before handing of the message off to the firmware. The firmware performs the same translation again. In the final output produced by QEMU each line ends with '\r\r\n'. This breaks the kunit parser, which can only handle '\r\n' and '\n'. Use the Zilog console instead. It works correctly, is the one documented by the QEMU manual and also saves a bit of codesize: Before=4051011, After=4023326, chg -0.68% Observed on QEMU 9.2.0. Link: https://lore.kernel.org/r/20250214-kunit-qemu-sparc-console-v1-1-ba1dfdf8f0b1@linutronix.de Fixes: 87c9c1631788 ("kunit: tool: add support for QEMU") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-15kunit: tool: Use qboot on QEMU x86_64Brendan Jackman
As noted in [0], SeaBIOS (QEMU default) makes a mess of the terminal, qboot does not. It turns out this is actually useful with kunit.py, since the user is exposed to this issue if they set --raw_output=all. qboot is also faster than SeaBIOS, but it's is marginal for this usecase. [0] https://lore.kernel.org/all/CA+i-1C0wYb-gZ8Mwh3WSVpbk-LF-Uo+njVbASJPe1WXDURoV7A@mail.gmail.com/ Both SeaBIOS and qboot are x86-specific. Link: https://lore.kernel.org/r/20250124-kunit-qboot-v1-1-815e4d4c6f7c@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-02-15Merge tag 'rust-fixes-6.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: - Fix objtool warning due to future Rust 1.85.0 (to be released in a few days) - Clean future Rust 1.86.0 (to be released 2025-04-03) Clippy warning * tag 'rust-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: rbtree: fix overindented list item objtool/rust: add one more `noreturn` Rust function
2025-02-15kernfs: Use RCU to access kernfs_node::parent.Sebastian Andrzej Siewior
kernfs_rename_lock is used to obtain stable kernfs_node::{name|parent} pointer. This is a preparation to access kernfs_node::parent under RCU and ensure that the pointer remains stable under the RCU lifetime guarantees. For a complete path, as it is done in kernfs_path_from_node(), the kernfs_rename_lock is still required in order to obtain a stable parent relationship while computing the relevant node depth. This must not change while the nodes are inspected in order to build the path. If the kernfs user never moves the nodes (changes the parent) then the kernfs_rename_lock is not required and the RCU guarantees are sufficient. This "restriction" can be set with KERNFS_ROOT_INVARIANT_PARENT. Otherwise the lock is required. Rename kernfs_node::parent to kernfs_node::__parent to denote the RCU access and use RCU accessor while accessing the node. Make cgroup use KERNFS_ROOT_INVARIANT_PARENT since the parent here can not change. Acked-by: Tejun Heo <tj@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250213145023.2820193-6-bigeasy@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-14selftests/bpf: add test for LDX/STX/ST relocations over array fieldAndrii Nakryiko
Add a simple repro for the issue of miscalculating LDX/STX/ST CO-RE relocation size adjustment when the CO-RE relocation target type is an ARRAY. We need to make sure that compiler generates LDX/STX/ST instruction with CO-RE relocation against entire ARRAY type, not ARRAY's element. With the code pattern in selftest, we get this: 59: 61 71 00 00 00 00 00 00 w1 = *(u32 *)(r7 + 0x0) 00000000000001d8: CO-RE <byte_off> [5] struct core_reloc_arrays::a (0:0) Where offset of `int a[5]` is embedded (through CO-RE relocation) into memory load instruction itself. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250207014809.1573841-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14libbpf: fix LDX/STX/ST CO-RE relocation size adjustment logicAndrii Nakryiko
Libbpf has a somewhat obscure feature of automatically adjusting the "size" of LDX/STX/ST instruction (memory store and load instructions), based on originally recorded access size (u8, u16, u32, or u64) and the actual size of the field on target kernel. This is meant to facilitate using BPF CO-RE on 32-bit architectures (pointers are always 64-bit in BPF, but host kernel's BTF will have it as 32-bit type), as well as generally supporting safe type changes (unsigned integer type changes can be transparently "relocated"). One issue that surfaced only now, 5 years after this logic was implemented, is how this all works when dealing with fields that are arrays. This isn't all that easy and straightforward to hit (see selftests that reproduce this condition), but one of sched_ext BPF programs did hit it with innocent looking loop. Long story short, libbpf used to calculate entire array size, instead of making sure to only calculate array's element size. But it's the element that is loaded by LDX/STX/ST instructions (1, 2, 4, or 8 bytes), so that's what libbpf should check. This patch adjusts the logic for arrays and fixed the issue. Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250207014809.1573841-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14selftests/bpf: Add selftest for may_gotoJiayuan Chen
Added test cases to ensure that programs with stack sizes exceeding 512 bytes are restricted in non-JITed mode, and can be executed normally in JITed mode, even with stack sizes exceeding 512 bytes due to the presence of may_goto instructions. Test result: echo "0" > /proc/sys/net/core/bpf_jit_enable ./test_progs -t verifier_stack_ptr ... stack size 512 with may_goto with jit:SKIP stack size 512 with may_goto without jit:OK ... Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED echo "1" > /proc/sys/net/core/bpf_jit_enable ./test_progs -t verifier_stack_ptr ... stack size 512 with may_goto with jit:OK stack size 512 with may_goto without jit:SKIP ... Summary: 1/27 PASSED, 25 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20250214091823.46042-4-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14selftests/bpf: Introduce __load_if_JITed annotation for testsJiayuan Chen
In some cases, the verification logic under the interpreter and JIT differs, such as may_goto, and the test program behaves differently under different runtime modes, requiring separate verification logic for each result. Introduce __load_if_JITed and __load_if_no_JITed annotation for tests. Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20250214091823.46042-3-mrpre@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14rseq/selftests: Fix riscv rseq_offset_deref_addv inline asmStafford Horne
When working on OpenRISC support for restartable sequences I noticed and fixed these two issues with the riscv support bits. 1 The 'inc' argument to RSEQ_ASM_OP_R_DEREF_ADDV was being implicitly passed to the macro. Fix this by adding 'inc' to the list of macro arguments. 2 The inline asm input constraints for 'inc' and 'off' use "er", The riscv gcc port does not have an "e" constraint, this looks to be copied from the x86 port. Fix this by just using an "r" constraint. I have compile tested this only for riscv. However, the same fixes I use in the OpenRISC rseq selftests and everything passes with no issues. Fixes: 171586a6ab66 ("selftests/rseq: riscv: Template memory ordering and percpu access mode") Signed-off-by: Stafford Horne <shorne@gmail.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250114170721.3613280-1-shorne@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-02-14perf tools: Fix compile error on sample->user_regsNamhyung Kim
It's recently changed to allocate dynamically but misses to update some arch-dependent codes to use perf_sample__user_regs(). Fixes: dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250214191641.756664-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-14Merge tag 'sched_ext-for-6.14-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix lock imbalance in a corner case of dispatch_to_local_dsq() - Migration disabled tasks were confusing some BPF schedulers and its handling had a bug. Fix it and simplify the default behavior by dispatching them automatically - ops.tick(), ops.disable() and ops.exit_task() were incorrectly disallowing kfuncs that require the task argument to be the rq operation is currently operating on and thus is rq-locked. Allow them. - Fix autogroup migration handling bug which was occasionally triggering a warning in the cgroup migration path - tools/sched_ext, selftest and other misc updates * tag 'sched_ext-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Use SCX_CALL_OP_TASK in task_tick_scx sched_ext: Fix the incorrect bpf_list kfunc API in common.bpf.h. sched_ext: selftests: Fix grammar in tests description sched_ext: Fix incorrect assumption about migration disabled tasks in task_can_run_on_remote_rq() sched_ext: Fix migration disabled handling in targeted dispatches sched_ext: Implement auto local dispatching of migration disabled tasks sched_ext: Fix incorrect time delta calculation in time_delta() sched_ext: Fix lock imbalance in dispatch_to_local_dsq() sched_ext: selftests/dsp_local_on: Fix selftest on UP systems tools/sched_ext: Add helper to check task migration state sched_ext: Fix incorrect autogroup migration detection sched_ext: selftests/dsp_local_on: Fix sporadic failures selftests/sched_ext: Fix enum resolution sched_ext: Include task weight in the error state dump sched_ext: Fixes typos in comments
2025-02-14perf tools: Fix compilation error on arm64Leo Yan
Since the commit dc6d2bc2d893 ("perf sample: Make user_regs and intr_regs optional"), the building for Arm64 reports error: arch/arm64/util/unwind-libdw.c: In function ‘libdw__arch_set_initial_registers’: arch/arm64/util/unwind-libdw.c:11:32: error: initialization of ‘struct regs_dump *’ from incompatible pointer type ‘struct regs_dump **’ [-Werror=incompatible-pointer-types] 11 | struct regs_dump *user_regs = &ui->sample->user_regs; | ^ cc1: all warnings being treated as errors make[6]: *** [/home/niayan01/linux/tools/build/Makefile.build:85: arch/arm64/util/unwind-libdw.o] Error 1 make[5]: *** [/home/niayan01/linux/tools/build/Makefile.build:138: util] Error 2 arch/arm64/tests/dwarf-unwind.c: In function ‘test__arch_unwind_sample’: arch/arm64/tests/dwarf-unwind.c:48:27: error: initialization of ‘struct regs_dump *’ from incompatible pointer type ‘struct regs_dump **’ [-Werror=incompatible-pointer-types] 48 | struct regs_dump *regs = &sample->user_regs; | ^ To fix the issue, use the helper perf_sample__user_regs() to retrieve the user_regs. Fixes: dc6d2bc2d893 ("perf sample: Make user_regs and intr_regs optional") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250214111025.14478-1-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-14Merge tag 'cgroup-for-6.14-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix a race window where a newly forked task could escape cgroup.kill - Remove incorrectly included steal time from cpu.stat::usage_usec - Minor update in selftest * tag 'cgroup-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Remove steal time from usage_usec selftests/cgroup: use bash in test_cpuset_v1_hp.sh cgroup: fix race between fork and cgroup.kill
2025-02-14tools/sched_ext: Sync with scx repoTejun Heo
Synchronize with https://github.com/sched-ext/scx at d384453984a0 ("kernel: Sync at ad3b301aa05a ("sched_ext: Provides a sysfs 'events' to expose core event counters")"). Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-14KVM: selftests: Add infrastructure for getting vCPU binary statsSean Christopherson
Now that the binary stats cache infrastructure is largely scope agnostic, add support for vCPU-scoped stats. Like VM stats, open and cache the stats FD when the vCPU is created so that it's guaranteed to be valid when vcpu_get_stats() is invoked. Account for the extra per-vCPU file descriptor in kvm_set_files_rlimit(), so that tests that create large VMs don't run afoul of resource limits. To sanity check that the infrastructure actually works, and to get a bit of bonus coverage, add an assert in x86's xapic_ipi_test to verify that the number of HLTs executed by the test matches the number of HLT exits observed by KVM. Tested-by: Manali Shukla <Manali.Shukla@amd.com> Link: https://lore.kernel.org/r/20250111005049.1247555-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14KVM: selftests: Adjust number of files rlimit for all "standard" VMsSean Christopherson
Move the max vCPUs test's RLIMIT_NOFILE adjustments to common code, and use the new helper to adjust the resource limit for non-barebones VMs by default. x86's recalc_apic_map_test creates 512 vCPUs, and a future change will open the binary stats fd for all vCPUs, which will put the recalc APIC test above some distros' default limit of 1024. Link: https://lore.kernel.org/r/20250111005049.1247555-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14KVM: selftests: Get VM's binary stats FD when opening VMSean Christopherson
Get and cache a VM's binary stats FD when the VM is opened, as opposed to waiting until the stats are first used. Opening the stats FD outside of __vm_get_stat() will allow converting it to a scope-agnostic helper. Note, this doesn't interfere with kvm_binary_stats_test's testcase that verifies a stats FD can be used after its own VM's FD is closed, as the cached FD is also closed during kvm_vm_free(). Link: https://lore.kernel.org/r/20250111005049.1247555-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14KVM: selftests: Add struct and helpers to wrap binary stats cacheSean Christopherson
Add a struct and helpers to manage the binary stats cache, which is currently used only for VM-scoped stats. This will allow expanding the selftests infrastructure to provide support for vCPU-scoped binary stats, which, except for the ioctl to get the stats FD are identical to VM-scoped stats. Defer converting __vm_get_stat() to a scope-agnostic helper to a future patch, as getting the stats FD from KVM needs to be moved elsewhere before it can be made completely scope-agnostic. Link: https://lore.kernel.org/r/20250111005049.1247555-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14KVM: selftests: Macrofy vm_get_stat() to auto-generate stat name stringSean Christopherson
Turn vm_get_stat() into a macro that generates a string for the stat name, as opposed to taking a string. This will allow hardening stat usage in the future to generate errors on unknown stats at compile time. No functional change intended. Link: https://lore.kernel.org/r/20250111005049.1247555-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14KVM: selftests: Assert that __vm_get_stat() actually finds a statSean Christopherson
Fail the test if it attempts to read a stat that doesn't exist, e.g. due to a typo (hooray, strings), or because the test tried to get a stat for the wrong scope. As is, there's no indiciation of failure and @data is left untouched, e.g. holds '0' or random stack data in most cases. Fixes: 8448ec5993be ("KVM: selftests: Add NX huge pages test") Link: https://lore.kernel.org/r/20250111005049.1247555-4-seanjc@google.com [sean: fixup spelling mistake, courtesy of Colin Ian King] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14x86/alternative: Simplify callthunk patchingPeter Zijlstra
Now that paravirt call patching is implemented using alternatives, it is possible to avoid having to patch the alternative sites by including the altinstr_replacement calls in the call_sites list. This means we're now stacking relative adjustments like so: callthunks_patch_builtin_calls(): patches all function calls to target: func() -> func()-10 since the CALL accounting lives in the CALL_PADDING. This explicitly includes .altinstr_replacement alt_replace_call(): patches: x86_BUG() -> target() this patching is done in a relative manner, and will preserve the above adjustment, meaning that with calldepth patching it will do: x86_BUG()-10 -> target()-10 apply_relocation(): does code relocation, and adjusts all RIP-relative instructions to the new location, also in a relative manner. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20250207122546.617187089@infradead.org
2025-02-14selftests/landlock: Add binaries to .gitignoreBharadwaj Raju
Building the test creates binaries 'wait-pipe' and 'sandbox-and-launch' which need to be gitignore'd. Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Link: https://lore.kernel.org/r/20250210161101.6024-1-bharadwaj.raju777@gmail.com [mic: Sort entries] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-02-14selftests/landlock: Test that MPTCP actions are not restrictedMikhail Ivanov
Extend protocol fixture with test suits for MPTCP protocol. Add CONFIG_MPTCP and CONFIG_MPTCP_IPV6 options in config. Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com> Link: https://lore.kernel.org/r/20250205093651.1424339-4-ivanov.mikhail1@huawei-partners.com Cc: <stable@vger.kernel.org> # 6.7.x Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-02-14selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCPMikhail Ivanov
Extend protocol_variant structure with protocol field (Cf. socket(2)). Extend protocol fixture with TCP test suits with protocol=IPPROTO_TCP which can be used as an alias for IPPROTO_IP (=0) in socket(2). Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com> Link: https://lore.kernel.org/r/20250205093651.1424339-3-ivanov.mikhail1@huawei-partners.com Cc: <stable@vger.kernel.org> # 6.7.x Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-02-14selftests/landlock: Enable the new CONFIG_AF_UNIX_OOBMickaël Salaün
Since commit 5155cbcdbf03 ("af_unix: Add a prompt to CONFIG_AF_UNIX_OOB"), the Landlock selftests's configuration is not enough to build a minimal kernel. Because scoped_signal_test checks with the MSG_OOB flag, we need to enable CONFIG_AF_UNIX_OOB for tests: # RUN fown.no_sandbox.sigurg_socket ... # scoped_signal_test.c:420:sigurg_socket:Expected 1 (1) == send(client_socket, ".", 1, MSG_OOB) (-1) # sigurg_socket: Test terminated by assertion # FAIL fown.no_sandbox.sigurg_socket ... Cc: Günther Noack <gnoack@google.com> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/r/20250211132531.1625566-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-02-13bpftool: Check map name length when map createRong Tao
The size of struct bpf_map::name is BPF_OBJ_NAME_LEN (16). bpf(2) { map_create() { bpf_obj_name_cpy(map->name, attr->map_name, sizeof(attr->map_name)); } } When specifying a map name using bpftool map create name, no error is reported if the name length is greater than 15. $ sudo bpftool map create /sys/fs/bpf/12345678901234567890 \ type array key 4 value 4 entries 5 name 12345678901234567890 Users will think that 12345678901234567890 is legal, but this name cannot be used to index a map. $ sudo bpftool map show name 12345678901234567890 Error: can't parse name $ sudo bpftool map show ... 1249: array name 123456789012345 flags 0x0 key 4B value 4B max_entries 5 memlock 304B $ sudo bpftool map show name 123456789012345 1249: array name 123456789012345 flags 0x0 key 4B value 4B max_entries 5 memlock 304B The map name provided in the command line is truncated, but no warning is reported. This submission checks the length of the map name. Reviewed-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Rong Tao <rongtao@cestc.cn> Link: https://lore.kernel.org/r/tencent_B44B3A95F0D7C2512DC40D831DA1FA2C9907@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-13selftests/bpf: Test kfuncs that set and remove xattr from BPF programsSong Liu
Two sets of tests are added to exercise the not _locked and _locked version of the kfuncs. For both tests, user space accesses xattr security.bpf.foo on a testfile. The BPF program is triggered by user space access (on LSM hook inode_[set|get]_xattr) and sets or removes xattr security.bpf.bar. Then user space then validates that xattr security.bpf.bar is set or removed as expected. Note that, in both tests, the BPF programs use the not _locked kfuncs. The verifier picks the proper kfuncs based on the calling context. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250130213549.3353349-6-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-13selftests/bpf: Extend test fs_kfuncs to cover security.bpf. xattr namesSong Liu
Extend test_progs fs_kfuncs to cover different xattr names. Specifically: xattr name "user.kfuncs" and "security.bpf.xxx" can be read from BPF program with kfuncs bpf_get_[file|dentry]_xattr(); while "security.bpf" and "security.selinux" cannot be read. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250130213549.3353349-3-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-13selftests/bpf: Fix stdout race condition in traffic monitorAmery Hung
Fix a race condition between the main test_progs thread and the traffic monitoring thread. The traffic monitor thread tries to print a line using multiple printf and use flockfile() to prevent the line from being torn apart. Meanwhile, the main thread doing io redirection can reassign or close stdout when going through tests. A deadlock as shown below can happen. main traffic_monitor_thread ==== ====================== show_transport() -> flockfile(stdout) stdio_hijack_init() -> stdout = open_memstream(log_buf, log_cnt); ... env.subtest_state->stdout_saved = stdout; ... funlockfile(stdout) stdio_restore_cleanup() -> fclose(env.subtest_state->stdout_saved); After the traffic monitor thread lock stdout, A new memstream can be assigned to stdout by the main thread. Therefore, the traffic monitor thread later will not be able to unlock the original stdout. As the main thread tries to access the old stdout, it will hang indefinitely as it is still locked by the traffic monitor thread. The deadlock can be reproduced by running test_progs repeatedly with traffic monitor enabled: for ((i=1;i<=100;i++)); do ./test_progs -a flow_dissector_skb* -m '*' done Fix this by only calling printf once and remove flockfile()/funlockfile(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250213233217.553258-1-ameryhung@gmail.com