Age | Commit message (Collapse) | Author |
|
With gcc14, when building with RELEASE=1, I hit four below compilation
failure:
Error 1:
In file included from test_loader.c:6:
test_loader.c: In function ‘run_subtest’: test_progs.h:194:17:
error: ‘retval’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
194 | fprintf(stdout, ##format); \
| ^~~~~~~
test_loader.c:958:13: note: ‘retval’ was declared here
958 | int retval, err, i;
| ^~~~~~
The uninitialized var 'retval' actually could cause incorrect result.
Error 2:
In function ‘test_fd_array_cnt’:
prog_tests/fd_array.c:71:14: error: ‘btf_id’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
71 | fd = bpf_btf_get_fd_by_id(id);
| ^~~~~~~~~~~~~~~~~~~~~~~~
prog_tests/fd_array.c:302:15: note: ‘btf_id’ was declared here
302 | __u32 btf_id;
| ^~~~~~
Changing ASSERT_GE to ASSERT_EQ can fix the compilation error. Otherwise,
there is no functionality change.
Error 3:
prog_tests/tailcalls.c: In function ‘test_tailcall_hierarchy_count’:
prog_tests/tailcalls.c:1402:23: error: ‘fentry_data_fd’ may be used uninitialized
in this function [-Werror=maybe-uninitialized]
1402 | err = bpf_map_lookup_elem(fentry_data_fd, &i, &val);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The code is correct. The change intends to silence gcc errors.
Error 4: (this error only happens on arm64)
In file included from prog_tests/log_buf.c:4:
prog_tests/log_buf.c: In function ‘bpf_prog_load_log_buf’:
./test_progs.h:390:22: error: ‘log_buf’ may be used uninitialized [-Werror=maybe-uninitialized]
390 | int ___err = libbpf_get_error(___res); \
| ^~~~~~~~~~~~~~~~~~~~~~~~
prog_tests/log_buf.c:158:14: note: in expansion of macro ‘ASSERT_OK_PTR’
158 | if (!ASSERT_OK_PTR(log_buf, "log_buf_alloc"))
| ^~~~~~~~~~~~~
In file included from selftests/bpf/tools/include/bpf/bpf.h:32,
from ./test_progs.h:36:
selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17:
note: by argument 1 of type ‘const void *’ to ‘libbpf_get_error’ declared here
113 | LIBBPF_API long libbpf_get_error(const void *ptr);
| ^~~~~~~~~~~~~~~~
Adding a pragma to disable maybe-uninitialized fixed the issue.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250617044956.2686668-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
LSM hooks such as security_path_mknod() and security_inode_rename() have
access to newly allocated negative dentry, which has NULL d_inode.
Therefore, it is necessary to do the NULL pointer check for d_inode.
Also add selftests that checks the verifier enforces the NULL pointer
check.
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20250613052857.1992233-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file
callback"), the f_op->mmap() hook has been deprecated in favour of
f_op->mmap_prepare().
Additionally, commit bb666b7c2707 ("mm: add mmap_prepare() compatibility
layer for nested file systems") permits the use of the .mmap_prepare() hook
even in nested filesystems like overlayfs.
There are a number of places where we check only for f_op->mmap - this is
incorrect now mmap_prepare exists, so update all of these to use the
general helper can_mmap_file().
Most notably, this updates the elf logic to allow for the ability to
execute binaries on filesystems which have the .mmap_prepare hook, but
additionally we update nested filesystems.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/b68145b609532e62bab603dd9686faa6562046ec.1750099179.git.lorenzo.stoakes@oracle.com
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The call_mmap() function violates the existing convention in
include/linux/fs.h whereby invocations of virtual file system hooks is
performed by functions prefixed with vfs_xxx().
Correct this by renaming call_mmap() to vfs_mmap(). This also avoids
confusion as to the fact that f_op->mmap_prepare may be invoked here.
Also rename __call_mmap_prepare() function to vfs_mmap_prepare() and adjust
to accept a file parameter, this is useful later for nested file systems.
Finally, fix up the VMA userland tests and ensure the mmap_prepare -> mmap
shim is implemented there.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/8d389f4994fa736aa8f9172bef8533c10a9e9011.1750099179.git.lorenzo.stoakes@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In the current test topology, all the routers are connected to each
other via dedicated links with addresses of the form fcf0:0:x:y::/64.
The test configures rt-3 with an adjacency with rt-4 and rt-4 with an
adjacency with rt-1:
# ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
fcbb:0:300::/48 encap seg6local action End.X nh6 fcf0:0:3:4::4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
# ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
fcbb:0:400::/48 encap seg6local action End.X nh6 fcf0:0:1:4::1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
The routes are used when pinging hs-2 from hs-1 and vice-versa.
Extend the test to also cover End.X behavior with an IPv6 link-local
nexthop address and an output interface. Configure every router
interface with an IPv6 link-local address of the form fe80::x:y/64 and
before re-running the ping tests, replace the previous End.X routes with
routes that use the new IPv6 link-local addresses:
# ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
fcbb:0:300::/48 encap seg6local action End.X nh6 fe80::4:3 oif veth-rt-3-4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
# ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
fcbb:0:400::/48 encap seg6local action End.X nh6 fe80::1:4 oif veth-rt-4-1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
The new test cases fail without the previous patch ("seg6: Allow End.X
behavior to accept an oif"):
# ./srv6_end_x_next_csid_l3vpn_test.sh
[...]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv6), link-local
################################################################################
TEST: IPv6 Hosts connectivity: hs-1 -> hs-2 [FAIL]
TEST: IPv6 Hosts connectivity: hs-2 -> hs-1 [FAIL]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv4), link-local
################################################################################
TEST: IPv4 Hosts connectivity: hs-1 -> hs-2 [FAIL]
TEST: IPv4 Hosts connectivity: hs-2 -> hs-1 [FAIL]
Tests passed: 40
Tests failed: 4
And pass with it:
# ./srv6_end_x_next_csid_l3vpn_test.sh
[...]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv6), link-local
################################################################################
TEST: IPv6 Hosts connectivity: hs-1 -> hs-2 [ OK ]
TEST: IPv6 Hosts connectivity: hs-2 -> hs-1 [ OK ]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv4), link-local
################################################################################
TEST: IPv4 Hosts connectivity: hs-1 -> hs-2 [ OK ]
TEST: IPv4 Hosts connectivity: hs-2 -> hs-1 [ OK ]
Tests passed: 44
Tests failed: 0
Without the previous patch, rt-3 and rt-4 resolve the wrong routes for
the link-local nexthops, with the output interface being the input
interface:
# perf script
[...]
ping 1067 [001] 37.554486: fib6:fib6_table_lookup: table 254 oif 0 iif 11 proto 41 cafe::254/0 -> fe80::4:3/0 flowlabel 0xb7973 tos 0 scope 0 flags 2 ==> dev veth-rt-3-1 gw :: err 0
[...]
ping 1069 [002] 41.573360: fib6:fib6_table_lookup: table 254 oif 0 iif 12 proto 41 cafe::254/0 -> fe80::1:4/0 flowlabel 0xb7973 tos 0 scope 0 flags 2 ==> dev veth-rt-4-2 gw :: err 0
But the correct routes are resolved with the patch:
# perf script
[...]
ping 1066 [006] 30.672355: fib6:fib6_table_lookup: table 254 oif 13 iif 1 proto 41 cafe::254/0 -> fe80::4:3/0 flowlabel 0x85941 tos 0 scope 0 flags 6 ==> dev veth-rt-3-4 gw :: err 0
[...]
ping 1066 [006] 30.672411: fib6:fib6_table_lookup: table 254 oif 11 iif 1 proto 41 cafe::254/0 -> fe80::1:4/0 flowlabel 0x91de0 tos 0 scope 0 flags 6 ==> dev veth-rt-4-1 gw :: err 0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new selftest to verify netconsole module loading with command
line arguments. This test exercises the init_netconsole() path and
validates proper parsing of the netconsole= parameter format.
The test:
- Loads netconsole module with cmdline configuration instead of
dynamic reconfiguration
- Validates message transmission through the configured target
- Adds helper functions for cmdline string generation and module
validation
This complements existing netconsole selftests by covering the
module initialization code path that processes boot-time parameters.
This test is useful to test issues like the one described in [1].
Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorage.com/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-8-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extract the network device and namespace cleanup logic from the
cleanup() function into a new do_cleanup() helper in lib_netcons.sh.
The do_cleanup() function only unconfigure the network and
printk, while cleanup() cleans the netconsole targets plus the network
and printk.
This refactoring let this code to be reused in cases netconsole dynamic
is not being used, as in the upcoming patch.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-7-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add one test to check that the kernel rejects a negative perturb timer.
Add a second test checking that the kernel rejects
a too big perturb timer.
All test results:
1..2
ok 1 cdc1 - Check that a negative perturb timer is rejected
ok 2 a9f0 - Check that a too big perturb timer is rejected
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250613064136.3911944-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Dave Hansen:
"This is a pretty scattered set of fixes. The majority of them are
further fixups around the recent ITS mitigations.
The rest don't really have a coherent story:
- Some flavors of Xen PV guests don't support large pages, but the
set_memory.c code assumes all CPUs support them.
Avoid problems with a quick CPU feature check.
- The TDX code has some wrappers to help retry calls to the TDX
module. They use function pointers to assembly functions and the
compiler usually generates direct CALLs. But some new compilers,
plus -Os turned them in to indirect CALLs and the assembly code was
not annotated for indirect calls.
Force inlining of the helper to fix it up.
- Last, a FRED issue showed up when single-stepping. It's fine when
using an external debugger, but was getting stuck returning from a
SIGTRAP handler otherwise.
Clear the FRED 'swevent' bit to ensure that forward progress is
made"
* tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "mm/execmem: Unify early execmem_cache behaviour"
x86/its: explicitly manage permissions for ITS pages
x86/its: move its_pages array to struct mod_arch_specific
x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set
x86/mm/pat: don't collapse pages without PSE set
x86/virt/tdx: Avoid indirect calls to TDX assembly functions
selftests/x86: Add a test to detect infinite SIGTRAP handler loop
x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
|
|
Link: https://lore.kernel.org/20250612-work-coredump-massage-v1-8-315c0c34ba94@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"9 hotfixes. 3 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels. Only 4 are for MM"
* tag 'mm-hotfixes-stable-2025-06-13-21-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: add mmap_prepare() compatibility layer for nested file systems
init: fix build warnings about export.h
MAINTAINERS: add Barry as a THP reviewer
drivers/rapidio/rio_cm.c: prevent possible heap overwrite
mm: close theoretical race where stale TLB entries could linger
mm/vma: reset VMA iterator on commit_merge() OOM failure
docs: proc: update VmFlags documentation in smaps
scatterlist: fix extraneous '@'-sign kernel-doc notation
selftests/mm: skip failed memfd setups in gup_longterm
|
|
A test case to check if both branches of jset are explored when
computing program CFG.
At 'if r1 & 0x7 ...':
- register 'r2' is computed alive only if jump branch of jset
instruction is followed;
- register 'r0' is computed alive only if fallthrough branch of jset
instruction is followed.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250613175331.3238739-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This commit adds a new field mem_peak / "Peak memory (MiB)" field to a
set of gathered statistics. The field is intended as an estimate for
peak verifier memory consumption for processing of a given program.
Mechanically stat is collected as follows:
- At the beginning of handle_verif_mode() a new cgroup is created
and veristat process is moved into this cgroup.
- At each program load:
- bpf_object__load() is split into bpf_object__prepare() and
bpf_object__load() to avoid accounting for memory allocated for
maps;
- before bpf_object__load():
- a write to "memory.peak" file of the new cgroup is used to reset
cgroup statistics;
- updated value is read from "memory.peak" file and stashed;
- after bpf_object__load() "memory.peak" is read again and
difference between new and stashed values is used as a metric.
If any of the above steps fails veristat proceeds w/o collecting
mem_peak information for a program, reporting mem_peak as -1.
While memcg provides data in bytes (converted from pages), veristat
converts it to megabytes to avoid jitter when comparing results of
different executions.
The change has no measurable impact on veristat running time.
A correlation between "Peak states" and "Peak memory" fields provides
a sanity check for gathered statistics, e.g. a sample of data for
sched_ext programs:
Program Peak states Peak memory (MiB)
------------------------ ----------- -----------------
lavd_select_cpu 2153 44
lavd_enqueue 1982 41
lavd_dispatch 3480 28
layered_dispatch 1417 17
layered_enqueue 760 11
lavd_cpu_offline 349 6
lavd_cpu_online 349 6
lavd_init 394 6
rusty_init 350 5
layered_select_cpu 391 4
...
rusty_stopping 134 1
arena_topology_node_init 170 0
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613072147.3938139-3-eddyz87@gmail.com
|
|
BPF_MAP_TYPE_CGRP_STORAGE doesn't allow non-zero max_entries. So veristat
should not set it to 1.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613050001.1058733-1-song@kernel.org
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
x86:
- Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to
pass only the "untouched" addresses and flipping the shared/private
bit in the implementation.
- Disable SEV-SNP support on initialization failure
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY
KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY
KVM: SEV: Disable SEV-SNP support on initialization failure
KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases
KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases
KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases
KVM: arm64: selftests: Fix help text for arch_timer_edge_cases
KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand
KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg
KVM: arm64: Add RMW specific sysreg accessor
KVM: arm64: Add assignment-specific sysreg accessor
|
|
Add tests for PR_SYS_DISPATCH_INCLUSIVE_ON correct/incorrect args,
and a test that ensures that the specified range is respected
by both PR_SYS_DISPATCH_EXCLUSIVE_ON and PR_SYS_DISPATCH_INCLUSIVE_ON.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/8df4b50b176b073550bab5b3f3faff752c5f8e17.1747839857.git.dvyukov@google.com
|
|
Successful syscalls don't change errno, so checking errno is wrong
to ensure that a syscall has failed. For example for the following
sequence:
prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x0, 0xff, 0);
EXPECT_EQ(EINVAL, errno);
prctl(PR_SET_SYSCALL_USER_DISPATCH, op, 0x0, 0x0, &sel);
EXPECT_EQ(EINVAL, errno);
only the first syscall may fail and set errno, but the second may succeed
and keep errno intact, and the check will falsely pass.
Or if errno happened to be EINVAL before, even the first check may falsely
pass.
Also use EXPECT/ASSERT consistently. Currently there is an inconsistent mix
without obvious reasons for usage of one or another.
Fixes: 179ef035992e ("selftests: Add kselftest for syscall user dispatch")
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/af6a04dbfef9af8570f5bab43e3ef1416b62699a.1747839857.git.dvyukov@google.com
|
|
Nested file systems, that is those which invoke call_mmap() within their
own f_op->mmap() handlers, may encounter underlying file systems which
provide the f_op->mmap_prepare() hook introduced by commit c84bf6dd2b83
("mm: introduce new .mmap_prepare() file callback").
We have a chicken-and-egg scenario here - until all file systems are
converted to using .mmap_prepare(), we cannot convert these nested
handlers, as we can't call f_op->mmap from an .mmap_prepare() hook.
So we have to do it the other way round - invoke the .mmap_prepare() hook
from an .mmap() one.
in order to do so, we need to convert VMA state into a struct vm_area_desc
descriptor, invoking the underlying file system's f_op->mmap_prepare()
callback passing a pointer to this, and then setting VMA state accordingly
and safely.
This patch achieves this via the compat_vma_mmap_prepare() function, which
we invoke from call_mmap() if f_op->mmap_prepare() is specified in the
passed in file pointer.
We place the fundamental logic into mm/vma.h where VMA manipulation
belongs. We also update the VMA userland tests to accommodate the
changes.
The compat_vma_mmap_prepare() function and its associated machinery is
temporary, and will be removed once the conversion of file systems is
complete.
We carefully place this code so it can be used with CONFIG_MMU and also
with cutting edge nommu silicon.
[akpm@linux-foundation.org: export compat_vma_mmap_prepare tp fix build]
[lorenzo.stoakes@oracle.com: remove unused declarations]
Link: https://lkml.kernel.org/r/ac3ae324-4c65-432a-8c6d-2af988b18ac8@lucifer.local
Link: https://lkml.kernel.org/r/20250609165749.344976-1-lorenzo.stoakes@oracle.com
Fixes: c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback").
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/linux-mm/CAG48ez04yOEVx1ekzOChARDDBZzAKwet8PEoPM4Ln3_rk91AzQ@mail.gmail.com/
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
On arm64 with 64KB page size, the selftest xdp_do_redirect failed like
below:
...
test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec
test_max_pkt_size:PASS:prog_run_max_size 0 nsec
test_max_pkt_size:FAIL:prog_run_too_big unexpected prog_run_too_big: actual -28 != expected -22
With 64KB page size, the xdp frame size will be much bigger so
the existing test will fail.
Adjust various parameters so the test can also work on 64K page size.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035042.2208630-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When running BPF selftests on arm64 with a 64K page size, I encountered
the following two test failures:
sockmap_basic/sockmap skb_verdict change tail:FAIL
tc_change_tail:FAIL
With further debugging, I identified the root cause in the following
kernel code within __bpf_skb_change_tail():
u32 max_len = BPF_SKB_MAX_LEN;
u32 min_len = __bpf_skb_min_len(skb);
int ret;
if (unlikely(flags || new_len > max_len || new_len < min_len))
return -EINVAL;
With a 4K page size, new_len = 65535 and max_len = 16064, the function
returns -EINVAL. However, With a 64K page size, max_len increases to
261824, allowing execution to proceed further in the function. This is
because BPF_SKB_MAX_LEN scales with the page size and larger page sizes
result in higher max_len values.
Updating the new_len parameter in both tests based on actual kernel
page size resolved both failures.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035037.2207911-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on
arm64 with 64KB page:
xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL
In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on
when constructing frags, with 64K page size, the frag data_len could
be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail().
To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel
can test different page size properly. With the kernel change, the user
space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default
value is 17, so for 4K page, the maximum packet size will be less than 68K.
To test 64K page, a bigger maximum packet size than 68K is desired. So two
different functions are implemented for subtest xdp_adjust_frags_tail_grow.
Depending on different page size, different data input/output sizes are used
to adapt with different page size.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Spelling fix:
conneciton --> connection
This is a non-functional change aimed at improving code clarity.
Signed-off-by: Ankit Chauhan <ankitchauhan2065@gmail.com>
Link: https://patch.msgid.link/20250610071903.67180-1-ankitchauhan2065@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly
treats it as success due to unsigned promotion. Explicitly check for -1
first.
Fixes: a4b7193d8efd ("selftests/bpf: Add sockmap test for redirecting partial skb data")
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Link: https://lore.kernel.org/r/20250612084208.27722-1-wangfushuai@baidu.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test case absent_mark_in_the_middle_state is equivalent of the
following C program:
1: r8 = bpf_get_prandom_u32();
2: r6 = -32;
3: bpf_iter_num_new(&fp[-8], 0, 10);
4: if (unlikely(bpf_get_prandom_u32()))
5: r6 = -31;
6: for (;;) {
7: if (!bpf_iter_num_next(&fp[-8]))
8: break;
9: if (unlikely(bpf_get_prandom_u32()))
10: *(u64 *)(fp + r6) = 7;
11: }
12: bpf_iter_num_destroy(&fp[-8]);
13: return 0;
W/o a fix that instructs verifier to ignore branches count for loop
entries verification proceeds as follows:
- 1-4, state is {r6=-32,fp-8=active};
- 6, checkpoint A is created with {r6=-32,fp-8=active};
- 7, checkpoint B is created with {r6=-32,fp-8=active},
push state {r6=-32,fp-8=active} from 7 to 9;
- 8,12,13, {r6=-32,fp-8=drained}, exit;
- pop state with {r6=-32,fp-8=active} from 7 to 9;
- 9, push state {r6=-32,fp-8=active} from 9 to 10;
- 6, checkpoint C is created with {r6=-32,fp-8=active};
- 7, checkpoint A is hit, no precision propagated for r6 to C;
- pop state {r6=-32,fp-8=active} from 9 to 10;
- 10, state is {r6=-31,fp-8=active}, r6 is marked as read and precise,
these marks are propagated to checkpoints A and B (but not C, as
it is not the parent of current state;
- 6, {r6=-31,fp-8=active} checkpoint C is hit, because r6 is not
marked precise for this checkpoint;
- the program is accepted, despite a possibility of unaligned u64
stack access at offset -31.
The test case absent_mark_in_the_middle_state2 is similar except the
following change:
r8 = bpf_get_prandom_u32();
r6 = -32;
bpf_iter_num_new(&fp[-8], 0, 10);
if (unlikely(bpf_get_prandom_u32())) {
r6 = -31;
+ jump_into_loop:
+ goto +0;
+ goto loop;
+ }
+ if (unlikely(bpf_get_prandom_u32()))
+ goto jump_into_loop;
+ loop:
for (;;) {
if (!bpf_iter_num_next(&fp[-8]))
break;
if (unlikely(bpf_get_prandom_u32()))
*(u64 *)(fp + r6) = 7;
}
bpf_iter_num_destroy(&fp[-8])
return 0
The goal is to check that read/precision marks are propagated to
checkpoint created at 'goto +0' that resides outside of the loop.
The test case absent_mark_in_the_middle_state3 is a bit different and
is equivalent to the C program below:
int absent_mark_in_the_middle_state3(void)
{
bpf_iter_num_new(&fp[-8], 0, 10)
loop1(-32, &fp[-8])
loop1_wrapper(&fp[-8])
bpf_iter_num_destroy(&fp[-8])
}
int loop1(num, iter)
{
while (bpf_iter_num_next(iter)) {
if (unlikely(bpf_get_prandom_u32()))
*(fp + num) = 7;
}
return 0
}
int loop1_wrapper(iter)
{
r6 = -32;
if (unlikely(bpf_get_prandom_u32()))
r6 = -31;
loop1(r6, iter);
return 0;
}
The unsafe state is reached in a similar manner, but the loop is
located inside a subprogram that is called from two locations in the
main subprogram. This detail is important for exercising
bpf_scc_visit->backedges memory management.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc2).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
|
|
context
Add test_rss_default_context_rule() to verify that ntuple rules can
correctly direct traffic to the default RSS context (context 0).
The test creates two ntuple rules with explicit location priorities:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This validates that:
1. Rules targeting the default context function properly.
2. Traffic steering works as expected when mixing default and
additional RSS contexts.
The test was written by AI, and reviewed by humans.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250612071958.1696361-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds extensive tests for the coredump server.
Link: https://lore.kernel.org/20250603-work-coredump-socket-protocol-v2-5-05a5f0c18ecc@kernel.org
Acked-by: Lennart Poettering <lennart@poettering.net>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make the selftests we added this cycle easier to read.
Link: https://lore.kernel.org/20250603-work-coredump-socket-protocol-v2-3-05a5f0c18ecc@kernel.org
Acked-by: Lennart Poettering <lennart@poettering.net>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix various warnings in the selftest build.
Link: https://lore.kernel.org/20250603-work-coredump-socket-protocol-v2-2-05a5f0c18ecc@kernel.org
Acked-by: Lennart Poettering <lennart@poettering.net>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Unlike the other cases gup_longterm's memfd tests previously skipped the
test when failing to set up the file descriptor to test. Restore this
behavior to avoid hitting failures when hugetlb isn't configured.
Link: https://lkml.kernel.org/r/20250605-selftest-mm-gup-longterm-tweaks-v1-1-2fae34b05958@kernel.org
Fixes: 66bce7afbaca ("selftests/mm: fix test result reporting in gup_longterm")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reported-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Closes: https://lkml.kernel.org/r/a76fc252-0fe3-4d4b-a9a1-4a2895c2680d@lucifer.local
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Simple test for crash involving multicast loopback and stale dst.
Reuse exising NAT46 program.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250610001245.1981782-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This commit introduces a new vmtest.sh runner for vsock.
It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H,
H2G, and loopback. The testing tools from tools/testing/vsock/ are
reused. Currently, only vsock_test is used.
VMCI and hyperv support is included in the config file to be built with
the -b option, though not used in the tests.
Only tested on x86.
To run:
$ make -C tools/testing/selftests TARGETS=vsock
$ tools/testing/selftests/vsock/vmtest.sh
or
$ make -C tools/testing/selftests TARGETS=vsock run_tests
Example runs (after make -C tools/testing/selftests TARGETS=vsock):
$ ./tools/testing/selftests/vsock/vmtest.sh
1..3
ok 0 vm_server_host_client
ok 1 vm_client_host_server
ok 2 vm_loopback
SUMMARY: PASS=3 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_m7DI.log
$ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback
1..1
ok 0 vm_loopback
SUMMARY: PASS=1 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_a1IO.log
$ mkdir -p ~/scratch
$ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch
[... omitted ...]
$ cd ~/scratch
$ ./run_kselftest.sh
TAP version 13
1..1
# timeout set to 300
# selftests: vsock: vmtest.sh
# 1..3
# ok 0 vm_server_host_client
# ok 1 vm_client_host_server
# ok 2 vm_loopback
# SUMMARY: PASS=3 SKIP=0 FAIL=0
# Log: /tmp/vsock_vmtest_svEl.log
ok 1 selftests: vsock: vmtest.sh
Future work can include vsock_diag_test.
Because vsock requires a VM to test anything other than loopback, this
patch adds vmtest.sh as a kselftest itself. This is different than other
systems that have a "vmtest.sh", where it is used as a utility script to
spin up a VM to run the selftests as a guest (but isn't hooked into
kselftest).
Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250609-vsock-vmtest-v10-1-7f37198e1cd4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On arm64, the cgroup_mprog_ordering selftest failed with test_progs run
when building with clang compiler. The reason is due to socklen_t optlen
not initialized.
In kernel function do_ip_getsockopt(), we have
if (copy_from_sockptr(&len, optlen, sizeof(int)))
return -EFAULT;
if (len < 0)
return -EINVAL;
The above 'len' variable is a negative value and hence the test failed.
But the test is okay on x86_64. I checked the x86_64 asm code and I didn't
see explicit initialization of 'optlen' but its value is 0 so kernel
didn't return error. This should be a pure luck.
Fix the bug by initializing 'oplen' var properly.
Fixes: e422d5f118e4 ("selftests/bpf: Add two selftests for mprog API based cgroup progs")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250611162103.1623692-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.16, take #2
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
|
|
The selftest can reproduce an issue where using bpf_msg_pop_data() in
ktls causes errors on the receiving end.
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20250609020910.397930-3-jiayuan.chen@linux.dev
|
|
Most of the packetdrill tests have not flaked once last week.
Add the few which did to the XFAIL list.
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250610000001.1970934-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Once the THREADED napi is disabled, the napi kthread should also be
stopped. Keeping the kthread intact after disabling THREADED napi makes
the PID of this kthread show up in the output of netlink 'napi-get' and
ps -ef output.
The is discussed in the patch below:
https://lore.kernel.org/all/20250502191548.559cc416@kernel.org
NAPI kthread should stop only if,
- There are no pending napi poll scheduled for this thread.
- There are no new napi poll scheduled for this thread while it has
stopped.
- The ____napi_schedule can correctly fallback to the softirq for napi
polling.
Since napi_schedule_prep provides mutual exclusion over STATE_SCHED bit,
it is safe to unset the STATE_THREADED when SCHED_THREADED is set or the
SCHED bit is not set. SCHED_THREADED being set means that SCHED is
already set and the kthread owns this napi.
To disable threaded napi, unset STATE_THREADED bit safely if
SCHED_THREADED is set or SCHED is unset. Once STATE_THREADED is unset
safely then wait for the kthread to unset the SCHED_THREADED bit so it
safe to stop the kthread.
Add a new test in nl_netdev to verify this behaviour.
Tested:
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..6
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.nsim_rxq_reset_down
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
Ran neper for 300 seconds and did enable/disable of thread napi in a
loop continuously.
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250609173015.3851695-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend the netconsole selftest to validate both basic and extended
target formats. The basic format is a simpler variant that doesn't
support userdata or release functionality.
The test now validates that netconsole works correctly in both
configurations, improving test coverage for different netconsole
deployment scenarios.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250609-netcons_ext-v3-4-5336fa670326@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the exit call from validate_result() function and move the
test exit logic to the main script. This allows the function to
be reused in scenarios where the test needs to continue execution
after validation, rather than terminating immediately.
The validate_result() function should focus on validation logic
only, while the calling script maintains control over program
flow and exit conditions. This change improves code modularity
and prepares for potential future enhancements where multiple
validations might be needed in a single test run.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250609-netcons_ext-v3-3-5336fa670326@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nolibc only supports symbol-based stackprotectors, based on the global
variable __stack_chk_guard. Support for this differs between
architectures and toolchains. Some use the symbol mode by default, some
require a flag to enable it and some don't support it at all.
Before the nolibc test Makefile required the availability of
"-mstack-protector-guard=global" to enable stackprotectors.
While this flag makes sure that the correct mode is available it doesn't
work where the correct mode is the only supported one and therefore the
flag is not implemented.
Switch to a more dynamic probing mechanism.
This correctly enables stack protectors for mips, loongarch and m68k.
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20250609-nolibc-stackprotector-robust-v1-1-a1cfc92a568a@weissschuh.net
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
This is based on the gadget from the description of commit 9183671af6db
("bpf: Fix leakage under speculation on mispredicted branches").
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250603212814.338867-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This implements the core of the series and causes the verifier to fall
back to mitigating Spectre v1 using speculation barriers. The approach
was presented at LPC'24 [1] and RAID'24 [2].
If we find any forbidden behavior on a speculative path, we insert a
nospec (e.g., lfence speculation barrier on x86) before the instruction
and stop verifying the path. While verifying a speculative path, we can
furthermore stop verification of that path whenever we encounter a
nospec instruction.
A minimal example program would look as follows:
A = true
B = true
if A goto e
f()
if B goto e
unsafe()
e: exit
There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):
- A = true
- B = true
- if A goto e
- A && !cur->speculative && !speculative
- exit
- !A && !cur->speculative && speculative
- f()
- if B goto e
- B && cur->speculative && !speculative
- exit
- !B && cur->speculative && speculative
- unsafe()
If f() contains any unsafe behavior under Spectre v1 and the unsafe
behavior matches `state->speculative &&
error_recoverable_with_nospec(err)`, do_check() will now add a nospec
before f() instead of rejecting the program:
A = true
B = true
if A goto e
nospec
f()
if B goto e
unsafe()
e: exit
Alternatively, the algorithm also takes advantage of nospec instructions
inserted for other reasons (e.g., Spectre v4). Taking the program above
as an example, speculative path exploration can stop before f() if a
nospec was inserted there because of Spectre v4 sanitization.
In this example, all instructions after the nospec are dead code (and
with the nospec they are also dead code speculatively).
For this, it relies on the fact that speculation barriers generally
prevent all later instructions from executing if the speculation was not
correct:
* On Intel x86_64, lfence acts as full speculation barrier, not only as
a load fence [3]:
An LFENCE instruction or a serializing instruction will ensure that
no later instructions execute, even speculatively, until all prior
instructions complete locally. [...] Inserting an LFENCE instruction
after a bounds check prevents later operations from executing before
the bound check completes.
This was experimentally confirmed in [4].
* On AMD x86_64, lfence is dispatch-serializing [5] (requires MSR
C001_1029[1] to be set if the MSR is supported, this happens in
init_amd()). AMD further specifies "A dispatch serializing instruction
forces the processor to retire the serializing instruction and all
previous instructions before the next instruction is executed" [8]. As
dispatch is not specific to memory loads or branches, lfence therefore
also affects all instructions there. Also, if retiring a branch means
it's PC change becomes architectural (should be), this means any
"wrong" speculation is aborted as required for this series.
* ARM's SB speculation barrier instruction also affects "any instruction
that appears later in the program order than the barrier" [6].
* PowerPC's barrier also affects all subsequent instructions [7]:
[...] executing an ori R31,R31,0 instruction ensures that all
instructions preceding the ori R31,R31,0 instruction have completed
before the ori R31,R31,0 instruction completes, and that no
subsequent instructions are initiated, even out-of-order, until
after the ori R31,R31,0 instruction completes. The ori R31,R31,0
instruction may complete before storage accesses associated with
instructions preceding the ori R31,R31,0 instruction have been
performed
Regarding the example, this implies that `if B goto e` will not execute
before `if A goto e` completes. Once `if A goto e` completes, the CPU
should find that the speculation was wrong and continue with `exit`.
If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then a nospec will
still be needed there. However, this patch assumes this other path will
be explored separately and therefore be discovered by the verifier even
if the exploration discussed here stops at the nospec.
This patch furthermore has the unfortunate consequence that Spectre v1
mitigations now only support architectures which implement BPF_NOSPEC.
Before this commit, Spectre v1 mitigations prevented exploits by
rejecting the programs on all architectures. Because some JITs do not
implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's
security to a limited extent:
* The regression is limited to systems vulnerable to Spectre v1, have
unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The
latter is not the case for x86 64- and 32-bit, arm64, and powerpc
64-bit and they are therefore not affected by the regression.
According to commit a6f6a95f2580 ("LoongArch, bpf: Fix jit to skip
speculation barrier opcode"), LoongArch is not vulnerable to Spectre
v1 and therefore also not affected by the regression.
* To the best of my knowledge this regression may therefore only affect
MIPS. This is deemed acceptable because unpriv BPF is still disabled
there by default. As stated in a previous commit, BPF_NOSPEC could be
implemented for MIPS based on GCC's speculation_barrier
implementation.
* It is unclear which other architectures (besides x86 64- and 32-bit,
ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel
are vulnerable to Spectre v1. Also, it is not clear if barriers are
available on these architectures. Implementing BPF_NOSPEC on these
architectures therefore is non-trivial. Searching GCC and the kernel
for speculation barrier implementations for these architectures
yielded no result.
* If any of those regressed systems is also vulnerable to Spectre v4,
the system was already vulnerable to Spectre v4 attacks based on
unpriv BPF before this patch and the impact is therefore further
limited.
As an alternative to regressing security, one could still reject
programs if the architecture does not emit BPF_NOSPEC (e.g., by removing
the empty BPF_NOSPEC-case from all JITs except for LoongArch where it
appears justified). However, this will cause rejections on these archs
that are likely unfounded in the vast majority of cases.
In the tests, some are now successful where we previously had a
false-positive (i.e., rejection). Change them to reflect where the
nospec should be inserted (using __xlated_unpriv) and modify the error
message if the nospec is able to mitigate a problem that previously
shadowed another problem (in that case __xlated_unpriv does not work,
therefore just add a comment).
Define SPEC_V1 to avoid duplicating this ifdef whenever we check for
nospec insns using __xlated_unpriv, define it here once. This also
improves readability. PowerPC can probably also be added here. However,
omit it for now because the BPF CI currently does not include a test.
Limit it to EPERM, EACCES, and EINVAL (and not everything except for
EFAULT and ENOMEM) as it already has the desired effect for most
real-world programs. Briefly went through all the occurrences of EPERM,
EINVAL, and EACCESS in verifier.c to validate that catching them like
this makes sense.
Thanks to Dustin for their help in checking the vendor documentation.
[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
("Managed Runtime Speculative Execution Side Channel Mitigations")
[4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a
tool to analyze speculative execution attacks and mitigations" -
Section 4.6 "Stopping Speculative Execution")
[5] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf
("White Paper - SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD
PROCESSORS - REVISION 5.09.23")
[6] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB--Speculation-Barrier-
("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set
Architecture (2020-12)")
[7] https://wiki.raptorcs.com/w/images/5/5f/OPF_PowerISA_v3.1C.pdf
("Power ISA™ - Version 3.1C - May 26, 2024 - Section 9.2.1 of Book
III")
[8] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
("AMD64 Architecture Programmer’s Manual Volumes 1–5 - Revision 4.08
- April 2024 - 7.6.4 Serializing Instructions")
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Dustin Nguyen <nguyen@cs.fau.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603212428.338473-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Adding tests for getting cookie with fill_link_info for tracing.
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-2-chen.dylane@linux.dev
|
|
A test requires the following to happen:
* CONST_PTR_TO_MAP value is checked for null
* the code in the null branch fails verification
Add test cases:
* direct global map_ptr comparison to null
* lookup inner map, then two checks (the first transforms
map_value_or_null into map_ptr)
* lookup inner map, spill-fill it, then check for null
* use an array of ringbufs to recreate a common coding pattern [1]
[1] https://lore.kernel.org/bpf/CAEf4BzZNU0gX_sQ8k8JaLe1e+Veth3Rk=4x7MDhv=hQxvO8EDw@mail.gmail.com/
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-4-isolodrai@meta.com
|
|
Add a test for CONST_PTR_TO_MAP comparison with a non-0 constant. A
BPF program with this code must not pass verification in unpriv.
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-3-isolodrai@meta.com
|
|
When reg->type is CONST_PTR_TO_MAP, it can not be null. However the
verifier explores the branches under rX == 0 in check_cond_jmp_op()
even if reg->type is CONST_PTR_TO_MAP, because it was not checked for
in reg_not_null().
Fix this by adding CONST_PTR_TO_MAP to the set of types that are
considered non nullable in reg_not_null().
An old "unpriv: cmp map pointer with zero" selftest fails with this
change, because now early out correctly triggers in
check_cond_jmp_op(), making the verification to pass.
In practice verifier may allow pointer to null comparison in unpriv,
since in many cases the relevant branch and comparison op are removed
as dead code. So change the expected test result to __success_unpriv.
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-2-isolodrai@meta.com
|
|
Two tests are added:
- cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link
attach are tested. Some negative tests are also included.
- cgroup_mprog_ordering, which actually runs the program with some mprog
API flags.
[1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163156.2429955-1-yonghong.song@linux.dev
|
|
Move static inline functions id_from_prog_fd() and id_from_link_fd()
from prog_tests/tc_helpers.h to test_progs.h so these two functions
can be reused for later cgroup mprog selftests.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163151.2429325-1-yonghong.song@linux.dev
|
|
When FRED is enabled, if the Trap Flag (TF) is set without an external
debugger attached, it can lead to an infinite loop in the SIGTRAP
handler. To avoid this, the software event flag in the augmented SS
must be cleared, ensuring that no single-step trap remains pending when
ERETU completes.
This test checks for that specific scenario—verifying whether the kernel
correctly prevents an infinite SIGTRAP loop in this edge case when FRED
is enabled.
The test should _always_ pass with IDT event delivery, thus no need to
disable the test even when FRED is not enabled.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250609084054.2083189-3-xin%40zytor.com
|