Age | Commit message (Collapse) | Author |
|
Add selftests for cgroup1 hierarchy.
The result as follows,
$ tools/testing/selftests/bpf/test_progs --name=cgroup1_hierarchy
#36/1 cgroup1_hierarchy/test_cgroup1_hierarchy:OK
#36/2 cgroup1_hierarchy/test_root_cgid:OK
#36/3 cgroup1_hierarchy/test_invalid_level:OK
#36/4 cgroup1_hierarchy/test_invalid_cgid:OK
#36/5 cgroup1_hierarchy/test_invalid_hid:OK
#36/6 cgroup1_hierarchy/test_invalid_cgrp_name:OK
#36/7 cgroup1_hierarchy/test_invalid_cgrp_name2:OK
#36/8 cgroup1_hierarchy/test_sleepable_prog:OK
#36 cgroup1_hierarchy:OK
Summary: 1/8 PASSED, 0 SKIPPED, 0 FAILED
Besides, I also did some stress test similar to the patch #2 in this
series, as follows (with CONFIG_PROVE_RCU_LIST enabled):
- Continuously mounting and unmounting named cgroups in some tasks,
for example:
cgrp_name=$1
while true
do
mount -t cgroup -o none,name=$cgrp_name none /$cgrp_name
umount /$cgrp_name
done
- Continuously run this selftest concurrently,
while true; do ./test_progs --name=cgroup1_hierarchy; done
They can ran successfully without any RCU warnings in dmesg.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231111090034.4248-7-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A new cgroup helper function, get_cgroup1_hierarchy_id(), has been
introduced to obtain the ID of a cgroup1 hierarchy based on the provided
cgroup name. This cgroup name can be obtained from the /proc/self/cgroup
file.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231111090034.4248-6-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce a new helper function to retrieve the cgroup ID from a net_cls
cgroup directory.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231111090034.4248-5-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Include the current pid in the classid cgroup path. This way, different
testers relying on classid-based configurations will have distinct classid
cgroup directories, enabling them to run concurrently. Additionally, we
leverage the current pid as the classid, ensuring unique identification.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231111090034.4248-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If the net_cls subsystem is already mounted, attempting to mount it again
in setup_classid_environment() will result in a failure with the error code
EBUSY. Despite this, tmpfs will have been successfully mounted at
/sys/fs/cgroup/net_cls. Consequently, the /sys/fs/cgroup/net_cls directory
will be empty, causing subsequent setup operations to fail.
Here's an error log excerpt illustrating the issue when net_cls has already
been mounted at /sys/fs/cgroup/net_cls prior to running
setup_classid_environment():
- Before that change
$ tools/testing/selftests/bpf/test_progs --name=cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup.procs
(cgroup_helpers.c:540: errno: No such file or directory) Opening cgroup classid: /sys/fs/cgroup/net_cls/cgroup-test-work-dir/net_cls.classid
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup-test-work-dir/cgroup.procs
run_test:FAIL:join_classid unexpected error: 1 (errno 2)
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 2)
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup.procs
#44 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
- After that change
$ tools/testing/selftests/bpf/test_progs --name=cgroup_v1v2
#44 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231111090034.4248-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce several new KVM uAPIs to ultimately create a guest-first memory
subsystem within KVM, a.k.a. guest_memfd. Guest-first memory allows KVM
to provide features, enhancements, and optimizations that are kludgly
or outright impossible to implement in a generic memory subsystem.
The core KVM ioctl() for guest_memfd is KVM_CREATE_GUEST_MEMFD, which
similar to the generic memfd_create(), creates an anonymous file and
returns a file descriptor that refers to it. Again like "regular"
memfd files, guest_memfd files live in RAM, have volatile storage,
and are automatically released when the last reference is dropped.
The key differences between memfd files (and every other memory subystem)
is that guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
convert a guest memory area between the shared and guest-private states.
A second KVM ioctl(), KVM_SET_MEMORY_ATTRIBUTES, allows userspace to
specify attributes for a given page of guest memory. In the long term,
it will likely be extended to allow userspace to specify per-gfn RWX
protections, including allowing memory to be writable in the guest
without it also being writable in host userspace.
The immediate and driving use case for guest_memfd are Confidential
(CoCo) VMs, specifically AMD's SEV-SNP, Intel's TDX, and KVM's own pKVM.
For such use cases, being able to map memory into KVM guests without
requiring said memory to be mapped into the host is a hard requirement.
While SEV+ and TDX prevent untrusted software from reading guest private
data by encrypting guest memory, pKVM provides confidentiality and
integrity *without* relying on memory encryption. In addition, with
SEV-SNP and especially TDX, accessing guest private memory can be fatal
to the host, i.e. KVM must be prevent host userspace from accessing
guest memory irrespective of hardware behavior.
Long term, guest_memfd may be useful for use cases beyond CoCo VMs,
for example hardening userspace against unintentional accesses to guest
memory. As mentioned earlier, KVM's ABI uses userspace VMA protections to
define the allow guest protection (with an exception granted to mapping
guest memory executable), and similarly KVM currently requires the guest
mapping size to be a strict subset of the host userspace mapping size.
Decoupling the mappings sizes would allow userspace to precisely map
only what is needed and with the required permissions, without impacting
guest performance.
A guest-first memory subsystem also provides clearer line of sight to
things like a dedicated memory pool (for slice-of-hardware VMs) and
elimination of "struct page" (for offload setups where userspace _never_
needs to DMA from or into guest memory).
guest_memfd is the result of 3+ years of development and exploration;
taking on memory management responsibilities in KVM was not the first,
second, or even third choice for supporting CoCo VMs. But after many
failed attempts to avoid KVM-specific backing memory, and looking at
where things ended up, it is quite clear that of all approaches tried,
guest_memfd is the simplest, most robust, and most extensible, and the
right thing to do for KVM and the kernel at-large.
The "development cycle" for this version is going to be very short;
ideally, next week I will merge it as is in kvm/next, taking this through
the KVM tree for 6.8 immediately after the end of the merge window.
The series is still based on 6.6 (plus KVM changes for 6.7) so it
will require a small fixup for changes to get_file_rcu() introduced in
6.7 by commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU").
The fixup will be done as part of the merge commit, and most of the text
above will become the commit message for the merge.
Pending post-merge work includes:
- hugepage support
- looking into using the restrictedmem framework for guest memory
- introducing a testing mechanism to poison memory, possibly using
the same memory attributes introduced here
- SNP and TDX support
There are two non-KVM patches buried in the middle of this series:
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
|
|
Add a subtest to set_memory_region_test to verify that KVM rejects invalid
flags and combinations with -EINVAL. KVM might or might not fail with
EINVAL anyways, but we can at least try.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231031002049.3915752-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
"Testing private access when memslot gets deleted" tests the behavior
of KVM when a private memslot gets deleted while the VM is using the
private memslot. When KVM looks up the deleted (slot = NULL) memslot,
KVM should exit to userspace with KVM_EXIT_MEMORY_FAULT.
In the second test, upon a private access to non-private memslot, KVM
should also exit to userspace with KVM_EXIT_MEMORY_FAULT.
Intentionally don't take a requirement on KVM_CAP_GUEST_MEMFD,
KVM_CAP_MEMORY_FAULT_INFO, KVM_MEMORY_ATTRIBUTE_PRIVATE, etc., as it's a
KVM bug to advertise KVM_X86_SW_PROTECTED_VM without its prerequisites.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
[sean: call out the similarities with set_memory_region_test]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-36-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a selftest to verify the basic functionality of guest_memfd():
+ file descriptor created with the guest_memfd() ioctl does not allow
read/write/mmap operations
+ file size and block size as returned from fstat are as expected
+ fallocate on the fd checks that offset/length on
fallocate(FALLOC_FL_PUNCH_HOLE) should be page aligned
+ invalid inputs (misaligned size, invalid flags) are rejected
+ file size and inode are unique (the innocuous-sounding
anon_inode_getfile() backs all files with a single inode...)
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-35-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Expand set_memory_region_test to exercise various positive and negative
testcases for private memory.
- Non-guest_memfd() file descriptor for private memory
- guest_memfd() from different VM
- Overlapping bindings
- Unaligned bindings
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
[sean: trim the testcases to remove duplicate coverage]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-34-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add helpers to invoke KVM_SET_USER_MEMORY_REGION2 directly so that tests
can validate of features that are unique to "version 2" of "set user
memory region", e.g. do negative testing on gmem_fd and gmem_offset.
Provide a raw version as well as an assert-success version to reduce
the amount of boilerplate code need for basic usage.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-33-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a selftest to exercise implicit/explicit conversion functionality
within KVM and verify:
- Shared memory is visible to host userspace
- Private memory is not visible to host userspace
- Host userspace and guest can communicate over shared memory
- Data in shared backing is preserved across conversions (test's
host userspace doesn't free the data)
- Private memory is bound to the lifetime of the VM
Ideally, KVM's selftests infrastructure would be reworked to allow backing
a single region of guest memory with multiple memslots for _all_ backing
types and shapes, i.e. ideally the code for using a single backing fd
across multiple memslots would work for "regular" memory as well. But
sadly, support for KVM_CREATE_GUEST_MEMFD has languished for far too long,
and overhauling selftests' memslots infrastructure would likely open a can
of worms, i.e. delay things even further.
In addition to the more obvious tests, verify that PUNCH_HOLE actually
frees memory. Directly verifying that KVM frees memory is impractical, if
it's even possible, so instead indirectly verify memory is freed by
asserting that the guest reads zeroes after a PUNCH_HOLE. E.g. if KVM
zaps SPTEs but doesn't actually punch a hole in the inode, the subsequent
read will still see the previous value. And obviously punching a hole
shouldn't cause explosions.
Let the user specify the number of memslots in the private mem conversion
test, i.e. don't require the number of memslots to be '1' or "nr_vcpus".
Creating more memslots than vCPUs is particularly interesting, e.g. it can
result in a single KVM_SET_MEMORY_ATTRIBUTES spanning multiple memslots.
To keep the math reasonable, align each vCPU's chunk to at least 2MiB (the
size is 2MiB+4KiB), and require the total size to be cleanly divisible by
the number of memslots. The goal is to be able to validate that KVM plays
nice with multiple memslots, being able to create a truly arbitrary number
of memslots doesn't add meaningful value, i.e. isn't worth the cost.
Intentionally don't take a requirement on KVM_CAP_GUEST_MEMFD,
KVM_CAP_MEMORY_FAULT_INFO, KVM_MEMORY_ATTRIBUTE_PRIVATE, etc., as it's a
KVM bug to advertise KVM_X86_SW_PROTECTED_VM without its prerequisites.
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-32-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add GUEST_SYNC[1-6]() so that tests can pass the maximum amount of
information supported via ucall(), without needing to resort to shared
memory.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-31-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a "vm_shape" structure to encapsulate the selftests-defined "mode",
along with the KVM-defined "type" for use when creating a new VM. "mode"
tracks physical and virtual address properties, as well as the preferred
backing memory type, while "type" corresponds to the VM type.
Taking the VM type will allow adding tests for KVM_CREATE_GUEST_MEMFD
without needing an entirely separate set of helpers. At this time,
guest_memfd is effectively usable only by confidential VM types in the
form of guest private memory, and it's expected that x86 will double down
and require unique VM types for TDX and SNP guests.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-30-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add helpers for x86 guests to invoke the KVM_HC_MAP_GPA_RANGE hypercall,
which KVM will forward to userspace and thus can be used by tests to
coordinate private<=>shared conversions between host userspace code and
guest code.
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
[sean: drop shared/private helpers (let tests specify flags)]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-29-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add helpers to convert memory between private and shared via KVM's
memory attributes, as well as helpers to free/allocate guest_memfd memory
via fallocate(). Userspace, i.e. tests, is NOT required to do fallocate()
when converting memory, as the attributes are the single source of truth.
Provide allocate() helpers so that tests can mimic a userspace that frees
private memory on conversion, e.g. to prioritize memory usage over
performance.
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-28-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add support for creating "private" memslots via KVM_CREATE_GUEST_MEMFD and
KVM_SET_USER_MEMORY_REGION2. Make vm_userspace_mem_region_add() a wrapper
to its effective replacement, vm_mem_add(), so that private memslots are
fully opt-in, i.e. don't require update all tests that add memory regions.
Pivot on the KVM_MEM_PRIVATE flag instead of the validity of the "gmem"
file descriptor so that simple tests can let vm_mem_add() do the heavy
lifting of creating the guest memfd, but also allow the caller to pass in
an explicit fd+offset so that fancier tests can do things like back
multiple memslots with a single file. If the caller passes in a fd, dup()
the fd so that (a) __vm_mem_region_delete() can close the fd associated
with the memory region without needing yet another flag, and (b) so that
the caller can safely close its copy of the fd without having to first
destroy memslots.
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-27-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use KVM_SET_USER_MEMORY_REGION2 throughout KVM's selftests library so that
support for guest private memory can be added without needing an entirely
separate set of helpers.
Note, this obviously makes selftests backwards-incompatible with older KVM
versions from this point forward.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-26-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop kvm_userspace_memory_region_find(), it's unused and a terrible API
(probably why it's unused). If anything outside of kvm_util.c needs to
get at the memslot, userspace_mem_region_find() can be exposed to give
others full access to all memory region/slot information.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-25-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This is a follow up to:
commit b8e3a87a627b ("bpf: Add crosstask check to __bpf_get_stack").
This test ensures that the task iterator only gets a single
user stack (for the current task).
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20231112023010.144675-1-linux@jordanrome.com
|
|
When the kernel code has changed the build may ask for configuration
input and hang. Prevent this and instead use the default settings.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Add selftests for the three system calls supporting the LSM
infrastructure. This set of tests is limited by the differences
in access policy enforced by the existing security modules.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Mickaël Salaün <mic@digikod.net>
Tested-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Wireup lsm_get_self_attr, lsm_set_self_attr and lsm_list_modules
system calls.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-api@vger.kernel.org
Reviewed-by: Mickaël Salaün <mic@digikod.net>
[PM: forward ported beyond v6.6 due merge window changes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
To make CPUs in isolated cpuset partition closer in isolation to
the boot time isolated CPUs specified in the "isolcpus" boot command
line option, we need to take those CPUs out of the workqueue unbound
cpumask so that work functions from the unbound workqueues won't run
on those CPUs. Otherwise, they will interfere the user tasks running
on those isolated CPUs.
With the introduction of the workqueue_unbound_exclude_cpumask() helper
function in an earlier commit, those isolated CPUs can now be taken
out from the workqueue unbound cpumask.
This patch also updates cgroup-v2.rst to mention that isolated
CPUs will be excluded from unbound workqueue cpumask as well as
updating test_cpuset_prs.sh to verify the correctness of the new
*cpuset.cpus.isolated file, if available via cgroup_debug option.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Minor cleanup of test matrix and relocation of test_isolated() function
to prepare for the next patch. There is no functional change.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- support PREEMPT_DYNAMIC with static keys
- relax memory ordering for atomic operations
- support BPF CPU v4 instructions for LoongArch
- some build and runtime warning fixes
* tag 'loongarch-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
selftests/bpf: Enable cpu v4 tests for LoongArch
LoongArch: BPF: Support signed mod instructions
LoongArch: BPF: Support signed div instructions
LoongArch: BPF: Support 32-bit offset jmp instructions
LoongArch: BPF: Support unconditional bswap instructions
LoongArch: BPF: Support sign-extension mov instructions
LoongArch: BPF: Support sign-extension load instructions
LoongArch: Add more instruction opcodes and emit_* helpers
LoongArch/smp: Call rcutree_report_cpu_starting() earlier
LoongArch: Relax memory ordering for atomic operations
LoongArch: Mark __percpu functions as always inline
LoongArch: Disable module from accessing external data directly
LoongArch: Support PREEMPT_DYNAMIC with static keys
|
|
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
|
|
Some small fixes:
- lets make sure we are not adding ipv4 addresses in ipv6 section in
keyfile and vice versa.
- ADDR_FAMILY_IPV6 is a bit in addr_family. Test that bit instead of
checking the whole value of addr_family.
- Some trivial fixes in hv_set_ifconfig.sh.
These fixes are proposed after doing some internal testing at Red Hat.
CC: Shradha Gupta <shradhagupta@linux.microsoft.com>
CC: Saurabh Sengar <ssengar@linux.microsoft.com>
Fixes: 42999c904612 ("hv/hv_kvp_daemon:Support for keyfile based connection profile")
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Shradha Gupta <Shradhagupta@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20231016133122.2419537-1-anisinha@redhat.com>
|
|
Currently get_perf_callchain only supports user stack walking for
the current task. Passing the correct *crosstask* param will return
0 frames if the task passed to __bpf_get_stack isn't the current
one instead of a single incorrect frame/address. This change
passes the correct *crosstask* param but also does a preemptive
check in __bpf_get_stack if the task is current and returns
-EOPNOTSUPP if it is not.
This issue was found using bpf_get_task_stack inside a BPF
iterator ("iter/task"), which iterates over all tasks.
bpf_get_task_stack works fine for fetching kernel stacks
but because get_perf_callchain relies on the caller to know
if the requested *task* is the current one (via *crosstask*)
it was failing in a confusing way.
It might be possible to get user stacks for all tasks utilizing
something like access_process_vm but that requires the bpf
program calling bpf_get_task_stack to be sleepable and would
therefore be a breaking change.
Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Signed-off-by: Jordan Rome <jordalgo@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231108112334.3433136-1-jordalgo@meta.com
|
|
Now it has a feature check for the dwarf_getcfi(), use it and convert
the code to check HAVE_DWARF_CFI_SUPPORT definition.
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The dwarf_getcfi() is available on libdw 0.142+. Instead of just
checking the version number, it'd be nice to have a config item to check
the feature at build time.
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The die_find_variable_by_reg() will search for a variable or a parameter
sub-DIE in the given scope DIE where the location matches to the given
register.
For the simplest and most common case, memory access usually happens
with a base register and an offset to the field so the register holds a
pointer in a variable or function parameter. Then we can find one if it
has a location expression at the (instruction) address. This function
only handles such a simple case for now.
In this case, the expression has a DW_OP_regN operation where N < 32.
If the register index (N) is greater than or equal to 32, DW_OP_regx
operation with an operand which saves the value for the N would be used.
It rejects expressions with more operations.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The die_get_scopes() returns the number of enclosing DIEs for the given
address and it fills an array of DIEs like dwarf_getscopes(). But it
doesn't follow the abstract origin of inlined functions as we want
information of the concrete instance. This is needed to check the
location of parameters and local variables properly. Users can check
the origin separately if needed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
code to the header file
It's a usual convention that the conditional code is handled in a header
file. As I'm planning to add some more of them, let's move the current
code to the header first.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The die_get_typename() is to return a C-like type name from DWARF debug
entry and it follows data type if the target entry is a pointer type.
But I found that void pointers don't have the type attribute to follow
and then the function returns an error for that case. This results in a
broken type string for void pointer types.
For example, the following type entries are pointer types.
<1><48c>: Abbrev Number: 4 (DW_TAG_pointer_type)
<48d> DW_AT_byte_size : 8
<48d> DW_AT_type : <0x481>
<1><491>: Abbrev Number: 211 (DW_TAG_pointer_type)
<493> DW_AT_byte_size : 8
<1><494>: Abbrev Number: 4 (DW_TAG_pointer_type)
<495> DW_AT_byte_size : 8
<495> DW_AT_type : <0x49e>
The first one at offset 48c and the third one at offset 494 have type
information. Then they are pointer types for the referenced types. But
the second one at offset 491 doesn't have the type attribute.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Split debuginfo data structure and related functions into a separate
file so that it can be used by other components than the probe-finder.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
ins_operands'
Thoese two fields are used only for the jump_ops, so move them into the
union to save some bytes. Also add jump__delete() callback not to free
the fields as they didn't allocate new strings.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: WANG Rui <wangrui@loongson.cn>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The "-l" option is to print line numbers in the objdump output. perf
annotate TUI only can show the line numbers later but it causes big slow
downs for the kernel binary.
Similarly, showing source code also takes a long time and it already has
an option to control it.
$ time objdump ... -d -S -C vmlinux > /dev/null
real 0m3.474s
user 0m3.047s
sys 0m0.428s
$ time objdump ... -d -l -C vmlinux > /dev/null
real 0m1.796s
user 0m1.459s
sys 0m0.338s
$ time objdump ... -d -C vmlinux > /dev/null
real 0m0.051s
user 0m0.036s
sys 0m0.016s
As it's not needed for data type profiling, let's make it conditional so
that it can skip the unnecessary work.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
x86 core PMU exposes supported maximum precision level via max_precise
PMU capability. Although, AMD core PMU does not support precise mode,
certain core PMU events with precise_ip > 0 are allowed and forwarded to
IBS OP PMU.
Display a note about this in the 'perf report' header output and
document the details in the perf-list man page.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231107083331.901-2-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a few more simple cases to validate proper privileged vs unprivileged
loop detection behavior. conditional_loop2 is the one reported by Hao
Sun that triggered this set of fixes.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Suggested-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231110061412.2995786-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When BPF program is verified in privileged mode, BPF verifier allows
bounded loops. This means that from CFG point of view there are
definitely some back-edges. Original commit adjusted check_cfg() logic
to not detect back-edges in control flow graph if they are resulting
from conditional jumps, which the idea that subsequent full BPF
verification process will determine whether such loops are bounded or
not, and either accept or reject the BPF program. At least that's my
reading of the intent.
Unfortunately, the implementation of this idea doesn't work correctly in
all possible situations. Conditional jump might not result in immediate
back-edge, but just a few unconditional instructions later we can arrive
at back-edge. In such situations check_cfg() would reject BPF program
even in privileged mode, despite it might be bounded loop. Next patch
adds one simple program demonstrating such scenario.
To keep things simple, instead of trying to detect back edges in
privileged mode, just assume every back edge is valid and let subsequent
BPF verification prove or reject bounded loops.
Note a few test changes. For unknown reason, we have a few tests that
are specified to detect a back-edge in a privileged mode, but looking at
their code it seems like the right outcome is passing check_cfg() and
letting subsequent verification to make a decision about bounded or not
bounded looping.
Bounded recursion case is also interesting. The example should pass, as
recursion is limited to just a few levels and so we never reach maximum
number of nested frames and never exhaust maximum stack depth. But the
way that max stack depth logic works today it falsely detects this as
exceeding max nested frame count. This patch series doesn't attempt to
fix this orthogonal problem, so we just adjust expected verifier failure.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 2589726d12a1 ("bpf: introduce bounded loops")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231110061412.2995786-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a dedicated selftests to try to set up conditions to have a state
with same first and last instruction index, but it actually is a loop
3->4->1->2->3. This confuses mark_chain_precision() if verifier doesn't
take into account jump history.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231110002638.4168352-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
ldimm64 instructions are 16-byte long, and so have to be handled
appropriately in check_cfg(), just like the rest of BPF verifier does.
This has implications in three places:
- when determining next instruction for non-jump instructions;
- when determining next instruction for callback address ldimm64
instructions (in visit_func_call_insn());
- when checking for unreachable instructions, where second half of
ldimm64 is expected to be unreachable;
We take this also as an opportunity to report jump into the middle of
ldimm64. And adjust few test_verifier tests accordingly.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Fixes: 475fb78fbf48 ("bpf: verifier (add branch/goto checks)")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231110002638.4168352-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Crossbuilding selftests/bpf for architecture arm64, format specifies
type error show up like.
xskxceiver.c:912:34: error: format specifies type 'int' but the argument
has type '__u64' (aka 'unsigned long long') [-Werror,-Wformat]
ksft_print_msg("[%s] expected meta_count [%d], got meta_count [%d]\n",
~~
%llu
__func__, pkt->pkt_nb, meta->count);
^~~~~~~~~~~
xskxceiver.c:929:55: error: format specifies type 'unsigned long long' but
the argument has type 'u64' (aka 'unsigned long') [-Werror,-Wformat]
ksft_print_msg("Frag invalid addr: %llx len: %u\n", addr, len);
~~~~ ^~~~
Fixing the issues by casting to (unsigned long long) and changing the
specifiers to be %llu from %d and %u, since with u64s it might be %llx
or %lx, depending on architecture.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20231109174328.1774571-1-anders.roxell@linaro.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Martin and Vadim reported a verifier failure with bpf_dynptr usage.
The issue is mentioned but Vadim workarounded the issue with source
change ([1]). The below describes what is the issue and why there
is a verification failure.
int BPF_PROG(skb_crypto_setup) {
struct bpf_dynptr algo, key;
...
bpf_dynptr_from_mem(..., ..., 0, &algo);
...
}
The bpf program is using vmlinux.h, so we have the following definition in
vmlinux.h:
struct bpf_dynptr {
long: 64;
long: 64;
};
Note that in uapi header bpf.h, we have
struct bpf_dynptr {
long: 64;
long: 64;
} __attribute__((aligned(8)));
So we lost alignment information for struct bpf_dynptr by using vmlinux.h.
Let us take a look at a simple program below:
$ cat align.c
typedef unsigned long long __u64;
struct bpf_dynptr_no_align {
__u64 :64;
__u64 :64;
};
struct bpf_dynptr_yes_align {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
void bar(void *, void *);
int foo() {
struct bpf_dynptr_no_align a;
struct bpf_dynptr_yes_align b;
bar(&a, &b);
return 0;
}
$ clang --target=bpf -O2 -S -emit-llvm align.c
Look at the generated IR file align.ll:
...
%a = alloca %struct.bpf_dynptr_no_align, align 1
%b = alloca %struct.bpf_dynptr_yes_align, align 8
...
The compiler dictates the alignment for struct bpf_dynptr_no_align is 1 and
the alignment for struct bpf_dynptr_yes_align is 8. So theoretically compiler
could allocate variable %a with alignment 1 although in reallity the compiler
may choose a different alignment by considering other local variables.
In [1], the verification failure happens because variable 'algo' is allocated
on the stack with alignment 4 (fp-28). But the verifer wants its alignment
to be 8.
To fix the issue, the RFC patch ([1]) tried to add '__attribute__((aligned(8)))'
to struct bpf_dynptr plus other similar structs. Andrii suggested that
we could directly modify uapi struct with named fields like struct 'bpf_iter_num':
struct bpf_iter_num {
/* opaque iterator state; having __u64 here allows to preserve correct
* alignment requirements in vmlinux.h, generated from BTF
*/
__u64 __opaque[1];
} __attribute__((aligned(8)));
Indeed, adding named fields for those affected structs in this patch can preserve
alignment when bpf program references them in vmlinux.h. With this patch,
the verification failure in [1] can also be resolved.
[1] https://lore.kernel.org/bpf/1b100f73-7625-4c1f-3ae5-50ecf84d3ff0@linux.dev/
[2] https://lore.kernel.org/bpf/20231103055218.2395034-1-yonghong.song@linux.dev/
Cc: Vadim Fedorenko <vadfed@meta.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231104024900.1539182-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch demonstrates that verifier changes earlier in this series
result in bpf_refcount_acquire(mapval->stashed_kptr) passing
verification. The added test additionally validates that stashing a kptr
in mapval and - in a separate BPF program - refcount_acquiring the kptr
without unstashing works as expected at runtime.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20231107085639.3016113-7-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test added in this patch exercises the logic fixed in the previous
patch in this series. Before the previous patch's changes,
bpf_refcount_acquire accepts MAYBE_NULL local kptrs; after the change
the verifier correctly rejects the such a call.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20231107085639.3016113-3-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add ability to filter top B results, both in replay/verifier mode and
comparison mode. Just adding `-n10` will emit only first 10 rows, or
less, if there is not enough rows.
This is not just a shortcut instead of passing veristat output through
`head`, though. Filtering out all the other rows influences final table
formatting, as table column widths are calculated based on actual
emitted test.
To demonstrate the difference, compare two "equivalent" forms below, one
using head and another using -n argument.
TOP N FEATURE
=============
[vmuser@archvm bpf]$ sudo ./veristat -C ~/baseline-results-selftests.csv ~/sanity2-results-selftests.csv -e file,prog,insns,states -s '|insns_diff|' -n10
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
---------------------------------------- --------------------- --------- --------- ------------ ---------- ---------- -------------
test_seg6_loop.bpf.linked3.o __add_egr_x 12440 12360 -80 (-0.64%) 364 357 -7 (-1.92%)
async_stack_depth.bpf.linked3.o async_call_root_check 145 145 +0 (+0.00%) 3 3 +0 (+0.00%)
async_stack_depth.bpf.linked3.o pseudo_call_check 139 139 +0 (+0.00%) 3 3 +0 (+0.00%)
atomic_bounds.bpf.linked3.o sub 7 7 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o kmalloc 5 5 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o sched_process_fork 22 22 +0 (+0.00%) 2 2 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o socket_post_create 23 23 +0 (+0.00%) 2 2 +0 (+0.00%)
bind4_prog.bpf.linked3.o bind_v4_prog 358 358 +0 (+0.00%) 33 33 +0 (+0.00%)
bind6_prog.bpf.linked3.o bind_v6_prog 429 429 +0 (+0.00%) 37 37 +0 (+0.00%)
bind_perm.bpf.linked3.o bind_v4_prog 15 15 +0 (+0.00%) 1 1 +0 (+0.00%)
PIPING TO HEAD
==============
[vmuser@archvm bpf]$ sudo ./veristat -C ~/baseline-results-selftests.csv ~/sanity2-results-selftests.csv -e file,prog,insns,states -s '|insns_diff|' | head -n12
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
----------------------------------------------------- ---------------------------------------------------- --------- --------- ------------ ---------- ---------- -------------
test_seg6_loop.bpf.linked3.o __add_egr_x 12440 12360 -80 (-0.64%) 364 357 -7 (-1.92%)
async_stack_depth.bpf.linked3.o async_call_root_check 145 145 +0 (+0.00%) 3 3 +0 (+0.00%)
async_stack_depth.bpf.linked3.o pseudo_call_check 139 139 +0 (+0.00%) 3 3 +0 (+0.00%)
atomic_bounds.bpf.linked3.o sub 7 7 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o kmalloc 5 5 +0 (+0.00%) 0 0 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o sched_process_fork 22 22 +0 (+0.00%) 2 2 +0 (+0.00%)
bench_local_storage_create.bpf.linked3.o socket_post_create 23 23 +0 (+0.00%) 2 2 +0 (+0.00%)
bind4_prog.bpf.linked3.o bind_v4_prog 358 358 +0 (+0.00%) 33 33 +0 (+0.00%)
bind6_prog.bpf.linked3.o bind_v6_prog 429 429 +0 (+0.00%) 37 37 +0 (+0.00%)
bind_perm.bpf.linked3.o bind_v4_prog 15 15 +0 (+0.00%) 1 1 +0 (+0.00%)
Note all the wasted whitespace in the "PIPING TO HEAD" variant.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231108051430.1830950-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add ability to sort results by absolute values of specified stats. This
is especially useful to find biggest deviations in comparison mode. When
comparing verifier change effect against a large base of BPF object
files, it's necessary to see big changes both in positive and negative
directions, as both might be a signal for regressions or bugs.
The syntax is natural, e.g., adding `-s '|insns_diff|'^` will instruct
veristat to sort by absolute value of instructions difference in
ascending order.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231108051430.1830950-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|