Age | Commit message (Collapse) | Author |
|
will not crash on any platforms
With sampling, --intr-regs option is used for capturing
interrupt regs. When --intr-regs option is used, PMU code
uses is_sier_available() function which uses PMU flags in
the code. In environment where platform specific PMU is
not registered, PMU flags is not defined. A fix was added
in kernel to address crash while accessing is_sier_available()
function when pmu is not set. commit f75e7d73bdf7 ("powerpc/perf:
Fix crash with is_sier_available when pmu is not set").
Add perf sampling test to exercise this code and make sure
enabling intr_regs shouldn't crash in any platform. Testcase
uses software event cycles since software event will work even
in cases without PMU.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-12-atrajeev@linux.vnet.ibm.com
|
|
not crash on any platforms
While enabling branch stack for an event, BHRB (Branch History
Rolling Buffer) filter is set using bhrb_filter_map() callback.
This callback is not defined for cases like generic_compat_pmu
or in case where there is no PMU registered. A fix was added
in kernel to address a crash issue observed while enabling branch
stack for environments which doesn't have this callback.
commit b460b512417a ("powerpc/perf: Fix crashes with
generic_compat_pmu & BHRB").
Add perf sampling test to exercise this code path and make
sure enabling branch stack shouldn't crash in any platform.
Testcase uses software event cycles since software event is
available and can be used even in cases without PMU.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-11-atrajeev@linux.vnet.ibm.com
|
|
array size/PVR
The platform check for selftest support "check_pvr_for_sampling_tests"
is specific to sampling tests which includes PVR check, presence of
PMU and extended regs support. Extended regs support is needed for
sampling tests which tests whether PMU registers are programmed
correctly. There could be other sampling tests which may not need
extended regs, example, bhrb filter tests which only needs validity
check via event open.
Hence refactor the platform check to have a common function
"platform_check_for_tests" that checks only for PVR check
and presence of PMU. The existing function
"check_pvr_for_sampling_tests" will invoke the common function
and also will include checks for extended regs specific for
sampling. The common function can also be used by tests other
than sampling like event code tests.
Add macro to find array size ("ARRAY_SIZE") to sampling
tests "misc.h" file. This can be used in next tests to
find event array size. Also update "include/reg.h" to
add macros to find minor and major version from PVR which
will be used in testcases.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-10-atrajeev@linux.vnet.ibm.com
|
|
Libbpf 1.0 stops support legacy-style BPF map definitions. Selftests has
been migrated away from using legacy BPF map definitions except for two
selftests, to make sure that legacy functionality still worked in
pre-1.0 libbpf. Now it's time to let those tests go as libbpf 1.0 is
imminent.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-14-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Remove deprecated xsk APIs from libbpf. But given we have selftests
relying on this, move those files (with minimal adjustments to make them
compilable) under selftests/bpf.
We also remove all the removed APIs from libbpf.map, while overall
keeping version inheritance chain, as most APIs are backwards
compatible so there is no need to reassign them as LIBBPF_1.0.0 versions.
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220627211527.2245459-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add error messages when the module test-drm_mm is not found or could
not be removed to make tests output more readable.
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Due to CreatePrimary commands which need to create RSA keys of
increasing size, the timeout value need to be raised, as well.
Default is 45s.
Fixed git am white space warns:
Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Johannes Holland <johannes.holland@infineon.com>
Signed-off-by: Stefan Mahnke-Hartmann <stefan.mahnke-hartmann@infineon.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The testcase uses "instructions" event to generate the
samples and fetch Monitor Mode Control Register A (MMCRA)
when overflow. Branch History Rolling Buffer(bhrb) disable bit
is part of MMCRA which need to be verified by perf interface.
Testcase checks if the bhrb disable bit of MMCRA register is
programmed correctly via perf interface for ISA v3.1 platform
Also make get_mmcra_ifm return type as u64.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-9-atrajeev@linux.vnet.ibm.com
|
|
conditional branch type
The testcase uses "instructions" event to check if the
Instruction filtering mode(IFM) bits are programmed correctly
for conditional branch type. Testcase checks if IFM bits is
programmed correctly to Monitor Mode Control Register A (MMCRA)
via perf interface for ISA v3.1 platform.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-8-atrajeev@linux.vnet.ibm.com
|
|
type
The testcase uses "instructions" event to check if the
Instruction filtering mode(IFM) bits are programmed correctly
for type any branch. Testcase checks if IFM bits is
programmed correctly to Monitor Mode Control Register A (MMCRA)
via perf interface.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-7-atrajeev@linux.vnet.ibm.com
|
|
call type
The testcase uses "instructions" event to check if the
Instruction filtering mode(IFM) bits are programmed correctly
for indirect branch type. Testcase checks if IFM bits are
programmed correctly to Monitor Mode Control Register A (MMCRA)
via perf interface for ISA v3.1 platform.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-6-atrajeev@linux.vnet.ibm.com
|
|
Add support for sample type as PERF_SAMPLE_BRANCH_STACK in sampling
tests. This change is a precursor/helper for sampling testcases, that
test branck stack feature in perf interface.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-5-atrajeev@linux.vnet.ibm.com
|
|
The testcase uses event code 0x35340401e0 for load only sampling, to
verify the settings of thresh compare field in Monitor Mode Control
Register A (MMCRA: 9-18 bits for power9 and MMCRA: 8-18 bits for
power10). Testcase checks if the thresh compare field is programmed
correctly via perf interface to MMCRA register.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-4-atrajeev@linux.vnet.ibm.com
|
|
auxv to detect platform.
The /proc/self/auxv contains information about "platform" on any system.
Also "base platform" which is an indication about platform string
corresponding to the real PVR. When systems are booted in compat mode,
say, power10 booted in power9 mode, "platform" will point to power9
whereas base platform will point to power10. Incase, if the distro
doesn't support platform indicated by real PVR, base platform will have
a default value.
The mismatch of platform/base platform is an indication of system booted
in compat mode. In such cases, distro will have a Generic Compat
registered which supports basic features for performance monitoring.
Some of the selftest needs to be handled differently ( ex: generic
events, alternative events, bhrb filter map) in Generic Compat PMU.
Hence selftest framework needs utility functions to identify such cases.
One way is make sure of auxv information. Below condition can be used to
detect if Generic Compat PMU is registered. ie:
if ((AT_PLATFORM != AT_BASE_PLATFORM) && (AT_BASE_PLATFORM != PVR))
this indicates Generic Compat PMU.
Add utility function in "include/utils.h" to return:
AT_PLATFORM and AT_BASE_PLATFORM from auxv. Also update misc.c in
"sampling_tests" folder to add function to use above check to determine
presence of generic compat pmu.
In other architecture ( like x86 ), pmu_name is exposed via
"/sys/bus/event_source/devices/cpu/caps". The same could be used in
powerpc in future. Since currently we don't have the "caps" support in
powerpc, patch uses auxv information to detect platform type and compat
mode. But as placeholder utility function is added considering
possiblity of getting "caps" information via sysfs. If that doesn't
exist, fallback to using auxv information.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-3-atrajeev@linux.vnet.ibm.com
|
|
field
In power10, threshold compare field is not part of the raw event code
and provided via event attribute config1. Hence add the mask and shift
bits based on event attribute config1, to extract the threshold compare
value for power10
Also add a new function called get_thresh_cmp_val to compute and return
the threshold compare field for a given platform, since incase of
power10, threshold compare value provided is decimal.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610134113.62991-2-atrajeev@linux.vnet.ibm.com
|
|
When checking for libc rseq support in the library constructor, don't
only depend on the symbols presence, check that the registration was
completed.
This targets a scenario where the libc has rseq support but it is not
wired for the current architecture in 'bits/rseq.h', we want to fallback
to our internal registration mechanism.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20220614154830.1367382-4-mjeanson@efficios.com
|
|
This header is also used in librseq where it can be included in C++
code, add a space between literals and string macros.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20220614154830.1367382-3-mjeanson@efficios.com
|
|
Make the RISC-V rseq selftests compatible with glibc-2.35 by using the
rseq_get_abi() helper.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20220614154830.1367382-2-mjeanson@efficios.com
|
|
Add tdc test cases to verify new flush behaviour is correct, which do
the following:
- Try to flush only one action which is being referenced by a filter
- Try to flush three actions where the last one (index 3) is being
referenced by a filter
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kernel is in lockdown mode when secureboot is enabled and hence
debugfs cannot be used. Add support for this and other general cases
where debugfs cannot be read and communicate the same to the user before
running tests.
Signed-off-by: Gautam <gautammenghani201@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Add a colon in the "Optional" test usage message to ensure consistency
with the "Default" test usage message.
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
In the install section of the main Makefile of kselftests, the echo
command is used with -n flag, which disables the printing of new line
due to which the output contains "\n" chars as follows:
Emit Tests for alsa\nSkipping non-existent dir: arm64
Emit Tests for breakpoints\nEmit Tests for capabilities\n
This patch fixes the above bug by using the -e flag.
Signed-off-by: Gautam <gautammenghani201@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Delete the redundant word 'in'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
|
|
|
|
Older machines don't have the firmware feature that enables the code
this test is testing. Skip the test if the sysfs directory doesn't
exist. Also use the FAIL_IF() macro to provide more verbose error
reporting if an error is encountered.
Fixes: 57201d657eb7 ("selftest/powerpc: Add PAPR sysfs attributes sniff test")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220619233103.2666171-1-mpe@ellerman.id.au
|
|
when we modfying kernel, commit it to our environment building. we find a error
that is "tools/testing/selftests/tc-testing/plugins" failed: No such file or directory"
we find plugins directory is ignored in
"tools/testing/selftests/tc-testing/.gitignore", but the plugins directory
is need in "tools/testing/selftests/tc-testing/Makefile"
Signed-off-by: liujing <liujing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20220622121237.5832-1-liujing@cmss.chinamobile.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix a regression with pKVM when kmemleak is enabled
- Add Oliver Upton as an official KVM/arm64 reviewer
selftests:
- deal with compiler optimizations around hypervisor exits
x86:
- MAINTAINERS reorganization
- Two SEV fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV: Init target VMCBs in sev_migrate_from
KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()
MAINTAINERS: Reorganize KVM/x86 maintainership
selftests: KVM: Handle compiler optimizations in ucall
KVM: arm64: Add Oliver as a reviewer
KVM: arm64: Prevent kmemleak from accessing pKVM memory
tools/kvm_stat: fix display of error when multiple processes are found
|
|
Cover the scenario when we cannot insert a socket into the sockmap, because
it has it is using ULP. Failed insert should not have any effect on the ULP
state. This is a regression test.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220623091231.417138-1-jakub@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This test verifies that bpf_loop() inlining works as expected when
address of `env->prog` is updated. This address is updated upon BPF
program reallocation.
Reallocation is handled by bpf_prog_realloc(), which reuses old memory
if page boundary is not crossed. The value of `len` in the test is
chosen to cross this boundary on bpf_loop() patching.
Verify that the use-after-free bug in inline_bpf_loop() reported by
Dan Carpenter is fixed.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220624020613.548108-3-eddyz87@gmail.com
|
|
Hardware would directly write x2APIC ICR register instead of software
emulation in some circumstances, e.g when Intel IPI virtualization is
enabled. This behavior requires normal reserved bits checking to ensure
them input as zero, otherwise it will cause #GP. So we need mask out
those reserved bits from the data written to vICR register.
Remove Delivery Status bit emulation in test case as this flag
is invalid and not needed in x2APIC mode. KVM may ignore clearing
it during interrupt dispatch which will lead to fake test failure.
Opportunistically correct vector number for test sending IPI to
non-existent vCPUs.
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Message-Id: <20220623094511.26066-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This patch add a self test that verifies user space can inject
UnCorrectable No Action required (UCNA) memory errors to the guest.
It also verifies that incorrectly configured MSRs for Corrected
Machine Check Interrupt (CMCI) emulation will result in #GP.
Signed-off-by: Jue Wang <juew@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220610171134.772566-9-juew@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to improve performance across multiple reads of VM stats, cache
the stats metadata in the VM struct.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-11-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add an argument to the NX huge pages test to test disabling the feature
on a VM using the new capability.
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-10-bgardon@google.com>
[Handle failure of sudo or setcap more gracefully. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There's currently no test coverage of NX hugepages in KVM selftests, so
add a basic test to ensure that the feature works as intended.
The test creates a VM with a data slot backed with huge pages. The
memory in the data slot is filled with op-codes for the return
instruction. The guest then executes a series of accesses on the memory,
some reads, some instruction fetches. After each operation, the guest
exits and the test performs some checks on the backing page counts to
ensure that NX page splitting an reclaim work as expected.
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-7-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the code to read the binary stats data to the KVM selftests
library. It will be re-used by other tests to check KVM behavior.
Also opportunistically remove an unnecessary calculation with
"size_data" in stats_test.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-6-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fix a variety of code style violations and/or inconsistencies in the
binary stats test. The 80 char limit is a soft limit and can and should
be ignored/violated if doing so improves the overall code readability.
Specifically, provide consistent indentation and don't split expressions
at arbitrary points just to honor the 80 char limit.
Opportunistically expand/add comments to call out the more subtle aspects
of the code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-5-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the code to read the binary stats descriptors to the KVM selftests
library. It will be re-used by other tests to check KVM behavior.
No functional change intended.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-4-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the code to read the binary stats header to the KVM selftests
library. It will be re-used by other tests to check KVM behavior.
No functional change intended.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-3-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There's no need to allocate dynamic memory for the stats header since
its size is known at compile time.
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit b433a52aa28733e0 ("selftests/kexec: update get_secureboot_mode")
refactored the code that discovers the EFI secure boot mode so it only
depends on either the efivars pseudo filesystem or the efivars sysfs
interface, but never both.
However, the latter version was not implemented correctly, given the
fact that the local 'efi_vars' variable never assumes a value. This
means the fallback has been dead code ever since it was introduced.
So let's drop the fallback altogether. The sysfs interface has been
deprecated for ~10 years now, and is only enabled on x86 to begin with,
so it is time to get rid of it entirely.
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
udpgso_bench.sh has been running its IPv6 TCP test with IPv4 arguments
since its initial conmit. Looks like a typo.
Fixes: 3a687bef148d ("selftests: udp gso benchmark")
Cc: willemb@google.com
Signed-off-by: Dimitris Michailidis <dmichail@fungible.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20220623000234.61774-1-dmichail@fungible.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
test_sock_fields__detach() got called with a null pointer here when one
of the CHECKs or ASSERTs up to the test_sock_fields__open_and_load()
call resulted in a jump to the "done" label.
A skeletons *__detach() is not safe to call with a null pointer, though.
This led to a segfault.
Go the easy route and only call test_sock_fields__destroy() which is
null-pointer safe and includes detaching.
Came across this while looking[1] to introduce the usage of
bpf_tcp_helpers.h (included in progs/test_sock_fields.c) together with
vmlinux.h.
[1] https://lore.kernel.org/bpf/629bc069dd807d7ac646f836e9dca28bbc1108e2.camel@mailbox.tu-berlin.de/
Fixes: 8f50f16ff39d ("selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads")
Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220621070116.307221-1-jthinz@mailbox.tu-berlin.de
|
|
Test whether a TCP CC implemented in BPF providing get_info() is
rejected correctly. get_info() is unsupported in a BPF CC. The check for
required functions in a BPF CC has moved, this test ensures unsupported
functions are still rejected correctly.
Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220622191227.898118-6-jthinz@mailbox.tu-berlin.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Test whether a TCP CC implemented in BPF providing neither cong_avoid()
nor cong_control() is correctly rejected. This check solely depends on
tcp_register_congestion_control() now, which is invoked during
bpf_map__attach_struct_ops().
Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Link: https://lore.kernel.org/r/20220622191227.898118-5-jthinz@mailbox.tu-berlin.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Test whether a TCP CC implemented in BPF is allowed to write
sk_pacing_rate and sk_pacing_status in struct sock. This is needed when
cong_control() is implemented and used.
Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Link: https://lore.kernel.org/r/20220622191227.898118-4-jthinz@mailbox.tu-berlin.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The selftests, when built with newer versions of clang, is found
to have over optimized guests' ucall() function, and eliminating
the stores for uc.cmd (perhaps due to no immediate readers). This
resulted in the userspace side always reading a value of '0', and
causing multiple test failures.
As a result, prevent the compiler from optimizing the stores in
ucall() with WRITE_ONCE().
Suggested-by: Ricardo Koller <ricarkol@google.com>
Suggested-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Message-Id: <20220615185706.1099208-1-rananta@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- netfilter: cttimeout: fix slab-out-of-bounds read in
cttimeout_net_exit
Current release - new code bugs:
- bpf: ftrace: keep address offset in ftrace_lookup_symbols
- bpf: force cookies array to follow symbols sorting
Previous releases - regressions:
- ipv4: ping: fix bind address validity check
- tipc: fix use-after-free read in tipc_named_reinit
- eth: veth: add updating of trans_start
Previous releases - always broken:
- sock: redo the psock vs ULP protection check
- netfilter: nf_dup_netdev: fix skb_under_panic
- bpf: fix request_sock leak in sk lookup helpers
- eth: igb: fix a use-after-free issue in igb_clean_tx_ring
- eth: ice: prohibit improper channel config for DCB
- eth: at803x: fix null pointer dereference on AR9331 phy
- eth: virtio_net: fix xdp_rxq_info bug after suspend/resume
Misc:
- eth: hinic: replace memcpy() with direct assignment"
* tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: openvswitch: fix parsing of nw_proto for IPv6 fragments
sock: redo the psock vs ULP protection check
Revert "net/tls: fix tls_sk_proto_close executed repeatedly"
virtio_net: fix xdp_rxq_info bug after suspend/resume
igb: Make DMA faster when CPU is active on the PCIe link
net: dsa: qca8k: reduce mgmt ethernet timeout
net: dsa: qca8k: reset cpu port on MTU change
MAINTAINERS: Add a maintainer for OCP Time Card
hinic: Replace memcpy() with direct assignment
Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c"
net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode
ice: ethtool: Prohibit improper channel config for DCB
ice: ethtool: advertise 1000M speeds properly
ice: Fix switchdev rules book keeping
ice: ignore protocol field in GTP offload
netfilter: nf_dup_netdev: add and use recursion counter
netfilter: nf_dup_netdev: do not push mac header a second time
selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh
net/tls: fix tls_sk_proto_close executed repeatedly
erspan: do not assume transport header is always set
...
|
|
Add a benchmarks to demonstrate the performance cliff for local_storage
get as the number of local_storage maps increases beyond current
local_storage implementation's cache size.
"sequential get" and "interleaved get" benchmarks are added, both of
which do many bpf_task_storage_get calls on sets of task local_storage
maps of various counts, while considering a single specific map to be
'important' and counting task_storage_gets to the important map
separately in addition to normal 'hits' count of all gets. Goal here is
to mimic scenario where a particular program using one map - the
important one - is running on a system where many other local_storage
maps exist and are accessed often.
While "sequential get" benchmark does bpf_task_storage_get for map 0, 1,
..., {9, 99, 999} in order, "interleaved" benchmark interleaves 4
bpf_task_storage_gets for the important map for every 10 map gets. This
is meant to highlight performance differences when important map is
accessed far more frequently than non-important maps.
A "hashmap control" benchmark is also included for easy comparison of
standard bpf hashmap lookup vs local_storage get. The benchmark is
similar to "sequential get", but creates and uses BPF_MAP_TYPE_HASH
instead of local storage. Only one inner map is created - a hashmap
meant to hold tid -> data mapping for all tasks. Size of the hashmap is
hardcoded to my system's PID_MAX_LIMIT (4,194,304). The number of these
keys which are actually fetched as part of the benchmark is
configurable.
Addition of this benchmark is inspired by conversation with Alexei in a
previous patchset's thread [0], which highlighted the need for such a
benchmark to motivate and validate improvements to local_storage
implementation. My approach in that series focused on improving
performance for explicitly-marked 'important' maps and was rejected
with feedback to make more generally-applicable improvements while
avoiding explicitly marking maps as important. Thus the benchmark
reports both general and important-map-focused metrics, so effect of
future work on both is clear.
Regarding the benchmark results. On a powerful system (Skylake, 20
cores, 256gb ram):
Hashmap Control
===============
num keys: 10
hashmap (control) sequential get: hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s
num keys: 1000
hashmap (control) sequential get: hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s
num keys: 10000
hashmap (control) sequential get: hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s
num keys: 100000
hashmap (control) sequential get: hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s
num keys: 4194304
hashmap (control) sequential get: hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s
Local Storage
=============
num_maps: 1
local_storage cache sequential get: hits throughput: 47.298 ± 0.180 M ops/s, hits latency: 21.142 ns/op, important_hits throughput: 47.298 ± 0.180 M ops/s
local_storage cache interleaved get: hits throughput: 55.277 ± 0.888 M ops/s, hits latency: 18.091 ns/op, important_hits throughput: 55.277 ± 0.888 M ops/s
num_maps: 10
local_storage cache sequential get: hits throughput: 40.240 ± 0.802 M ops/s, hits latency: 24.851 ns/op, important_hits throughput: 4.024 ± 0.080 M ops/s
local_storage cache interleaved get: hits throughput: 48.701 ± 0.722 M ops/s, hits latency: 20.533 ns/op, important_hits throughput: 17.393 ± 0.258 M ops/s
num_maps: 16
local_storage cache sequential get: hits throughput: 44.515 ± 0.708 M ops/s, hits latency: 22.464 ns/op, important_hits throughput: 2.782 ± 0.044 M ops/s
local_storage cache interleaved get: hits throughput: 49.553 ± 2.260 M ops/s, hits latency: 20.181 ns/op, important_hits throughput: 15.767 ± 0.719 M ops/s
num_maps: 17
local_storage cache sequential get: hits throughput: 38.778 ± 0.302 M ops/s, hits latency: 25.788 ns/op, important_hits throughput: 2.284 ± 0.018 M ops/s
local_storage cache interleaved get: hits throughput: 43.848 ± 1.023 M ops/s, hits latency: 22.806 ns/op, important_hits throughput: 13.349 ± 0.311 M ops/s
num_maps: 24
local_storage cache sequential get: hits throughput: 19.317 ± 0.568 M ops/s, hits latency: 51.769 ns/op, important_hits throughput: 0.806 ± 0.024 M ops/s
local_storage cache interleaved get: hits throughput: 24.397 ± 0.272 M ops/s, hits latency: 40.989 ns/op, important_hits throughput: 6.863 ± 0.077 M ops/s
num_maps: 32
local_storage cache sequential get: hits throughput: 13.333 ± 0.135 M ops/s, hits latency: 75.000 ns/op, important_hits throughput: 0.417 ± 0.004 M ops/s
local_storage cache interleaved get: hits throughput: 16.898 ± 0.383 M ops/s, hits latency: 59.178 ns/op, important_hits throughput: 4.717 ± 0.107 M ops/s
num_maps: 100
local_storage cache sequential get: hits throughput: 6.360 ± 0.107 M ops/s, hits latency: 157.233 ns/op, important_hits throughput: 0.064 ± 0.001 M ops/s
local_storage cache interleaved get: hits throughput: 7.303 ± 0.362 M ops/s, hits latency: 136.930 ns/op, important_hits throughput: 1.907 ± 0.094 M ops/s
num_maps: 1000
local_storage cache sequential get: hits throughput: 0.452 ± 0.010 M ops/s, hits latency: 2214.022 ns/op, important_hits throughput: 0.000 ± 0.000 M ops/s
local_storage cache interleaved get: hits throughput: 0.542 ± 0.007 M ops/s, hits latency: 1843.341 ns/op, important_hits throughput: 0.136 ± 0.002 M ops/s
Looking at the "sequential get" results, it's clear that as the
number of task local_storage maps grows beyond the current cache size
(16), there's a significant reduction in hits throughput. Note that
current local_storage implementation assigns a cache_idx to maps as they
are created. Since "sequential get" is creating maps 0..n in order and
then doing bpf_task_storage_get calls in the same order, the benchmark
is effectively ensuring that a map will not be in cache when the program
tries to access it.
For "interleaved get" results, important-map hits throughput is greatly
increased as the important map is more likely to be in cache by virtue
of being accessed far more frequently. Throughput still reduces as #
maps increases, though.
To get a sense of the overhead of the benchmark program, I
commented out bpf_task_storage_get/bpf_map_lookup_elem in
local_storage_bench.c and ran the benchmark on the same host as the
'real' run. Results:
Hashmap Control
===============
num keys: 10
hashmap (control) sequential get: hits throughput: 54.288 ± 0.655 M ops/s, hits latency: 18.420 ns/op, important_hits throughput: 54.288 ± 0.655 M ops/s
num keys: 1000
hashmap (control) sequential get: hits throughput: 52.913 ± 0.519 M ops/s, hits latency: 18.899 ns/op, important_hits throughput: 52.913 ± 0.519 M ops/s
num keys: 10000
hashmap (control) sequential get: hits throughput: 53.480 ± 1.235 M ops/s, hits latency: 18.699 ns/op, important_hits throughput: 53.480 ± 1.235 M ops/s
num keys: 100000
hashmap (control) sequential get: hits throughput: 54.982 ± 1.902 M ops/s, hits latency: 18.188 ns/op, important_hits throughput: 54.982 ± 1.902 M ops/s
num keys: 4194304
hashmap (control) sequential get: hits throughput: 50.858 ± 0.707 M ops/s, hits latency: 19.662 ns/op, important_hits throughput: 50.858 ± 0.707 M ops/s
Local Storage
=============
num_maps: 1
local_storage cache sequential get: hits throughput: 110.990 ± 4.828 M ops/s, hits latency: 9.010 ns/op, important_hits throughput: 110.990 ± 4.828 M ops/s
local_storage cache interleaved get: hits throughput: 161.057 ± 4.090 M ops/s, hits latency: 6.209 ns/op, important_hits throughput: 161.057 ± 4.090 M ops/s
num_maps: 10
local_storage cache sequential get: hits throughput: 112.930 ± 1.079 M ops/s, hits latency: 8.855 ns/op, important_hits throughput: 11.293 ± 0.108 M ops/s
local_storage cache interleaved get: hits throughput: 115.841 ± 2.088 M ops/s, hits latency: 8.633 ns/op, important_hits throughput: 41.372 ± 0.746 M ops/s
num_maps: 16
local_storage cache sequential get: hits throughput: 115.653 ± 0.416 M ops/s, hits latency: 8.647 ns/op, important_hits throughput: 7.228 ± 0.026 M ops/s
local_storage cache interleaved get: hits throughput: 138.717 ± 1.649 M ops/s, hits latency: 7.209 ns/op, important_hits throughput: 44.137 ± 0.525 M ops/s
num_maps: 17
local_storage cache sequential get: hits throughput: 112.020 ± 1.649 M ops/s, hits latency: 8.927 ns/op, important_hits throughput: 6.598 ± 0.097 M ops/s
local_storage cache interleaved get: hits throughput: 128.089 ± 1.960 M ops/s, hits latency: 7.807 ns/op, important_hits throughput: 38.995 ± 0.597 M ops/s
num_maps: 24
local_storage cache sequential get: hits throughput: 92.447 ± 5.170 M ops/s, hits latency: 10.817 ns/op, important_hits throughput: 3.855 ± 0.216 M ops/s
local_storage cache interleaved get: hits throughput: 128.844 ± 2.808 M ops/s, hits latency: 7.761 ns/op, important_hits throughput: 36.245 ± 0.790 M ops/s
num_maps: 32
local_storage cache sequential get: hits throughput: 102.042 ± 1.462 M ops/s, hits latency: 9.800 ns/op, important_hits throughput: 3.194 ± 0.046 M ops/s
local_storage cache interleaved get: hits throughput: 126.577 ± 1.818 M ops/s, hits latency: 7.900 ns/op, important_hits throughput: 35.332 ± 0.507 M ops/s
num_maps: 100
local_storage cache sequential get: hits throughput: 111.327 ± 1.401 M ops/s, hits latency: 8.983 ns/op, important_hits throughput: 1.113 ± 0.014 M ops/s
local_storage cache interleaved get: hits throughput: 131.327 ± 1.339 M ops/s, hits latency: 7.615 ns/op, important_hits throughput: 34.302 ± 0.350 M ops/s
num_maps: 1000
local_storage cache sequential get: hits throughput: 101.978 ± 0.563 M ops/s, hits latency: 9.806 ns/op, important_hits throughput: 0.102 ± 0.001 M ops/s
local_storage cache interleaved get: hits throughput: 141.084 ± 1.098 M ops/s, hits latency: 7.088 ns/op, important_hits throughput: 35.430 ± 0.276 M ops/s
Adjusting for overhead, latency numbers for "hashmap control" and
"sequential get" are:
hashmap_control_1k: ~53.8ns
hashmap_control_10k: ~124.2ns
hashmap_control_100k: ~206.5ns
sequential_get_1: ~12.1ns
sequential_get_10: ~16.0ns
sequential_get_16: ~13.8ns
sequential_get_17: ~16.8ns
sequential_get_24: ~40.9ns
sequential_get_32: ~65.2ns
sequential_get_100: ~148.2ns
sequential_get_1000: ~2204ns
Clearly demonstrating a cliff.
In the discussion for v1 of this patch, Alexei noted that local_storage
was 2.5x faster than a large hashmap when initially implemented [1]. The
benchmark results show that local_storage is 5-10x faster: a
long-running BPF application putting some pid-specific info into a
hashmap for each pid it sees will probably see on the order of 10-100k
pids. Bench numbers for hashmaps of this size are ~10x slower than
sequential_get_16, but as the number of local_storage maps grows far
past local_storage cache size the performance advantage shrinks and
eventually reverses.
When running the benchmarks it may be necessary to bump 'open files'
ulimit for a successful run.
[0]: https://lore.kernel.org/all/20220420002143.1096548-1-davemarchevsky@fb.com
[1]: https://lore.kernel.org/bpf/20220511173305.ftldpn23m4ski3d3@MBP-98dd607d3435.dhcp.thefacebook.com/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20220620222554.270578-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|