linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-06-28	selftests/powerpc/pmu: Add interface test for mmcra_thresh_cmp fields	Kajol Jain
	The testcase uses event code 0x35340401e0 for load only sampling, to verify the settings of thresh compare field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9 and MMCRA: 8-18 bits for power10). Testcase checks if the thresh compare field is programmed correctly via perf interface to MMCRA register. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610134113.62991-4-atrajeev@linux.vnet.ibm.com
2022-06-28	selftests/powerpc: Add support to fetch "platform" and "base platform" from ↵	Athira Rajeev
	auxv to detect platform. The /proc/self/auxv contains information about "platform" on any system. Also "base platform" which is an indication about platform string corresponding to the real PVR. When systems are booted in compat mode, say, power10 booted in power9 mode, "platform" will point to power9 whereas base platform will point to power10. Incase, if the distro doesn't support platform indicated by real PVR, base platform will have a default value. The mismatch of platform/base platform is an indication of system booted in compat mode. In such cases, distro will have a Generic Compat registered which supports basic features for performance monitoring. Some of the selftest needs to be handled differently ( ex: generic events, alternative events, bhrb filter map) in Generic Compat PMU. Hence selftest framework needs utility functions to identify such cases. One way is make sure of auxv information. Below condition can be used to detect if Generic Compat PMU is registered. ie: if ((AT_PLATFORM != AT_BASE_PLATFORM) && (AT_BASE_PLATFORM != PVR)) this indicates Generic Compat PMU. Add utility function in "include/utils.h" to return: AT_PLATFORM and AT_BASE_PLATFORM from auxv. Also update misc.c in "sampling_tests" folder to add function to use above check to determine presence of generic compat pmu. In other architecture ( like x86 ), pmu_name is exposed via "/sys/bus/event_source/devices/cpu/caps". The same could be used in powerpc in future. Since currently we don't have the "caps" support in powerpc, patch uses auxv information to detect platform type and compat mode. But as placeholder utility function is added considering possiblity of getting "caps" information via sysfs. If that doesn't exist, fallback to using auxv information. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610134113.62991-3-atrajeev@linux.vnet.ibm.com
2022-06-28	selftests/powerpc/pmu: Add mask/shift bits for extracting threshold compare ↵	Kajol Jain
	field In power10, threshold compare field is not part of the raw event code and provided via event attribute config1. Hence add the mask and shift bits based on event attribute config1, to extract the threshold compare value for power10 Also add a new function called get_thresh_cmp_val to compute and return the threshold compare field for a given platform, since incase of power10, threshold compare value provided is decimal. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220610134113.62991-2-atrajeev@linux.vnet.ibm.com
2022-06-28	selftests/rseq: check if libc rseq support is registered	Michael Jeanson
	When checking for libc rseq support in the library constructor, don't only depend on the symbols presence, check that the registration was completed. This targets a scenario where the libc has rseq support but it is not wired for the current architecture in 'bits/rseq.h', we want to fallback to our internal registration mechanism. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20220614154830.1367382-4-mjeanson@efficios.com
2022-06-28	selftests/rseq: riscv: fix 'literal-suffix' warning	Michael Jeanson
	This header is also used in librseq where it can be included in C++ code, add a space between literals and string macros. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20220614154830.1367382-3-mjeanson@efficios.com
2022-06-28	selftests/rseq: riscv: use rseq_get_abi() helper	Michael Jeanson
	Make the RISC-V rseq selftests compatible with glibc-2.35 by using the rseq_get_abi() helper. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20220614154830.1367382-2-mjeanson@efficios.com
2022-06-27	selftests: tc-testing: Add testcases to test new flush behaviour	Victor Nogueira
	Add tdc test cases to verify new flush behaviour is correct, which do the following: - Try to flush only one action which is being referenced by a filter - Try to flush three actions where the last one (index 3) is being referenced by a filter Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-27	kselftests/damon: add support for cases where debugfs cannot be read	Gautam
	The kernel is in lockdown mode when secureboot is enabled and hence debugfs cannot be used. Add support for this and other general cases where debugfs cannot be read and communicate the same to the user before running tests. Signed-off-by: Gautam <gautammenghani201@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-27	selftests: Make the usage formatting consistent in kselftest_deps.sh	Gautam Menghani
	Add a colon in the "Optional" test usage message to ensure consistency with the "Default" test usage message. Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-27	kselftests: Enable the echo command to print newlines in Makefile	Gautam
	In the install section of the main Makefile of kselftests, the echo command is used with -n flag, which disables the printing of new line due to which the output contains "\n" chars as follows: Emit Tests for alsa\nSkipping non-existent dir: arm64 Emit Tests for breakpoints\nEmit Tests for capabilities\n This patch fixes the above bug by using the -e flag. Signed-off-by: Gautam <gautammenghani201@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-27	userfaultfd/selftests: Fix typo in comment	Xiang wangx
	Delete the redundant word 'in'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-27	Merge branch 'master' into mm-nonmm-stable	akpm

2022-06-27	Merge branch 'master' into mm-stable	akpm

2022-06-26	selftests/powerpc: Skip energy_scale_info test on older firmware	Michael Ellerman
	Older machines don't have the firmware feature that enables the code this test is testing. Skip the test if the sysfs directory doesn't exist. Also use the FAIL_IF() macro to provide more verbose error reporting if an error is encountered. Fixes: 57201d657eb7 ("selftest/powerpc: Add PAPR sysfs attributes sniff test") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220619233103.2666171-1-mpe@ellerman.id.au
2022-06-24	tc-testing: gitignore, delete plugins directory	liujing
	when we modfying kernel, commit it to our environment building. we find a error that is "tools/testing/selftests/tc-testing/plugins" failed: No such file or directory" we find plugins directory is ignored in "tools/testing/selftests/tc-testing/.gitignore", but the plugins directory is need in "tools/testing/selftests/tc-testing/Makefile" Signed-off-by: liujing <liujing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20220622121237.5832-1-liujing@cmss.chinamobile.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix a regression with pKVM when kmemleak is enabled - Add Oliver Upton as an official KVM/arm64 reviewer selftests: - deal with compiler optimizations around hypervisor exits x86: - MAINTAINERS reorganization - Two SEV fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SEV: Init target VMCBs in sev_migrate_from KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user() MAINTAINERS: Reorganize KVM/x86 maintainership selftests: KVM: Handle compiler optimizations in ucall KVM: arm64: Add Oliver as a reviewer KVM: arm64: Prevent kmemleak from accessing pKVM memory tools/kvm_stat: fix display of error when multiple processes are found
2022-06-24	selftests/bpf: Test sockmap update when socket has ULP	Jakub Sitnicki
	Cover the scenario when we cannot insert a socket into the sockmap, because it has it is using ULP. Failed insert should not have any effect on the ULP state. This is a regression test. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20220623091231.417138-1-jakub@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24	selftest/bpf: Test for use-after-free bug fix in inline_bpf_loop	Eduard Zingerman
	This test verifies that bpf_loop() inlining works as expected when address of `env->prog` is updated. This address is updated upon BPF program reallocation. Reallocation is handled by bpf_prog_realloc(), which reuses old memory if page boundary is not crossed. The value of `len` in the test is chosen to cross this boundary on bpf_loop() patching. Verify that the use-after-free bug in inline_bpf_loop() reported by Dan Carpenter is fixed. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220624020613.548108-3-eddyz87@gmail.com
2022-06-24	KVM: selftests: Enhance handling WRMSR ICR register in x2APIC mode	Zeng Guang
	Hardware would directly write x2APIC ICR register instead of software emulation in some circumstances, e.g when Intel IPI virtualization is enabled. This behavior requires normal reserved bits checking to ensure them input as zero, otherwise it will cause #GP. So we need mask out those reserved bits from the data written to vICR register. Remove Delivery Status bit emulation in test case as this flag is invalid and not needed in x2APIC mode. KVM may ignore clearing it during interrupt dispatch which will lead to fake test failure. Opportunistically correct vector number for test sending IPI to non-existent vCPUs. Signed-off-by: Zeng Guang <guang.zeng@intel.com> Message-Id: <20220623094511.26066-1-guang.zeng@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Add a self test for CMCI and UCNA emulations.	Jue Wang
	This patch add a self test that verifies user space can inject UnCorrectable No Action required (UCNA) memory errors to the guest. It also verifies that incorrectly configured MSRs for Corrected Machine Check Interrupt (CMCI) emulation will result in #GP. Signed-off-by: Jue Wang <juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220610171134.772566-9-juew@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Cache binary stats metadata for duration of test	Ben Gardon
	In order to improve performance across multiple reads of VM stats, cache the stats metadata in the VM struct. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Test disabling NX hugepages on a VM	Ben Gardon
	Add an argument to the NX huge pages test to test disabling the feature on a VM using the new capability. Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-10-bgardon@google.com> [Handle failure of sudo or setcap more gracefully. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Add NX huge pages test	Ben Gardon
	There's currently no test coverage of NX hugepages in KVM selftests, so add a basic test to ensure that the feature works as intended. The test creates a VM with a data slot backed with huge pages. The memory in the data slot is filled with op-codes for the return instruction. The guest then executes a series of accesses on the memory, some reads, some instruction fetches. After each operation, the guest exits and the test performs some checks on the backing page counts to ensure that NX page splitting an reclaim work as expected. Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-7-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Read binary stat data in lib	Ben Gardon
	Move the code to read the binary stats data to the KVM selftests library. It will be re-used by other tests to check KVM behavior. Also opportunistically remove an unnecessary calculation with "size_data" in stats_test. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-6-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Clean up coding style in binary stats test	Sean Christopherson
	Fix a variety of code style violations and/or inconsistencies in the binary stats test. The 80 char limit is a soft limit and can and should be ignored/violated if doing so improves the overall code readability. Specifically, provide consistent indentation and don't split expressions at arbitrary points just to honor the 80 char limit. Opportunistically expand/add comments to call out the more subtle aspects of the code. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-5-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Read binary stats desc in lib	Ben Gardon
	Move the code to read the binary stats descriptors to the KVM selftests library. It will be re-used by other tests to check KVM behavior. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-4-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Read binary stats header in lib	Ben Gardon
	Move the code to read the binary stats header to the KVM selftests library. It will be re-used by other tests to check KVM behavior. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-3-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	KVM: selftests: Remove dynamic memory allocation for stats header	Ben Gardon
	There's no need to allocate dynamic memory for the stats header since its size is known at compile time. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-24	selftests/kexec: remove broken EFI_VARS secure boot fallback check	Ard Biesheuvel
	Commit b433a52aa28733e0 ("selftests/kexec: update get_secureboot_mode") refactored the code that discovers the EFI secure boot mode so it only depends on either the efivars pseudo filesystem or the efivars sysfs interface, but never both. However, the latter version was not implemented correctly, given the fact that the local 'efi_vars' variable never assumes a value. This means the fallback has been dead code ever since it was introduced. So let's drop the fallback altogether. The sysfs interface has been deprecated for ~10 years now, and is only enabled on x86 to begin with, so it is time to get rid of it entirely. Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-23	selftests/net: pass ipv6_args to udpgso_bench's IPv6 TCP test	Dimitris Michailidis
	udpgso_bench.sh has been running its IPv6 TCP test with IPv4 arguments since its initial conmit. Looks like a typo. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Cc: willemb@google.com Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220623000234.61774-1-dmichail@fungible.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-23	selftests/bpf: Fix rare segfault in sock_fields prog test	Jörn-Thorben Hinz
	test_sock_fields__detach() got called with a null pointer here when one of the CHECKs or ASSERTs up to the test_sock_fields__open_and_load() call resulted in a jump to the "done" label. A skeletons *__detach() is not safe to call with a null pointer, though. This led to a segfault. Go the easy route and only call test_sock_fields__destroy() which is null-pointer safe and includes detaching. Came across this while looking[1] to introduce the usage of bpf_tcp_helpers.h (included in progs/test_sock_fields.c) together with vmlinux.h. [1] https://lore.kernel.org/bpf/629bc069dd807d7ac646f836e9dca28bbc1108e2.camel@mailbox.tu-berlin.de/ Fixes: 8f50f16ff39d ("selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads") Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220621070116.307221-1-jthinz@mailbox.tu-berlin.de
2022-06-23	selftests/bpf: Test a BPF CC implementing the unsupported get_info()	Jörn-Thorben Hinz
	Test whether a TCP CC implemented in BPF providing get_info() is rejected correctly. get_info() is unsupported in a BPF CC. The check for required functions in a BPF CC has moved, this test ensures unsupported functions are still rejected correctly. Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220622191227.898118-6-jthinz@mailbox.tu-berlin.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-23	selftests/bpf: Test an incomplete BPF CC	Jörn-Thorben Hinz
	Test whether a TCP CC implemented in BPF providing neither cong_avoid() nor cong_control() is correctly rejected. This check solely depends on tcp_register_congestion_control() now, which is invoked during bpf_map__attach_struct_ops(). Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Link: https://lore.kernel.org/r/20220622191227.898118-5-jthinz@mailbox.tu-berlin.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-23	selftests/bpf: Test a BPF CC writing sk_pacing_*	Jörn-Thorben Hinz
	Test whether a TCP CC implemented in BPF is allowed to write sk_pacing_rate and sk_pacing_status in struct sock. This is needed when cong_control() is implemented and used. Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Link: https://lore.kernel.org/r/20220622191227.898118-4-jthinz@mailbox.tu-berlin.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-23	selftests: KVM: Handle compiler optimizations in ucall	Raghavendra Rao Ananta
	The selftests, when built with newer versions of clang, is found to have over optimized guests' ucall() function, and eliminating the stores for uc.cmd (perhaps due to no immediate readers). This resulted in the userspace side always reading a value of '0', and causing multiple test failures. As a result, prevent the compiler from optimizing the stores in ucall() with WRITE_ONCE(). Suggested-by: Ricardo Koller <ricarkol@google.com> Suggested-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Message-Id: <20220615185706.1099208-1-rananta@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-23	Merge tag 'net-5.19-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exit Current release - new code bugs: - bpf: ftrace: keep address offset in ftrace_lookup_symbols - bpf: force cookies array to follow symbols sorting Previous releases - regressions: - ipv4: ping: fix bind address validity check - tipc: fix use-after-free read in tipc_named_reinit - eth: veth: add updating of trans_start Previous releases - always broken: - sock: redo the psock vs ULP protection check - netfilter: nf_dup_netdev: fix skb_under_panic - bpf: fix request_sock leak in sk lookup helpers - eth: igb: fix a use-after-free issue in igb_clean_tx_ring - eth: ice: prohibit improper channel config for DCB - eth: at803x: fix null pointer dereference on AR9331 phy - eth: virtio_net: fix xdp_rxq_info bug after suspend/resume Misc: - eth: hinic: replace memcpy() with direct assignment" * tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) net: openvswitch: fix parsing of nw_proto for IPv6 fragments sock: redo the psock vs ULP protection check Revert "net/tls: fix tls_sk_proto_close executed repeatedly" virtio_net: fix xdp_rxq_info bug after suspend/resume igb: Make DMA faster when CPU is active on the PCIe link net: dsa: qca8k: reduce mgmt ethernet timeout net: dsa: qca8k: reset cpu port on MTU change MAINTAINERS: Add a maintainer for OCP Time Card hinic: Replace memcpy() with direct assignment Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c" net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode ice: ethtool: Prohibit improper channel config for DCB ice: ethtool: advertise 1000M speeds properly ice: Fix switchdev rules book keeping ice: ignore protocol field in GTP offload netfilter: nf_dup_netdev: add and use recursion counter netfilter: nf_dup_netdev: do not push mac header a second time selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh net/tls: fix tls_sk_proto_close executed repeatedly erspan: do not assume transport header is always set ...
2022-06-22	selftests/bpf: Add benchmark for local_storage get	Dave Marchevsky
	Add a benchmarks to demonstrate the performance cliff for local_storage get as the number of local_storage maps increases beyond current local_storage implementation's cache size. "sequential get" and "interleaved get" benchmarks are added, both of which do many bpf_task_storage_get calls on sets of task local_storage maps of various counts, while considering a single specific map to be 'important' and counting task_storage_gets to the important map separately in addition to normal 'hits' count of all gets. Goal here is to mimic scenario where a particular program using one map - the important one - is running on a system where many other local_storage maps exist and are accessed often. While "sequential get" benchmark does bpf_task_storage_get for map 0, 1, ..., {9, 99, 999} in order, "interleaved" benchmark interleaves 4 bpf_task_storage_gets for the important map for every 10 map gets. This is meant to highlight performance differences when important map is accessed far more frequently than non-important maps. A "hashmap control" benchmark is also included for easy comparison of standard bpf hashmap lookup vs local_storage get. The benchmark is similar to "sequential get", but creates and uses BPF_MAP_TYPE_HASH instead of local storage. Only one inner map is created - a hashmap meant to hold tid -> data mapping for all tasks. Size of the hashmap is hardcoded to my system's PID_MAX_LIMIT (4,194,304). The number of these keys which are actually fetched as part of the benchmark is configurable. Addition of this benchmark is inspired by conversation with Alexei in a previous patchset's thread [0], which highlighted the need for such a benchmark to motivate and validate improvements to local_storage implementation. My approach in that series focused on improving performance for explicitly-marked 'important' maps and was rejected with feedback to make more generally-applicable improvements while avoiding explicitly marking maps as important. Thus the benchmark reports both general and important-map-focused metrics, so effect of future work on both is clear. Regarding the benchmark results. On a powerful system (Skylake, 20 cores, 256gb ram): Hashmap Control =============== num keys: 10 hashmap (control) sequential get: hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s num keys: 1000 hashmap (control) sequential get: hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s num keys: 10000 hashmap (control) sequential get: hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s num keys: 100000 hashmap (control) sequential get: hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s num keys: 4194304 hashmap (control) sequential get: hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s Local Storage ============= num_maps: 1 local_storage cache sequential get: hits throughput: 47.298 ± 0.180 M ops/s, hits latency: 21.142 ns/op, important_hits throughput: 47.298 ± 0.180 M ops/s local_storage cache interleaved get: hits throughput: 55.277 ± 0.888 M ops/s, hits latency: 18.091 ns/op, important_hits throughput: 55.277 ± 0.888 M ops/s num_maps: 10 local_storage cache sequential get: hits throughput: 40.240 ± 0.802 M ops/s, hits latency: 24.851 ns/op, important_hits throughput: 4.024 ± 0.080 M ops/s local_storage cache interleaved get: hits throughput: 48.701 ± 0.722 M ops/s, hits latency: 20.533 ns/op, important_hits throughput: 17.393 ± 0.258 M ops/s num_maps: 16 local_storage cache sequential get: hits throughput: 44.515 ± 0.708 M ops/s, hits latency: 22.464 ns/op, important_hits throughput: 2.782 ± 0.044 M ops/s local_storage cache interleaved get: hits throughput: 49.553 ± 2.260 M ops/s, hits latency: 20.181 ns/op, important_hits throughput: 15.767 ± 0.719 M ops/s num_maps: 17 local_storage cache sequential get: hits throughput: 38.778 ± 0.302 M ops/s, hits latency: 25.788 ns/op, important_hits throughput: 2.284 ± 0.018 M ops/s local_storage cache interleaved get: hits throughput: 43.848 ± 1.023 M ops/s, hits latency: 22.806 ns/op, important_hits throughput: 13.349 ± 0.311 M ops/s num_maps: 24 local_storage cache sequential get: hits throughput: 19.317 ± 0.568 M ops/s, hits latency: 51.769 ns/op, important_hits throughput: 0.806 ± 0.024 M ops/s local_storage cache interleaved get: hits throughput: 24.397 ± 0.272 M ops/s, hits latency: 40.989 ns/op, important_hits throughput: 6.863 ± 0.077 M ops/s num_maps: 32 local_storage cache sequential get: hits throughput: 13.333 ± 0.135 M ops/s, hits latency: 75.000 ns/op, important_hits throughput: 0.417 ± 0.004 M ops/s local_storage cache interleaved get: hits throughput: 16.898 ± 0.383 M ops/s, hits latency: 59.178 ns/op, important_hits throughput: 4.717 ± 0.107 M ops/s num_maps: 100 local_storage cache sequential get: hits throughput: 6.360 ± 0.107 M ops/s, hits latency: 157.233 ns/op, important_hits throughput: 0.064 ± 0.001 M ops/s local_storage cache interleaved get: hits throughput: 7.303 ± 0.362 M ops/s, hits latency: 136.930 ns/op, important_hits throughput: 1.907 ± 0.094 M ops/s num_maps: 1000 local_storage cache sequential get: hits throughput: 0.452 ± 0.010 M ops/s, hits latency: 2214.022 ns/op, important_hits throughput: 0.000 ± 0.000 M ops/s local_storage cache interleaved get: hits throughput: 0.542 ± 0.007 M ops/s, hits latency: 1843.341 ns/op, important_hits throughput: 0.136 ± 0.002 M ops/s Looking at the "sequential get" results, it's clear that as the number of task local_storage maps grows beyond the current cache size (16), there's a significant reduction in hits throughput. Note that current local_storage implementation assigns a cache_idx to maps as they are created. Since "sequential get" is creating maps 0..n in order and then doing bpf_task_storage_get calls in the same order, the benchmark is effectively ensuring that a map will not be in cache when the program tries to access it. For "interleaved get" results, important-map hits throughput is greatly increased as the important map is more likely to be in cache by virtue of being accessed far more frequently. Throughput still reduces as # maps increases, though. To get a sense of the overhead of the benchmark program, I commented out bpf_task_storage_get/bpf_map_lookup_elem in local_storage_bench.c and ran the benchmark on the same host as the 'real' run. Results: Hashmap Control =============== num keys: 10 hashmap (control) sequential get: hits throughput: 54.288 ± 0.655 M ops/s, hits latency: 18.420 ns/op, important_hits throughput: 54.288 ± 0.655 M ops/s num keys: 1000 hashmap (control) sequential get: hits throughput: 52.913 ± 0.519 M ops/s, hits latency: 18.899 ns/op, important_hits throughput: 52.913 ± 0.519 M ops/s num keys: 10000 hashmap (control) sequential get: hits throughput: 53.480 ± 1.235 M ops/s, hits latency: 18.699 ns/op, important_hits throughput: 53.480 ± 1.235 M ops/s num keys: 100000 hashmap (control) sequential get: hits throughput: 54.982 ± 1.902 M ops/s, hits latency: 18.188 ns/op, important_hits throughput: 54.982 ± 1.902 M ops/s num keys: 4194304 hashmap (control) sequential get: hits throughput: 50.858 ± 0.707 M ops/s, hits latency: 19.662 ns/op, important_hits throughput: 50.858 ± 0.707 M ops/s Local Storage ============= num_maps: 1 local_storage cache sequential get: hits throughput: 110.990 ± 4.828 M ops/s, hits latency: 9.010 ns/op, important_hits throughput: 110.990 ± 4.828 M ops/s local_storage cache interleaved get: hits throughput: 161.057 ± 4.090 M ops/s, hits latency: 6.209 ns/op, important_hits throughput: 161.057 ± 4.090 M ops/s num_maps: 10 local_storage cache sequential get: hits throughput: 112.930 ± 1.079 M ops/s, hits latency: 8.855 ns/op, important_hits throughput: 11.293 ± 0.108 M ops/s local_storage cache interleaved get: hits throughput: 115.841 ± 2.088 M ops/s, hits latency: 8.633 ns/op, important_hits throughput: 41.372 ± 0.746 M ops/s num_maps: 16 local_storage cache sequential get: hits throughput: 115.653 ± 0.416 M ops/s, hits latency: 8.647 ns/op, important_hits throughput: 7.228 ± 0.026 M ops/s local_storage cache interleaved get: hits throughput: 138.717 ± 1.649 M ops/s, hits latency: 7.209 ns/op, important_hits throughput: 44.137 ± 0.525 M ops/s num_maps: 17 local_storage cache sequential get: hits throughput: 112.020 ± 1.649 M ops/s, hits latency: 8.927 ns/op, important_hits throughput: 6.598 ± 0.097 M ops/s local_storage cache interleaved get: hits throughput: 128.089 ± 1.960 M ops/s, hits latency: 7.807 ns/op, important_hits throughput: 38.995 ± 0.597 M ops/s num_maps: 24 local_storage cache sequential get: hits throughput: 92.447 ± 5.170 M ops/s, hits latency: 10.817 ns/op, important_hits throughput: 3.855 ± 0.216 M ops/s local_storage cache interleaved get: hits throughput: 128.844 ± 2.808 M ops/s, hits latency: 7.761 ns/op, important_hits throughput: 36.245 ± 0.790 M ops/s num_maps: 32 local_storage cache sequential get: hits throughput: 102.042 ± 1.462 M ops/s, hits latency: 9.800 ns/op, important_hits throughput: 3.194 ± 0.046 M ops/s local_storage cache interleaved get: hits throughput: 126.577 ± 1.818 M ops/s, hits latency: 7.900 ns/op, important_hits throughput: 35.332 ± 0.507 M ops/s num_maps: 100 local_storage cache sequential get: hits throughput: 111.327 ± 1.401 M ops/s, hits latency: 8.983 ns/op, important_hits throughput: 1.113 ± 0.014 M ops/s local_storage cache interleaved get: hits throughput: 131.327 ± 1.339 M ops/s, hits latency: 7.615 ns/op, important_hits throughput: 34.302 ± 0.350 M ops/s num_maps: 1000 local_storage cache sequential get: hits throughput: 101.978 ± 0.563 M ops/s, hits latency: 9.806 ns/op, important_hits throughput: 0.102 ± 0.001 M ops/s local_storage cache interleaved get: hits throughput: 141.084 ± 1.098 M ops/s, hits latency: 7.088 ns/op, important_hits throughput: 35.430 ± 0.276 M ops/s Adjusting for overhead, latency numbers for "hashmap control" and "sequential get" are: hashmap_control_1k: ~53.8ns hashmap_control_10k: ~124.2ns hashmap_control_100k: ~206.5ns sequential_get_1: ~12.1ns sequential_get_10: ~16.0ns sequential_get_16: ~13.8ns sequential_get_17: ~16.8ns sequential_get_24: ~40.9ns sequential_get_32: ~65.2ns sequential_get_100: ~148.2ns sequential_get_1000: ~2204ns Clearly demonstrating a cliff. In the discussion for v1 of this patch, Alexei noted that local_storage was 2.5x faster than a large hashmap when initially implemented [1]. The benchmark results show that local_storage is 5-10x faster: a long-running BPF application putting some pid-specific info into a hashmap for each pid it sees will probably see on the order of 10-100k pids. Bench numbers for hashmaps of this size are ~10x slower than sequential_get_16, but as the number of local_storage maps grows far past local_storage cache size the performance advantage shrinks and eventually reverses. When running the benchmarks it may be necessary to bump 'open files' ulimit for a successful run. [0]: https://lore.kernel.org/all/20220420002143.1096548-1-davemarchevsky@fb.com [1]: https://lore.kernel.org/bpf/20220511173305.ftldpn23m4ski3d3@MBP-98dd607d3435.dhcp.thefacebook.com/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20220620222554.270578-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-22	KVM: selftests: Add MONITOR/MWAIT quirk test	Sean Christopherson
	Add a test to verify the "MONITOR/MWAIT never fault" quirk, and as a bonus, also verify the related "MISC_ENABLES ignores ENABLE_MWAIT" quirk. If the "never fault" quirk is enabled, MONITOR/MWAIT should always be emulated as NOPs, even if they're reported as disabled in guest CPUID. Use the MISC_ENABLES quirk to coerce KVM into toggling the MWAIT CPUID enable, as KVM now disallows manually toggling CPUID bits after running the vCPU. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220608224516.3788274-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-22	Merge tag 'linux-kselftest-fixes-5.19-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Compile time fixes and run-time resources leaks: - Fix clang cross compilation - Fix resource leak when return error - fix compile error for dma_map_benchmark - Fix regression - make use of GUP_TEST_FILE macro" * tag 'linux-kselftest-fixes-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: make use of GUP_TEST_FILE macro selftests: vm: Fix resource leak when return error selftests dma: fix compile error for dma_map_benchmark selftests: Fix clang cross compilation
2022-06-21	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf	Jakub Kicinski
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Use get_random_u32() instead of prandom_u32_state() in nft_meta and nft_numgen, from Florian Westphal. 2) Incorrect list head in nfnetlink_cttimeout in recent update coming from previous development cycle. Also from Florian. 3) Incorrect path to pktgen scripts for nft_concat_range.sh selftest. From Jie2x Zhou. 4) Two fixes for the for nft_fwd and nft_dup egress support, from Florian. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_dup_netdev: add and use recursion counter netfilter: nf_dup_netdev: do not push mac header a second time selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh netfilter: cttimeout: fix slab-out-of-bounds read typo in cttimeout_net_exit netfilter: use get_random_u32 instead of prandom ==================== Link: https://lore.kernel.org/r/20220621085618.3975-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-21	torture: Create kvm-check-branches.sh output in proper location	Paul E. McKenney
	Currently, kvm-check-branches.sh causes each kvm.sh invocation create a separate date-stamped directory, then after that invocation completes, moves it into the *-group/NNNN directory. This works, but makes it more difficult to monitor an ongoing run. This commit therefore uses the kvm.sh --datestamp argument to make kvm.sh put the output in the right place to start with, and also dispenses with the additional level of datestamping. (Those wanting datestamps can find them in the log files.) Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-21	torture: Adjust to again produce debugging information	Paul E. McKenney
	A recent change to the DEBUG_INFO Kconfig option means that simply adding CONFIG_DEBUG_INFO=y to the .config file and running "make oldconfig" no longer works. It is instead necessary to add CONFIG_DEBUG_INFO_NONE=n and (for example) CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y. This combination will then result in CONFIG_DEBUG_INFO being selected. This commit therefore updates the Kconfig options produced in response to the kvm.sh --gdb, --kasan, and --kcsan Kconfig options. Fixes: f9b3cd245784 ("Kconfig.debug: make DEBUG_INFO selectable from a choice") Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-06-21	selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh	Jie2x Zhou
	Before change: make -C netfilter TEST: performance net,port [SKIP] perf not supported port,net [SKIP] perf not supported net6,port [SKIP] perf not supported port,proto [SKIP] perf not supported net6,port,mac [SKIP] perf not supported net6,port,mac,proto [SKIP] perf not supported net,mac [SKIP] perf not supported After change: net,mac [ OK ] baseline (drop from netdev hook): 2061098pps baseline hash (non-ranged entries): 1606741pps baseline rbtree (match on first field only): 1191607pps set with 1000 full, ranged entries: 1639119pps ok 8 selftests: netfilter: nft_concat_range.sh Fixes: 611973c1e06f ("selftests: netfilter: Introduce tests for sets with range concatenation") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jie2x Zhou <jie2x.zhou@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-20	selftests/bpf: BPF test_prog selftests for bpf_loop inlining	Eduard Zingerman
	Two new test BPF programs for test_prog selftests checking bpf_loop behavior. Both are corner cases for bpf_loop inlinig transformation: - check that bpf_loop behaves correctly when callback function is not a compile time constant - check that local function variables are not affected by allocating additional stack storage for registers spilled by loop inlining Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20220620235344.569325-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20	selftests/bpf: BPF test_verifier selftests for bpf_loop inlining	Eduard Zingerman
	A number of test cases for BPF selftests test_verifier to check how bpf_loop inline transformation rewrites the BPF program. The following cases are covered: - happy path - no-rewrite when flags is non-zero - no-rewrite when callback is non-constant - subprogno in insn_aux is updated correctly when dead sub-programs are removed - check that correct stack offsets are assigned for spilling of R6-R8 registers Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20220620235344.569325-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20	selftests/bpf: allow BTF specs and func infos in test_verifier tests	Eduard Zingerman
	The BTF and func_info specification for test_verifier tests follows the same notation as in prog_tests/btf.c tests. E.g.: ... .func_info = { { 0, 6 }, { 8, 7 } }, .func_info_cnt = 2, .btf_strings = "\0int\0", .btf_types = { BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), BTF_PTR_ENC(1), }, ... The BTF specification is loaded only when specified. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20220620235344.569325-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20	selftests/bpf: specify expected instructions in test_verifier tests	Eduard Zingerman
	Allows to specify expected and unexpected instruction sequences in test_verifier test cases. The instructions are requested from kernel after BPF program loading, thus allowing to check some of the transformations applied by BPF verifier. - `expected_insn` field specifies a sequence of instructions expected to be found in the program; - `unexpected_insn` field specifies a sequence of instructions that are not expected to be found in the program; - `INSN_OFF_MASK` and `INSN_IMM_MASK` values could be used to mask `off` and `imm` fields. - `SKIP_INSNS` could be used to specify that some instructions in the (un)expected pattern are not important (behavior similar to usage of `\t` in `errstr` field). The intended usage is as follows: { "inline simple bpf_loop call", .insns = { /* main / BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1), BPF_RAW_INSN(BPF_LD \| BPF_IMM \| BPF_DW, BPF_REG_2, BPF_PSEUDO_FUNC, 0, 6), ... BPF_EXIT_INSN(), / callback */ BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 1), BPF_EXIT_INSN(), }, .expected_insns = { BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1), SKIP_INSNS(), BPF_RAW_INSN(BPF_JMP \| BPF_CALL, 0, BPF_PSEUDO_CALL, 8, 1) }, .unexpected_insns = { BPF_RAW_INSN(BPF_JMP \| BPF_CALL, 0, 0, INSN_OFF_MASK, INSN_IMM_MASK), }, .prog_type = BPF_PROG_TYPE_TRACEPOINT, .result = ACCEPT, .runs = 0, }, Here it is expected that move of 1 to register 1 would remain in place and helper function call instruction would be replaced by a relative call instruction. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20220620235344.569325-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20	KVM: selftests: Use exception fixup for #UD/#GP Hyper-V MSR/hcall tests	Sean Christopherson
	Use exception fixup to verify VMCALL/RDMSR/WRMSR fault as expected in the Hyper-V Features test. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220608224516.3788274-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-20	torture: Make kvm-remote.sh announce which system is being waited on	Paul E. McKenney
	If a remote system fails in certain ways, for example, if it is rebooted without removing the contents of the /tmp directory, its remote.run file never will be removed and the kvm-remote.sh script will loop waiting forever. The manual workaround for this (hopefully!) rare event is to manually remove the file, which will cause the results up to the reboot to be collected and evaluated. Unfortunately, to work out which system is holding things up, the user must refer to the name of the last system whose results were collected, then look up the name of the next system in sequence, then manually remove the remote.run file. Even more unfortunately, this procedure can be fooled in runs where each system handles more than one batch should a given system take longer than expected, causing the systems to be handled out of order. This commit therefore causes kvm-remote.sh to print out the name of the system it will wait on next, allowing the user to refer directly to that name. Making the kvm-remote.sh script automatically handle unscheduled termination of the qemu processes is left as future work. Quite possibly deep future work. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>