linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-11-11	selftests/bpf: fix veristat's singular file-or-prog filter	Andrii Nakryiko
	Fix the bug of filtering out filename too early, before we know the program name, if using unified file-or-prog filter (i.e., -f <any-glob>). Because we try to filter BPF object file early without opening and parsing it, if any_glob (file-or-prog) filter is used we have to accept any filename just to get program name, which might match any_glob. Fixes: 10b1b3f3e56a ("selftests/bpf: consolidate and improve file/prog filtering in veristat") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221111181242.2101192-1-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-11-11	Merge tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux	Linus Torvalds
	Pull io_uring fixes from Jens Axboe: "Nothing major, just a few minor tweaks: - Tweak for the TCP zero-copy io_uring self test (Pavel) - Rather than use our internal cached value of number of CQ events available, use what the user can see (Dylan) - Fix a typo in a comment, added in this release (me) - Don't allow wrapping while adding provided buffers (me) - Fix a double poll race, and add a lockdep assertion for it too (Pavel)" * tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux: io_uring/poll: lockdep annote io_poll_req_insert_locked io_uring/poll: fix double poll req->flags races io_uring: check for rollover of buffer ID when providing buffers io_uring: calculate CQEs from the user visible value io_uring: fix typo in io_uring.h comment selftests/net: don't tests batched TCP io_uring zc
2022-11-11	selftests/bpf: Test skops->skb_hwtstamp	Martin KaFai Lau
	This patch tests reading the skops->skb_hwtstamp field. A local test was also done such that the shinfo hwtstamp was temporary set to a non zero value in the kernel bpf_skops_parse_hdr() and the same value can be read by the skops test. An adjustment is needed to the btf_dump selftest because the changes in the 'struct bpf_sock_ops'. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221107230420.4192307-4-martin.lau@linux.dev
2022-11-11	selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test	Martin KaFai Lau
	This patch fixes the incorrect ASSERT test in tcp_hdr_options during the CHECK to ASSERT macro cleanup. Fixes: 3082f8cd4ba3 ("selftests/bpf: Convert tcp_hdr_options test to ASSERT_* macros") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Wang Yufen <wangyufen@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221107230420.4192307-3-martin.lau@linux.dev
2022-11-11	bpf: Add hwtstamp field for the sockops prog	Martin KaFai Lau
	The bpf-tc prog has already been able to access the skb_hwtstamps(skb)->hwtstamp. This patch extends the same hwtstamp access to the sockops prog. In sockops, the skb is also available to the bpf prog during the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event. There is a use case that the hwtstamp will be useful to the sockops prog to better measure the one-way-delay when the sender has put the tx timestamp in the tcp header option. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev
2022-11-11	selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch	Yang Jihong
	xdp_synproxy fails to be compiled in the 32-bit arch, log is as follows: xdp_synproxy.c: In function 'parse_options': xdp_synproxy.c:175:36: error: left shift count >= width of type [-Werror=shift-count-overflow] 175 \| *tcpipopts = (mss6 << 32) \| (ttl << 24) \| (wscale << 16) \| mss4; \| ^~ xdp_synproxy.c: In function 'syncookie_open_bpf_maps': xdp_synproxy.c:289:28: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] 289 \| .map_ids = (__u64)map_ids, \| ^ Fix it. Fixes: fb5cd0ce70d4 ("selftests/bpf: Add selftests for raw syncookie helpers") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221111030836.37632-1-yangjihong1@huawei.com
2022-11-11	selftests: bpf: Add a test when bpf_probe_read_kernel_str() returns EFAULT	Alban Crequy
	This commit tests previous fix of bpf_probe_read_kernel_str(). The BPF helper bpf_probe_read_kernel_str should return -EFAULT when given a bad source pointer and the target buffer should only be modified to make the string NULL terminated. bpf_probe_read_kernel_str() was previously inserting a NULL before the beginning of the dst buffer. This test should ensure that the implementation stays correct for now on. Without the fix, this test will fail as follows: $ cd tools/testing/selftests/bpf $ make $ sudo ./test_progs --name=varlen ... test_varlen:FAIL:check got 0 != exp 66 Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com> Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221110085614.111213-3-albancrequy@linux.microsoft.com Changes v1 to v2: - add ack tag - fix my email - rebase on bpf tree and tag for bpf tree
2022-11-11	libbpf: Hashmap.h update to fix build issues using LLVM14	Eduard Zingerman
	A fix for the LLVM compilation error while building bpftool. Replaces the expression: _Static_assert((p) == NULL \|\| ...) by expression: _Static_assert((__builtin_constant_p((p)) ? (p) == NULL : 0) \|\| ...) When "p" is not a constant the former is not considered to be a constant expression by LLVM 14. The error was introduced in the following patch-set: [1]. The error was reported here: [2]. [1] https://lore.kernel.org/bpf/20221109142611.879983-1-eddyz87@gmail.com/ [2] https://lore.kernel.org/all/202211110355.BcGcbZxP-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Fixes: c302378bc157 ("libbpf: Hashmap interface update to allow both long and void* keys/values") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221110223240.1350810-1-eddyz87@gmail.com
2022-11-11	Merge tag 'perf-tools-fixes-for-v6.1-2-2022-11-10' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix 'perf stat' crash with --per-node --metric-only in CSV mode, due to the AGGR_NODE slot in the 'aggr_header_csv' array not being set. - Fix printing prefix in CSV output of 'perf stat' metrics in interval mode (-I), where an extra separator was being added to the start of some lines. - Fix skipping branch stack sampling 'perf test' entry, that was using both --branch-any and --branch-filter, which can't be used together. * tag 'perf-tools-fixes-for-v6.1-2-2022-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf tools: Add the include/perf/ directory to .gitignore perf test: Fix skipping branch stack sampling test perf stat: Fix printing os->prefix in CSV metrics output perf stat: Fix crash with --per-node --metric-only in CSV mode
2022-11-11	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm "This is a pretty large diffstat for this time of the release. The main culprit is a reorganization of the AMD assembly trampoline, allowing percpu variables to be accessed early. This is needed for the return stack depth tracking retbleed mitigation that will be in 6.2, but it also makes it possible to tighten the IBRS restore on vmexit. The latter change is a long tail of the spectrev2/retbleed patches (the corresponding Intel change was simpler and went in already last June), which is why I am including it right now instead of sharing a topic branch with tip. Being assembly and being rich in comments makes the line count balloon a bit, but I am pretty confident in the change (famous last words) because the reorganization actually makes everything simpler and more understandable than before. It has also had external review and has been tested on the aforementioned 6.2 changes, which explode quite brutally without the fix. Apart from this, things are pretty normal. s390: - PCI fix - PV clock fix x86: - Fix clash between PMU MSRs and other MSRs - Prepare SVM assembly trampoline for 6.2 retbleed mitigation and for... - ... tightening IBRS restore on vmexit, moving it before the first RET or indirect branch - Fix log level for VMSA dump - Block all page faults during kvm_zap_gfn_range() Tools: - kvm_stat: fix incorrect detection of debugfs - kvm_stat: update vmexit definitions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Block all page faults during kvm_zap_gfn_range() KVM: x86/pmu: Limit the maximum number of supported AMD GP counters KVM: x86/pmu: Limit the maximum number of supported Intel GP counters KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet KVM: SVM: Only dump VMSA to klog at KERN_DEBUG level tools/kvm_stat: update exit reasons for vmx/svm/aarch64/userspace tools/kvm_stat: fix incorrect detection of debugfs x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callers KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly KVM: SVM: restore host save area from assembly KVM: SVM: move guest vmsave/vmload back to assembly KVM: SVM: do not allocate struct svm_cpu_data dynamically KVM: SVM: remove dead field from struct svm_cpu_data KVM: SVM: remove unused field from struct vcpu_svm KVM: SVM: retrieve VMCB from assembly KVM: SVM: adjust register allocation for __svm_vcpu_run() KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm KVM: x86: use a separate asm-offsets.c file KVM: s390: pci: Fix allocation size of aift kzdev elements KVM: s390: pv: don't allow userspace to set the clock under PV
2022-11-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	drivers/net/can/pch_can.c ae64438be192 ("can: dev: fix skb drop check") 1dd1b521be85 ("can: remove obsolete PCH CAN driver") https://lore.kernel.org/all/20221110102509.1f7d63cc@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10	Merge tag 'net-6.1-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wifi, can and bpf. Current release - new code bugs: - can: af_can: can_exit(): add missing dev_remove_pack() of canxl_packet Previous releases - regressions: - bpf, sockmap: fix the sk->sk_forward_alloc warning - wifi: mac80211: fix general-protection-fault in ieee80211_subif_start_xmit() - can: af_can: fix NULL pointer dereference in can_rx_register() - can: dev: fix skb drop check, avoid o-o-b access - nfnetlink: fix potential dead lock in nfnetlink_rcv_msg() Previous releases - always broken: - bpf: fix wrong reg type conversion in release_reference() - gso: fix panic on frag_list with mixed head alloc types - wifi: brcmfmac: fix buffer overflow in brcmf_fweh_event_worker() - wifi: mac80211: set TWT Information Frame Disabled bit as 1 - eth: macsec offload related fixes, make sure to clear the keys from memory - tun: fix memory leaks in the use of napi_get_frags - tun: call napi_schedule_prep() to ensure we own a napi - tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent - ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network - tipc: fix a msg->req tlv length check - sctp: clear out_curr if all frag chunks of current msg are pruned, avoid list corruption - mctp: fix an error handling path in mctp_init(), avoid leaks" * tag 'net-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits) eth: sp7021: drop free_netdev() from spl2sw_init_netdev() MAINTAINERS: Move Vivien to CREDITS net: macvlan: fix memory leaks of macvlan_common_newlink ethernet: tundra: free irq when alloc ring failed in tsi108_open() net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open() ethernet: s2io: disable napi when start nic failed in s2io_card_up() net: atlantic: macsec: clear encryption keys from the stack net: phy: mscc: macsec: clear encryption keys when freeing a flow stmmac: dwmac-loongson: fix missing of_node_put() while module exiting stmmac: dwmac-loongson: fix missing pci_disable_device() in loongson_dwmac_probe() stmmac: dwmac-loongson: fix missing pci_disable_msi() while module exiting cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open() mctp: Fix an error handling path in mctp_init() stmmac: intel: Update PCH PTP clock rate from 200MHz to 204.8MHz net: cxgb3_main: disable napi when bind qsets failed in cxgb_up() net: cpsw: disable napi in cpsw_ndo_open() iavf: Fix VF driver counting VLAN 0 filters ice: Fix spurious interrupt during removal of trusted VF net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions net/mlx5e: E-Switch, Fix comparing termination table instance ...
2022-11-10	KVM: selftests: aarch64: Add mix of tests into page_fault_test	Ricardo Koller
	Add some mix of tests into page_fault_test: memory regions with all the pairwise combinations of read-only, userfaultfd, and dirty-logging. For example, writing into a read-only region which has a hole handled with userfaultfd. Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-15-ricarkol@google.com
2022-11-10	KVM: selftests: aarch64: Add readonly memslot tests into page_fault_test	Ricardo Koller
	Add some readonly memslot tests into page_fault_test. Mark the data and/or page-table memory regions as readonly, perform some accesses, and check that the right fault is triggered when expected (e.g., a store with no write-back should lead to an mmio exit). Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-14-ricarkol@google.com
2022-11-10	KVM: selftests: aarch64: Add dirty logging tests into page_fault_test	Ricardo Koller
	Add some dirty logging tests into page_fault_test. Mark the data and/or page-table memory regions for dirty logging, perform some accesses, and check that the dirty log bits are set or clean when expected. Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-13-ricarkol@google.com
2022-11-10	KVM: selftests: aarch64: Add userfaultfd tests into page_fault_test	Ricardo Koller
	Add some userfaultfd tests into page_fault_test. Punch holes into the data and/or page-table memslots, perform some accesses, and check that the faults are taken (or not taken) when expected. Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-12-ricarkol@google.com
2022-11-10	KVM: selftests: aarch64: Add aarch64/page_fault_test	Ricardo Koller
	Add a new test for stage 2 faults when using different combinations of guest accesses (e.g., write, S1PTW), backing source type (e.g., anon) and types of faults (e.g., read on hugetlbfs with a hole). The next commits will add different handling methods and more faults (e.g., uffd and dirty logging). This first commit starts by adding two sanity checks for all types of accesses: AF setting by the hw, and accessing memslots with holes. Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-11-ricarkol@google.com
2022-11-10	KVM: selftests: Use the right memslot for code, page-tables, and data ↵	Ricardo Koller
	allocations Now that kvm_vm allows specifying different memslots for code, page tables, and data, use the appropriate memslot when making allocations in common/libraty code. Change them accordingly: - code (allocated by lib/elf) use the CODE memslot - stacks, exception tables, and other core data pages (like the TSS in x86) use the DATA memslot - page tables and the PGD use the PT memslot - test data (anything allocated with vm_vaddr_alloc()) uses the TEST_DATA memslot No functional change intended. All allocators keep using memslot #0. Cc: Sean Christopherson <seanjc@google.com> Cc: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-10-ricarkol@google.com
2022-11-10	KVM: selftests: Fix alignment in virt_arch_pgd_alloc() and vm_vaddr_alloc()	Ricardo Koller
	Refactor virt_arch_pgd_alloc() and vm_vaddr_alloc() in both RISC-V and aarch64 to fix the alignment of parameters in a couple of calls. This will make it easier to fix the alignment in a future commit that adds an extra parameter (that happens to be very long). No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-9-ricarkol@google.com
2022-11-10	KVM: selftests: Add vm->memslots[] and enum kvm_mem_region_type	Ricardo Koller
	The vm_create() helpers are hardcoded to place most page types (code, page-tables, stacks, etc) in the same memslot #0, and always backed with anonymous 4K. There are a couple of issues with that. First, tests willing to differ a bit, like placing page-tables in a different backing source type must replicate much of what's already done by the vm_create() functions. Second, the hardcoded assumption of memslot #0 holding most things is spread everywhere; this makes it very hard to change. Fix the above issues by having selftests specify how they want memory to be laid out. Start by changing ____vm_create() to not create memslot #0; a test (to come) will specify all memslots used by the VM. Then, add the vm->memslots[] array to specify the right memslot for different memory allocators, e.g.,: lib/elf should use the vm->[MEM_REGION_CODE] memslot. This will be used as a way to specify the page-tables memslots (to be backed by huge pages for example). There is no functional change intended. The current commit lays out memory exactly as before. A future commit will change the allocators to get the region they should be using, e.g.,: like the page table allocators using the pt memslot. Cc: Sean Christopherson <seanjc@google.com> Cc: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-8-ricarkol@google.com
2022-11-10	KVM: selftests: Stash backing_src_type in struct userspace_mem_region	Ricardo Koller
	Add the backing_src_type into struct userspace_mem_region. This struct already stores a lot of info about memory regions, except the backing source type. This info will be used by a future commit in order to determine the method for punching a hole. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-7-ricarkol@google.com
2022-11-10	tools: Copy bitfield.h from the kernel sources	Ricardo Koller
	Copy bitfield.h from include/linux/bitfield.h. A subsequent change will make use of some FIELD_{GET,PREP} macros defined in this header. The header was copied as-is, no changes needed. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-6-ricarkol@google.com
2022-11-10	KVM: selftests: aarch64: Construct DEFAULT_MAIR_EL1 using sysreg.h macros	Ricardo Koller
	Define macros for memory type indexes and construct DEFAULT_MAIR_EL1 with macros from asm/sysreg.h. The index macros can then be used when constructing PTEs (instead of using raw numbers). Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-5-ricarkol@google.com
2022-11-10	KVM: selftests: Add missing close and munmap in __vm_mem_region_delete()	Ricardo Koller
	Deleting a memslot (when freeing a VM) is not closing the backing fd, nor it's unmapping the alias mapping. Fix by adding the missing close and munmap. Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-4-ricarkol@google.com
2022-11-10	KVM: selftests: aarch64: Add virt_get_pte_hva() library function	Ricardo Koller
	Add a library function to get the PTE (a host virtual address) of a given GVA. This will be used in a future commit by a test to clear and check the access flag of a particular page. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-3-ricarkol@google.com
2022-11-10	KVM: selftests: Add a userfaultfd library	Ricardo Koller
	Move the generic userfaultfd code out of demand_paging_test.c into a common library, userfaultfd_util. This library consists of a setup and a stop function. The setup function starts a thread for handling page faults using the handler callback function. This setup returns a uffd_desc object which is then used in the stop function (to wait and destroy the threads). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221017195834.2295901-2-ricarkol@google.com
2022-11-10	KVM: arm64: selftests: Test with every breakpoint/watchpoint	Reiji Watanabe
	Currently, the debug-exceptions test always uses only {break,watch}point#0 and the highest numbered context-aware breakpoint. Modify the test to use all {break,watch}points and context-aware breakpoints supported on the system. Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-10-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Add a test case for a linked watchpoint	Reiji Watanabe
	Currently, the debug-exceptions test doesn't have a test case for a linked watchpoint. Add a test case for the linked watchpoint to the test. The new test case uses the highest numbered context-aware breakpoint (for Context ID match), and the watchpoint#0, which is linked to the context-aware breakpoint. Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-9-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Add a test case for a linked breakpoint	Reiji Watanabe
	Currently, the debug-exceptions test doesn't have a test case for a linked breakpoint. Add a test case for the linked breakpoint to the test. The new test case uses a pair of breakpoints. One is the higiest numbered context-aware breakpoint (for Context ID match), and the other one is the breakpoint#0 (for Address Match), which is linked to the context-aware breakpoint. Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-8-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Change debug_version() to take ID_AA64DFR0_EL1	Reiji Watanabe
	Change debug_version() to take the ID_AA64DFR0_EL1 value instead of vcpu as an argument, and change its callsite to read ID_AA64DFR0_EL1 (and pass it to debug_version()). Subsequent patches will reuse the register value in the callsite. No functional change intended. Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-7-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Stop unnecessary test stage tracking of debug-exceptions	Reiji Watanabe
	Currently, debug-exceptions test unnecessarily tracks some test stages using GUEST_SYNC(). The code for it needs to be updated as test cases are added or removed. Stop doing the unnecessary stage tracking, as they are not so useful and are a bit pain to maintain. Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-6-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Add helpers to enable debug exceptions	Reiji Watanabe
	Add helpers to enable breakpoint and watchpoint exceptions. No functional change intended. Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-5-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Remove the hard-coded {b,w}pn#0 from debug-exceptions	Reiji Watanabe
	Remove the hard-coded {break,watch}point #0 from the guest_code() in debug-exceptions to allow {break,watch}point number to be specified. Change reset_debug_state() to zeroing all dbg{b,w}{c,v}r_el0 registers so that guest_code() can use the function to reset those registers even when non-zero {break,watch}points are specified for guest_code(). Subsequent patches will add test cases for non-zero {break,watch}points. Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-4-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Add write_dbg{b,w}{c,v}r helpers in debug-exceptions	Reiji Watanabe
	Introduce helpers in the debug-exceptions test to write to dbg{b,w}{c,v}r registers. Those helpers will be useful for test cases that will be added to the test in subsequent patches. No functional change intended. Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-3-reijiw@google.com
2022-11-10	KVM: arm64: selftests: Use FIELD_GET() to extract ID register fields	Reiji Watanabe
	Use FIELD_GET() macro to extract ID register fields for existing aarch64 selftests code. No functional change intended. Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020054202.2119018-2-reijiw@google.com
2022-11-10	KVM: selftests: memslot_perf_test: Report optimal memory slots	Gavin Shan
	The memory area in each slot should be aligned to host page size. Otherwise, the test will fail. For example, the following command fails with the following messages with 64KB-page-size-host and 4KB-pae-size-guest. It's not user friendly to abort the test. Lets do something to report the optimal memory slots, instead of failing the test. # ./memslot_perf_test -v -s 1000 Number of memory slots: 999 Testing map performance with 1 runs, 5 seconds each Adding slots 1..999, each slot with 8 pages + 216 extra pages last ==== Test Assertion Failure ==== lib/kvm_util.c:824: vm_adjust_num_guest_pages(vm->mode, npages) == npages pid=19872 tid=19872 errno=0 - Success 1 0x00000000004065b3: vm_userspace_mem_region_add at kvm_util.c:822 2 0x0000000000401d6b: prepare_vm at memslot_perf_test.c:273 3 (inlined by) test_execute at memslot_perf_test.c:756 4 (inlined by) test_loop at memslot_perf_test.c:994 5 (inlined by) main at memslot_perf_test.c:1073 6 0x0000ffff7ebb4383: ?? ??:0 7 0x00000000004021ff: _start at :? Number of guest pages is not compatible with the host. Try npages=16 Report the optimal memory slots instead of failing the test when the memory area in each slot isn't aligned to host page size. With this applied, the optimal memory slots is reported. # ./memslot_perf_test -v -s 1000 Number of memory slots: 999 Testing map performance with 1 runs, 5 seconds each Memslot count too high for this test, decrease the cap (max is 514) Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020071209.559062-7-gshan@redhat.com
2022-11-10	KVM: selftests: memslot_perf_test: Consolidate memory	Gavin Shan
	The addresses and sizes passed to vm_userspace_mem_region_add() and madvise() should be aligned to host page size, which can be 64KB on aarch64. So it's wrong by passing additional fixed 4KB memory area to various tests. Fix it by passing additional fixed 64KB memory area to various tests. We also add checks to ensure that none of host/guest page size exceeds 64KB. MEM_TEST_MOVE_SIZE is fixed up to 192KB either. With this, the following command works fine on 64KB-page-size-host and 4KB-page-size-guest. # ./memslot_perf_test -v -s 512 Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020071209.559062-6-gshan@redhat.com
2022-11-10	KVM: selftests: memslot_perf_test: Support variable guest page size	Gavin Shan
	The test case is obviously broken on aarch64 because non-4KB guest page size is supported. The guest page size on aarch64 could be 4KB, 16KB or 64KB. This supports variable guest page size, mostly for aarch64. - The host determines the guest page size when virtual machine is created. The value is also passed to guest through the synchronization area. - The number of guest pages are unknown until the virtual machine is to be created. So all the related macros are dropped. Instead, their values are dynamically calculated based on the guest page size. - The static checks on memory sizes and pages becomes dependent on guest page size, which is unknown until the virtual machine is about to be created. So all the static checks are converted to dynamic checks, done in check_memory_sizes(). - As the address passed to madvise() should be aligned to host page, the size of page chunk is automatically selected, other than one page. - MEM_TEST_MOVE_SIZE has fixed and non-working 64KB. It will be consolidated in next patch. However, the comments about how it's calculated has been correct. - All other changes included in this patch are almost mechanical replacing '4096' with 'guest_page_size'. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020071209.559062-5-gshan@redhat.com
2022-11-10	KVM: selftests: memslot_perf_test: Probe memory slots for once	Gavin Shan
	prepare_vm() is called in every iteration and run. The allowed memory slots (KVM_CAP_NR_MEMSLOTS) are probed for multiple times. It's not free and unnecessary. Move the probing logic for the allowed memory slots to parse_args() for once, which is upper layer of prepare_vm(). No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020071209.559062-4-gshan@redhat.com
2022-11-10	KVM: selftests: memslot_perf_test: Consolidate loop conditions in prepare_vm()	Gavin Shan
	There are two loops in prepare_vm(), which have different conditions. 'slot' is treated as meory slot index in the first loop, but index of the host virtual address array in the second loop. It makes it a bit hard to understand the code. Change the usage of 'slot' in the second loop, to treat it as the memory slot index either. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020071209.559062-3-gshan@redhat.com
2022-11-10	KVM: selftests: memslot_perf_test: Use data->nslots in prepare_vm()	Gavin Shan
	In prepare_vm(), 'data->nslots' is assigned with 'max_mem_slots - 1' at the beginning, meaning they are interchangeable. Use 'data->nslots' isntead of 'max_mem_slots - 1'. With this, it becomes easier to move the logic of probing number of slots into upper layer in subsequent patches. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221020071209.559062-2-gshan@redhat.com
2022-11-10	perf vendor events: Add Arm Neoverse V2 PMU events	James Clark
	Rename the neoverse-n2 folder to make it clear that it includes V2, and add V2 to mapfile.csv. V2 has the same events as N2, visible by running the following command in the ARM-software/data github repo [1]: diff pmu/neoverse-v2.json pmu/neoverse-n2.json \| grep code Testing: $ perf test pmu 10: PMU events : 10.1: PMU event table sanity : Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs : Ok [1]: https://github.com/ARM-software/data Reviewed-by: Nick Forrington <nick.forrington@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20221020134512.1345013-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-10	perf print-events: Remove redundant comparison with zero	Kang Minchul
	Since variable npmus is unsigned int, comparing with 0 is unnecessary. Signed-off-by: Kang Minchul <tegongkang@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221105135932.81612-1-tegongkang@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-10	perf data: Add tracepoint fields when converting to JSON	Dmitrii Dolgov
	When converting recorded data into JSON format, perf data omits probe variables. Add them to the output in the format "field name": "field value" using tep_print_field: $ perf data convert --to-json output.json // output.json { "linux-perf-json-version": 1, "headers": { ... }, "samples": [ { "timestamp": 29182079082999, "pid": 309194, [...] "__probe_ip": "0x93ee35", "query_string_string": "select 2;", "nxids": "0" } ] } Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221109103932.65675-1-9erthalion6@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-10	perf lock: Allow concurrent record and report	Namhyung Kim
	To support live monitoring of kernel lock contention without BPF, it should support something like below: # perf lock record -a -o- sleep 1 \| perf lock contention -i- contended total wait max wait avg wait type caller 2 10.27 us 6.17 us 5.13 us spinlock load_balance+0xc03 1 5.29 us 5.29 us 5.29 us rwlock:W ep_scan_ready_list+0x54 1 4.12 us 4.12 us 4.12 us spinlock smpboot_thread_fn+0x116 1 3.28 us 3.28 us 3.28 us mutex pipe_read+0x50 To do that, it needs to handle HEAD_ATTR, HEADER_EVENT_UPDATE and HEADER_TRACING_DATA which are generated only for the pipe mode. And setting event handler also should be delayed until it gets the event information. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221104051440.220989-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-10	perf trace: Add augmenter for clock_gettime's rqtp timespec arg	Arnaldo Carvalho de Melo
	One more before going the BTF way: # perf trace -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o,*nanosleep ? pool-gsd-smart/2893 ... [continued]: clock_nanosleep()) = 0 ? gpm/1042 ... [continued]: clock_nanosleep()) = 0 1.232 pool-gsd-smart/2893 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f64d7ffec50) ... 1.232 pool-gsd-smart/2893 ... [continued]: clock_nanosleep()) = 0 327.329 gpm/1042 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffddfd1cf20) ... 1002.482 pool-gsd-smart/2893 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f64d7ffec50) = 0 327.329 gpm/1042 ... [continued]: clock_nanosleep()) = 0 2003.947 pool-gsd-smart/2893 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f64d7ffec50) ... 2003.947 pool-gsd-smart/2893 ... [continued]: clock_nanosleep()) = 0 2327.858 gpm/1042 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffddfd1cf20) ... ? crond/1384 ... [continued]: clock_nanosleep()) = 0 3005.382 pool-gsd-smart/2893 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f64d7ffec50) ... 3005.382 pool-gsd-smart/2893 ... [continued]: clock_nanosleep()) = 0 3675.633 crond/1384 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffcc02b66b0) ... ^C# Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-10	KVM: selftests: Automate choosing dirty ring size in dirty_log_test	Gavin Shan
	In the dirty ring case, we rely on vcpu exit due to full dirty ring state. On ARM64 system, there are 4096 host pages when the host page size is 64KB. In this case, the vcpu never exits due to the full dirty ring state. The similar case is 4KB page size on host and 64KB page size on guest. The vcpu corrupts same set of host pages, but the dirty page information isn't collected in the main thread. This leads to infinite loop as the following log shows. # ./dirty_log_test -M dirty-ring -c 65536 -m 5 Setting log mode to: 'dirty-ring' Test iterations: 32, interval: 10 (ms) Testing guest mode: PA-bits:40, VA-bits:48, 4K pages guest physical test memory offset: 0xffbffe0000 vcpu stops because vcpu is kicked out... Notifying vcpu to continue vcpu continues now. Iteration 1 collected 576 pages <No more output afterwards> Fix the issue by automatically choosing the best dirty ring size, to ensure vcpu exit due to full dirty ring state. The option '-c' becomes a hint to the dirty ring count, instead of the value of it. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-8-gshan@redhat.com
2022-11-10	KVM: selftests: Clear dirty ring states between two modes in dirty_log_test	Gavin Shan
	There are two states, which need to be cleared before next mode is executed. Otherwise, we will hit failure as the following messages indicate. - The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu thread. It's indicating if the vcpu exit due to full ring buffer. The value can be carried from previous mode (VM_MODE_P40V48_4K) to current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't supported. - The current ring buffer index needs to be reset before next mode (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is carried from previous mode (VM_MODE_P40V48_4K). # ./dirty_log_test -M dirty-ring Setting log mode to: 'dirty-ring' Test iterations: 32, interval: 10 (ms) Testing guest mode: PA-bits:40, VA-bits:48, 4K pages guest physical test memory offset: 0xffbfffc000 : Dirtied 995328 pages Total bits checked: dirty (1012434), clear (7114123), track_next (966700) Testing guest mode: PA-bits:40, VA-bits:48, 64K pages guest physical test memory offset: 0xffbffc0000 vcpu stops because vcpu is kicked out... vcpu continues now. Notifying vcpu to continue Iteration 1 collected 0 pages vcpu stops because dirty ring is full... vcpu continues now. vcpu stops because dirty ring is full... vcpu continues now. vcpu stops because dirty ring is full... ==== Test Assertion Failure ==== dirty_log_test.c:369: cleared == count pid=10541 tid=10541 errno=22 - Invalid argument 1 0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369 2 0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492 3 (inlined by) run_test at dirty_log_test.c:795 4 (inlined by) run_test at dirty_log_test.c:705 5 0x0000000000403a37: for_each_guest_mode at guest_modes.c:100 6 0x0000000000401ccf: main at dirty_log_test.c:938 7 0x0000ffff9ecd279b: ?? ??:0 8 0x0000ffff9ecd286b: ?? ??:0 9 0x0000000000401def: _start at ??:? Reset dirty pages (0) mismatch with collected (35566) Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring buffer index before next new mode is to be executed. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-7-gshan@redhat.com
2022-11-10	KVM: selftests: Use host page size to map ring buffer in dirty_log_test	Gavin Shan
	In vcpu_map_dirty_ring(), the guest's page size is used to figure out the offset in the virtual area. It works fine when we have same page sizes on host and guest. However, it fails when the page sizes on host and guest are different on arm64, like below error messages indicates. # ./dirty_log_test -M dirty-ring -m 7 Setting log mode to: 'dirty-ring' Test iterations: 32, interval: 10 (ms) Testing guest mode: PA-bits:40, VA-bits:48, 64K pages guest physical test memory offset: 0xffbffc0000 vcpu stops because vcpu is kicked out... Notifying vcpu to continue vcpu continues now. ==== Test Assertion Failure ==== lib/kvm_util.c:1477: addr == MAP_FAILED pid=9000 tid=9000 errno=0 - Success 1 0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477 2 0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349 3 0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478 4 (inlined by) run_test at dirty_log_test.c:778 5 (inlined by) run_test at dirty_log_test.c:691 6 0x0000000000403a57: for_each_guest_mode at guest_modes.c:105 7 0x0000000000401ccf: main at dirty_log_test.c:921 8 0x0000ffffb06ec79b: ?? ??:0 9 0x0000ffffb06ec86b: ?? ??:0 10 0x0000000000401def: _start at ??:? Dirty ring mapped private Fix the issue by using host's page size to map the ring buffer. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-6-gshan@redhat.com
2022-11-10	tools/vm/slabinfo: indicates the cause of the EACCES error	Rong Tao
	If you don't run slabinfo with a superuser, return 0 when read_slab_dir() reads get_obj_and_str("slabs", &t), because fopen() fails (sometimes EACCES), causing slabcache() to return directly, without any error during this time, we should tell the user about the EACCES problem instead of running successfully($?=0) without any error printing. For example: $ ./slabinfo Permission denied, Try using superuser <== What this submission did $ sudo ./slabinfo Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg Acpi-Namespace 5950 48 286.7K 65/0/5 85 0 0 99 Acpi-Operand 13664 72 999.4K 231/0/13 56 0 0 98 ... Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>