linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-01-20	bpf: selftests: Get rid of CHECK macro in xdp_adjust_tail.c	Lorenzo Bianconi
	Rely on ASSERT* macros and get rid of deprecated CHECK ones in xdp_adjust_tail bpf selftest. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/c0ab002ffa647a20ec9e584214bf0d4373142b54.1642679130.git.lorenzo@kernel.org
2022-01-20	selftests: kvm/x86: Fix the warning in lib/x86_64/processor.c	Jinrong Liang
	The following warning appears when executing make -C tools/testing/selftests/kvm include/x86_64/processor.h:290:2: warning: 'ecx' may be used uninitialized in this function [-Wmaybe-uninitialized] asm volatile("cpuid" ^~~ lib/x86_64/processor.c:1523:21: note: 'ecx' was declared here uint32_t eax, ebx, ecx, edx, max_ext_leaf; Just initialize ecx to remove this warning. Fixes: c8cc43c1eae2 ("selftests: KVM: avoid failures due to reserved HyperTransport region") Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220119140325.59369-1-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-20	selftests: kvm/x86: Fix the warning in pmu_event_filter_test.c	Jinrong Liang
	The following warning appears when executing make -C tools/testing/selftests/kvm x86_64/pmu_event_filter_test.c: In function 'vcpu_supports_intel_br_retired': x86_64/pmu_event_filter_test.c:241:28: warning: variable 'cpuid' set but not used [-Wunused-but-set-variable] 241 \| struct kvm_cpuid2 cpuid; \| ^~~~~ x86_64/pmu_event_filter_test.c: In function 'vcpu_supports_amd_zen_br_retired': x86_64/pmu_event_filter_test.c:258:28: warning: variable 'cpuid' set but not used [-Wunused-but-set-variable] 258 \| struct kvm_cpuid2 cpuid; \| ^~~~~ Just delete the unused variables to stay away from warnings. Fixes: dc7e75b3b3ee ("selftests: kvm/x86: Add test for KVM_SET_PMU_EVENT_FILTER") Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Message-Id: <20220119133910.56285-1-cloudliang@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-20	tools headers UAPI: Sync files changed by new set_mempolicy_home_node syscall	Arnaldo Carvalho de Melo
	To pick the changes in these csets: 21b084fdf2a49ca1 ("mm/mempolicy: wire up syscall set_mempolicy_home_node") That add support for this new syscall in tools such as 'perf trace'. For instance, this is now possible: [root@five ~]# perf trace -e set_mempolicy_home_node ^C[root@five ~]# [root@five ~]# perf trace -v -e set_mempolicy_home_node Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 253729 && common_pid != 3585) && (id == 450) mmap size 528384B ^C[root@five ~] [root@five ~]# perf trace -v -e set* --max-events 5 Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 253734 && common_pid != 3585) && (id == 38 \|\| id == 54 \|\| id == 105 \|\| id == 106 \|\| id == 109 \|\| id == 112 \|\| id == 113 \|\| id == 114 \|\| id == 116 \|\| id == 117 \|\| id == 119 \|\| id == 122 \|\| id == 123 \|\| id == 141 \|\| id == 160 \|\| id == 164 \|\| id == 170 \|\| id == 171 \|\| id == 188 \|\| id == 205 \|\| id == 218 \|\| id == 238 \|\| id == 273 \|\| id == 308 \|\| id == 450) mmap size 528384B 0.000 ( 0.008 ms): bash/253735 setpgid(pid: 253735 (bash), pgid: 253735 (bash)) = 0 6849.011 ( 0.008 ms): bash/16046 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0 6849.080 ( 0.005 ms): bash/253736 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0 7437.718 ( 0.009 ms): gnome-shell/253737 set_robust_list(head: 0x7f34b527e920, len: 24) = 0 13445.986 ( 0.010 ms): bash/16046 setpgid(pid: 253738 (bash), pgid: 253738 (bash)) = 0 [root@five ~]# That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ find tools/perf/arch/ -name "syscall*tbl" \| xargs grep -w set_mempolicy_home_node tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/s390/entry/syscalls/syscall.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node $ $ grep -w set_mempolicy_home_node /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c [450] = "set_mempolicy_home_node", $ This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl' diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-20	Merge tag 'net-5.17-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, bpf. Quite a handful of old regression fixes but most of those are pre-5.16. Current release - regressions: - fix memory leaks in the skb free deferral scheme if upper layer protocols are used, i.e. in-kernel TCP readers like TLS Current release - new code bugs: - nf_tables: fix NULL check typo in _clone() functions - change the default to y for Vertexcom vendor Kconfig - a couple of fixes to incorrect uses of ref tracking - two fixes for constifying netdev->dev_addr Previous releases - regressions: - bpf: - various verifier fixes mainly around register offset handling when passed to helper functions - fix mount source displayed for bpffs (none -> bpffs) - bonding: - fix extraction of ports for connection hash calculation - fix bond_xmit_broadcast return value when some devices are down - phy: marvell: add Marvell specific PHY loopback - sch_api: don't skip qdisc attach on ingress, prevent ref leak - htb: restore minimal packet size handling in rate control - sfp: fix high power modules without diagnostic monitoring - mscc: ocelot: - don't let phylink re-enable TX PAUSE on the NPI port - don't dereference NULL pointers with shared tc filters - smsc95xx: correct reset handling for LAN9514 - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account - phy: micrel: use kszphy_suspend/_resume for irq aware devices, avoid races with the interrupt Previous releases - always broken: - xdp: check prog type before updating BPF link - smc: resolve various races around abnormal connection termination - sit: allow encapsulated IPv6 traffic to be delivered locally - axienet: fix init/reset handling, add missing barriers, read the right status words, stop queues correctly - add missing dev_put() in sock_timestamping_bind_phc() Misc: - ipv4: prevent accidentally passing RTO_ONLINK to ip_route_output_key_hash() by sanitizing flags - ipv4: avoid quadratic behavior in netns dismantle - stmmac: dwmac-oxnas: add support for OX810SE - fsl: xgmac_mdio: add workaround for erratum A-009885" * tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits) ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys ipv4: avoid quadratic behavior in netns dismantle net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses dt-bindings: net: Document fsl,erratum-a009885 net/fsl: xgmac_mdio: Add workaround for erratum A-009885 net: mscc: ocelot: fix using match before it is set net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind() net: axienet: increase default TX ring size to 128 net: axienet: fix for TX busy handling net: axienet: fix number of TX ring slots for available check net: axienet: Fix TX ring slot available check net: axienet: limit minimum TX ring size net: axienet: add missing memory barriers net: axienet: reset core on initialization prior to MDIO access net: axienet: Wait for PhyRstCmplt after core reset net: axienet: increase reset timeout bpf, selftests: Add ringbuf memory type confusion test ...
2022-01-20	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge more updates from Andrew Morton: "55 patches. Subsystems affected by this patch series: percpu, procfs, sysctl, misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2, hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits) lib: remove redundant assignment to variable ret ubsan: remove CONFIG_UBSAN_OBJECT_SIZE kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB btrfs: use generic Kconfig option for 256kB page size limit arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB configs: introduce debug.config for CI-like setup delayacct: track delays from memory compact Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact delayacct: cleanup flags in struct task_delay_info and functions use it delayacct: fix incomplete disable operation when switch enable to disable delayacct: support swapin delay accounting for swapping without blkio panic: remove oops_id panic: use error_report_end tracepoint on warnings fs/adfs: remove unneeded variable make code cleaner FAT: use io_schedule_timeout() instead of congestion_wait() hfsplus: use struct_group_attr() for memcpy() region nilfs2: remove redundant pointer sbufs fs/binfmt_elf: use PT_LOAD p_align values for static PIE const_structs.checkpatch: add frequently used ops structs ...
2022-01-20	delayacct: track delays from memory compact	wangyong
	Delay accounting does not track the delay of memory compact. When there is not enough free memory, tasks can spend a amount of their time waiting for compact. To get the impact of tasks in direct memory compact, measure the delay when allocating memory through memory compact. Also update tools/accounting/getdelays.c: / # ./getdelays_next -di -p 304 print delayacct stats ON printing IO accounting PID 304 CPU count real total virtual total delay total delay average 277 780000000 849039485 18877296 0.068ms IO count delay total delay average 0 0 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 5 11088812685 2217ms THRASHING count delay total delay average 0 0 0ms COMPACT count delay total delay average 3 72758 0ms watch: read=0, write=0, cancelled_write=0 Link: https://lkml.kernel.org/r/1638619795-71451-1-git-send-email-wang.yong12@zte.com.cn Signed-off-by: wangyong <wang.yong12@zte.com.cn> Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> Reviewed-by: Zhang Wenya <zhang.wenya1@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20	hash.h: remove unused define directive	Isabella Basso
	Patch series "test_hash.c: refactor into KUnit", v3. We refactored the lib/test_hash.c file into KUnit as part of the student group LKCAMP [1] introductory hackathon for kernel development. This test was pointed to our group by Daniel Latypov [2], so its full conversion into a pure KUnit test was our goal in this patch series, but we ran into many problems relating to it not being split as unit tests, which complicated matters a bit, as the reasoning behind the original tests is quite cryptic for those unfamiliar with hash implementations. Some interesting developments we'd like to highlight are: - In patch 1/5 we noticed that there was an unused define directive that could be removed. - In patch 4/5 we noticed how stringhash and hash tests are all under the lib/test_hash.c file, which might cause some confusion, and we also broke those kernel config entries up. Overall KUnit developments have been made in the other patches in this series: In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c file so as to make it more compatible with the KUnit style, whilst preserving the original idea of the maintainer who designed it (i.e. George Spelvin), which might be undesirable for unit tests, but we assume it is enough for a first patch. This patch (of 5): Currently, there exist hash_32() and __hash_32() functions, which were introduced in a patch [1] targeting architecture specific optimizations. These functions can be overridden on a per-architecture basis to achieve such optimizations. They must set their corresponding define directive (HAVE_ARCH_HASH_32 and HAVE_ARCH__HASH_32, respectively) so that header files can deal with these overrides properly. As the supported 32-bit architectures that have their own hash function implementation (i.e. m68k, Microblaze, H8/300, pa-risc) have only been making use of the (more general) __hash_32() function (which only lacks a right shift operation when compared to the hash_32() function), remove the define directive corresponding to the arch-specific hash_32() implementation. [1] https://lore.kernel.org/lkml/20160525073311.5600.qmail@ns.sciencehorizons.net/ [akpm@linux-foundation.org: hash_32_generic() becomes hash_32()] Link: https://lkml.kernel.org/r/20211208183711.390454-1-isabbasso@riseup.net Link: https://lkml.kernel.org/r/20211208183711.390454-2-isabbasso@riseup.net Reviewed-by: David Gow <davidgow@google.com> Tested-by: David Gow <davidgow@google.com> Co-developed-by: Augusto Durães Camargo <augusto.duraes33@gmail.com> Signed-off-by: Augusto Durães Camargo <augusto.duraes33@gmail.com> Co-developed-by: Enzo Ferreira <ferreiraenzoa@gmail.com> Signed-off-by: Enzo Ferreira <ferreiraenzoa@gmail.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20	tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN	Yafang Shao
	As the sched:sched_switch tracepoint args are derived from the kernel, we'd better make it same with the kernel. So the macro TASK_COMM_LEN is converted to type enum, then all the BPF programs can get it through BTF. The BPF program which wants to use TASK_COMM_LEN should include the header vmlinux.h. Regarding the test_stacktrace_map and test_tracepoint, as the type defined in linux/bpf.h are also defined in vmlinux.h, so we don't need to include linux/bpf.h again. Link: https://lkml.kernel.org/r/20211120112738.45980-8-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20	tools/bpf/bpftool/skeleton: replace bpf_probe_read_kernel with ↵	Yafang Shao
	bpf_probe_read_kernel_str to get task comm bpf_probe_read_kernel_str() will add a nul terminator to the dst, then we don't care about if the dst size is big enough. Link: https://lkml.kernel.org/r/20211120112738.45980-7-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-19	selftests/bpf: Update sockopt_sk test to the use bpf_set_retval	YiFei Zhu
	The tests would break without this patch, because at one point it calls getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen) This getsockopt receives the kernel-set -EINVAL. Prior to this patch series, the eBPF getsockopt hook's -EPERM would override kernel's -EINVAL, however, after this patch series, return 0's automatic -EPERM will not; the eBPF prog has to explicitly bpf_set_retval(-EPERM) if that is wanted. I also removed the explicit mentions of EPERM in the comments in the prog. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/4f20b77cb46812dbc2bdcd7e3fa87c7573bde55e.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19	selftests/bpf: Test bpf_{get,set}_retval behavior with cgroup/sockopt	YiFei Zhu
	The tests checks how different ways of interacting with the helpers (getting retval, setting EUNATCH, EISCONN, and legacy reject returning 0 without setting retval), produce different results in both the setsockopt syscall and the retval returned by the helper. A few more tests verify the interaction between the retval of the helper and the retval in getsockopt context. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/43ec60d679ae3f4f6fd2460559c28b63cb93cd12.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19	bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value	YiFei Zhu
	The helpers continue to use int for retval because all the hooks are int-returning rather than long-returning. The return value of bpf_set_retval is int for future-proofing, in case in the future there may be errors trying to set the retval. After the previous patch, if a program rejects a syscall by returning 0, an -EPERM will be generated no matter if the retval is already set to -err. This patch change it being forced only if retval is not -err. This is because we want to support, for example, invoking bpf_set_retval(-EINVAL) and return 0, and have the syscall return value be -EINVAL not -EPERM. For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is that, if the return value is NET_XMIT_DROP, the packet is silently dropped. We preserve this behavior for backward compatibility reasons, so even if an errno is set, the errno does not return to caller. However, setting a non-err to retval cannot propagate so this is not allowed and we return a -EFAULT in that case. Signed-off-by: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/b4013fd5d16bed0b01977c1fafdeae12e1de61fb.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19	libbpf: Improve btf__add_btf() with an additional hashmap for strings.	Kui-Feng Lee
	Add a hashmap to map the string offsets from a source btf to the string offsets from a target btf to reduce overheads. btf__add_btf() calls btf__add_str() to add strings from a source to a target btf. It causes many string comparisons, and it is a major hotspot when adding a big btf. btf__add_str() uses strcmp() to check if a hash entry is the right one. The extra hashmap here compares offsets of strings, that are much cheaper. It remembers the results of btf__add_str() for later uses to reduce the cost. We are parallelizing BTF encoding for pahole by creating separated btf instances for worker threads. These per-thread btf instances will be added to the btf instance of the main thread by calling btf__add_str() to deduplicate and write out. With this patch and -j4, the running time of pahole drops to about 6.0s from 6.6s. The following lines are the summary of 'perf stat' w/o the change. 6.668126396 seconds time elapsed 13.451054000 seconds user 0.715520000 seconds sys The following lines are the summary w/ the change. 5.986973919 seconds time elapsed 12.939903000 seconds user 0.724152000 seconds sys V4 fixes a bug of error checking against the pointer returned by hashmap__new(). [v3] https://lore.kernel.org/bpf/20220118232053.2113139-1-kuifeng@fb.com/ [v2] https://lore.kernel.org/bpf/20220114193713.461349-1-kuifeng@fb.com/ Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220119180214.255634-1-kuifeng@fb.com
2022-01-19	kvm: selftests: Do not indent with spaces	Paolo Bonzini
	Some indentation with spaces crept in, likely due to terminal-based cut and paste. Clean it up. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	kvm: selftests: sync uapi/linux/kvm.h with Linux header	Paolo Bonzini
	KVM_CAP_XSAVE2 is out of sync due to a conflict. Copy the whole file while at it. Reported-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	uapi/bpf: Add missing description and returns for helper documentation	Usama Arif
	Both description and returns section will become mandatory for helpers and syscalls in a later commit to generate man pages. This commit also adds in the documentation that BPF_PROG_RUN is an alias for BPF_PROG_TEST_RUN for anyone searching for the syscall in the generated man pages. Signed-off-by: Usama Arif <usama.arif@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220119114442.1452088-1-usama.arif@bytedance.com
2022-01-19	bpftool: Adding support for BTF program names	Raman Shukhau
	`bpftool prog list` and other bpftool subcommands that show BPF program names currently get them from bpf_prog_info.name. That field is limited to 16 (BPF_OBJ_NAME_LEN) chars which leads to truncated names since many progs have much longer names. The idea of this change is to improve all bpftool commands that output prog name so that bpftool uses info from BTF to print program names if available. It tries bpf_prog_info.name first and fall back to btf only if the name is suspected to be truncated (has 15 chars length). Right now `bpftool p show id <id>` returns capped prog name <id>: kprobe name example_cap_cap tag 712e... ... With this change it would return <id>: kprobe name example_cap_capable tag 712e... ... Note, other commands that print prog names (e.g. "bpftool cgroup tree") are also addressed in this change. Signed-off-by: Raman Shukhau <ramasha@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220119100255.1068997-1-ramasha@fb.com
2022-01-19	tools headers UAPI: Sync x86 arch prctl headers with the kernel sources	Arnaldo Carvalho de Melo
	To pick the changes in this cset: 980fe2fddcff2193 ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions") This picks these new prctls: $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before $ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after $ diff -u /tmp/before /tmp/after --- /tmp/before 2022-01-19 14:40:05.049394977 -0300 +++ /tmp/after 2022-01-19 14:40:35.628154565 -0300 @@ -9,6 +9,8 @@ [0x1021 - 0x1001]= "GET_XCOMP_SUPP", [0x1022 - 0x1001]= "GET_XCOMP_PERM", [0x1023 - 0x1001]= "REQ_XCOMP_PERM", + [0x1024 - 0x1001]= "GET_XCOMP_GUEST_PERM", + [0x1025 - 0x1001]= "REQ_XCOMP_GUEST_PERM", }; #define x86_arch_prctl_codes_2_offset 0x2001 $ With this 'perf trace' can translate those numbers into strings and use the strings in filter expressions: # perf trace -e prctl 0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5) = 0 0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580) = 0 5.452 ( 0.003 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0 5.468 ( 0.002 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0 24.494 ( 0.009 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0 24.540 ( 0.002 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0 670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0 670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0 ^C# This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h' diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-19	selftests: kvm: add amx_test to .gitignore	Muhammad Usama Anjum
	amx_test's binary should be present in the .gitignore file for the git to ignore it. Fixes: bf70636d9443 ("selftest: kvm: Add amx selftest") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Message-Id: <20220118122053.1941915-1-usama.anjum@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	Merge branch 'kvm-pi-raw-spinlock' into HEAD	Paolo Bonzini
	Bring in fix for VT-d posted interrupts before further changing the code in 5.17. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	KVM: selftests: Add a test to force emulation with a pending exception	Sean Christopherson
	Add a VMX specific test to verify that KVM doesn't explode if userspace attempts KVM_RUN when emulation is required with a pending exception. KVM VMX's emulation support for !unrestricted_guest punts exceptions to userspace instead of attempting to synthesize the exception with all the correct state (and stack switching, etc...). Punting is acceptable as there's never been a request to support injecting exceptions when emulating due to invalid state, but KVM has historically assumed that userspace will do the right thing and either clear the exception or kill the guest. Deliberately do the opposite and attempt to re-enter the guest with a pending exception and emulation required to verify KVM continues to punt the combination to userspace, e.g. doesn't explode, WARN, etc... Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211228232437.1875318-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	selftests: kvm/x86: Add test for KVM_SET_PMU_EVENT_FILTER	Jim Mattson
	Verify that the PMU event filter works as expected. Note that the virtual PMU doesn't work as expected on AMD Zen CPUs (an intercepted rdmsr is counted as a retired branch instruction), but the PMU event filter does work. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-7-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	selftests: kvm/x86: Introduce x86_model()	Jim Mattson
	Extract the x86 model number from CPUID.01H:EAX. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-6-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	selftests: kvm/x86: Export x86_family() for use outside of processor.c	Jim Mattson
	Move this static inline function to processor.h, so that it can be used in individual tests, as needed. Opportunistically replace the bare 'unsigned' with 'unsigned int.' Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-5-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	selftests: kvm/x86: Introduce is_amd_cpu()	Jim Mattson
	Replace the one ad hoc "AuthenticAMD" CPUID vendor string comparison with a new function, is_amd_cpu(). Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-4-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	selftests: kvm/x86: Parameterize the CPUID vendor string check	Jim Mattson
	Refactor is_intel_cpu() to make it easier to reuse the bulk of the code for other vendors in the future. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220115052431.447232-3-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	kvm: selftests: conditionally build vm_xsave_req_perm()	Wei Wang
	vm_xsave_req_perm() is currently defined and used by x86_64 only. Make it compiled into vm_create_with_vcpus() only when on x86_64 machines. Otherwise, it would cause linkage errors, e.g. on s390x. Fixes: 415a3c33e8 ("kvm: selftests: Add support for KVM_CAP_XSAVE2") Reported-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Tested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Message-Id: <20220118014817.30910-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19	perf machine: Use path__join() to compose a path instead of snprintf(dir, ↵	Arnaldo Carvalho de Melo
	'/', filename) Its more intention revealing, and if we're interested in the odd cases where this may end up truncating we can do debug checks at one centralized place. Motivation, of all the container builds, fedora rawhide started complaining of: util/machine.c: In function ‘machine__create_modules’: util/machine.c:1419:50: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=] 1419 \| snprintf(path, sizeof(path), "%s/%s", dir_name, dent->d_name); \| ^~ In file included from /usr/include/stdio.h:894, from util/branch.h:9, from util/callchain.h:8, from util/machine.c:7: In function ‘snprintf’, inlined from ‘maps__set_modules_path_dir’ at util/machine.c:1419:3, inlined from ‘machine__set_modules_path’ at util/machine.c:1473:9, inlined from ‘machine__create_modules’ at util/machine.c:1519:7: /usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096 There are other places where we should use path__join(), but lets get rid of this one first. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: Link: https://lore.kernel.org/r/YebZKjwgfdOz0lAs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18	libbpf: Define BTF_KIND_* constants in btf.h to avoid compilation errors	Toke Høiland-Jørgensen
	The btf.h header included with libbpf contains inline helper functions to check for various BTF kinds. These helpers directly reference the BTF_KIND_* constants defined in the kernel header, and because the header file is included in user applications, this happens in the user application compile units. This presents a problem if a user application is compiled on a system with older kernel headers because the constants are not available. To avoid this, add #defines of the constants directly in btf.h before using them. Since the kernel header moved to an enum for BTF_KIND_*, the #defines can shadow the enum values without any errors, so we only need #ifndef guards for the constants that predates the conversion to enum. We group these so there's only one guard for groups of values that were added together. [0] Closes: https://github.com/libbpf/libbpf/issues/436 Fixes: 223f903e9c83 ("bpf: Rename BTF_KIND_TAG to BTF_KIND_DECL_TAG") Fixes: 5b84bd10363e ("libbpf: Add support for BTF_KIND_TAG") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/bpf/20220118141327.34231-1-toke@redhat.com
2022-01-18	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Jakub Kicinski
	Daniel Borkmann says: ==================== pull-request: bpf 2022-01-19 We've added 12 non-merge commits during the last 8 day(s) which contain a total of 12 files changed, 262 insertions(+), 64 deletions(-). The main changes are: 1) Various verifier fixes mainly around register offset handling when passed to helper functions, from Daniel Borkmann. 2) Fix XDP BPF link handling to assert program type, from Toke Høiland-Jørgensen. 3) Fix regression in mount parameter handling for BPF fs, from Yafang Shao. 4) Fix incorrect integer literal when marking scratched stack slots in verifier, from Christy Lee. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Add ringbuf memory type confusion test bpf, selftests: Add various ringbuf tests with invalid offset bpf: Fix ringbuf memory type confusion when passing to helpers bpf: Fix out of bounds access for ringbuf helpers bpf: Generally fix helper register offset check bpf: Mark PTR_TO_FUNC register initially with zero offset bpf: Generalize check_ctx_reg for reuse with other types bpf: Fix incorrect integer literal used for marking scratched stack. bpf/selftests: Add check for updating XDP bpf_link with wrong program type bpf/selftests: convert xdp_link test to ASSERT_* macros xdp: check prog type before updating BPF link bpf: Fix mount source show for bpffs ==================== Link: https://lore.kernel.org/r/20220119011825.9082-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-19	bpf, selftests: Add ringbuf memory type confusion test	Daniel Borkmann
	Add two tests, one which asserts that ring buffer memory can be passed to other helpers for populating its entry area, and another one where verifier rejects different type of memory passed to bpf_ringbuf_submit(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2022-01-19	bpf, selftests: Add various ringbuf tests with invalid offset	Daniel Borkmann
	Assert that the verifier is rejecting invalid offsets on the ringbuf entries: # ./test_verifier \| grep ring #947/u ringbuf: invalid reservation offset 1 OK #947/p ringbuf: invalid reservation offset 1 OK #948/u ringbuf: invalid reservation offset 2 OK #948/p ringbuf: invalid reservation offset 2 OK Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	selftest/bpf: Fix a stale comment.	Kuniyuki Iwashima
	The commit b8a58aa6fccc ("af_unix: Cut unix_validate_addr() out of unix_mkname().") moved the bound test part into unix_validate_addr(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/r/20220113002849.4384-6-kuniyu@amazon.co.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	selftest/bpf: Test batching and bpf_(get\|set)sockopt in bpf unix iter.	Kuniyuki Iwashima
	This patch adds a test for the batching and bpf_(get\|set)sockopt in bpf unix iter. It does the following. 1. Creates an abstract UNIX domain socket 2. Call bpf_setsockopt() 3. Call bpf_getsockopt() and save the value 4. Call setsockopt() 5. Call getsockopt() and save the value 6. Compare the saved values Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/r/20220113002849.4384-5-kuniyu@amazon.co.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	selftests/bpf: Add test for race in btf_try_get_module	Kumar Kartikeya Dwivedi
	This adds a complete test case to ensure we never take references to modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also ensures we never access btf->kfunc_set_tab in an inconsistent state. The test uses userfaultfd to artificially widen the race. When run on an unpatched kernel, it leads to the following splat: [root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym [ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b [ 55.499206] #PF: supervisor read access in kernel mode [ 55.499855] #PF: error_code(0x0000) - not-present page [ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0 [ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151 [ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014 [ 55.504777] Workqueue: events bpf_prog_free_deferred [ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0 [ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282 [ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba [ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458 [ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b [ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598 [ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400 [ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000 [ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0 [ 55.518672] PKRU: 55555554 [ 55.519022] Call Trace: [ 55.519483] <TASK> [ 55.519884] module_put.part.0+0x2a/0x180 [ 55.520642] bpf_prog_free_deferred+0x129/0x2e0 [ 55.521478] process_one_work+0x4fa/0x9e0 [ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100 [ 55.522878] ? rwlock_bug.part.0+0x60/0x60 [ 55.523551] worker_thread+0x2eb/0x700 [ 55.524176] ? __kthread_parkme+0xd8/0xf0 [ 55.524853] ? process_one_work+0x9e0/0x9e0 [ 55.525544] kthread+0x23a/0x270 [ 55.526088] ? set_kthread_struct+0x80/0x80 [ 55.526798] ret_from_fork+0x1f/0x30 [ 55.527413] </TASK> [ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod] [ 55.530846] CR2: fffffbfff802548b [ 55.531341] ---[ end trace 1af41803c054ad6d ]--- [ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0 [ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282 [ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba [ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458 [ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b [ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598 [ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400 [ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000 [ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0 [ 55.545317] PKRU: 55555554 [ 55.545671] note: kworker/0:2[83] exited with preempt_count 1 Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	selftests/bpf: Extend kfunc selftests	Kumar Kartikeya Dwivedi
	Use the prog_test kfuncs to test the referenced PTR_TO_BTF_ID kfunc support, and PTR_TO_CTX, PTR_TO_MEM argument passing support. Also testing the various failure cases for invalid kfunc prototypes. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	selftests/bpf: Add test_verifier support to fixup kfunc call insns	Kumar Kartikeya Dwivedi
	This allows us to add tests (esp. negative tests) where we only want to ensure the program doesn't pass through the verifier, and also verify the error. The next commit will add the tests making use of this. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	selftests/bpf: Add test for unstable CT lookup API	Kumar Kartikeya Dwivedi
	This tests that we return errors as documented, and also that the kfunc calls work from both XDP and TC hooks. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	bpf: Remove check_kfunc_call callback and old kfunc BTF ID API	Kumar Kartikeya Dwivedi
	Completely remove the old code for check_kfunc_call to help it work with modules, and also the callback itself. The previous commit adds infrastructure to register all sets and put them in vmlinux or module BTF, and concatenates all related sets organized by the hook and the type. Once populated, these sets remain immutable for the lifetime of the struct btf. Also, since we don't need the 'owner' module anywhere when doing check_kfunc_call, drop the 'btf_modp' module parameter from find_kfunc_desc_btf. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-18	perf evlist: No need to setup affinities when disabling events for pid targets	Arnaldo Carvalho de Melo
	When the target is a pid, not started by 'perf stat' we need to disable the events, and in that case there is no need to setup affinities as we use a dummy CPU map, with just one entry set to -1. So stop doing it to avoid this needless call to sched_getaffinity(): # strace -ke sched_getaffinity perf stat -e cycles -p 241957 sleep 1 <SNIP> sched_getaffinity(0, 512, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]) = 8 > /usr/lib64/libc-2.33.so(sched_getaffinity@@GLIBC_2.3.4+0x1a) [0xe6eea] > /var/home/acme/bin/perf(affinity__setup+0x6a) [0x532a2a] > /var/home/acme/bin/perf(__evlist__disable.constprop.0+0x27) [0x4b9827] > /var/home/acme/bin/perf(cmd_stat+0x29b5) [0x431725] > /var/home/acme/bin/perf(run_builtin+0x6a) [0x4a2cfa] > /var/home/acme/bin/perf(main+0x612) [0x40f8c2] > /usr/lib64/libc-2.33.so(__libc_start_main+0xd4) [0x27b74] > /var/home/acme/bin/perf(_start+0x2d) [0x40fadd] <SNIP> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18	perf evlist: No need to setup affinities when enabling events for pid targets	Arnaldo Carvalho de Melo
	When the target is a pid, not started by 'perf stat' we need to enable the events, and in that case there is no need to setup affinities as we use a dummy CPU map, with just one entry set to -1. So stop doing it to avoid this needless call to sched_getaffinity(): # strace -ke sched_getaffinity perf stat -e cycles -p 241957 sleep 1 <SNIP> sched_getaffinity(0, 512, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]) = 8 > /usr/lib64/libc-2.33.so(sched_getaffinity@@GLIBC_2.3.4+0x1a) [0xe6eea] > /var/home/acme/bin/perf(affinity__setup+0x6a) [0x5329ca] > /var/home/acme/bin/perf(__evlist__enable.constprop.0+0x23) [0x4b9693] > /var/home/acme/bin/perf(enable_counters+0x14d) [0x42de5d] > /var/home/acme/bin/perf(cmd_stat+0x2358) [0x4310c8] > /var/home/acme/bin/perf(run_builtin+0x6a) [0x4a2cfa] > /var/home/acme/bin/perf(main+0x612) [0x40f8c2] > /usr/lib64/libc-2.33.so(__libc_start_main+0xd4) [0x27b74] > /var/home/acme/bin/perf(_start+0x2d) [0x40fadd] <SNIP> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18	perf stat: No need to setup affinities when starting a workload	Arnaldo Carvalho de Melo
	I.e. the simple: $ perf stat sleep 1 Uses a dummy CPU map and thus there is no need to setup/cleanup affinities to avoid IPIs, etc. With this we're down to a sched_getaffinity() call, in the libnuma initialization, that probably can be removed in a followup patch. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18	perf affinity: Allow passing a NULL arg to affinity__cleanup()	Arnaldo Carvalho de Melo
	Just like with free(), NULL is checked to avoid having all callers do it. Its convenient for when not using affinity setup/cleanup for dummy CPU maps, i.e. CPU maps for pid targets. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18	perf probe: Fix ppc64 'perf probe add events failed' case	Zechuan Chen
	Because of commit bf794bf52a80c627 ("powerpc/kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2"), in ppc64 ABIv1, our perf command eliminates the need to use the prefix "." at the symbol name. But when the command "perf probe -a schedule" is executed on ppc64 ABIv1, it obtains two symbol address information through /proc/kallsyms, for example: cat /proc/kallsyms \| grep -w schedule c000000000657020 T .schedule c000000000d4fdb8 D schedule The symbol "D schedule" is not a function symbol, and perf will print: "p:probe/schedule _text+13958584"Failed to write event: Invalid argument Therefore, when searching symbols from map and adding probe point for them, a symbol type check is added. If the type of symbol is not a function, skip it. Fixes: bf794bf52a80c627 ("powerpc/kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2") Signed-off-by: Zechuan Chen <chenzechuan1@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jianlin Lv <Jianlin.Lv@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20211228111338.218602-1-chenzechuan1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18	Merge tag 'acpi-5.17-rc1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "The most significant item here is the Platform Firmware Runtime Update and Telemetry (PFRUT) support designed to allow certain pieces of the platform firmware to be updated on the fly, among other things. Also important is the e820 handling change on x86 that should work around PCI BAR allocation issues on some systems shipping since 2019. The rest is just a handful of assorted fixes and cleanups on top of the ACPI material merged previously. Specifics: - Add support for the the Platform Firmware Runtime Update and Telemetry (PFRUT) interface based on ACPI to allow certain pieces of the platform firmware to be updated without restarting the system and to provide a mechanism for collecting platform firmware telemetry data (Chen Yu, Dan Carpenter, Yang Yingliang). - Ignore E820 reservations covering PCI host bridge windows on sufficiently recent x86 systems to avoid issues with allocating PCI BARs on systems where the E820 reservations cover the entire PCI host bridge memory window returned by the _CRS object in the system's ACPI tables (Hans de Goede). - Fix and clean up acpi_scan_init() (Rafael Wysocki). - Add more sanity checking to ACPI SPCR tables parsing (Mark Langsdorf). - Fix up ACPI APD (AMD Soc) driver initialization (Jiasheng Jiang). - Drop unnecessary "static" from the ACPI PCC address space handling driver added recently (kernel test robot)" * tag 'acpi-5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PCC: pcc_ctx can be static ACPI: scan: Rename label in acpi_scan_init() ACPI: scan: Simplify initialization of power and sleep buttons ACPI: scan: Change acpi_scan_init() return value type to void ACPI: SPCR: check if table->serial_port.access_width is too wide ACPI: APD: Check for NULL pointer after calling devm_ioremap() x86/PCI: Ignore E820 reservations for bridge windows on newer systems ACPI: pfr_telemetry: Fix info leak in pfrt_log_ioctl() ACPI: pfr_update: Fix return value check in pfru_write() ACPI: tools: Introduce utility for firmware updates/telemetry ACPI: Introduce Platform Firmware Runtime Telemetry driver ACPI: Introduce Platform Firmware Runtime Update device driver efi: Introduce EFI_FIRMWARE_MANAGEMENT_CAPSULE_HEADER and corresponding structures
2022-01-18	Merge tag 'perf-tools-for-v5.17-2022-01-16' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "New features: - Add 'trace' subcommand for 'perf ftrace', setting the stage for more 'perf ftrace' subcommands. Not using a subcommand yields the previous behaviour of 'perf ftrace'. - Add 'latency' subcommand to 'perf ftrace', that can use the function graph tracer or a BPF optimized one, via the -b/--use-bpf option. E.g.: $ sudo perf ftrace latency -a -T mutex_lock sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 4596 \| ######################## \| 1 - 2 us \| 1680 \| ######### \| 2 - 4 us \| 1106 \| ##### \| 4 - 8 us \| 546 \| ## \| 8 - 16 us \| 562 \| ### \| 16 - 32 us \| 1 \| \| 32 - 64 us \| 0 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 0 \| \| 256 - 512 us \| 0 \| \| 512 - 1024 us \| 0 \| \| 1 - 2 ms \| 0 \| \| 2 - 4 ms \| 0 \| \| 4 - 8 ms \| 0 \| \| 8 - 16 ms \| 0 \| \| 16 - 32 ms \| 0 \| \| 32 - 64 ms \| 0 \| \| 64 - 128 ms \| 0 \| \| 128 - 256 ms \| 0 \| \| 256 - 512 ms \| 0 \| \| 512 - 1024 ms \| 0 \| \| 1 - ... s \| 0 \| \| The original implementation of this command was in the bcc tool. - Support --cputype option for hybrid events in 'perf stat'. Improvements: - Call chain improvements for ARM64. - No need to do any affinity setup when profiling pids. - Reduce multiplexing with duration_time in 'perf stat' metrics. - Improve error message for uncore events, stating that some event groups are can only be used in system wide (-a) mode. - perf stat metric group leader fixes/improvements, including arch specific changes to better support Intel topdown events. - Probe non-deprecated sysfs path first, i.e. try the path /sys/devices/system/cpu/cpuN/topology/thread_siblings first, then the old /sys/devices/system/cpu/cpuN/topology/core_cpus. - Disable debuginfod by default in 'perf record', to avoid stalls on distros such as Fedora 35. - Use unbuffered output in 'perf bench' when pipe/tee'ing to a file. - Enable ignore_missing_thread in 'perf trace' Fixes: - Avoid TUI crash when navigating in the annotation of recursive functions. - Fix hex dump character output in 'perf script'. - Fix JSON indentation to 4 spaces standard in the ARM vendor event files. - Fix use after free in metric__new(). - Fix IS_ERR_OR_NULL() usage in the perf BPF loader. - Fix up cross-arch register support, i.e. when printing register names take into account the architecture where the perf.data file was collected. - Fix SMT fallback with large core counts. - Don't lower case MetricExpr when parsing JSON files so as not to lose info such as the ":G" event modifier in metrics. perf test: - Add basic stress test for sigtrap handling to 'perf test'. - Fix 'perf test' failures on s/390 - Enable system wide for metricgroups test in 'perf test´. - Use 3 digits for test numbering now we can have more tests. Arch specific: - Add events for Arm Neoverse N2 in the ARM JSON vendor event files - Support PERF_MEM_LVLNUM encodings in powerpc, that came from a single patch series, where I incorrectly merged the kernel bits, that were then reverted after coordination with Michael Ellerman and Stephen Rothwell. - Add ARM SPE total latency as PERF_SAMPLE_WEIGHT. - Update AMD documentation, with info on raw event encoding. - Add support for global and local variants of the "p_stage_cyc" sort key, applicable to perf.data files collected on powerpc. - Remove duplicate and incorrect aux size checks in the ARM CoreSight ETM code. Refactorings: - Add a perf_cpu abstraction to disambiguate CPUs and CPU map indexes, fixing problems along the way. - Document CPU map methods. UAPI sync: - Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' - Sync UAPI files with the kernel sources: drm, msr-index, cpufeatures. Build system - Enable warnings through HOSTCFLAGS. - Drop requirement for libstdc++.so for libopencsd check libperf: - Make libperf adopt perf_counts_values__scale() from tools/perf/util/. - Add a stat multiplexing test to libperf" * tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits) perf record: Disable debuginfod by default perf evlist: No need to do any affinity setup when profiling pids perf cpumap: Add is_dummy() method perf metric: Fix metric_leader perf cputopo: Fix CPU topology reading on s/390 perf metricgroup: Fix use after free in metric__new() libperf tests: Update a use of the new cpumap API perf arm: Fix off-by-one directory path tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers cpufeatures: Sync with the kernel sources tools headers UAPI: Update tools's copy of drm.h header tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' perf pmu-events: Don't lower case MetricExpr perf expr: Add debug logging for literals perf tools: Probe non-deprecated sysfs path 1st perf tools: Fix SMT fallback with large core counts perf cpumap: Give CPUs their own type perf stat: Correct first_shadow_cpu to return index perf script: Fix flipped index and cpu perf c2c: Use more intention revealing iterator ...
2022-01-17	KVM: selftests: Test KVM_SET_CPUID2 after KVM_RUN	Vitaly Kuznetsov
	KVM forbids KVM_SET_CPUID2 after KVM_RUN was performed on a vCPU unless the supplied CPUID data is equal to what was previously set. Test this. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220117150542.2176196-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-17	KVM: selftests: Rename 'get_cpuid_test' to 'cpuid_test'	Vitaly Kuznetsov
	In preparation to reusing the existing 'get_cpuid_test' for testing "KVM_SET_CPUID{,2} after KVM_RUN" rename it to 'cpuid_test' to avoid the confusion. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220117150542.2176196-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-17	Merge branch 'acpi-pfrut'	Rafael J. Wysocki
	Merge support for the Platform Firmware Runtime Update and Telemetry interface based on ACPI. The interface provided here allows updating certain pieces of the platform firmware without restarting the system and collecting platform firmware telemetry data. This also includes a utility for accesing the new interface from user space. * acpi-pfrut: ACPI: pfr_telemetry: Fix info leak in pfrt_log_ioctl() ACPI: pfr_update: Fix return value check in pfru_write() ACPI: tools: Introduce utility for firmware updates/telemetry ACPI: Introduce Platform Firmware Runtime Telemetry driver ACPI: Introduce Platform Firmware Runtime Update device driver efi: Introduce EFI_FIRMWARE_MANAGEMENT_CAPSULE_HEADER and corresponding structures