linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-04-03	perf annotate: Add and use ins__is_nop()	Namhyung Kim
	Likewise, add ins__is_nop() to check if the current instruction is NOP. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03	perf annotate: Use ins__is_xxx() if possible	Namhyung Kim
	This is to prepare separation of disasm related code. Use the public ins API instead of checking the internal data structure. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03	perf evsel: Use evsel__name_is() helper	Yang Jihong
	Code cleanup, replace strcmp(evsel__name(evsel, {NAME})) with evsel__name_is() helper. No functional change. Committer notes: Fix this build error: trace.syscalls.events.bpf_output = evlist__last(trace.evlist); - assert(evsel__name_is(trace.syscalls.events.bpf_output), "__augmented_syscalls__"); + assert(evsel__name_is(trace.syscalls.events.bpf_output, "__augmented_syscalls__")); Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240401062724.1006010-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03	perf sched timehist: Fix -g/--call-graph option failure	Yang Jihong
	When 'perf sched' enables the call-graph recording, sample_type of dummy event does not have PERF_SAMPLE_CALLCHAIN, timehist_check_attr() checks that the evsel does not have a callchain, and set show_callchain to 0. Currently 'perf sched timehist' only saves callchain when processing the 'sched:sched_switch event', timehist_check_attr() only needs to determine whether the event has PERF_SAMPLE_CALLCHAIN. Before: # perf sched record -g true [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data (7536 samples) ] # perf sched timehist Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 147851.826019 [0000] perf[285035] 0.000 0.000 0.000 147851.826029 [0000] migration/0[15] 0.000 0.003 0.009 147851.826063 [0001] perf[285035] 0.000 0.000 0.000 147851.826069 [0001] migration/1[21] 0.000 0.003 0.006 <SNIP> After: # perf sched record -g true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.572 MB perf.data (822 samples) ] # perf sched timehist time cpu task name waittime sch delay runtime [tid/pid] (msec) (msec) (msec) ----------- --- --------------- -------- -------- ----- 4193.035164 [0] perf[277062] 0.000 0.000 0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion 4193.035174 [0] migration/0[15] 0.000 0.003 0.009 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork 4193.035207 [1] perf[277062] 0.000 0.000 0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion 4193.035214 [1] migration/1[21] 0.000 0.003 0.007 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork <SNIP> Fixes: 9c95e4ef06572349 ("perf evlist: Add evlist__findnew_tracking_event() helper") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240401062724.1006010-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03	perf annotate: Honor output options with --data-type	Namhyung Kim
	For data type profiling output, it should be in sync with normal output so make it display percentage for each field. Also use coloring scheme for users to identify fields with big overhead easily. Users can use --show-total-period or --show-nr-samples to change the output style like in the normal perf annotate output. Before: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ samples offset size field 34 0 9792 struct task_struct { 2 0 24 struct thread_info thread_info { 0 0 8 long unsigned int flags; 1 8 8 long unsigned int syscall_work; 0 16 4 u32 status; 1 20 4 u32 cpu; }; After: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ Percent offset size field 100.00 0 9792 struct task_struct { 3.55 0 24 struct thread_info thread_info { 0.00 0 8 long unsigned int flags; 1.63 8 8 long unsigned int syscall_work; 0.00 16 4 u32 status; 1.91 20 4 u32 cpu; }; Committer testing: First collect a suitable perf.data file for use with 'perf annotate --data-type': root@number:~# perf mem record -a sleep 1s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ] root@number:~# Then, before: root@number:~# perf annotate --data-type Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ samples offset size field 6 0 40 union { 6 0 40 struct __pthread_mutex_s __data { 2 0 4 int __lock; 0 4 4 unsigned int __count; 0 8 4 int __owner; 1 12 4 unsigned int __nusers; 2 16 4 int __kind; 1 20 2 short int __spins; 0 22 2 short int __elision; 0 24 16 __pthread_list_t __list { 0 24 8 struct __pthread_internal_list* __prev; 0 32 8 struct __pthread_internal_list* __next; }; }; 0 0 0 char* __size; 2 0 8 long int __align; }; <SNIP> And after: Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ Percent offset size field 100.00 0 40 union { 100.00 0 40 struct __pthread_mutex_s __data { 31.27 0 4 int __lock; 0.00 4 4 unsigned int __count; 0.00 8 4 int __owner; 7.67 12 4 unsigned int __nusers; 53.10 16 4 int __kind; 7.96 20 2 short int __spins; 0.00 22 2 short int __elision; 0.00 24 16 __pthread_list_t __list { 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0 0 char* __size; 31.27 0 8 long int __align; }; <SNIP> The lines with percentages >= 7.67 have its percentages red colored. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03	perf annotate: Get rid of duplicate --group option item	Namhyung Kim
	The options array in cmd_annotate() has duplicate --group options. It only needs one and let's get rid of the other. $ perf annotate -h 2>&1 \| grep group --group Show event group information together --group Show event group information together Fixes: 7ebaf4890f63eb90 ("perf annotate: Support '--group' option") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03	selftests/xsk: Add new test case for AF_XDP under max ring sizes	Tushar Vyavahare
	Introduce a test case to evaluate AF_XDP's robustness by pushing hardware and software ring sizes to their limits. This test ensures AF_XDP's reliability amidst potential producer/consumer throttling due to maximum ring utilization. The testing strategy includes: 1. Configuring rings to their maximum allowable sizes. 2. Executing a series of tests across diverse batch sizes to assess system's behavior under different configurations. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-8-tushar.vyavahare@intel.com
2024-04-03	selftests/xsk: Test AF_XDP functionality under minimal ring configurations	Tushar Vyavahare
	Add a new test case that stresses AF_XDP and the driver by configuring small hardware and software ring sizes. This verifies that AF_XDP continues to function properly even with insufficient ring space that could lead to frequent producer/consumer throttling. The test procedure involves: 1. Set the minimum possible ring configuration(tx 64 and rx 128). 2. Run tests with various batch sizes(1 and 63) to validate the system's behavior under different configurations. Update Makefile to include network_helpers.o in the build process for xskxceiver. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-7-tushar.vyavahare@intel.com
2024-04-03	selftests/xsk: Introduce set_ring_size function with a retry mechanism for ↵	Tushar Vyavahare
	handling AF_XDP socket closures Introduce a new function, set_ring_size(), to manage asynchronous AF_XDP socket closure. Retry set_hw_ring_size up to SOCK_RECONF_CTR times if it fails due to an active AF_XDP socket. Return an error immediately for non-EBUSY errors. This enhances robustness against asynchronous AF_XDP socket closures during ring size changes. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-6-tushar.vyavahare@intel.com
2024-04-03	selftests/bpf: Implement set_hw_ring_size function to configure interface ↵	Tushar Vyavahare
	ring size Introduce a new function called set_hw_ring_size that allows for the dynamic configuration of the ring size within the interface. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-5-tushar.vyavahare@intel.com
2024-04-03	selftests/bpf: Implement get_hw_ring_size function to retrieve current and ↵	Tushar Vyavahare
	max interface size Introduce a new function called get_hw_size that retrieves both the current and maximum size of the interface and stores this information in the 'ethtool_ringparam' structure. Remove ethtool_channels struct from xdp_hw_metadata.c due to redefinition error. Remove unused linux/if.h include from flow_dissector BPF test to address CI pipeline failure. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-4-tushar.vyavahare@intel.com
2024-04-03	selftests/xsk: Make batch size variable	Tushar Vyavahare
	Convert the constant BATCH_SIZE into a variable named batch_size to allow dynamic modification at runtime. This is required for the forthcoming changes to support testing different hardware ring sizes. While running these tests, a bug was identified when the batch size is roughly the same as the NIC ring size. This has now been addressed by Maciej's fix in commit 913eda2b08cc ("i40e: xsk: remove count_mask"). Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-3-tushar.vyavahare@intel.com
2024-04-03	tools: Add ethtool.h header to tooling infra	Tushar Vyavahare
	This commit duplicates the ethtool.h file from the include/uapi/linux directory in the kernel source to the tools/include/uapi/linux directory. This action ensures that the ethtool.h file used in the tools directory is in sync with the kernel's version, maintaining consistency across the codebase. There are some checkpatch warnings in this file that could be cleaned up, but I preferred to move it over as-is for now to avoid disrupting the code. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240402114529.545475-2-tushar.vyavahare@intel.com
2024-04-02	bpf: Add arm64 JIT support for bpf_addr_space_cast instruction.	Puranjay Mohan
	LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space. rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY. rY = addr_space_cast(rX, 1, 0) : used to convert a bpf arena pointer to a pointer in the userspace vma. This has to be converted by the JIT. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240325150716.4387-3-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-02	tools: ynl: add ynl_dump_empty() helper	Jakub Kicinski
	Checking if dump is empty requires a couple of casts. Add a convenient wrapper. Add an example use in the netdev sample, loopback is always present so an empty dump is an error. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240329181651.319326-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02	selftests/bpf: Add pid limit for mptcpify prog	Geliang Tang
	In order to prevent mptcpify prog from affecting the running results of other BPF tests, a pid limit was added to restrict it from only modifying its own program. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/8987e2938e15e8ec390b85b5dcbee704751359dc.1712054986.git.tanggeliang@kylinos.cn
2024-04-02	tools/power turbostat: Add proper re-initialization for perf file descriptors	Patryk Wlazlyn
	Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: Clear added counters when in no-msr mode	Patryk Wlazlyn
	If user request --no-msr or is not able to access the MSRs, turbostat should clear all the counters added with --add. Because MSR access permission checks are done after the cmdline is parsed, the decision has to be defered up until the transition into no-msr mode happen. Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: add early exits for permission checks	Patryk Wlazlyn
	Checking early if the permissions are even needed gets rid of the warnings about some of them missing. Earlier we issued a warning in case of missing MSR and/or perf permissions, even when user never asked for counters that require those. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: detect and disable unavailable BICs at runtime	Patryk Wlazlyn
	To allow unprivileged user to run turbostat seamlessly. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: Add reading aperf and mperf via perf API	Patryk Wlazlyn
	By using the perf API we spend less time in between the reads of the counters, resulting in more accurate calculations of the dependent metrics. Using perf API is also usually faster overall, although cache miss, if we get one, is more costly when using perf vs MSR driver. We would fallback to the msr reads if the sysfs isn't there or when in --no-perf mode. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: Add --no-perf option	Patryk Wlazlyn
	Add the --no-perf option to allow users to run turbostat without accessing perf. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: Add --no-msr option	Patryk Wlazlyn
	Add --no-msr option to allow users to run turbostat without accessing MSRs via the MSR driver. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: enhance -D (debug counter dump) output	Len Brown
	Eliminate redundant debug output for core and package scope counters. Include name and path for all "ADDED" counters. Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read	Len Brown
	Previously a failed read of /dev/cpu_dma_latency erroneously complained turbostat: capget(CAP_SYS_ADMIN) failed, try "# setcap cap_sys_admin=ep ./turbostat This went unnoticed because this file is typically visible to root, and turbostat was typically run as root. Going forward, when a non-root user can run turbostat... Complain about failed read access to this file only if --debug is used. Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	tools/power turbostat: Read base_hz and bclk from CPUID.16H if available	Patryk Wlazlyn
	If MSRs cannot be read, values can be obtained from cpuid. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2024-04-02	Merge tag 'kvm-riscv-fixes-6.9-1' of https://github.com/kvm-riscv/linux into ↵	Paolo Bonzini
	HEAD KVM/riscv fixes for 6.9, take #1 - Fix spelling mistake in arch_timer selftest - Remove redundant semicolon in num_isa_ext_regs() - Fix APLIC setipnum_le/be write emulation - Fix APLIC in_clrip[x] read emulation
2024-04-02	libbpf: Use local bpf_helpers.h include	Tobias Böhm
	Commit 20d59ee55172fdf6 ("libbpf: add bpf_core_cast() macro") added a bpf_helpers include in bpf_core_read.h as a system include. Usually, the includes are local, though, like in bpf_tracing.h. This commit adjusts the include to be local as well. Signed-off-by: Tobias Böhm <tobias@aibor.de> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/q5d5bgc6vty2fmaazd5e73efd6f5bhiru2le6fxn43vkw45bls@fhlw2s5ootdb
2024-04-02	Merge tag 'kvmarm-fixes-6.9-1' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.9, part #1 - Ensure perf events programmed to count during guest execution are actually enabled before entering the guest in the nVHE configuration. - Restore out-of-range handler for stage-2 translation faults. - Several fixes to stage-2 TLB invalidations to avoid stale translations, possibly including partial walk caches. - Fix early handling of architectural VHE-only systems to ensure E2H is appropriately set. - Correct a format specifier warning in the arch_timer selftest. - Make the KVM banner message correctly handle all of the possible configurations.
2024-04-02	selftests/bpf: Skip test when perf_event_open returns EOPNOTSUPP	Pu Lehui
	When testing send_signal and stacktrace_build_id_nmi using the riscv sbi pmu driver without the sscofpmf extension or the riscv legacy pmu driver, then failures as follows are encountered: test_send_signal_common:FAIL:perf_event_open unexpected perf_event_open: actual -1 < expected 0 #272/3 send_signal/send_signal_nmi:FAIL test_stacktrace_build_id_nmi:FAIL:perf_event_open err -1 errno 95 #304 stacktrace_build_id_nmi:FAIL The reason is that the above pmu driver or hardware does not support sampling events, that is, PERF_PMU_CAP_NO_INTERRUPT is set to pmu capabilities, and then perf_event_open returns EOPNOTSUPP. Since PERF_PMU_CAP_NO_INTERRUPT is not only set in the riscv-related pmu driver, it is better to skip testing when this capability is set. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240402073029.1299085-1-pulehui@huaweicloud.com
2024-04-02	bpftool: Use __typeof__() instead of typeof() in BPF skeleton	Andrii Nakryiko
	When generated BPF skeleton header is included in C++ code base, some compiler setups will emit warning about using language extensions due to typeof() usage, resulting in something like: error: extension used [-Werror,-Wlanguage-extension-token] obj->struct_ops.empty_tcp_ca = (typeof(obj->struct_ops.empty_tcp_ca)) ^ It looks like __typeof__() is a preferred way to do typeof() with better C++ compatibility behavior, so switch to that. With __typeof__() we get no such warning. Fixes: c2a0257c1edf ("bpftool: Cast pointers for shadow types explicitly.") Fixes: 00389c58ffe9 ("bpftool: Add support for subskeletons") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Quentin Monnet <qmo@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240401170713.2081368-1-andrii@kernel.org
2024-04-02	selftests/bpf: Using llvm may_goto inline asm for cond_break macro	Yonghong Song
	Currently, cond_break macro uses bytes to encode the may_goto insn. Patch [1] in llvm implemented may_goto insn in BPF backend. Replace byte-level encoding with llvm inline asm for better usability. Using llvm may_goto insn is controlled by macro __BPF_FEATURE_MAY_GOTO. [1] https://github.com/llvm/llvm-project/commit/0e0bfacff71859d1f9212205f8f873d47029d3fb Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240402025446.3215182-1-yonghong.song@linux.dev
2024-04-02	bpf: Fix typo in uapi doc comments	David Lechner
	In a few places in the bpf uapi headers, EOPNOTSUPP is missing a "P" in the doc comments. This adds the missing "P". Signed-off-by: David Lechner <dlechner@baylibre.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240329152900.398260-2-dlechner@baylibre.com
2024-04-02	bpftool: Clean-up typos, punctuation, list formatting in docs	Rameez Rehman
	Improve the formatting of the attach flags for cgroup programs in the relevant man page, and fix typos ("can be on of", "an userspace inet socket") when introducing that list. Also fix a couple of other trivial issues in docs. [ Quentin: Fixed trival issues in bpftool-gen.rst and bpftool-iter.rst ] Signed-off-by: Rameez Rehman <rameezrehman408@hotmail.com> Signed-off-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240331200346.29118-4-qmo@kernel.org
2024-04-02	bpftool: Remove useless emphasis on command description in man pages	Rameez Rehman
	As it turns out, the terms in definition lists in the rST file are already rendered with bold-ish formatting when generating the man pages; all double-star sequences we have in the commands for the command description are unnecessary, and can be removed to make the documentation easier to read. The rST files were automatically processed with: sed -i '/DESCRIPTION/,/OPTIONS/ { /^\/ s/\\//g }' b.rst Signed-off-by: Rameez Rehman <rameezrehman408@hotmail.com> Signed-off-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240331200346.29118-3-qmo@kernel.org
2024-04-02	bpftool: Use simpler indentation in source rST for documentation	Rameez Rehman
	The rST manual pages for bpftool would use a mix of tabs and spaces for indentation. While this is the norm in C code, this is rather unusual for rST documents, and over time we've seen many contributors use a wrong level of indentation for documentation update. Let's fix bpftool's indentation in docs once and for all: - Let's use spaces, that are more common in rST files. - Remove one level of indentation for the synopsis, the command description, and the "see also" section. As a result, all sections start with the same indentation level in the generated man page. - Rewrap the paragraphs after the changes. There is no content change in this patch, only indentation and rewrapping changes. The wrapping in the generated source files for the manual pages is changed, but the pages displayed with "man" remain the same, apart from the adjusted indentation level on relevant sections. [ Quentin: rebased on bpf-next, removed indent level for command description and options, updated synopsis, command summary, and "see also" sections. ] Signed-off-by: Rameez Rehman <rameezrehman408@hotmail.com> Signed-off-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240331200346.29118-2-qmo@kernel.org
2024-04-01	doc: netlink: Add hyperlinks to generated Netlink docs	Donald Hunter
	Update ynl-gen-rst to generate hyperlinks to definitions, attribute sets and sub-messages from all the places that reference them. Note that there is a single label namespace for all of the kernel docs. Hyperlinks within a single netlink doc need to be qualified by the family name to avoid collisions. The label format is 'family-type-name' which gives, for example, 'rt-link-attribute-set-link-attrs' as the link id. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240329135021.52534-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	doc: netlink: Change generated docs to limit TOC to depth 3	Donald Hunter
	The tables of contents in the generated Netlink docs include individual attribute definitions. This can make the contents exceedingly long and repeats a lot of what is on the rest of the pages. See for example: https://docs.kernel.org/networking/netlink_spec/tc.html Add a depth limit to the contents directive in generated .rst files to limit the contents depth to 3 levels. This reduces the contents to: - Family - Summary - Operations - op-one - op-two - ... - Definitions - struct-one - struct-two - enum-one - ... - Attribute sets - attrs-one - attrs-two - ... Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240329135021.52534-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	selftests: mptcp: join: fix dev in check_endpoint	Geliang Tang
	There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly. If calling it in the 2nd test of endpoint_tests() too, it fails with an error like this: creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \ found '10.0.2.2 id 2 subflow dev ns2eth2' The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	mptcp: don't account accept() of non-MPC client as fallback to TCP	Davide Caratti
	Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this is "surprising" because the counter might become greater than MPTcpExtMPCapableSYNRX. MPTcpExtMPCapableFallbackACK counter's name suggests it should only be incremented when a connection was seen using MPTCP options, then a fallback to TCP has been done. Let's do that by incrementing it when the subflow context of an inbound MPC connection attempt is dropped. Also, update mptcp_connect.sh kselftest, to ensure that the above MIB does not increment in case a pure TCP client connects to a MPTCP server. Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	selftests: reuseaddr_conflict: add missing new line at the end of the output	Jakub Kicinski
	The netdev CI runs in a VM and captures serial, so stdout and stderr get combined. Because there's a missing new line in stderr the test ends up corrupting KTAP: # Successok 1 selftests: net: reuseaddr_conflict which should have been: # Success ok 1 selftests: net: reuseaddr_conflict Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test") Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-01	bitmap: introduce generic optimized bitmap_size()	Alexander Lobakin
	The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue. Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13): 48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC: 8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter: add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617) Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes: add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520) Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	tools: move alignment-related macros to new <linux/align.h>	Alexander Lobakin
	Currently, tools have ALIGN() macros scattered across the unrelated headers, as there are only 3 of them and they were added separately each time on an as-needed basis. Anyway, let's make it more consistent with the kernel headers and allow using those macros outside of the mentioned headers. Create <linux/align.h> inside the tools/ folder and include it where needed. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01	bitops: make BYTES_TO_BITS() treewide-available	Alexander Lobakin
	Avoid open-coding that simple expression each time by moving BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export it to the rest of the kernel. Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always equals to %BITS_PER_BYTE, regardless of the target architecture. Do the same for the tools ecosystem as well (incl. its version of bitops.h). The previous implementation had its implicit type of long, while the new one is int, so adjust the format literal accordingly in the perf code. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-30	objtool: Fix compile failure when using the x32 compiler	Mikulas Patocka
	When compiling the v6.9-rc1 kernel with the x32 compiler, the following errors are reported. The reason is that we take an "unsigned long" variable and print it using "PRIx64" format string. In file included from check.c:16: check.c: In function ‘add_dead_ends’: /usr/src/git/linux-2.6/tools/objtool/include/objtool/warn.h:46:17: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘long unsigned int’ [-Werror=format=] 46 \| "%s: warning: objtool: " format "\n", \ \| ^~~~~~~~~~~~~~~~~~~~~~~~ check.c:613:33: note: in expansion of macro ‘WARN’ 613 \| WARN("can't find unreachable insn at %s+0x%" PRIx64, \| ^~~~ ... Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-kernel@vger.kernel.org
2024-03-29	selftests/bpf: make multi-uprobe tests work in RELEASE=1 mode	Andrii Nakryiko
	When BPF selftests are built in RELEASE=1 mode with -O2 optimization level, uprobe_multi binary, called from multi-uprobe tests is optimized to the point that all the thousands of target uprobe_multi_func_XXX functions are eliminated, breaking tests. So ensure they are preserved by using weak attribute. But, actually, compiling uprobe_multi binary with -O2 takes a really long time, and is quite useless (it's not a benchmark). So in addition to ensuring that uprobe_multi_func_XXX functions are preserved, opt-out of -O2 explicitly in Makefile and stick to -O0. This saves a lot of compilation time. With -O2, just recompiling uprobe_multi: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 291.66s user 2.54s system 99% cpu 4:55.52 total With -O0: $ touch uprobe_multi.c $ time make RELEASE=1 -j90 make RELEASE=1 -j90 22.40s user 1.91s system 99% cpu 24.355 total 5 minutes vs (still slow, but...) 24 seconds. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240329190410.4191353-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-29	Merge tag 'linux_kselftest-fixes-6.9-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Fixes to seccomp and ftrace tests and a change to add config file for dmabuf-heap test to increase coverage" * tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: dmabuf-heap: add config file for the test selftests/seccomp: Try to fit runtime of benchmark into timeout selftests/ftrace: Fix event filter target_func selection
2024-03-29	Merge tag 'linux_kselftest-kunit-fixes-6.9-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "One urgent fix for --alltests build failure related to renaming of CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED to the missing config option" * tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
2024-03-29	selftest: tcp: Add bind() tests for SO_REUSEADDR/SO_REUSEPORT.	Kuniyuki Iwashima
	This patch adds two tests using SO_REUSEADDR and SO_REUSEPORT and defines errno for each test case. SO_REUSEADDR/SO_REUSEPORT is set for the per-fixture two bind() calls. The notable pattern is the pair of v6only [::] and plain [::]. The two sockets are put into the same tb2, where per-bucket v6only flag would be useless to detect bind() conflict. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29	selftest: tcp: Add bind() tests for IPV6_V6ONLY.	Kuniyuki Iwashima
	bhash2 was not well tested for IPv6-only sockets. This patch adds test cases where we set IPV6_V6ONLY for per-fixture bind() calls if variant->ipv6_only[i] is true. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240326204251.51301-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>