Age | Commit message (Collapse) | Author |
|
If cgroup_has_attached_progs queries an attach type not supported
by the running kernel, due to the kernel being older than the bpftool
build, it would encounter an -EINVAL from BPF_PROG_QUERY syscall.
Prior to commit 98b303c9bf05 ("bpftool: Query only cgroup-related
attach types"), this EINVAL would be ignored by the function, allowing
the function to only consider supported attach types. The commit
changed so that, instead of querying all attach types, only attach
types from the array `cgroup_attach_types` is queried. The assumption
is that because these are only cgroup attach types, they should all
be supported. Unfortunately this assumption may be false when the
kernel is older than the bpftool build, where the attach types queried
by bpftool is not yet implemented in the kernel. This would result in
errors such as:
$ bpftool cgroup tree
CgroupPath
ID AttachType AttachFlags Name
Error: can't query bpf programs attached to /sys/fs/cgroup: Invalid argument
This patch restores the logic of ignoring EINVAL from prior to that patch.
Fixes: 98b303c9bf05 ("bpftool: Query only cgroup-related attach types")
Reported-by: Sagarika Sharma <sharmasagarika@google.com>
Reported-by: Minh-Anh Nguyen <minhanhdn@google.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20250428211536.1651456-1-zhuyifei@google.com
|
|
This patch adds the "list_peek" test to use the new
bpf_list_{front,back} kfunc.
The test_{front,back}* tests ensure that the return value
is a non_own_ref node pointer and requires the spinlock to be held.
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # check non_own_ref marking
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch has a much simplified rbtree usage from the
kernel sch_fq qdisc. It has a "struct node_data" which can be
added to two different rbtrees which are ordered by different keys.
The test first populates both rbtrees. Then search for a lookup_key
from the "groot0" rbtree. Once the lookup_key is found, that node
refcount is taken. The node is then removed from another "groot1"
rbtree.
While searching the lookup_key, the test will also try to remove
all rbnodes in the path leading to the lookup_key.
The test_{root,left,right}_spinlock_true tests ensure that the
return value of the bpf_rbtree functions is a non_own_ref node pointer.
This is done by forcing an verifier error by calling a helper
bpf_jiffies64() while holding the spinlock. The tests then
check for the verifier message
"call bpf_rbtree...R0=rcu_ptr_or_null_node..."
The other test_{root,left,right}_spinlock_false tests ensure that
they must be called with spinlock held.
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # Check non_own_ref marking
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The bpf_rbtree_{remove,left,right} requires the root's lock to be held.
They also check the node_internal->owner is still owned by that root
before proceeding, so it is safe to allow refcounted bpf_rb_node
pointer to be used in these kfuncs.
In a bpf fq implementation which is much closer to the kernel fq,
https://lore.kernel.org/bpf/20250418224652.105998-13-martin.lau@linux.dev/,
a networking flow (allocated by bpf_obj_new) can be added to two different
rbtrees. There are cases that the flow is searched from one rbtree,
held the refcount of the flow, and then removed from another rbtree:
struct fq_flow {
struct bpf_rb_node fq_node;
struct bpf_rb_node rate_node;
struct bpf_refcount refcount;
unsigned long sk_long;
};
int bpf_fq_enqueue(...)
{
/* ... */
bpf_spin_lock(&root->lock);
while (can_loop) {
/* ... */
if (!p)
break;
gc_f = bpf_rb_entry(p, struct fq_flow, fq_node);
if (gc_f->sk_long == sk_long) {
f = bpf_refcount_acquire(gc_f);
break;
}
/* ... */
}
bpf_spin_unlock(&root->lock);
if (f) {
bpf_spin_lock(&q->lock);
bpf_rbtree_remove(&q->delayed, &f->rate_node);
bpf_spin_unlock(&q->lock);
}
}
bpf_rbtree_{left,right} do not need this change but are relaxed together
with bpf_rbtree_remove instead of adding extra verifier logic
to exclude these kfuncs.
To avoid bi-sect failure, this patch also changes the selftests together.
The "rbtree_api_remove_unadded_node" is not expecting verifier's error.
The test now expects bpf_rbtree_remove(&groot, &m->node) to return NULL.
The test uses __retval(0) to ensure this NULL return value.
Some of the "only take non-owning..." failure messages are changed also.
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Perform a bulk resync of tools/arch/arm64/include/asm/sysreg.h.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Synchronize with what we did with the kernel side header in:
3846389c03a8 ("x86/platform/amd: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Apparently, nf_conntrack_bridge changes the way in which fragments
are handled, dealing to packet drop. From Huajian Yang.
2) Add a selftest to stress the conntrack subsystem, from Florian Westphal.
3) nft_quota depletion is off-by-one byte, Zhongqiu Duan.
4) Rewrites the procfs to read the conntrack table to speed it up,
from Florian Westphal.
5) Two patches to prevent overflow in nft_pipapo lookup table and to
clamp the maximum bucket size.
6) Update nft_fib selftest to check for loopback packet bypass.
From Florian Westphal.
netfilter pull request 25-05-06
* tag 'nf-next-25-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
selftests: netfilter: nft_fib.sh: check lo packets bypass fib lookup
netfilter: nft_set_pipapo: clamp maximum map bucket size to INT_MAX
netfilter: nft_set_pipapo: prevent overflow in lookup table allocation
netfilter: nf_conntrack: speed up reads from nf_conntrack proc file
netfilter: nft_quota: match correctly when the quota just depleted
selftests: netfilter: add conntrack stress test
netfilter: bridge: Move specific fragmented packet to slow_path instead of dropping it
====================
Link: https://patch.msgid.link/20250505234151.228057-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In commit 2e044911be75 ("x86/traps: Decode 0xEA instructions as #UD")
FineIBT starts using 0xEA as an invalid instruction like UD2. But
insn decoder always returns the length of "0xea" instruction as 7
because it does not check the (i64) superscript.
The x86 instruction decoder should also decode 0xEA on x86-64 as
a one-byte invalid instruction by decoding the "(i64)" superscript tag.
This stops decoding instruction which has (i64) but does not have (o64)
superscript in 64-bit mode at opcode and skips other fields.
With this change, insn_decoder_test says 0xea is 1 byte length if
x86-64 (-y option means 64-bit):
$ printf "0:\tea\t\n" | insn_decoder_test -y -v
insn_decoder_test: success: Decoded and checked 1 instructions
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/174580490000.388420.5225447607417115496.stgit@devnote2
|
|
Commit:
159039af8c07 ("x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map")
added (!REX2) superscript with a space, but the correct format requires ','
for concatination with other superscript tags.
Add ',' to generate correct insn attribute tables.
I confirmed with following command:
arch/x86/lib/x86-opcode-map.txt | grep e8 | head -n 1
[0xe8] = INAT_MAKE_IMM(INAT_IMM_VWORD32) | INAT_FORCE64 | INAT_NO_REX2,
Fixes: 159039af8c07 ("x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/174580489027.388420.15539375184727726142.stgit@devnote2
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
ksft runner sends 2 SIGTERMs in a row if a test runs out of time.
Handle this in a similar way we handle SIGINT - cleanup and stop
running further tests.
Because we get 2 signals we need a bit of logic to ignore
the subsequent one, they come immediately one after the other
(due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM
to runner child")).
This change makes sure we run cleanup (scheduled defer()s)
and also print a stack trace on SIGTERM, which doesn't happen
by default. Tests occasionally hang in NIPA and it's impossible
to tell what they are waiting from or doing.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250503011856.46308-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Until recently, the kernel could unexpectedly discard SVE state for a
period after a KVM_RUN ioctl, when the guest did not execute any
FPSIMD/SVE/SME instructions. We fixed that issue in commit:
fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state")
Add a test which tries to provoke that issue by manipulating SVE state
before/after running a guest which does not execute any FPSIMD/SVE/SME
instructions. The test executes a handful of iterations to miminize
the risk that the issue is masked by preemption.
Signed-off--by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250417-kvm-selftest-sve-signal-v1-1-6330c2f3da0c@kernel.org
[maz: Restored MR's SoB, fixed commit message according to MR's write-up]
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Update kcpuid's CSV file to version 2.4, as generated by x86-cpuid-db.
Summary of the v2.4 changes:
* Mark CPUID(0x80000001) EDX:23 bit, 'e_mmx', as not exclusive to
Transmeta since it is supported by AMD as well.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://gitlab.com/x86-cpuid.org/x86-cpuid-db/-/blob/v2.4/CHANGELOG.rst
Link: https://lore.kernel.org/r/20250506050437.10264-2-darwi@linutronix.de
|
|
Conflicts:
tools/arch/x86/include/asm/cpufeatures.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Strings from the kernel are guaranteed to be null terminated and
ynl_attr_validate() checks for this. But it doesn't check if the string
has a len of 0, which would cause problems when trying to access
data[len - 1]. Fix this by checking that len is positive.
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250503043050.861238-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the test result does not differentiate between the cases when
either one of the address families are configured or if both the address
families are configured. Ideally, the result should report if a
particular case was skipped.
./drivers/net/ping.py
TAP version 13
1..7
ok 1 ping.test_default_v4 # SKIP Test requires IPv4 connectivity
ok 2 ping.test_default_v6
ok 3 ping.test_xdp_generic_sb
ok 4 ping.test_xdp_generic_mb
ok 5 ping.test_xdp_native_sb
ok 6 ping.test_xdp_native_mb
ok 7 ping.test_xdp_offload # SKIP device does not support offloaded XDP
Totals: pass:5 fail:0 xfail:0 xpass:0 skip:2 error:0
Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250503013518.1722913-4-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On a system with either of the ipv4 or ipv6 information missing, tests
are currently skipped. Ideally, the test should run as long as at least
one address family is present. This patch make test run whenever
possible.
Before:
./drivers/net/ping.py
TAP version 13
1..6
ok 1 ping.test_default # SKIP Test requires IPv4 connectivity
ok 2 ping.test_xdp_generic_sb # SKIP Test requires IPv4 connectivity
ok 3 ping.test_xdp_generic_mb # SKIP Test requires IPv4 connectivity
ok 4 ping.test_xdp_native_sb # SKIP Test requires IPv4 connectivity
ok 5 ping.test_xdp_native_mb # SKIP Test requires IPv4 connectivity
ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP
Totals: pass:0 fail:0 xfail:0 xpass:0 skip:6 error:0
After:
./drivers/net/ping.py
TAP version 13
1..6
ok 1 ping.test_default
ok 2 ping.test_xdp_generic_sb
ok 3 ping.test_xdp_generic_mb
ok 4 ping.test_xdp_native_sb
ok 5 ping.test_xdp_native_mb
ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP
Totals: pass:5 fail:0 xfail:0 xpass:0 skip:1 error:0
Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250503013518.1722913-3-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The `get_interface_info` call has ip version hard-coded which leads to
failures on an IPV6 system. The NetDrvEnv class already gathers
information about remote interface, so instead of fixing the local
implementation switch to using cfg.remote_ifname.
Before:
./drivers/net/ping.py
Traceback (most recent call last):
File "/new_tests/./drivers/net/ping.py", line 217, in <module>
main()
File "/new_tests/./drivers/net/ping.py", line 204, in main
get_interface_info(cfg)
File "/new_tests/./drivers/net/ping.py", line 128, in get_interface_info
raise KsftFailEx('Can not get remote interface')
net.lib.py.ksft.KsftFailEx: Can not get remote interface
After:
./drivers/net/ping.py
TAP version 13
1..6
ok 1 ping.test_default # SKIP Test requires IPv4 connectivity
ok 2 ping.test_xdp_generic_sb # SKIP Test requires IPv4 connectivity
ok 3 ping.test_xdp_generic_mb # SKIP Test requires IPv4 connectivity
ok 4 ping.test_xdp_native_sb # SKIP Test requires IPv4 connectivity
ok 5 ping.test_xdp_native_mb # SKIP Test requires IPv4 connectivity
ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP
Totals: pass:0 fail:0 xfail:0 xpass:0 skip:6 error:0
Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250503013518.1722913-2-mohsin.bashr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
GRE devices have their special code for IPv6 link-local address
generation that has been the source of several regressions in the past.
Add selftest to check that all gre, ip6gre, gretap and ip6gretap get an
IPv6 link-link local address in accordance with the
net.ipv6.conf.<dev>.addr_gen_mode sysctl.
Note: This patch was originally applied as commit 6f50175ccad4 ("selftests:
Add IPv6 link-local address generation tests for GRE devices.").
However, it was then reverted by commit 355d940f4d5a ("Revert "selftests:
Add IPv6 link-local address generation tests for GRE devices."")
because the commit it depended on was going to be reverted. Now that
the situation is resolved, we can add this selftest again (no changes
since original patch, appart from context update in
tools/testing/selftests/net/Makefile).
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/2c3a5733cb3a6e3119504361a9b9f89fda570a2d.1746225214.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since we added --off-cpu-thresh, add tests for when a sample's off-cpu
time is above the threshold, and when it's below the threshold.
Note that the basic test performed in test_offcpu_basic() collects a
direct sample now, since sleep 1 has duration of 1000ms, higher than the
default value of --off-cpu-thresh of 500ms, resulting in a direct
sample.
An example:
$ sudo perf test offcpu
124: perf record offcpu profiling tests : Ok
$
Committer testing:
root@number:~# perf test offcpu
126: perf record offcpu profiling tests : Ok
root@number:~# perf test -v offcpu
126: perf record offcpu profiling tests : Ok
root@number:~# perf test -vv offcpu
126: perf record offcpu profiling tests:
--- start ---
test child forked, pid 1410791
Checking off-cpu privilege
Basic off-cpu test
Basic off-cpu test [Success]
Child task off-cpu test
Child task off-cpu test [Success]
Threshold test (above threshold)
Threshold test (above threshold) [Success]
Threshold test (below threshold)
Threshold test (below threshold) [Success]
---- end(0) ----
126: perf record offcpu profiling tests : Ok
root@number:~#
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250501022809.449767-11-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Specify the threshold for dumping offcpu samples with --off-cpu-thresh,
the unit is milliseconds. Default value is 500ms.
Example:
perf record --off-cpu --off-cpu-thresh 824
The example above collects direct off-cpu samples where the off-cpu time
is longer than 824ms.
Committer testing:
After commenting out the end off-cpu dump to have just the ones that are
added right after the task is scheduled back, and using a threshould of
1000ms, we see some periods (the 5th column, just before "offcpu-time"
in the 'perf script' output) that are over 1000.000.000 nanoseconds:
root@number:~# perf record --off-cpu --off-cpu-thresh 10000
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.902 MB perf.data (34335 samples) ]
root@number:~# perf script
<SNIP>
Isolated Web Co 59932 [028] 63839.594437: 1000049427 offcpu-time:
7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6)
7fe63c78c04c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6)
7fe63c78e928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6)
5599974a9fe7 mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)+0xe7 (/usr/lib64/fir>
100000000 [unknown] ([unknown])
swapper 0 [025] 63839.594459: 195724 cycles:P: ffffffffac328270 read_tsc+0x0 ([kernel.kallsyms])
Isolated Web Co 59932 [010] 63839.594466: 1000055278 offcpu-time:
7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6)
7fe63c78ba24 __syscall_cancel+0x14 (/usr/lib64/libc.so.6)
7fe63c804c4e __poll+0x1e (/usr/lib64/libc.so.6)
7fe633b0d1b8 PollWrapper(_GPollFD*, unsigned int, int) [clone .lto_priv.0]+0xf8 (/usr/lib64/firefox/libxul.so)
10000002c [unknown] ([unknown])
swapper 0 [027] 63839.594475: 134433 cycles:P: ffffffffad4c45d9 irqentry_enter+0x19 ([kernel.kallsyms])
swapper 0 [028] 63839.594499: 215838 cycles:P: ffffffffac39199a switch_mm_irqs_off+0x10a ([kernel.kallsyms])
MediaPD~oder #1 1407676 [027] 63839.594514: 134433 cycles:P: 7f982ef5e69f dct_IV(int*, int, int*)+0x24f (/usr/lib64/libfdk-aac.so.2.0.0)
swapper 0 [024] 63839.594524: 267411 cycles:P: ffffffffad4c6ee6 poll_idle+0x56 ([kernel.kallsyms])
MediaSu~sor #75 1093827 [026] 63839.594555: 332652 cycles:P: 55be753ad030 moz_xmalloc+0x200 (/usr/lib64/firefox/firefox)
swapper 0 [027] 63839.594616: 160548 cycles:P: ffffffffad144840 menu_select+0x570 ([kernel.kallsyms])
Isolated Web Co 14019 [027] 63839.595120: 1000050178 offcpu-time:
7fc9537cc6c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6)
7fc9537c104c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6)
7fc9537c3928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6)
7fc95372a3c8 pt_TimedWait+0xb8 (/usr/lib64/libnspr4.so)
7fc95372a8d8 PR_WaitCondVar+0x68 (/usr/lib64/libnspr4.so)
7fc94afb1f7c WatchdogMain(void*)+0xac (/usr/lib64/firefox/libxul.so)
7fc947498660 [unknown] ([unknown])
7fc9535fce88 [unknown] ([unknown])
7fc94b620e60 WatchdogManager::~WatchdogManager()+0x0 (/usr/lib64/firefox/libxul.so)
fff8548387f8b48 [unknown] ([unknown])
swapper 0 [003] 63839.595712: 212948 cycles:P: ffffffffacd5b865 acpi_os_read_port+0x55 ([kernel.kallsyms])
<SNIP>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Suggested-by: Ian Rogers <irogers@google.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-2-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-10-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
BPF's stack trace map
Dump the remaining PERF_SAMPLE_ data, as if it is dumping a direct
sample.
Put the stack trace, tid, off-cpu time and cgroup id into the raw_data
section, just like a direct off-cpu sample coming from BPF's
bpf_perf_event_output().
This ensures that evsel__parse_sample() correctly parses both direct
samples and accumulated samples.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-10-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-9-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
No PERF_SAMPLE_CALLCHAIN in sample_type, but 'perf script' needs to
display a callchain, have to specify manually.
Also, prefer displaying a callchain:
gvfs-afc-volume 2267 [001] 3829232.955656: 1001115340 offcpu-time:
77f05292603f __pselect+0xbf (/usr/lib/x86_64-linux-gnu/libc.so.6)
77f052a1801c [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
77f052a18d45 [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0)
77f05289ca94 start_thread+0x384 (/usr/lib/x86_64-linux-gnu/libc.so.6)
77f052929c3c clone3+0x2c (/usr/lib/x86_64-linux-gnu/libc.so.6)
to a raw binary BPF output:
BPF output: 0000: dd 08 00 00 db 08 00 00 <DD>...<DB>...
0008: cc ce ab 3b 00 00 00 00 <CC>Ϋ;....
0010: 06 00 00 00 00 00 00 00 ........
0018: 00 fe ff ff ff ff ff ff .<FE><FF><FF><FF><FF><FF><FF>
0020: 3f 60 92 52 f0 77 00 00 ?`.R<F0>w..
0028: 1c 80 a1 52 f0 77 00 00 ..<A1>R<F0>w..
0030: 45 8d a1 52 f0 77 00 00 E.<A1>R<F0>w..
0038: 94 ca 89 52 f0 77 00 00 .<CA>.R<F0>w..
0040: 3c 9c 92 52 f0 77 00 00 <..R<F0>w..
0048: 00 00 00 00 00 00 00 00 ........
0050: 00 00 00 00 ....
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-9-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-8-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There is a check in evsel.c that does this:
if (evsel__is_offcpu_event(evsel))
evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES;
This along with:
#define OFFCPU_SAMPLE_TYPES (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
PERF_SAMPLE_CGROUP)
will tell perf_event to collect callchain.
We don't need the callchain from perf_event when collecting off-cpu
samples, because it's prev's callchain, not next's callchain.
(perf_event) (task_storage) (needed)
prev next
| |
---sched_switch---->
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-8-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Use the data in bpf-output samples, to assemble off-cpu samples.
In evsel__is_offcpu_event(), check if sample_type is PERF_SAMPLE_RAW to
support off-cpu sample data created by an older version of perf.
Testing compatibility on off-cpu samples collected by perf before this patch series:
See below, the sample_type still uses PERF_SAMPLE_CALLCHAIN
$ perf script --header -i ./perf.data.ptn | grep "event : name = offcpu-time"
# event : name = offcpu-time, , id = { 237917, 237918, 237919, 237920 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1
The output is correct.
$ perf script -i ./perf.data.ptn | grep offcpu-time
gmain 2173 [000] 18446744069.414584: 100102015 offcpu-time:
NetworkManager 901 [000] 18446744069.414584: 5603579 offcpu-time:
Web Content 1183550 [000] 18446744069.414584: 46278 offcpu-time:
gnome-control-c 2200559 [000] 18446744069.414584: 11998247014 offcpu-time:
<SNIP>
$
And after this patch series:
$ perf script --header -i ./perf.data.off-cpu-v9 | grep "event : name = offcpu-time"
# event : name = offcpu-time, , id = { 237959, 237960, 237961, 237962 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1
$ ./perf script -i ./perf.data.off-cpu-v9 | grep offcpu-time
gnome-shell 1875 [001] 4789616.361225: 100097057 offcpu-time:
gnome-shell 1875 [001] 4789616.461419: 100107463 offcpu-time:
firefox 2206821 [002] 4789616.475690: 255257245 offcpu-time:
$
Committer testing:
The command to record those samples:
root@number:~# perf record --off-cpu -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.092 MB perf.data (1552 samples) ]
root@number:~#
Then, before this patch series, the sample_type for the "offcpu-time" event is:
root@number:~# perf evlist -v | grep offcpu-time
offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, freq: 1, sample_id_all: 1
root@number:~#
And after it, after recording it again:
root@number:~# perf record --off-cpu -a sleep 1 ; perf evlist -v | grep offcpu-time
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.151 MB perf.data (2843 samples) ]
offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, sample_id_all: 1
root@number:~#
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-7-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-6-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Collect tid, period, callchain, and cgroup id and dump them when off-cpu
time threshold is reached.
We don't collect the off-cpu time twice (the delta), it's either in
direct samples, or accumulated samples that are dumped at the end of
perf.data.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-6-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-5-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Set the perf_event map in BPF for dumping off-cpu samples, and set the
offcpu_thresh to specify the threshold.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-5-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-4-howardchu95@gmail.com
[ Added some missing iteration variables to off_cpu_config() and fixed up
a manually edited patch hunk line boundary line ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Parse the off-cpu event using parse_event(), as bpf-output.
Call evlist__enable_evsel() on off-cpu event. This fixes the inability
to collect direct off-cpu samples on a workload, as reported by Arnaldo
Carvalho de Melo <acme@redhat.com>.
The reason being, workload sets enable_on_exec instead of calling
evlist__enable(), but off-cpu event does not attach to an executable and
execve won't be called, so the fds from perf_event_open() are not
enabled.
no-inherit should be set to 1, here's the reason:
We update the BPF perf_event map for direct off-cpu sample dumping (in
following patches), it executes as follows:
bpf_map_update_value()
bpf_fd_array_map_update_elem()
perf_event_fd_array_get_ptr()
perf_event_read_local()
In perf_event_read_local(), there is:
int perf_event_read_local(struct perf_event *event, u64 *value,
u64 *enabled, u64 *running)
{
...
/*
* It must not be an event with inherit set, we cannot read
* all child counters from atomic context.
*/
if (event->attr.inherit) {
ret = -EOPNOTSUPP;
goto out;
}
Which means no-inherit has to be true for updating the BPF perf_event
map.
Moreover, for bpf-output events, we primarily want a system-wide event
instead of a per-task event.
The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU
index to retrieve the perf_event file descriptor it outputs to.
Making a bpf-output event system-wide naturally satisfies this
requirement by mapping CPU appropriately.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-4-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Expose evsel__is_offcpu_event() so it can be used in off_cpu_config(),
evsel__parse_sample() and 'perf script'.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241108204137.2444151-3-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20250501022809.449767-2-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Clean up two build warnings:
[1]
iou-zcrx.c: In function ‘process_recvzc’:
iou-zcrx.c:263:37: warning: too many arguments for format [-Wformat-extra-args]
263 | error(1, 0, "payload mismatch at ", i);
| ^~~~~~~~~~~~~~~~~~~~~~
[2] Use "%zd" for ssize_t type as better
iou-zcrx.c: In function ‘run_client’:
iou-zcrx.c:357:47: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘ssize_t’ {aka ‘long int’} [-Wformat=]
357 | error(1, 0, "send(): %d", sent);
| ~^ ~~~~
| | |
| int ssize_t {aka long int}
| %ld
Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250502175136.1122-1-haiyuewa@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch aims to add chk_dump_subflow in diag.sh. The subflow's
info can be obtained through "ss -tin", then use the 'mptcp_diag'
to verify the token in subflow_info.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/524
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-7-68eec95898fb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds 'get_subflow_info' in 'mptcp_diag', which can check whether
a TCP connection is an MPTCP subflow based on the "INET_ULP_INFO_MPTCP"
with tcp_diag method.
The helper 'print_subflow_info' in 'mptcp_diag' can print the subflow_filed
of an MPTCP subflow for further checking the 'subflow_info' through
inet_diag method.
The example of the whole output should be:
$ ./mptcp_diag -s "127.0.0.1:10000 127.0.0.1:38984"
127.0.0.1:10000 -> 127.0.0.1:38984
It's a mptcp subflow, the subflow info:
flags:Mec token:0000(id:0)/4278e77e(id:0) seq:9288466187236176036 \
sfseq:1 ssnoff:2317083055 maplen:215
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-6-68eec95898fb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch introduces the '__u32 proto' variable to the 'send_query' and
'recv_nlmsg' functions for further extending function.
In the 'send_query' function, the inclusion of this variable makes the
structure clearer and more readable.
In the 'recv_nlmsg' function, the '__u32 proto' variable ensures that
the 'diag_info' field remains unmodified when processing IPPROTO_TCP data,
thereby preventing unintended transformation into 'mptcp_info' format.
While at it, increment iovlen directly when an item is added to simplify
this portion of the code and improve its readaility.
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-5-68eec95898fb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch use 'inet_diag_req_v2' instead of 'token' as parameters of
send_query, and construct the req in 'get_mptcpinfo'.
This modification enhances the clarity of the code, and prepare for the
dump_subflow_info.
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-4-68eec95898fb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a struct named 'params' to save 'target_token' and other
future parameters. This structure facilitates future function expansions.
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-3-68eec95898fb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
getaddrinfo MPTCP is recently supported in glibc and IPPROTO_MPTCP for
getaddrinfo is used in mptcp_connect.c. But in mptcp_sockopt.c and
mptcp_inq.c, IPPROTO_TCP are still used for getaddrinfo, So this patch
updates them.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-2-68eec95898fb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mptcp_lib_get_info_value() will only print the first entry that match
the filter because of the ';q' at the end. As a consequence, the 'sed'
command could finish before the previous 'grep' one and print a 'write
error' warning because it is trying to write data to the closed pipe.
Such warnings are not interesting, they can be hidden by muting stderr
here for grep.
While at it, clearly indicate that mptcp_lib_get_info_value() will only
print the first matched entry to avoid confusions later on.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250502-net-next-mptcp-sft-inc-cover-v1-1-68eec95898fb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Building kublk currently fails (with a "could not find linux/ublk_cmd.h"
error message) if kernel headers are not installed in a system-global
location (i.e. somewhere in the compiler's default include search path).
This failure is unnecessary, as make kselftest installs kernel headers
in the build tree - kublk's build just isn't looking for them properly.
There is an include path in kublk's CFLAGS which is probably intended to
find the kernel headers installed in the build tree; fix it so that it
can actually find them.
This introduces some macro redefinition issues between glibc-provided
headers and kernel headers; fix those by eliminating one include in
kublk.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250429-ublk_selftests-v2-3-e970b6d9e4f4@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Convention dictates that tests should not log anything on success. Make
test_generic_06 follow this convention.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250429-ublk_selftests-v2-2-e970b6d9e4f4@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Compiler warnings can catch bugs at compile time; thus, heeding them is
usually a good idea. Turn warnings into errors by default for the kublk
build so that anyone making changes is forced to heed them. Compiler
warnings can also sometimes produce annoying false positives, so provide
a flag WERROR that the developer can use as follows to have the build
and selftests run go through even if there are warnings:
make WERROR=0 TARGETS=ublk kselftest
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250429-ublk_selftests-v2-1-e970b6d9e4f4@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
BTF dedup has a strong assumption that compiler with deduplicate identical
types within any given compilation unit (i.e., .c file). This property
is used when establishing equilvalence of two subgraphs of types.
Unfortunately, this property doesn't always holds in practice. We've
seen cases of having truly identical structs, unions, array definitions,
and, most recently, even pointers to the same type being duplicated
within CU.
Previously, we mitigated this on a case-by-case basis, adding a few
simple heuristics for validating that two BTF types (having two
different type IDs) are structurally the same. But this approach scales
poorly, and we can have more weird cases come up in the future.
So let's take a half-step back, and implement a bit more generic
structural equivalence check, recursively. We still limit it to
reasonable depth to avoid long reference loops. Depth-wise limiting of
potentially cyclical graph isn't great, but as I mentioned below doesn't
seem to be detrimental performance-wise. We can always improve this in
the future with per-type visited markers, if necessary.
Performance-wise this doesn't seem too affect vmlinux BTF dedup, which
makes sense because this logic kicks in not so frequently and only if we
already established a canonical candidate type match, but suddenly find
a different (but probably identical) type.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20250501235231.1339822-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
With the latest LLVM bpf selftests build will fail with
the following error message:
progs/profiler.inc.h:710:31: error: default initialization of an object of type 'typeof ((parent_task)->real_cred->uid.val)' (aka 'const unsigned int') leaves the object uninitialized and is incompatible with C++ [-Werror,-Wdefault-const-init-unsafe]
710 | proc_exec_data->parent_uid = BPF_CORE_READ(parent_task, real_cred, uid.val);
| ^
tools/testing/selftests/bpf/tools/include/bpf/bpf_core_read.h:520:35: note: expanded from macro 'BPF_CORE_READ'
520 | ___type((src), a, ##__VA_ARGS__) __r; \
| ^
This happens because BPF_CORE_READ (and other macro) declare the
variable __r using the ___type macro which can inherit const modifier
from intermediate types.
Fix this by using __typeof_unqual__, when supported. (And when it
is not supported, the problem shouldn't appear, as older compilers
haven't complained.)
Fixes: 792001f4f7aa ("libbpf: Add user-space variants of BPF_CORE_READ() family of macros")
Fixes: a4b09a9ef945 ("libbpf: Add non-CO-RE variants of BPF_CORE_READ() macro family")
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250502193031.3522715-1-a.s.protopopov@gmail.com
|
|
"sockmap_ktls disconnect_after_delete" is effectively moot after
disconnect has been disabled for TLS [1][2]. Remove the test
completely.
[1] https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev/
[2] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250502185221.1556192-1-isolodrai@meta.com
|
|
Integrate the reproducer from Alan into TC selftests and use scapy to
generate TCP traffic instead of relying on ping command.
Cc: Alan J. Wylie <alan@wylie.me.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250428232955.1740419-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-05-02
We've added 14 non-merge commits during the last 10 day(s) which contain
a total of 13 files changed, 740 insertions(+), 121 deletions(-).
The main changes are:
1) Avoid skipping or repeating a sk when using a UDP bpf_iter,
from Jordan Rife.
2) Fixed a crash when a bpf qdisc is set in
the net.core.default_qdisc, from Amery Hung.
3) A few other fixes in the bpf qdisc, from Amery Hung.
- Always call qdisc_watchdog_init() in the .init prologue such that
the .reset/.destroy epilogue can always call qdisc_watchdog_cancel()
without issue.
- bpf_qdisc_init_prologue() was incorrectly returning an error
when the bpf qdisc is set as the default_qdisc and the mq is creating
the default_qdisc. It is now fixed.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Cleanup bpf qdisc selftests
selftests/bpf: Test attaching a bpf qdisc with incomplete operators
bpf: net_sched: Make some Qdisc_ops ops mandatory
selftests/bpf: Test setting and creating bpf qdisc as default qdisc
bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc
selftests/bpf: Add tests for bucket resume logic in UDP socket iterators
selftests/bpf: Return socket cookies from sock_iter_batch progs
bpf: udp: Avoid socket skips and repeats during iteration
bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items
bpf: udp: Get rid of st_bucket_done
bpf: udp: Make sure iter->batch always contains a full bucket snapshot
bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch
bpf: net_sched: Fix using bpf qdisc as default qdisc
selftests/bpf: Fix compilation errors
====================
Link: https://patch.msgid.link/20250503010755.4030524-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Trying to cut the branch you are sat on is pretty dumb. And so is
trying to disable the instruction set you are executing on.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250429114117.3618800-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Merge 6.15 block fixes in, once again, to resolve conflicts with the
fixes for ublk that went into mainline and the 6.16 ublk updates.
* block-6.15:
nvmet-auth: always free derived key data
nvmet-tcp: don't restore null sk_state_change
nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLS
nvme-tcp: select CONFIG_TLS from CONFIG_NVME_TCP_TLS
nvme-tcp: fix premature queue removal and I/O failover
nvme-pci: add quirks for WDC Blue SN550 15b7:5009
nvme-pci: add quirks for device 126f:1001
nvme-pci: fix queue unquiesce check on slot_reset
ublk: remove the check of ublk_need_req_ref() from __ublk_check_and_get_req
ublk: enhance check for register/unregister io buffer command
ublk: decouple zero copy from user copy
selftests: ublk: fix UBLK_F_NEED_GET_DATA
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
With reverted fix:
PASS: fib expression did not cause unwanted packet drops
[ 37.285169] ns1-KK76Kt nft_rpfilter: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=32287 DF PROTO=ICMP TYPE=8 CODE=0 ID=1818 SEQ=1
FAIL: rpfilter did drop packets
FAIL: ns1-KK76Kt cannot reach 127.0.0.1, ret 0
Check for this.
Link: https://lore.kernel.org/netfilter/20250422114352.GA2092@breakpoint.cc/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add a new test case to check:
- conntrack_max limit is effective
- conntrack_max limit cannot be exceeded from within a netns
- resizing the hash table while packets are inflight works
- removal of all conntrack rules disables conntrack in netns
- conntrack tool dump (conntrack -L) returns expected number
of (unique) entries
- procfs interface - if available - has same number of entries
as conntrack -L dump
Expected output with selftest framework:
selftests: net/netfilter: conntrack_resize.sh
PASS: got 1 connections: netns conntrack_max is pernet bound
PASS: got 100 connections: netns conntrack_max is init_net bound
PASS: dump in netns had same entry count (-C 1778, -L 1778, -p 1778, /proc 0)
PASS: dump in netns had same entry count (-C 2000, -L 2000, -p 2000, /proc 0)
PASS: test parallel conntrack dumps
PASS: resize+flood
PASS: got 0 connections: conntrack disabled
PASS: got 1 connections: conntrack enabled
ok 1 selftests: net/netfilter: conntrack_resize.sh
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"Just a couple of build fixes on arm64"
* tag 'perf-tools-fixes-for-v6.15-2025-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf tools: Fix in-source libperf build
perf tools: Fix arm64 build by generating unistd_64.h
|