Age | Commit message (Collapse) | Author |
|
Both glibc and musl provide RB_ flags via <sys/reboot.h> for reboot(),
they don't need to include <linux/reboot.h>, let nolibc provide RB_
flags too.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
musl limits the fast signed int in 32bit, but glibc and nolibc don't, to
let such test cases work on musl, let's provide the type based
SINT_MAX_OF_TYPE(type) and SINT_MIN_OF_TYPE(type).
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/bc635c4f-67fe-4e86-bfdf-bcb4879b928d@t-8ch.de/
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
_GNU_SOURCE Implies _LARGEFILE64_SOURCE in glibc, but in musl, the
default configuration doesn't enable _LARGEFILE64_SOURCE.
>From include/dirent.h of musl, getdents64 is provided as getdents when
_LARGEFILE64_SOURCE is defined.
#if defined(_LARGEFILE64_SOURCE)
...
#define getdents64 getdents
#endif
Let's define _LARGEFILE64_SOURCE to fix up this compile error:
tools/testing/selftests/nolibc/nolibc-test.c: In function ‘test_getdents64’:
tools/testing/selftests/nolibc/nolibc-test.c:453:8: warning: implicit declaration of function ‘getdents64’; did you mean ‘getdents’? [-Wimplicit-function-declaration]
453 | ret = getdents64(fd, (void *)buffer, sizeof(buffer));
| ^~~~~~~~~~
| getdents
/usr/bin/ld: /tmp/ccKILm5u.o: in function `test_getdents64':
nolibc-test.c:(.text+0xe3e): undefined reference to `getdents64'
collect2: error: ld returned 1 exit status
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
As the gettid manpage [1] shows, glibc 2.30 has gettid support, so,
let's enable the test for glibc >= 2.30.
gettid works on musl too.
[1]: https://man7.org/linux/man-pages/man2/gettid.2.html
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
Use another invalid address (void *)1 instead of NULL to silence this
compile warning with glibc:
$ make libc-test
CC libc-test
nolibc-test.c: In function ‘run_syscall’:
nolibc-test.c:622:49: warning: null argument where non-null required (argument 1) [-Wnonnull]
622 | CASE_TEST(stat_fault); EXPECT_SYSER(1, stat(NULL, &stat_buf), -1, EFAULT); break;
| ^~~~
nolibc-test.c:304:79: note: in definition of macro ‘EXPECT_SYSER2’
304 | do { if (!cond) pad_spc(llen, 64, "[SKIPPED]\n"); else ret += expect_syserr2(expr, expret, experr1, experr2, llen); } while (0)
| ^~~~
nolibc-test.c:622:33: note: in expansion of macro ‘EXPECT_SYSER’
622 | CASE_TEST(stat_fault); EXPECT_SYSER(1, stat(NULL, &stat_buf), -1, EFAULT); break;
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
allow run and report glibc or musl based libc-test.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
mmap() a file with a good offset and then munmap() it. a non-zero offset
is passed to test the 6th argument of my_syscall6().
Note, it is not easy to find a unique file for mmap() in different
scenes, so, a file list is used to search the right one:
- /dev/zero: is commonly used to allocate anonymous memory and is likely
present and readable
- /proc/1/exe: for 'run' and 'run-user' target, 'run-user' can not find
'/proc/self/exe'
- /proc/self/exe: for 'libc-test' target, normal program 'libc-test' has
no permission to access '/proc/1/exe'
- argv0: the path of the program itself, let it pass even with worst
case scene: no procfs and no /dev/zero
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702193306.GK16233@1wt.eu/
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/bff82ea6-610b-4471-a28b-6c76c28604a6@t-8ch.de/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
The addr argument of munmap() must be a multiple of the page size,
passing invalid (void *)1 addr expects failure with -EINVAL.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
The length argument of mmap() must be greater than 0, passing a zero
length argument expects failure with -EINVAL.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
>From musl 0.9.14 (to the latest version 1.2.3), both sbrk() and brk()
have almost been disabled for they conflict with malloc, only sbrk(0) is
still permitted as a way to get the current location of the program
break, let's support such case.
EXPECT_PTRNE() is used to expect sbrk() always successfully getting the
current break.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
The syscalls like sbrk() and mmap() return pointers, to test them, more
pointer compare test macros are required, add them:
- EXPECT_PTREQ() expects two equal pointers.
- EXPECT_PTRNE() expects two non-equal pointers.
- EXPECT_PTRER() expects failure with a specified errno.
- EXPECT_PTRER2() expects failure with one of two specified errnos.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
/dev/zero is commonly used to allocate anonymous memory, it is a very
good file for tests, let's prepare it.
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702193306.GK16233@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
argv0 is the path to nolibc-test program itself, which is a very good
always existing readable file for some tests, let's export it.
Note, the path may be absolute or relative, please make sure the tests
work with both of them. If it is relative, we must make sure the current
path is the one specified by the PWD environment variable.
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/ZKKbS3cwKcHgnGwu@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
Fix up the error reported by scripts/checkpatch.pl:
ERROR: do not use assignment in if condition
#95: FILE: tools/include/nolibc/sys.h:95:
+ if ((ret = sys_brk(0)) && (sys_brk(ret + inc) == ret + inc))
Apply the new generic __sysret() to merge the SET_ERRNO() and return
lines.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
Do several cleanups together:
- Since all supported architectures have my_syscall6() now, remove the
#ifdef check.
- Move the mmap() related macros to tools/include/nolibc/types.h and
reuse most of them from <linux/mman.h>
- Apply the new generic __sysret() to convert the calling of sys_map()
to oneline code
Note, since MAP_FAILED is -1 on Linux, so we can use the generic
__sysret() which returns -1 upon error and still satisfy user land that
checks for MAP_FAILED.
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/20230702192347.GJ16233@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
No official reference states the errno range, here aligns with musl and
glibc and uses [-MAX_ERRNO, -1] instead of all negative ones.
- musl: src/internal/syscall_ret.c
- glibc: sysdeps/unix/sysv/linux/sysdep.h
The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc
defined in tools/include/nolibc/errno.h.
Suggested-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
It is able to pass the 6th argument like the 5th argument via the stack
for mips, let's add a new my_syscall6() now, see [1] for details:
The mips/o32 system call convention passes arguments 5 through 8 on
the user stack.
Both mmap() and pselect6() require my_syscall6().
[1]: https://man7.org/linux/man-pages/man2/syscall.2.html
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
my_syscall<N> share the same long clobber list, define a macro for them.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
my_syscall<N> share the same long clobber list, define a macro for them.
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
replace "__asm__ volatile" with "__asm__ volatile" and insert necessary
whitespace before "\" to make sure the lines are aligned.
$ sed -i -e 's/__asm__ volatile ( /__asm__ volatile ( /g' tools/include/nolibc/*.h
Note, arch-s390.h uses post-tab instead of post-whitespaces, must avoid
insert whitespace just before the tabs:
$ sed -i -e 's/__asm__ volatile (\t/__asm__ volatile (\t/g' tools/include/nolibc/arch-*.h
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
More than 8 whitespaces of the code indent are replaced with "tab +
whitespaces" to fix up such errors reported by scripts/checkpatch.pl:
ERROR: code indent should use tabs where possible
#64: FILE: tools/include/nolibc/arch-mips.h:64:
+^I \$
ERROR: code indent should use tabs where possible
#72: FILE: tools/include/nolibc/arch-mips.h:72:
+^I "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \$
This command is used:
$ sed -i -e '/^\t* /{s/ /\t/g}' tools/include/nolibc/arch-*.h
Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
Since commit 53fcfafa8c5c ("tools/nolibc/unistd: add syscall()") nolibc
has support for syscall(2).
Use it to get rid of some ifdef-ery.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter updates for net-next
First patch resolves a fortify warning by wrapping the to-be-copied
members via struct_group.
Second patch replaces array[0] with array[] in ebtables uapi.
Both changes from GONG Ruiqi.
The largest chunk is replacement of strncpy with strscpy_pad()
in netfilter, from Justin Stitt.
Last patch, from myself, aborts ruleset validation if a fatal
signal is pending, this speeds up process exit.
* tag 'nf-next-23-08-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: allow loop termination for pending fatal signal
netfilter: xtables: refactor deprecated strncpy
netfilter: x_tables: refactor deprecated strncpy
netfilter: nft_meta: refactor deprecated strncpy
netfilter: nft_osf: refactor deprecated strncpy
netfilter: nf_tables: refactor deprecated strncpy
netfilter: nf_tables: refactor deprecated strncpy
netfilter: ipset: refactor deprecated strncpy
netfilter: ebtables: replace zero-length array members
netfilter: ebtables: fix fortify warnings in size_entry_mwt()
====================
Link: https://lore.kernel.org/r/20230822154336.12888-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update fprobe event example with BTF data structure field specification.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add test cases for accessing the data structure fields using BTF info.
This includes the field access from parameters and retval, and accessing
string information.
Link: https://lore.kernel.org/all/169272161265.160970.14048619786574971276.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Assume the fprobe event is a return event if there is $retval is
used in the probe's argument without %return. e.g.
echo 'f:myevent vfs_read $retval' >> dynamic_events
then 'myevent' is a return probe event.
Link: https://lore.kernel.org/all/169272160261.160970.13613040161560998787.stgit@devnote2/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a string type checking with BTF information if possible.
This will check whether the given BTF argument (and field) is
signed char array or pointer to signed char. If not, it reject
the 'string' type. If it is pointer to signed char, it adds
a dereference opration so that it can correctly fetch the
string data from memory.
# echo 'f getname_flags%return retval->name:string' >> dynamic_events
# echo 't sched_switch next->comm:string' >> dynamic_events
The above cases, 'struct filename::name' is 'char *' and
'struct task_struct::comm' is 'char []'. But in both case,
user can specify ':string' to fetch the string data.
Link: https://lore.kernel.org/all/169272159250.160970.1881112937198526188.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Support BTF argument on '$retval' for function return events including
kretprobe and fprobe for accessing the return value.
This also allows user to access its fields if the return value is a
pointer of a data structure.
E.g.
# echo 'f getname_flags%return +0($retval->name):string' \
> dynamic_events
# echo 1 > events/fprobes/getname_flags__exit/enable
# ls > /dev/null
# head -n 40 trace | tail
ls-87 [000] ...1. 8067.616101: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./function_profile_enabled"
ls-87 [000] ...1. 8067.616108: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./trace_stat"
ls-87 [000] ...1. 8067.616115: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_graph_notrace"
ls-87 [000] ...1. 8067.616122: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_graph_function"
ls-87 [000] ...1. 8067.616129: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_ftrace_notrace"
ls-87 [000] ...1. 8067.616135: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_ftrace_filter"
ls-87 [000] ...1. 8067.616143: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./touched_functions"
ls-87 [000] ...1. 8067.616237: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./enabled_functions"
ls-87 [000] ...1. 8067.616245: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./available_filter_functions"
ls-87 [000] ...1. 8067.616253: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_ftrace_notrace_pid"
Link: https://lore.kernel.org/all/169272158234.160970.2446691104240645205.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Using BTF to access the fields of a data structure. You can use this
for accessing the field with '->' or '.' operation with BTF argument.
# echo 't sched_switch next=next->pid vruntime=next->se.vruntime' \
> dynamic_events
# echo 1 > events/tracepoints/sched_switch/enable
# head -n 40 trace | tail
<idle>-0 [000] d..3. 272.565382: sched_switch: (__probestub_sched_switch+0x4/0x10) next=26 vruntime=956533179
kcompactd0-26 [000] d..3. 272.565406: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
<idle>-0 [000] d..3. 273.069441: sched_switch: (__probestub_sched_switch+0x4/0x10) next=9 vruntime=956533179
kworker/0:1-9 [000] d..3. 273.069464: sched_switch: (__probestub_sched_switch+0x4/0x10) next=26 vruntime=956579181
kcompactd0-26 [000] d..3. 273.069480: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
<idle>-0 [000] d..3. 273.141434: sched_switch: (__probestub_sched_switch+0x4/0x10) next=22 vruntime=956533179
kworker/u2:1-22 [000] d..3. 273.141461: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
<idle>-0 [000] d..3. 273.480872: sched_switch: (__probestub_sched_switch+0x4/0x10) next=22 vruntime=956585857
kworker/u2:1-22 [000] d..3. 273.480905: sched_switch: (__probestub_sched_switch+0x4/0x10) next=70 vruntime=959533179
sh-70 [000] d..3. 273.481102: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
Link: https://lore.kernel.org/all/169272157251.160970.9318175874130965571.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add btf_find_struct_member() API to search a member of a given data structure
or union from the member's name.
Link: https://lore.kernel.org/all/169272156248.160970.8868479822371129043.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
trace_btf
Move generic function-proto find API and getting function parameter API
to BTF library code from trace_probe.c. This will avoid redundant efforts
on different feature.
Link: https://lore.kernel.org/all/169272155255.160970.719426926348706349.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the btf returned from bpf_get_btf_vmlinux() only covers functions in
the vmlinux, BTF argument is not available on the functions in the modules.
Use bpf_find_btf_id() instead of bpf_get_btf_vmlinux()+btf_find_name_kind()
so that BTF argument can find the correct struct btf and btf_type in it.
With this fix, fprobe events can use `$arg*` on module functions as below
# grep nf_log_ip_packet /proc/kallsyms
ffffffffa0005c00 t nf_log_ip_packet [nf_log_syslog]
ffffffffa0005bf0 t __pfx_nf_log_ip_packet [nf_log_syslog]
# echo 'f nf_log_ip_packet $arg*' > dynamic_events
# cat dynamic_events
f:fprobes/nf_log_ip_packet__entry nf_log_ip_packet net=net pf=pf hooknum=hooknum skb=skb in=in out=out loginfo=loginfo prefix=prefix
To support the module's btf which is removable, the struct btf needs to be
ref-counted. So this also records the btf in the traceprobe_parse_context
and returns the refcount when the parse has done.
Link: https://lore.kernel.org/all/169272154223.160970.3507930084247934031.stgit@devnote2/
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Refer to the description in [1], we can skip "container_of()" following
"list_for_each_entry()" by using "list_for_each_entry()" with
"struct trace_eprobe" and "tp.list".
Also, this patch defines "for_each_trace_eprobe_tp" to simplify the code
of the same logic.
[1] https://lore.kernel.org/all/CAHk-=wjakjw6-rDzDDBsuMoDCqd+9ogifR_EE1F0K-jYek1CdA@mail.gmail.com/
Link: https://lore.kernel.org/all/20230822022433.262478-1-nashuiliang@gmail.com/
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Use struct_size() instead of hand-writing it, when allocating a structure
with a flex array.
This is less verbose.
Link: https://lore.kernel.org/all/20230725195424.3469242-1-ruanjinjie@huawei.com/
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM
requests. The bit resolution of this field is six bits. That bit five was
missing in the mask. This patch comes to correct the typo in the
IGC_PTM_CTRL_SHRT_CYC macro.
Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mat Martineau says:
====================
mptcp: Prepare MPTCP packet scheduler for BPF extension
The kernel's MPTCP packet scheduler has, to date, been a one-size-fits
all algorithm that is hard-coded. It attempts to balance latency and
throughput when transmitting data across multiple TCP subflows, and has
some limited tunability through sysctls. It has been a long-term goal of
the Linux MPTCP community to support customizable packet schedulers for
use cases that need to make different trade-offs regarding latency,
throughput, redundancy, and other metrics. BPF is well-suited for
configuring customized, per-packet scheduling decisions without having
to modify the kernel or manage out-of-tree kernel modules.
The first steps toward implementing BPF packet schedulers are to update
the existing MPTCP transmit loops to allow more flexible scheduling
decisions, and to add infrastructure for swappable packet schedulers.
The existing scheduling algorithm remains the default. BPF-related
changes will be in a future patch series.
This code has been in the MPTCP development tree for quite a while,
undergoing testing in our CI and community.
Patches 1 and 2 refactor the transmit code and do some related cleanup.
Patches 3-9 add infrastructure for registering and calling multiple
schedulers.
Patch 10 connects the in-kernel default scheduler to the new
infrastructure.
====================
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-0-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch defines the default packet scheduler mptcp_sched_default.
Register it in mptcp_sched_init(), which is invoked in mptcp_proto_init().
Skip deleting this default scheduler in mptcp_unregister_scheduler().
Set msk->sched to the default scheduler when the input parameter of
mptcp_init_sched() is NULL.
Invoke mptcp_sched_default_get_subflow in get_send() and get_retrans()
if the defaut scheduler is set or msk->sched is NULL.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-10-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds the multiple subflows support for __mptcp_retrans(). Use
get_retrans() wrapper instead of mptcp_subflow_get_retrans() in it.
Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.
Move msk_owned_by_me() and fallback checks into get_retrans() wrapper
from mptcp_subflow_get_retrans().
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-9-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds the multiple subflows support for __mptcp_push_pending
and __mptcp_subflow_push_pending. Use get_send() wrapper instead of
mptcp_subflow_get_send() in them.
Check the subflow scheduled flags to test which subflow or subflows are
picked by the scheduler, use them to send data.
Move msk_owned_by_me() and fallback checks into get_send() wrapper from
mptcp_subflow_get_send().
This commit allows the scheduler to set the subflow->scheduled bit in
multiple subflows, but it does not allow for sending redundant data.
Multiple scheduled subflows will send sequential data on each subflow.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-8-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch defines two packet scheduler wrappers mptcp_sched_get_send()
and mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in
them.
Set data->reinject to true in mptcp_sched_get_retrans(), set it false in
mptcp_sched_get_send().
If msk->sched is NULL, use default functions mptcp_subflow_get_send()
and mptcp_subflow_get_retrans() to send data.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-7-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new member scheduled in struct mptcp_subflow_context,
which will be set in the MPTCP scheduler context when the scheduler
picks this subflow to send data.
Add a new helper mptcp_subflow_set_scheduled() to set this flag using
WRITE_ONCE().
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-6-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new struct member sched in struct mptcp_sock.
And two helpers mptcp_init_sched() and mptcp_release_sched() to
init and release it.
Init it with the sysctl scheduler in mptcp_init_sock(), copy the
scheduler from the parent in mptcp_sk_clone(), and release it in
__mptcp_destroy_sock().
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-5-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new sysctl, named scheduler, to support for selection
of different schedulers. Export mptcp_get_scheduler helper to get this
sysctl.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch defines struct mptcp_sched_ops, which has three struct members,
name, owner and list, and four function pointers: init(), release() and
get_subflow().
The scheduler function get_subflow() have a struct mptcp_sched_data
parameter, which contains a reinject flag for retrans or not, a subflows
number and a mptcp_subflow_context array.
Add the scheduler registering, unregistering and finding functions to add,
delete and find a packet scheduler on the global list mptcp_sched_list.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the burst check conditions have moved out of the function
mptcp_subflow_get_send(), it makes all msk->last_snd useless.
This patch drops them as well as the macro MPTCP_RESET_SCHEDULER.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-2-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To support redundant package schedulers more easily, this patch refactors
__mptcp_push_pending() logic from:
For each dfrag:
While sends succeed:
Call the scheduler (selects subflow and msk->snd_burst)
Update subflow locks (push/release/acquire as needed)
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Push/release on final subflow
->
While first_pending isn't empty:
Call the scheduler (selects subflow and msk->snd_burst)
Update subflow locks (push/release/acquire as needed)
For each pending dfrag:
While sends succeed:
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Break if required by msk->snd_burst / etc
Push/release on final subflow
Refactors __mptcp_subflow_push_pending logic from:
For each dfrag:
While sends succeed:
Call the scheduler (selects subflow and msk->snd_burst)
Send the dfrag data with mptcp_subflow_delegate(), break
Send the dfrag data with mptcp_sendmsg_frag()
Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst
Update msk->first_pending
->
While first_pending isn't empty:
Call the scheduler (selects subflow and msk->snd_burst)
Send the dfrag data with mptcp_subflow_delegate(), break
Send the dfrag data with mptcp_sendmsg_frag()
For each pending dfrag:
While sends succeed:
Send the dfrag data with mptcp_sendmsg_frag()
Update already_sent, snd_nxt, snd_burst
Update msk->first_pending
Break if required by msk->snd_burst / etc
Move the duplicate code from __mptcp_push_pending() and
__mptcp_subflow_push_pending() into a new helper function, named
__subflow_push_pending(). Simplify __mptcp_push_pending() and
__mptcp_subflow_push_pending() by invoking this helper.
Also move the burst check conditions out of the function
mptcp_subflow_get_send(), check them in __subflow_push_pending() in
the inner "for each pending dfrag" loop.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-1-0c860fb256a8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable@vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282@syzkaller.appspotmail.com
Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7bfe861e@narfation.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting
PTP related workqueues.
In this way we can fix this:
BUG: unable to handle page fault for address: ffffc9000440b6f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
[...]
Workqueue: events igb_ptp_overflow_check
RIP: 0010:igb_rd32+0x1f/0x60
[...]
Call Trace:
igb_ptp_read_82580+0x20/0x50
timecounter_read+0x15/0x60
igb_ptp_overflow_check+0x1a/0x50
process_one_work+0x1cb/0x3c0
worker_thread+0x53/0x3f0
? rescuer_thread+0x370/0x370
kthread+0x142/0x160
? kthread_associate_blkcg+0xc0/0xc0
ret_from_fork+0x1f/0x30
Fixes: 1f6e8178d685 ("igb: Prevent dropped Tx timestamps via work items and interrupts.")
Fixes: d339b1331616 ("igb: add PTP Hardware Clock code")
Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-21 (ice)
This series contains updates to ice driver only.
Jesse fixes an issue on calculating buffer size.
Petr Oros reverts a commit that does not fully resolve VF reset issues
and implements one that provides a fuller fix.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix NULL pointer deref during VF reset
Revert "ice: Fix ice VF reset during iavf initialization"
ice: fix receive buffer size miscalculation
====================
Link: https://lore.kernel.org/r/20230821171633.2203505-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Oliver Hartkopp says:
====================
CAN fixes for 6.5-rc7
The isotp fix removes an unnecessary check which leads to delays and/or
a wrong error notification.
The fix for the CAN_RAW socket solves the last issue that has been
introduced with commit ee8b94c8510c ("can: raw: fix receiver memory leak")
in this upstream cycle (detected by Eric Dumazet).
====================
Link: https://lore.kernel.org/r/20230821144547.6658-1-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|