summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-23selftests/nolibc: add chmod_argv0 testZhangjin Wu
argv0 is readable and chmodable, let's use it for chmod test, but a safe umask should be used, the readable and executable modes should be reserved. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: chroot_exe: remove procfs dependencyZhangjin Wu
Since argv0 also works for CONFIG_PROC_FS=n, let's use it instead of '/proc/self/exe'. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: stat_timestamps: remove procfs dependencyZhangjin Wu
'/proc/self/' is a good path which doesn't have stale time info but it is only available for CONFIG_PROC_FS=y. When CONFIG_PROC_FS=n, use argv0 instead of '/proc/self', use '/' for the worst case. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: chdir_root: restore current path after testZhangjin Wu
The PWD environment variable has the path of the nolibc-test program, the current path must be the same as it, otherwise, the test cases will fail with relative path (e.g. ./nolibc-test). Since only chdir_root really changes the current path, let's restore it with the PWD environment variable. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: vfprintf: remove MEMFD_CREATE dependencyZhangjin Wu
The vfprintf test case require to open a temporary file to write, the old memfd_create() method is perfect but has strong dependency on MEMFD_CREATE and also TMPFS or HUGETLBFS (see fs/Kconfig): config MEMFD_CREATE def_bool TMPFS || HUGETLBFS And from v6.2, MFD_NOEXEC_SEAL must be passed for the non-executable memfd, otherwise, The kernel warning will be output to the test result like this: Running test 'vfprintf' 0 emptymemfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=1 'init' "" = "" [OK] To avoid such warning and also to remove the MEMFD_CREATE dependency, let's open a file from tmpfs directly. The /tmp directory is used to detect the existing of tmpfs, if not there, skip instead of fail. And further, for pid == 1, the initramfs is loaded as ramfs, which can be used as tmpfs, so, it is able to further remove TMPFS dependency too. Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/9ad51430-b7c0-47dc-80af-20c86539498d@t-8ch.de Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: prepare /tmp for tests that need to writeZhangjin Wu
create a /tmp directory. If it succeeds, the directory is writable, which is normally the case when booted from an initramfs anyway. This will be used instead of procfs for some tests. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Link: https://lore.kernel.org/lkml/20230710050600.9697-1-falcon@tinylab.org/ [wt: removed the unneeded mount() call] Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: fix up failures when CONFIG_PROC_FS=nZhangjin Wu
For CONFIG_PROC_FS=n, the /proc is not mountable, but the /proc directory has been created in the prepare() stage whenever /proc is there or not. so, the checking of /proc in the run_syscall() stage will be always true and at last it will fail all of the procfs dependent test cases, which deviates from the 'cond' check design of the EXPECT_xx macros, without procfs, these test cases should be skipped instead of failed. To solve this issue, one method is checking /proc/self instead of /proc, another method is removing the /proc directory completely for CONFIG_PROC_FS=n, we apply the second method to avoid misleading the users. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add a new rmdir() test caseZhangjin Wu
A new rmdir_blah test case is added to remove a non-existing /blah, which expects failure with ENOENT errno. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add rmdir() supportZhangjin Wu
a reverse operation of mkdir() is meaningful, add rmdir() here. required by nolibc-test to remove /proc while CONFIG_PROC_FS is not enabled. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: link_cross: use /proc/self/cmdlineZhangjin Wu
For CONFIG_NET=n, there would be no /proc/self/net, so, use /proc/self/cmdline instead. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: fix up kernel parameters supportZhangjin Wu
kernel parameters allow pass two types of strings, one type is like 'noapic', another type is like 'panic=5', the first type is passed as arguments of the init program, the second type is passed as environment variables of the init program. when users pass kernel parameters like this: noapic NOLIBC_TEST=syscall our nolibc-test program will use the test setting from argv[1] and ignore the one from NOLIBC_TEST environment variable, and at last, it will print the following line and ignore the whole test setting. Ignoring unknown test name 'noapic' reversing the parsing order does solve the above issue: test = getenv("NOLIBC_TEST"); if (test) test = argv[1]; but it still doesn't work with such kernel parameters (without NOLIBC_TEST environment variable): noapic FOO=bar To support all of the potential kernel parameters, let's verify the test setting from both of argv[1] and NOLIBC_TEST environment variable. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: prefer <sys/reboot.h> to <linux/reboot.h>Zhangjin Wu
Since both glibc and musl provide RB_ flags via <sys/reboot.h>, and we just add RB_ flags for nolibc, let's use RB_ flags instead of LINUX_REBOOT_ flags and only reserve the required <sys/reboot.h> header. This allows compile libc-test for musl libc without the linux headers. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: types.h: add RB_ flags for reboot()Zhangjin Wu
Both glibc and musl provide RB_ flags via <sys/reboot.h> for reboot(), they don't need to include <linux/reboot.h>, let nolibc provide RB_ flags too. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: fix up int_fast16/32_t test cases for muslZhangjin Wu
musl limits the fast signed int in 32bit, but glibc and nolibc don't, to let such test cases work on musl, let's provide the type based SINT_MAX_OF_TYPE(type) and SINT_MIN_OF_TYPE(type). Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/bc635c4f-67fe-4e86-bfdf-bcb4879b928d@t-8ch.de/ Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add _LARGEFILE64_SOURCE for muslZhangjin Wu
_GNU_SOURCE Implies _LARGEFILE64_SOURCE in glibc, but in musl, the default configuration doesn't enable _LARGEFILE64_SOURCE. >From include/dirent.h of musl, getdents64 is provided as getdents when _LARGEFILE64_SOURCE is defined. #if defined(_LARGEFILE64_SOURCE) ... #define getdents64 getdents #endif Let's define _LARGEFILE64_SOURCE to fix up this compile error: tools/testing/selftests/nolibc/nolibc-test.c: In function ‘test_getdents64’: tools/testing/selftests/nolibc/nolibc-test.c:453:8: warning: implicit declaration of function ‘getdents64’; did you mean ‘getdents’? [-Wimplicit-function-declaration] 453 | ret = getdents64(fd, (void *)buffer, sizeof(buffer)); | ^~~~~~~~~~ | getdents /usr/bin/ld: /tmp/ccKILm5u.o: in function `test_getdents64': nolibc-test.c:(.text+0xe3e): undefined reference to `getdents64' collect2: error: ld returned 1 exit status Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: gettid: restore for glibc and muslZhangjin Wu
As the gettid manpage [1] shows, glibc 2.30 has gettid support, so, let's enable the test for glibc >= 2.30. gettid works on musl too. [1]: https://man7.org/linux/man-pages/man2/gettid.2.html Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: stat_fault: silence NULL argument warning with glibcZhangjin Wu
Use another invalid address (void *)1 instead of NULL to silence this compile warning with glibc: $ make libc-test CC libc-test nolibc-test.c: In function ‘run_syscall’: nolibc-test.c:622:49: warning: null argument where non-null required (argument 1) [-Wnonnull] 622 | CASE_TEST(stat_fault); EXPECT_SYSER(1, stat(NULL, &stat_buf), -1, EFAULT); break; | ^~~~ nolibc-test.c:304:79: note: in definition of macro ‘EXPECT_SYSER2’ 304 | do { if (!cond) pad_spc(llen, 64, "[SKIPPED]\n"); else ret += expect_syserr2(expr, expret, experr1, experr2, llen); } while (0) | ^~~~ nolibc-test.c:622:33: note: in expansion of macro ‘EXPECT_SYSER’ 622 | CASE_TEST(stat_fault); EXPECT_SYSER(1, stat(NULL, &stat_buf), -1, EFAULT); break; Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add run-libc-test targetZhangjin Wu
allow run and report glibc or musl based libc-test. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add mmap_munmap_good test caseZhangjin Wu
mmap() a file with a good offset and then munmap() it. a non-zero offset is passed to test the 6th argument of my_syscall6(). Note, it is not easy to find a unique file for mmap() in different scenes, so, a file list is used to search the right one: - /dev/zero: is commonly used to allocate anonymous memory and is likely present and readable - /proc/1/exe: for 'run' and 'run-user' target, 'run-user' can not find '/proc/self/exe' - /proc/self/exe: for 'libc-test' target, normal program 'libc-test' has no permission to access '/proc/1/exe' - argv0: the path of the program itself, let it pass even with worst case scene: no procfs and no /dev/zero Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230702193306.GK16233@1wt.eu/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/bff82ea6-610b-4471-a28b-6c76c28604a6@t-8ch.de/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add munmap_bad test caseZhangjin Wu
The addr argument of munmap() must be a multiple of the page size, passing invalid (void *)1 addr expects failure with -EINVAL. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add mmap_bad test caseZhangjin Wu
The length argument of mmap() must be greater than 0, passing a zero length argument expects failure with -EINVAL. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add sbrk_0 to test current brk gettingZhangjin Wu
>From musl 0.9.14 (to the latest version 1.2.3), both sbrk() and brk() have almost been disabled for they conflict with malloc, only sbrk(0) is still permitted as a way to get the current location of the program break, let's support such case. EXPECT_PTRNE() is used to expect sbrk() always successfully getting the current break. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRERZhangjin Wu
The syscalls like sbrk() and mmap() return pointers, to test them, more pointer compare test macros are required, add them: - EXPECT_PTREQ() expects two equal pointers. - EXPECT_PTRNE() expects two non-equal pointers. - EXPECT_PTRER() expects failure with a specified errno. - EXPECT_PTRER2() expects failure with one of two specified errnos. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: prepare: create /dev/zeroZhangjin Wu
/dev/zero is commonly used to allocate anonymous memory, it is a very good file for tests, let's prepare it. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230702193306.GK16233@1wt.eu/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: export argv0 for some testsZhangjin Wu
argv0 is the path to nolibc-test program itself, which is a very good always existing readable file for some tests, let's export it. Note, the path may be absolute or relative, please make sure the tests work with both of them. If it is relative, we must make sure the current path is the one specified by the PWD environment variable. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/ZKKbS3cwKcHgnGwu@1wt.eu/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: clean up sbrk() routineZhangjin Wu
Fix up the error reported by scripts/checkpatch.pl: ERROR: do not use assignment in if condition #95: FILE: tools/include/nolibc/sys.h:95: + if ((ret = sys_brk(0)) && (sys_brk(ret + inc) == ret + inc)) Apply the new generic __sysret() to merge the SET_ERRNO() and return lines. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: clean up mmap() routineZhangjin Wu
Do several cleanups together: - Since all supported architectures have my_syscall6() now, remove the #ifdef check. - Move the mmap() related macros to tools/include/nolibc/types.h and reuse most of them from <linux/mman.h> - Apply the new generic __sysret() to convert the calling of sys_map() to oneline code Note, since MAP_FAILED is -1 on Linux, so we can use the generic __sysret() which returns -1 upon error and still satisfy user land that checks for MAP_FAILED. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230702192347.GJ16233@1wt.eu/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: __sysret: support syscalls who return a pointerZhangjin Wu
No official reference states the errno range, here aligns with musl and glibc and uses [-MAX_ERRNO, -1] instead of all negative ones. - musl: src/internal/syscall_ret.c - glibc: sysdeps/unix/sysv/linux/sysdep.h The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc defined in tools/include/nolibc/errno.h. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/ Suggested-by: David Laight <David.Laight@ACULAB.COM> Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add missing my_syscall6() for mipsZhangjin Wu
It is able to pass the 6th argument like the 5th argument via the stack for mips, let's add a new my_syscall6() now, see [1] for details: The mips/o32 system call convention passes arguments 5 through 8 on the user stack. Both mmap() and pselect6() require my_syscall6(). [1]: https://man7.org/linux/man-pages/man2/syscall.2.html Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-mips.h: shrink with _NOLIBC_SYSCALL_CLOBBERLISTZhangjin Wu
my_syscall<N> share the same long clobber list, define a macro for them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-loongarch.h: shrink with _NOLIBC_SYSCALL_CLOBBERLISTZhangjin Wu
my_syscall<N> share the same long clobber list, define a macro for them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23toolc/nolibc: arch-*.h: clean up whitespaces after __asm__Zhangjin Wu
replace "__asm__ volatile" with "__asm__ volatile" and insert necessary whitespace before "\" to make sure the lines are aligned. $ sed -i -e 's/__asm__ volatile ( /__asm__ volatile ( /g' tools/include/nolibc/*.h Note, arch-s390.h uses post-tab instead of post-whitespaces, must avoid insert whitespace just before the tabs: $ sed -i -e 's/__asm__ volatile (\t/__asm__ volatile (\t/g' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-*.h: fix up code indent errorsZhangjin Wu
More than 8 whitespaces of the code indent are replaced with "tab + whitespaces" to fix up such errors reported by scripts/checkpatch.pl: ERROR: code indent should use tabs where possible #64: FILE: tools/include/nolibc/arch-mips.h:64: +^I \$ ERROR: code indent should use tabs where possible #72: FILE: tools/include/nolibc/arch-mips.h:72: +^I "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \$ This command is used: $ sed -i -e '/^\t* /{s/ /\t/g}' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: simplify call to iopermThomas Weißschuh
Since commit 53fcfafa8c5c ("tools/nolibc/unistd: add syscall()") nolibc has support for syscall(2). Use it to get rid of some ifdef-ery. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-22Merge tag 'nf-next-23-08-22' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next First patch resolves a fortify warning by wrapping the to-be-copied members via struct_group. Second patch replaces array[0] with array[] in ebtables uapi. Both changes from GONG Ruiqi. The largest chunk is replacement of strncpy with strscpy_pad() in netfilter, from Justin Stitt. Last patch, from myself, aborts ruleset validation if a fatal signal is pending, this speeds up process exit. * tag 'nf-next-23-08-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow loop termination for pending fatal signal netfilter: xtables: refactor deprecated strncpy netfilter: x_tables: refactor deprecated strncpy netfilter: nft_meta: refactor deprecated strncpy netfilter: nft_osf: refactor deprecated strncpy netfilter: nf_tables: refactor deprecated strncpy netfilter: nf_tables: refactor deprecated strncpy netfilter: ipset: refactor deprecated strncpy netfilter: ebtables: replace zero-length array members netfilter: ebtables: fix fortify warnings in size_entry_mwt() ==================== Link: https://lore.kernel.org/r/20230822154336.12888-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22igc: Fix the typo in the PTM Control macroSasha Neftin
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM requests. The bit resolution of this field is six bits. That bit five was missing in the mask. This patch comes to correct the typo in the IGC_PTM_CTRL_SHRT_CYC macro. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22Merge branch 'mptcp-prepare-mptcp-packet-scheduler-for-bpf-extension'Jakub Kicinski
Mat Martineau says: ==================== mptcp: Prepare MPTCP packet scheduler for BPF extension The kernel's MPTCP packet scheduler has, to date, been a one-size-fits all algorithm that is hard-coded. It attempts to balance latency and throughput when transmitting data across multiple TCP subflows, and has some limited tunability through sysctls. It has been a long-term goal of the Linux MPTCP community to support customizable packet schedulers for use cases that need to make different trade-offs regarding latency, throughput, redundancy, and other metrics. BPF is well-suited for configuring customized, per-packet scheduling decisions without having to modify the kernel or manage out-of-tree kernel modules. The first steps toward implementing BPF packet schedulers are to update the existing MPTCP transmit loops to allow more flexible scheduling decisions, and to add infrastructure for swappable packet schedulers. The existing scheduling algorithm remains the default. BPF-related changes will be in a future patch series. This code has been in the MPTCP development tree for quite a while, undergoing testing in our CI and community. Patches 1 and 2 refactor the transmit code and do some related cleanup. Patches 3-9 add infrastructure for registering and calling multiple schedulers. Patch 10 connects the in-kernel default scheduler to the new infrastructure. ==================== Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-0-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: register default schedulerGeliang Tang
This patch defines the default packet scheduler mptcp_sched_default. Register it in mptcp_sched_init(), which is invoked in mptcp_proto_init(). Skip deleting this default scheduler in mptcp_unregister_scheduler(). Set msk->sched to the default scheduler when the input parameter of mptcp_init_sched() is NULL. Invoke mptcp_sched_default_get_subflow in get_send() and get_retrans() if the defaut scheduler is set or msk->sched is NULL. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-10-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: use get_retrans wrapperGeliang Tang
This patch adds the multiple subflows support for __mptcp_retrans(). Use get_retrans() wrapper instead of mptcp_subflow_get_retrans() in it. Check the subflow scheduled flags to test which subflow or subflows are picked by the scheduler, use them to send data. Move msk_owned_by_me() and fallback checks into get_retrans() wrapper from mptcp_subflow_get_retrans(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-9-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: use get_send wrapperGeliang Tang
This patch adds the multiple subflows support for __mptcp_push_pending and __mptcp_subflow_push_pending. Use get_send() wrapper instead of mptcp_subflow_get_send() in them. Check the subflow scheduled flags to test which subflow or subflows are picked by the scheduler, use them to send data. Move msk_owned_by_me() and fallback checks into get_send() wrapper from mptcp_subflow_get_send(). This commit allows the scheduler to set the subflow->scheduled bit in multiple subflows, but it does not allow for sending redundant data. Multiple scheduled subflows will send sequential data on each subflow. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-8-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: add scheduler wrappersGeliang Tang
This patch defines two packet scheduler wrappers mptcp_sched_get_send() and mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in them. Set data->reinject to true in mptcp_sched_get_retrans(), set it false in mptcp_sched_get_send(). If msk->sched is NULL, use default functions mptcp_subflow_get_send() and mptcp_subflow_get_retrans() to send data. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-7-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: add scheduled in mptcp_subflow_contextGeliang Tang
This patch adds a new member scheduled in struct mptcp_subflow_context, which will be set in the MPTCP scheduler context when the scheduler picks this subflow to send data. Add a new helper mptcp_subflow_set_scheduled() to set this flag using WRITE_ONCE(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-6-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: add sched in mptcp_sockGeliang Tang
This patch adds a new struct member sched in struct mptcp_sock. And two helpers mptcp_init_sched() and mptcp_release_sched() to init and release it. Init it with the sysctl scheduler in mptcp_init_sock(), copy the scheduler from the parent in mptcp_sk_clone(), and release it in __mptcp_destroy_sock(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-5-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: add a new sysctl schedulerGeliang Tang
This patch adds a new sysctl, named scheduler, to support for selection of different schedulers. Export mptcp_get_scheduler helper to get this sysctl. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: add struct mptcp_sched_opsGeliang Tang
This patch defines struct mptcp_sched_ops, which has three struct members, name, owner and list, and four function pointers: init(), release() and get_subflow(). The scheduler function get_subflow() have a struct mptcp_sched_data parameter, which contains a reinject flag for retrans or not, a subflows number and a mptcp_subflow_context array. Add the scheduler registering, unregistering and finding functions to add, delete and find a packet scheduler on the global list mptcp_sched_list. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: drop last_snd and MPTCP_RESET_SCHEDULERGeliang Tang
Since the burst check conditions have moved out of the function mptcp_subflow_get_send(), it makes all msk->last_snd useless. This patch drops them as well as the macro MPTCP_RESET_SCHEDULER. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-2-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: refactor push_pending logicGeliang Tang
To support redundant package schedulers more easily, this patch refactors __mptcp_push_pending() logic from: For each dfrag: While sends succeed: Call the scheduler (selects subflow and msk->snd_burst) Update subflow locks (push/release/acquire as needed) Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Push/release on final subflow -> While first_pending isn't empty: Call the scheduler (selects subflow and msk->snd_burst) Update subflow locks (push/release/acquire as needed) For each pending dfrag: While sends succeed: Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Break if required by msk->snd_burst / etc Push/release on final subflow Refactors __mptcp_subflow_push_pending logic from: For each dfrag: While sends succeed: Call the scheduler (selects subflow and msk->snd_burst) Send the dfrag data with mptcp_subflow_delegate(), break Send the dfrag data with mptcp_sendmsg_frag() Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst Update msk->first_pending -> While first_pending isn't empty: Call the scheduler (selects subflow and msk->snd_burst) Send the dfrag data with mptcp_subflow_delegate(), break Send the dfrag data with mptcp_sendmsg_frag() For each pending dfrag: While sends succeed: Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Break if required by msk->snd_burst / etc Move the duplicate code from __mptcp_push_pending() and __mptcp_subflow_push_pending() into a new helper function, named __subflow_push_pending(). Simplify __mptcp_push_pending() and __mptcp_subflow_push_pending() by invoking this helper. Also move the burst check conditions out of the function mptcp_subflow_get_send(), check them in __subflow_push_pending() in the inner "for each pending dfrag" loop. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-1-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22batman-adv: Hold rtnl lock during MTU update via netlinkSven Eckelmann
The automatic recalculation of the maximum allowed MTU is usually triggered by code sections which are already rtnl lock protected by callers outside of batman-adv. But when the fragmentation setting is changed via batman-adv's own batadv genl family, then the rtnl lock is not yet taken. But dev_set_mtu requires that the caller holds the rtnl lock because it uses netdevice notifiers. And this code will then fail the check for this lock: RTNL: assertion failed at net/core/dev.c (1953) Cc: stable@vger.kernel.org Reported-by: syzbot+f8812454d9b3ac00d282@syzkaller.appspotmail.com Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU") Signed-off-by: Sven Eckelmann <sven@narfation.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7bfe861e@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22igb: Avoid starting unnecessary workqueuesAlessio Igor Bogani
If ptp_clock_register() fails or CONFIG_PTP isn't enabled, avoid starting PTP related workqueues. In this way we can fix this: BUG: unable to handle page fault for address: ffffc9000440b6f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 1001e0067 PMD 107dc5067 PTE 0 Oops: 0000 [#1] PREEMPT SMP [...] Workqueue: events igb_ptp_overflow_check RIP: 0010:igb_rd32+0x1f/0x60 [...] Call Trace: igb_ptp_read_82580+0x20/0x50 timecounter_read+0x15/0x60 igb_ptp_overflow_check+0x1a/0x50 process_one_work+0x1cb/0x3c0 worker_thread+0x53/0x3f0 ? rescuer_thread+0x370/0x370 kthread+0x142/0x160 ? kthread_associate_blkcg+0xc0/0xc0 ret_from_fork+0x1f/0x30 Fixes: 1f6e8178d685 ("igb: Prevent dropped Tx timestamps via work items and interrupts.") Fixes: d339b1331616 ("igb: add PTP Hardware Clock code") Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821171927.2203644-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-21 (ice) This series contains updates to ice driver only. Jesse fixes an issue on calculating buffer size. Petr Oros reverts a commit that does not fully resolve VF reset issues and implements one that provides a fuller fix. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix NULL pointer deref during VF reset Revert "ice: Fix ice VF reset during iavf initialization" ice: fix receive buffer size miscalculation ==================== Link: https://lore.kernel.org/r/20230821171633.2203505-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>