summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-05-30net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) QdiscsPeilin Ye
Currently it is possible to add e.g. an HTB Qdisc under ffff:fff1 (TC_H_INGRESS, TC_H_CLSACT): $ ip link add name ifb0 type ifb $ tc qdisc add dev ifb0 parent ffff:fff1 htb $ tc qdisc add dev ifb0 clsact Error: Exclusivity flag on, cannot modify. $ drgn ... >>> ifb0 = netdev_get_by_name(prog, "ifb0") >>> qdisc = ifb0.ingress_queue.qdisc_sleeping >>> print(qdisc.ops.id.string_().decode()) htb >>> qdisc.flags.value_() # TCQ_F_INGRESS 2 Only allow ingress and clsact Qdiscs under ffff:fff1. Return -EINVAL for everything else. Make TCQ_F_INGRESS a static flag of ingress and clsact Qdiscs. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: 1f211a1b929c ("net, sched: add clsact qdisc") Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30net/sched: sch_clsact: Only create under TC_H_CLSACTPeilin Ye
clsact Qdiscs are only supposed to be created under TC_H_CLSACT (which equals TC_H_INGRESS). Return -EOPNOTSUPP if 'parent' is not TC_H_CLSACT. Fixes: 1f211a1b929c ("net, sched: add clsact qdisc") Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30net/sched: sch_ingress: Only create under TC_H_INGRESSPeilin Ye
ingress Qdiscs are only supposed to be created under TC_H_INGRESS. Return -EOPNOTSUPP if 'parent' is not TC_H_INGRESS, similar to mq_init(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/ Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30Merge tag 'for-6.4-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "One bug fix and two build warning fixes: - call proper end bio callback for metadata RAID0 in a rare case of an unaligned block - fix uninitialized variable (reported by gcc 10.2) - fix warning about potential access beyond array bounds on mips64 with 64k pages (runtime check would not allow that)" * tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix csum_tree_block page iteration to avoid tripping on -Werror=array-bounds btrfs: fix an uninitialized variable warning in btrfs_log_inode btrfs: call btrfs_orig_bbio_end_io in btrfs_end_bio_work
2023-05-30Merge tag 'perf-tools-fixes-for-v6.4-2-2023-05-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix BPF CO-RE naming convention for checking the availability of fields on 'union perf_mem_data_src' on the running kernel - Remove the use of llvm-strip on BPF skel object files, not needed, fixes a build breakage when the llvm package, that contains it in most distros, isn't installed - Fix tools that use both evsel->{bpf_counter_list,bpf_filters}, removing them from a union - Remove extra "--" from the 'perf ftrace latency' --use-nsec option, previously it was working only when using the '-n' alternative - Don't stop building when both binutils-devel and a C++ compiler isn't available to compile the alternative C++ demangle support code, disable that feature instead - Sync the linux/in.h and coresight-pmu.h header copies with the kernel sources - Fix relative include path to cs-etm.h * tag 'perf-tools-fixes-for-v6.4-2-2023-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf evsel: Separate bpf_counter_list and bpf_filters, can be used at the same time tools headers UAPI: Sync the linux/in.h with the kernel sources perf cs-etm: Copy kernel coresight-pmu.h header perf bpf: Do not use llvm-strip on BPF binary perf build: Don't compile demangle-cxx.cpp if not necessary perf arm: Fix include path to cs-etm.h perf bpf filter: Fix a broken perf sample data naming for BPF CO-RE perf ftrace latency: Remove unnecessary "--" from --use-nsec option
2023-05-30Merge tag 'regmap-fix-v6.4-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "The most important fix here is for missing dropping of the RCU read lock when syncing maple tree register caches, the physical devices I have that use the code don't do any syncing so I'd only ever tested this with virtual devices and missed the fact that we need to drop the lock in order to write to buses that need to sleep. Otherwise there's a fix for an edge case when splitting up large batch writes which has been lurking for a long time, a check to make sure nobody writes new drivers with a bug that was found in several SoundWire drivers and a tweak to the way the new kunit tests are enabled to ensure they don't cause regmap to be enabled when it wouldn't otherwise be" * tag 'regmap-fix-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: maple: Drop the RCU read lock while syncing registers regmap: sdw: check for invalid multi-register writes config regmap: Account for register length when chunking regmap: REGMAP_KUNIT should not select REGMAP
2023-05-30Merge tag 'modules-6.4-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules fix from Luis Chamberlain: "A fix is provided for ia64. Even though ia64 is on life support it helps to fix issues if we can. Thanks to Linus for doing tons of the ia64 debugging" * tag 'modules-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: fix module load for ia64
2023-05-30ext4: enable the lazy init thread when remounting read/writeTheodore Ts'o
In commit a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled") we defer clearing tyhe SB_RDONLY flag in struct super. However, we didn't defer when we checked sb_rdonly() to determine the lazy itable init thread should be enabled, with the next result that the lazy inode table initialization would not be properly started. This can cause generic/231 to fail in ext4's nojournal mode. Fix this by moving when we decide to start or stop the lazy itable init thread to after we clear the SB_RDONLY flag when we are remounting the file system read/write. Fixes a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until...") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230527035729.1001605-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: fix fsync for non-directoriesJan Kara
Commit e360c6ed7274 ("ext4: Drop special handling of journalled data from ext4_sync_file()") simplified ext4_sync_file() by dropping special handling of journalled data mode as it was not needed anymore. However that branch was also used for directories and symlinks and since the fastcommit code does not track metadata changes to non-regular files, the change has caused e.g. fsync(2) on directories to not commit transaction as it should. Fix the problem by adding handling for non-regular files. Fixes: e360c6ed7274 ("ext4: Drop special handling of journalled data from ext4_sync_file()") Reported-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/all/ZFqO3xVnmhL7zv1x@debian-BULLSEYE-live-builder-AMD64 Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20230524104453.8734-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: add lockdep annotations for i_data_sem for ea_inode'sTheodore Ts'o
Treat i_data_sem for ea_inodes as being in their own lockdep class to avoid lockdep complaints about ext4_setattr's use of inode_lock() on normal inodes potentially causing lock ordering with i_data_sem on ea_inodes in ext4_xattr_inode_write(). However, ea_inodes will be operated on by ext4_setattr(), so this isn't a problem. Cc: stable@kernel.org Link: https://syzkaller.appspot.com/bug?extid=298c5d8fb4a128bc27b0 Reported-by: syzbot+298c5d8fb4a128bc27b0@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-5-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: disallow ea_inodes with extended attributesTheodore Ts'o
An ea_inode stores the value of an extended attribute; it can not have extended attributes itself, or this will cause recursive nightmares. Add a check in ext4_iget() to make sure this is the case. Cc: stable@kernel.org Reported-by: syzbot+e44749b6ba4d0434cd47@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-4-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find()Theodore Ts'o
If the ea_inode has been pushed out of the inode cache while there is still a reference in the mb_cache, the lockdep subclass will not be set on the inode, which can lead to some lockdep false positives. Fixes: 33d201e0277b ("ext4: fix lockdep warning about recursive inode locking") Cc: stable@kernel.org Reported-by: syzbot+d4b971e744b1f5439336@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-3-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-30fbdev: bw2: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-30fbdev: broadsheetfb: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-30fbdev: au1200fb: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-30fbdev: au1100fb: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-30fbdev: arcfb: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-30fbdev: au1100fb: Drop if with an always false conditionUwe Kleine-König
The driver core never calls a remove callback with the platform_device pointer being NULL. So the check for this condition can just be dropped. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-30module: fix module load for ia64Song Liu
Frank reported boot regression in ia64 as: ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [ 0.000000] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20 CEST 2023 [ 0.000000] efi: EFI v1.1 by HP [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000 ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000 [ 0.000000] PCDP: v3 at 0x3fe28000 [ 0.000000] earlycon: uart8250 at MMIO 0x00000000f4050000 (options '9600n8') [ 0.000000] printk: bootconsole [uart8250] enabled [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x000000003FE2A000 000028 (v02 HP ) [ 0.000000] ACPI: XSDT 0x000000003FE2A02C 0000CC (v01 HP rx2620 00000000 HP 00000000) [...] [ 3.793350] Run /init as init process Loading, please wait... Starting systemd-udevd version 252.6-1 [ 3.951100] ------------[ cut here ]------------ [ 3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547 __layout_sections+0x370/0x3c0 [ 3.949512] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.951100] Modules linked in: [ 3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1 [ 3.956161] (udev-worker)[142]: Oops 11003706212352 [1] [ 3.951774] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [ 3.951774] [ 3.951774] Call Trace: [ 3.958339] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.956161] Modules linked in: [ 3.951774] [<a0000001000156d0>] show_stack.part.0+0x30/0x60 [ 3.951774] sp=e000000183a67b20 bsp=e000000183a61628 [ 3.956161] [ 3.956161] which bisect to module_memory change [1]. Debug showed that ia64 uses some special sections: __layout_sections: section .got (sh_flags 10000002) matched to MOD_INVALID __layout_sections: section .sdata (sh_flags 10000003) matched to MOD_INVALID __layout_sections: section .sbss (sh_flags 10000003) matched to MOD_INVALID All these sections are loaded to module core memory before [1]. Fix ia64 boot by loading these sections to MOD_DATA (core rw data). [1] commit ac3b43283923 ("module: replace module_layout with module_memory") Fixes: ac3b43283923 ("module: replace module_layout with module_memory") Reported-by: Frank Scheiner <frank.scheiner@web.de> Closes: https://lists.debian.org/debian-ia64/2023/05/msg00010.html Closes: https://marc.info/?l=linux-ia64&m=168509859125505 Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-30Merge branch 'selftests-mptcp-skip-tests-not-supported-by-old-kernels-part-1'Paolo Abeni
Matthieu Baerts says: ==================== selftests: mptcp: skip tests not supported by old kernels (part 1) After a few years of increasing test coverage in the MPTCP selftests, we realised [1] the last version of the selftests is supposed to run on old kernels without issues. Supporting older versions is not that easy for this MPTCP case: these selftests are often validating the internals by checking packets that are exchanged, when some MIB counters are incremented after some actions, how connections are getting opened and closed in some cases, etc. In other words, it is not limited to the socket interface between the userspace and the kernelspace. In addition, the current selftests run a lot of different sub-tests but the TAP13 protocol used in the selftests don't support sub-tests: in other words, one failure in sub-tests implies that the whole selftest is seen as failed at the end because sub-tests are not tracked. It is then important to skip sub-tests not supported by old kernels. To minimise the modifications and reduce the complexity to support old versions, the idea is to look at external signs and skip the whole selftests or just some sub-tests before starting them. This first part focuses on marking the different selftests as skipped if MPTCP is not even supported. That's what is done in patches 2 to 8. Patch 2/8 introduces a new file (mptcp_lib.sh) to be able to re-use some helpers in the different selftests. The first MPTCP selftest has been introduced in v5.6. Patch 1/8 is a bit different but still linked: it modifies mptcp_join.sh selftest not to use 'cmp --bytes' which is not supported by the BusyBox implementation. It is apparently quite common to use BusyBox in CI environments. This tool is needed for a subtest introduced in v6.1. Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/ [1] Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 ==================== Link: https://lore.kernel.org/r/20230528-upstream-net-20230528-mptcp-selftests-support-old-kernels-part-1-v1-0-a32d85577fc6@tessares.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: userspace pm: skip if MPTCP is not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting MPTCP. A new check is then added to make sure MPTCP is supported. If not, the test stops and is marked as "skipped". Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 259a834fadda ("selftests: mptcp: functional tests for the userspace PM type") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: sockopt: skip if MPTCP is not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting MPTCP. A new check is then added to make sure MPTCP is supported. If not, the test stops and is marked as "skipped". Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: dc65fe82fb07 ("selftests: mptcp: add packet mark test case") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: simult flows: skip if MPTCP is not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting MPTCP. A new check is then added to make sure MPTCP is supported. If not, the test stops and is marked as "skipped". Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: diag: skip if MPTCP is not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting MPTCP. A new check is then added to make sure MPTCP is supported. If not, the test stops and is marked as "skipped". Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: join: skip if MPTCP is not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting MPTCP. A new check is then added to make sure MPTCP is supported. If not, the test stops and is marked as "skipped". Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: pm nl: skip if MPTCP is not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting MPTCP. A new check is then added to make sure MPTCP is supported. If not, the test stops and is marked as "skipped". Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: eedbc685321b ("selftests: add PM netlink functional tests") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: connect: skip if MPTCP is not supportedMatthieu Baerts
Selftests are supposed to run on any kernels, including the old ones not supporting MPTCP. A new check is then added to make sure MPTCP is supported. If not, the test stops and is marked as "skipped". Note that this check can also mark the test as failed if 'SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES' env var is set to 1: by doing that, we can make sure a test is not being skipped by mistake. A new shared file is added here to be able to re-used the same check in the different selftests we have. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30selftests: mptcp: join: avoid using 'cmp --bytes'Matthieu Baerts
BusyBox's 'cmp' command doesn't support the '--bytes' parameter. Some CIs -- i.e. LKFT -- use BusyBox and have the mptcp_join.sh test failing [1] because their 'cmp' command doesn't support this '--bytes' option: cmp: unrecognized option '--bytes=1024' BusyBox v1.35.0 () multi-call binary. Usage: cmp [-ls] [-n NUM] FILE1 [FILE2] Instead, 'head --bytes' can be used as this option is supported by BusyBox. A temporary file is needed for this operation. Because it is apparently quite common to use BusyBox, it is certainly better to backport this fix to impacted kernels. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Link: https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc5-5-g148341f0a2f5/testrun/16088933/suite/kselftest-net-mptcp/test/net_mptcp_userspace_pm_sh/log [1] Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30net: mana: Fix perf regression: remove rx_cqes, tx_cqes countersHaiyang Zhang
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the frequent and parallel code path of all queues. So, r/w into this single shared variable by many threads on different CPUs creates a lot caching and memory overhead, hence perf regression. And, it's not accurate due to the high volume concurrent r/w. For example, a workload is iperf with 128 threads, and with RPS enabled. We saw perf regression of 25% with the previous patch adding the counters. And this patch eliminates the regression. Since the error path of mana_poll_rx_cq() already has warnings, so keeping the counter and convert it to a per-queue variable is not necessary. So, just remove this counter from this high frequency code path. Also, remove the tx_cqes counter for the same reason. We have warnings & other counters for errors on that path, and don't need to count every normal cqe processing. Cc: stable@vger.kernel.org Fixes: bd7fc6e1957c ("net: mana: Add new MANA VF performance counters for easier troubleshooting") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1685115537-31675-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30Merge branch 'two-fixes-for-smcrv2'Paolo Abeni
Wen Gu says: ==================== Two fixes for SMCRv2 This patch set includes two bugfix for SMCRv2. ==================== Link: https://lore.kernel.org/r/1685101741-74826-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30net/smc: Don't use RMBs not mapped to new link in SMCRv2 ADD LINKWen Gu
We encountered a crash when using SMCRv2. It is caused by a logical error in smc_llc_fill_ext_v2(). BUG: kernel NULL pointer dereference, address: 0000000000000014 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 453 Comm: kworker/7:4 Kdump: loaded Tainted: G W E 6.4.0-rc3+ #44 Workqueue: events smc_llc_add_link_work [smc] RIP: 0010:smc_llc_fill_ext_v2+0x117/0x280 [smc] RSP: 0018:ffffacb5c064bd88 EFLAGS: 00010282 RAX: ffff9a6bc1c3c02c RBX: ffff9a6be3558000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000002 RDI: 000000000000000a RBP: ffffacb5c064bdb8 R08: 0000000000000040 R09: 000000000000000c R10: ffff9a6bc0910300 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000002 R14: ffff9a6bc1c3c02c R15: ffff9a6be3558250 FS: 0000000000000000(0000) GS:ffff9a6eefdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000014 CR3: 000000010b078003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> smc_llc_send_add_link+0x1ae/0x2f0 [smc] smc_llc_srv_add_link+0x2c9/0x5a0 [smc] ? cc_mkenc+0x40/0x60 smc_llc_add_link_work+0xb8/0x140 [smc] process_one_work+0x1e5/0x3f0 worker_thread+0x4d/0x2f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> When an alernate RNIC is available in system, SMC will try to add a new link based on the RNIC for resilience. All the RMBs in use will be mapped to the new link. Then the RMBs' MRs corresponding to the new link will be filled into SMCRv2 LLC ADD LINK messages. However, smc_llc_fill_ext_v2() mistakenly accesses to unused RMBs which haven't been mapped to the new link and have no valid MRs, thus causing a crash. So this patch fixes the logic. Fixes: b4ba4652b3f8 ("net/smc: extend LLC layer for SMC-Rv2") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30net/smc: Scan from current RMB list when no position specifiedWen Gu
When finding the first RMB of link group, it should start from the current RMB list whose index is 0. So fix it. Fixes: b4ba4652b3f8 ("net/smc: extend LLC layer for SMC-Rv2") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30rxrpc: Truncate UTS_RELEASE for rxrpc versionDavid Howells
UTS_RELEASE has a maximum length of 64 which can cause rxrpc_version to exceed the 65 byte message limit. Per the rx spec[1]: "If a server receives a packet with a type value of 13, and the client-initiated flag set, it should respond with a 65-byte payload containing a string that identifies the version of AFS software it is running." The current implementation causes a compile error when WERROR is turned on and/or UTS_RELEASE exceeds the length of 49 (making the version string more than 64 characters). Fix this by generating the string during module initialisation and limiting the UTS_RELEASE segment of the string does not exceed 49 chars. We need to make sure that the 64 bytes includes "linux-" at the front and " AF_RXRPC" at the back as this may be used in pattern matching. Fixes: 44ba06987c0b ("RxRPC: Handle VERSION Rx protocol packets") Reported-by: Kenny Ho <Kenny.Ho@amd.com> Link: https://lore.kernel.org/r/20230523223944.691076-1-Kenny.Ho@amd.com/ Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Kenny Ho <Kenny.Ho@amd.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Andrew Lunn <andrew@lunn.ch> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Link: https://web.mit.edu/kolya/afs/rx/rx-spec [1] Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Link: https://lore.kernel.org/r/654974.1685100894@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-29tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss setCambda Zhu
This patch replaces the tp->mss_cache check in getting TCP_MAXSEG with tp->rx_opt.user_mss check for CLOSE/LISTEN sock. Since tp->mss_cache is initialized with TCP_MSS_DEFAULT, checking if it's zero is probably a bug. With this change, getting TCP_MAXSEG before connecting will return default MSS normally, and return user_mss if user_mss is set. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Jack Yang <mingliang@linux.alibaba.com> Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89i+3kL9pYtkxkwxwNMzvC_w3LNUum_2=3u+UyLBmGmifHA@mail.gmail.com/#t Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Link: https://lore.kernel.org/netdev/14D45862-36EA-4076-974C-EA67513C92F6@linux.alibaba.com/ Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230527040317.68247-1-cambda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29tcp: deny tcp_disconnect() when threads are waitingEric Dumazet
Historically connect(AF_UNSPEC) has been abused by syzkaller and other fuzzers to trigger various bugs. A recent one triggers a divide-by-zero [1], and Paolo Abeni was able to diagnose the issue. tcp_recvmsg_locked() has tests about sk_state being not TCP_LISTEN and TCP REPAIR mode being not used. Then later if socket lock is released in sk_wait_data(), another thread can call connect(AF_UNSPEC), then make this socket a TCP listener. When recvmsg() is resumed, it can eventually call tcp_cleanup_rbuf() and attempt a divide by 0 in tcp_rcv_space_adjust() [1] This patch adds a new socket field, counting number of threads blocked in sk_wait_event() and inet_wait_for_connect(). If this counter is not zero, tcp_disconnect() returns an error. This patch adds code in blocking socket system calls, thus should not hurt performance of non blocking ones. Note that we probably could revert commit 499350a5a6e7 ("tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0") to restore original tcpi_rcv_mss meaning (was 0 if no payload was ever received on a socket) [1] divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 13832 Comm: syz-executor.5 Not tainted 6.3.0-rc4-syzkaller-00224-g00c7b5f4ddc5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 RIP: 0010:tcp_rcv_space_adjust+0x36e/0x9d0 net/ipv4/tcp_input.c:740 Code: 00 00 00 00 fc ff df 4c 89 64 24 48 8b 44 24 04 44 89 f9 41 81 c7 80 03 00 00 c1 e1 04 44 29 f0 48 63 c9 48 01 e9 48 0f af c1 <49> f7 f6 48 8d 04 41 48 89 44 24 40 48 8b 44 24 30 48 c1 e8 03 48 RSP: 0018:ffffc900033af660 EFLAGS: 00010206 RAX: 4a66b76cbade2c48 RBX: ffff888076640cc0 RCX: 00000000c334e4ac RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000001 RBP: 00000000c324e86c R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880766417f8 R13: ffff888028fbb980 R14: 0000000000000000 R15: 0000000000010344 FS: 00007f5bffbfe700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32f25000 CR3: 000000007ced0000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_recvmsg_locked+0x100e/0x22e0 net/ipv4/tcp.c:2616 tcp_recvmsg+0x117/0x620 net/ipv4/tcp.c:2681 inet6_recvmsg+0x114/0x640 net/ipv6/af_inet6.c:670 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg+0xe2/0x160 net/socket.c:1038 ____sys_recvmsg+0x210/0x5a0 net/socket.c:2720 ___sys_recvmsg+0xf2/0x180 net/socket.c:2762 do_recvmmsg+0x25e/0x6e0 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5c0108c0f9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f5bffbfe168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00007f5c011ac050 RCX: 00007f5c0108c0f9 RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000003 RBP: 00007f5c010e7b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5c012cfb1f R14: 00007f5bffbfe300 R15: 0000000000022000 </TASK> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Diagnosed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20230526163458.2880232-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29af_packet: do not use READ_ONCE() in packet_bind()Eric Dumazet
A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt() This is better handled by reading pkt_sk(sk)->num later in packet_do_bind() while appropriate lock is held. READ_ONCE() in writers are often an evidence of something being wrong. Fixes: 822b5a1c17df ("af_packet: Fix data-races of pkt_sk(sk)->num.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230526154342.2533026-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29netlink: specs: correct types of legacy arraysJakub Kicinski
ethtool has some attrs which dump multiple scalars into an attribute. The spec currently expects one attr per entry. Fixes: a353318ebf24 ("tools: ynl: populate most of the ethtool spec") Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230526220653.65538-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-29net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818Sebastian Krzyszkowiak
BM818 is based on Qualcomm MDM9607 chipset. Fixes: 9a07406b00cd ("net: usb: qmi_wwan: Add the BroadMobi BM818 card") Cc: stable@vger.kernel.org Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20230526-bm818-dtr-v1-1-64bbfa6ba8af@puri.sm Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30ata: libata-scsi: Use correct device no in ata_find_dev()Damien Le Moal
For devices not attached to a port multiplier and managed directly by libata, the device number passed to ata_find_dev() must always be lower than the maximum number of devices returned by ata_link_max_devices(). That is 1 for SATA devices or 2 for an IDE link with master+slave devices. This device number is the SCSI device ID which matches these constraints as the IDs are generated per port and so never exceed the maximum number of devices for the link being used. However, for libsas managed devices, SCSI device IDs are assigned per struct scsi_host, leading to device IDs for SATA devices that can be well in excess of libata per-link maximum number of devices. This results in ata_find_dev() to always return NULL for libsas managed devices except for the first device of the target scsi_host with ID (device number) equal to 0. This issue is visible by executing the hdparm utility, which fails. E.g.: hdparm -i /dev/sdX /dev/sdX: HDIO_GET_IDENTITY failed: No message of desired type Fix this by rewriting ata_find_dev() to ignore the device number for non-PMP attached devices with a link with at most 1 device, that is SATA devices. For these, the device number 0 is always used to return the correct pointer to the struct ata_device of the port link. This change excludes IDE master/slave setups (maximum number of devices per link is 2) and port-multiplier attached devices. Also, to be consistant with the fact that SCSI device IDs and channel numbers used as device numbers are both unsigned int, change the devno argument of ata_find_dev() to unsigned int. Reported-by: Xingui Yang <yangxingui@huawei.com> Fixes: 41bda9c98035 ("libata-link: update hotplug to handle PMP links") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jason Yan <yanaijie@huawei.com>
2023-05-29RDMA/irdma: Fix Local Invalidate fencingMustafa Ismail
If the local invalidate fence is indicated in the WR, only the read fence is currently being set in WQE. Fix this to set both the read and local fence in the WQE. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20230522155654.1309-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-29RDMA/irdma: Prevent QP use after freeMustafa Ismail
There is a window where the poll cq may use a QP that has been freed. This can happen if a CQE is polled before irdma_clean_cqes() can clear the CQE's related to the QP and the destroy QP races to free the QP memory. then the QP structures are used in irdma_poll_cq. Fix this by moving the clearing of CQE's before the reference is removed and the QP is destroyed. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20230522155654.1309-3-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-29MAINTAINERS: Update maintainer of Amazon EFA driverMichael Margolin
Change EFA driver maintainer from Gal Pressman to myself. Keep Gal as a reviewer at his request. Link: https://lore.kernel.org/r/20230525094444.12570-1-mrgolin@amazon.com Signed-off-by: Michael Margolin <mrgolin@amazon.com> Acked-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-29Merge tag 'trace-v6.4-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "User events: - Use long instead of int for storing the enable set/clear bit, as it was found that big endian machines could end up using the wrong bits. - Split allocating mm and attaching it. This keeps the allocation separate from the registration and avoids various races. - Remove RCU locking around pin_user_pages_remote() as that can schedule. The RCU protection is no longer needed with the above split of mm allocation and attaching. - Rename the "link" fields of the various structs to something more meaningful. - Add comments around user_event_mm struct usage and locking requirements. Timerlat tracer: - Fix missed wakeup of timerlat thread caused by the timerlat interrupt triggering when tracing is off. The timer interrupt handler needs to always wake up the timerlat thread regardless if tracing is enabled or not, otherwise, it will never wake up. Histograms: - Fix regression of breaking the "stacktrace" modifier for variables. That modifier cannot be used for values, but can be used for variables that are passed from one histogram to the next. This was broken when adding the restriction to values as the variable logic used the same code. - Rename the special field "stacktrace" to "common_stacktrace". Special fields (that are not actually part of the event, but can act just like event fields, like 'comm' and 'timestamp') should be prefixed with 'common_' for consistency. To keep backward compatibility, 'stacktrace' can still be used (as with the special field 'cpu'), but can be overridden if the event has a field called 'stacktrace'. - Update the synthetic event selftests to use the new name (synthetic events are created by histograms) Tracing bootup selftests: - Reorganize the code to keep artifacts of the selftests not compiled in when selftests are not configured. - Add various cond_resched() around the selftest code, as the softlock watchdog was triggering much more often. It appears that the kernel runs slower now with full debugging enabled. - While debugging ftrace with ftrace (using an instance ring buffer instead of the top level one), I found that the selftests were disabling prints to the debug instance. This should not happen, as the selftests only disable printing to the main buffer as the selftests examine the main buffer to see if it has what it expects, and prints can make the tests fail. Make the selftests only disable printing to the toplevel buffer, and leave the instance buffers alone" * tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Have function_graph selftest call cond_resched() tracing: Only make selftest conditionals affect the global_trace tracing: Make tracing_selftest_running/delete nops when not used tracing: Have tracer selftests call cond_resched() before running tracing: Move setting of tracing_selftest_running out of register_tracer() tracing/selftests: Update synthetic event selftest to use common_stacktrace tracing: Rename stacktrace field to common_stacktrace tracing/histograms: Allow variables to have some modifiers tracing/user_events: Document user_event_mm one-shot list usage tracing/user_events: Rename link fields for clarity tracing/user_events: Remove RCU lock while pinning pages tracing/user_events: Split up mm alloc and attach tracing/timerlat: Always wakeup the timerlat thread tracing/user_events: Use long vs int for atomic bit ops
2023-05-29Merge tag 'v6.4-p3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix an alignment crash in x86/aria" * tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
2023-05-29Revert "module: error out early on concurrent load of the same module file"Linus Torvalds
This reverts commit 9828ed3f695a138f7add89fa2a186ababceb8006. Sadly, it does seem to cause failures to load modules. Johan Hovold reports: "This change breaks module loading during boot on the Lenovo Thinkpad X13s (aarch64). Specifically it results in indefinite probe deferral of the display and USB (ethernet) which makes it a pain to debug. Typing in the dark to acquire some logs reveals that other modules are missing as well" Since this was applied late as a "let's try this", I'm reverting it asap, and we can try to figure out what goes wrong later. The excessive parallel module loading problem is annoying, but not noticeable in normal situations, and this was only meant as an optimistic workaround for a user-space bug. One possible solution may be to do the optimistic exclusive open first, and then use a lock to serialize loading if that fails. Reported-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/lkml/ZHRpH-JXAxA6DnzR@hovoldconsulting.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-28tracing: Have function_graph selftest call cond_resched()Steven Rostedt (Google)
When all kernel debugging is enabled (lockdep, KSAN, etc), the function graph enabling and disabling can take several seconds to complete. The function_graph selftest enables and disables function graph tracing several times. With full debugging enabled, the soft lockup watchdog was triggering because the selftest was running without ever scheduling. Add cond_resched() throughout the test to make sure it does not trigger the soft lockup detector. Link: https://lkml.kernel.org/r/20230528051742.1325503-6-rostedt@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28tracing: Only make selftest conditionals affect the global_traceSteven Rostedt (Google)
The tracing_selftest_running and tracing_selftest_disabled variables were to keep trace_printk() and other writes from affecting the tracing selftests, as the tracing selftests would examine the ring buffer to see if it contained what it expected or not. trace_printk() and friends could add to the ring buffer and cause the selftests to fail (and then disable the tracer that was being tested). To keep that from happening, these variables were added and would keep trace_printk() and friends from writing to the ring buffer while the tests were going on. But this was only the top level ring buffer (owned by the global_trace instance). There is no reason to prevent writing into ring buffers of other instances via the trace_array_printk() and friends. For the functions that could be used by other instances, check if the global_trace is the tracer instance that is being written to before deciding to not allow the write. Link: https://lkml.kernel.org/r/20230528051742.1325503-5-rostedt@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28tracing: Make tracing_selftest_running/delete nops when not usedSteven Rostedt (Google)
There's no reason to test the condition variables tracing_selftest_running or tracing_selftest_delete when tracing selftests are not enabled. Make them define 0s when not the selftests are not configured in. Link: https://lkml.kernel.org/r/20230528051742.1325503-4-rostedt@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28tracing: Have tracer selftests call cond_resched() before runningSteven Rostedt (Google)
As there are more and more internal selftests being added to the Linux kernel (KSAN, lockdep, etc) the selftests are taking longer to run when these are enabled. Add a cond_resched() to the calling of do_run_tracer_selftest() to force a schedule if NEED_RESCHED is set, otherwise the soft lockup watchdog may trigger on boot up. Link: https://lkml.kernel.org/r/20230528051742.1325503-3-rostedt@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-28tracing: Move setting of tracing_selftest_running out of register_tracer()Steven Rostedt (Google)
The variables tracing_selftest_running and tracing_selftest_disabled are only used for when CONFIG_FTRACE_STARTUP_TEST is enabled. Make them only visible within the selftest code. The setting of those variables are in the register_tracer() call, and set in a location where they do not need to be. Create a wrapper around run_tracer_selftest() called do_run_tracer_selftest() which sets those variables, and have register_tracer() call that instead. Having those variables only set within the CONFIG_FTRACE_STARTUP_TEST scope gets rid of them (and also the ability to remove testing against them) when the startup tests are not enabled (most cases). Link: https://lkml.kernel.org/r/20230528051742.1325503-2-rostedt@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>