summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-08-13net: kcm: Fix race condition in kcm_unattach()Sven Stegemann
syzbot found a race condition when kcm_unattach(psock) and kcm_release(kcm) are executed at the same time. kcm_unattach() is missing a check of the flag kcm->tx_stopped before calling queue_work(). If the kcm has a reserved psock, kcm_unattach() might get executed between cancel_work_sync() and unreserve_psock() in kcm_release(), requeuing kcm->tx_work right before kcm gets freed in kcm_done(). Remove kcm->tx_stopped and replace it by the less error-prone disable_work_sync(). Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+e62c9db591c30e174662@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e62c9db591c30e174662 Reported-by: syzbot+d199b52665b6c3069b94@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d199b52665b6c3069b94 Reported-by: syzbot+be6b1fdfeae512726b4e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=be6b1fdfeae512726b4e Signed-off-by: Sven Stegemann <sven@stegemann.de> Link: https://patch.msgid.link/20250812191810.27777-1-sven@stegemann.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13Merge branch 'ets-use-old-nbands-while-purging-unused-classes'Jakub Kicinski
Davide Caratti says: ==================== ets: use old 'nbands' while purging unused classes - patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc - patch 2/2 extends kselftests to verify effectiveness of the above fix ==================== Link: https://patch.msgid.link/cover.1755016081.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13selftests: net/forwarding: test purge of active DWRR classesDavide Caratti
Extend sch_ets.sh to add a reproducer for problematic list deletions when active DWRR class are purged by ets_qdisc_change() [1] [2]. [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/f3b9bacc73145f265c19ab80785933da5b7cbdec.1754581577.git.dcaratti@redhat.com/ Suggested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/489497cb781af7389011ca1591fb702a7391f5e7.1755016081.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13net/sched: ets: use old 'nbands' while purging unused classesDavide Caratti
Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify() after recent changes from Lion [2]. The problem is: in ets_qdisc_change() we purge unused DWRR queues; the value of 'q->nbands' is the new one, and the cleanup should be done with the old one. The problem is here since my first attempts to fix ets_qdisc_change(), but it surfaced again after the recent qdisc len accounting fixes. Fix it purging idle DWRR queues before assigning a new value of 'q->nbands', so that all purge operations find a consistent configuration: - old 'q->nbands' because it's needed by ets_class_find() - old 'q->nstrict' because it's needed by ets_class_is_strict() BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary) Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab RSP: 0018:ffffba186009f400 EFLAGS: 00010202 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000 FS: 00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ets_class_qlen_notify+0x65/0x90 [sch_ets] qdisc_tree_reduce_backlog+0x74/0x110 ets_qdisc_change+0x630/0xa40 [sch_ets] __tc_modify_qdisc.constprop.0+0x216/0x7f0 tc_modify_qdisc+0x7c/0x120 rtnetlink_rcv_msg+0x145/0x3f0 netlink_rcv_skb+0x53/0x100 netlink_unicast+0x245/0x390 netlink_sendmsg+0x21b/0x470 ____sys_sendmsg+0x39d/0x3d0 ___sys_sendmsg+0x9a/0xe0 __sys_sendmsg+0x7a/0xd0 do_syscall_64+0x7d/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2155114084 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0 </TASK> [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Cc: stable@vger.kernel.org Fixes: 103406b38c60 ("net/sched: Always pass notifications when child class becomes empty") Fixes: c062f2a0b04d ("net/sched: sch_ets: don't remove idle classes from the round-robin list") Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Reported-by: Li Shuang <shuali@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-108026 Reviewed-by: Petr Machata <petrm@nvidia.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13Merge branch '10GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ixgbe: bypass devlink phys_port_name generation Jedrzej adds option to skip phys_port_name generation and opts ixgbe into it as some configurations rely on pre-devlink naming which could end up broken as a result. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: prevent from unwanted interface name changes devlink: let driver opt out of automatic phys_port_name generation ==================== Link: https://patch.msgid.link/20250812205226.1984369-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZEDavid Wei
The data page pool always fills the HW rx ring with pages. On arm64 with 64K pages, this will waste _at least_ 32K of memory per entry in the rx ring. Fix by fragmenting the pages if PAGE_SIZE > BNXT_RX_PAGE_SIZE. This makes the data page pool the same as the header pool. Tested with iperf3 with a small (64 entries) rx ring to encourage buffer circulation. Fixes: cd1fafe7da1f ("eth: bnxt: add support rx side device memory TCP") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250812182907.1540755-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13netdevsim: Fix wild pointer access in nsim_queue_free().Kuniyuki Iwashima
syzbot reported the splat below. [0] When nsim_queue_uninit() is called from nsim_init_netdevsim(), register_netdevice() has not been called, thus dev->dstats has not been allocated. Let's not call dev_dstats_rx_dropped_add() in such a case. [0] BUG: unable to handle page fault for address: ffff88809782c020 PF: supervisor write access in kernel mode PF: error_code(0x0002) - not-present page PGD 1b401067 P4D 1b401067 PUD 0 Oops: Oops: 0002 [#1] SMP KASAN NOPTI CPU: 3 UID: 0 PID: 8476 Comm: syz.1.251 Not tainted 6.16.0-syzkaller-06699-ge8d780dcd957 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:local_add arch/x86/include/asm/local.h:33 [inline] RIP: 0010:u64_stats_add include/linux/u64_stats_sync.h:89 [inline] RIP: 0010:dev_dstats_rx_dropped_add include/linux/netdevice.h:3027 [inline] RIP: 0010:nsim_queue_free+0xba/0x120 drivers/net/netdevsim/netdev.c:714 Code: 07 77 6c 4a 8d 3c ed 20 7e f1 8d 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 46 4a 03 1c ed 20 7e f1 8d <4c> 01 63 20 be 00 02 00 00 48 8d 3d 00 00 00 00 e8 61 2f 58 fa 48 RSP: 0018:ffffc900044af150 EFLAGS: 00010286 RAX: dffffc0000000000 RBX: ffff88809782c000 RCX: 00000000000079c3 RDX: 1ffffffff1be2fc7 RSI: ffffffff8c15f380 RDI: ffffffff8df17e38 RBP: ffff88805f59d000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000003 R14: ffff88806ceb3d00 R15: ffffed100dfd308e FS: 0000000000000000(0000) GS:ffff88809782c000(0063) knlGS:00000000f505db40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: ffff88809782c020 CR3: 000000006fc6a000 CR4: 0000000000352ef0 Call Trace: <TASK> nsim_queue_uninit drivers/net/netdevsim/netdev.c:993 [inline] nsim_init_netdevsim drivers/net/netdevsim/netdev.c:1049 [inline] nsim_create+0xd0a/0x1260 drivers/net/netdevsim/netdev.c:1101 __nsim_dev_port_add+0x435/0x7d0 drivers/net/netdevsim/dev.c:1438 nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1494 [inline] nsim_dev_reload_create drivers/net/netdevsim/dev.c:1546 [inline] nsim_dev_reload_up+0x5b8/0x860 drivers/net/netdevsim/dev.c:1003 devlink_reload+0x322/0x7c0 net/devlink/dev.c:474 devlink_nl_reload_doit+0xe31/0x1410 net/devlink/dev.c:584 genl_family_rcv_msg_doit+0x206/0x2f0 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x155/0x420 net/netlink/af_netlink.c:2552 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] __sock_sendmsg net/socket.c:729 [inline] ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 __sys_sendmsg+0x16d/0x220 net/socket.c:2700 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline] __do_fast_syscall_32+0x7c/0x3a0 arch/x86/entry/syscall_32.c:306 do_fast_syscall_32+0x32/0x80 arch/x86/entry/syscall_32.c:331 entry_SYSENTER_compat_after_hwframe+0x84/0x8e RIP: 0023:0xf708e579 Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 RSP: 002b:00000000f505d55c EFLAGS: 00000296 ORIG_RAX: 0000000000000172 RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000000080000080 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: ffff88809782c020 Fixes: 2a68a22304f9 ("netdevsim: account dropped packet length in stats on queue free") Reported-by: syzbot+8aa80c6232008f7b957d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/688bb9ca.a00a0220.26d0e1.0050.GAE@google.com/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250812162130.4129322-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-13net: mctp: Fix bad kfree_skb in bind lookup testMatt Johnston
The kunit test's skb_pkt is consumed by mctp_dst_input() so shouldn't be freed separately. Fixes: e6d8e7dbc5a3 ("net: mctp: Add bind lookup test") Reported-by: Alexandre Ghiti <alex@ghiti.fr> Closes: https://lore.kernel.org/all/734b02a3-1941-49df-a0da-ec14310d41e4@ghiti.fr/ Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://patch.msgid.link/20250812-fix-mctp-bind-test-v1-1-5e2128664eb3@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-14rust: devres: fix leaking call to devm_add_action()Danilo Krummrich
When the data argument of Devres::new() is Err(), we leak the preceding call to devm_add_action(). In order to fix this, call devm_add_action() in a unit type initializer in try_pin_init!() after the initializers of all other fields. Fixes: f5d3ef25d238 ("rust: devres: get rid of Devres' inner Arc") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <lossin@kernel.org> Link: https://lore.kernel.org/r/20250812130928.11075-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-08-14Merge tag 'amd-drm-fixes-6.17-2025-08-13' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.17-2025-08-13: amdgpu: - PSP fix - VRAM reservation fix - CSA fix - Process kill fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250813151905.2040816-1-alexander.deucher@amd.com
2025-08-13Merge tag 'nf-25-08-13' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for *net*: 1) I managed to add a null dereference crash in nft_set_pipapo in the current development cycle, was not caught by CI because the avx2 implementation is fine, but selftest splats when run on non-avx2 host. 2) Fix the ipvs estimater kthread affinity, was incorrect since 6.14. From Frederic Weisbecker. 3) nf_tables should not allow to add a device to a flowtable or netdev chain more than once -- reject this. From Pablo Neira Ayuso. This has been broken for long time, blamed commit dates from v5.8. * tag 'nf-25-08-13' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: reject duplicate device on updates ipvs: Fix estimator kthreads preferred affinity netfilter: nft_set_pipapo: fix null deref for empty set ==================== Link: https://patch.msgid.link/20250813113800.20775-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-14Merge tag 'drm-misc-next-fixes-2025-08-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: bridge: - fix OF-node leak - fix documentation fbdev-emulation: - pass correct format info to drm_helper_mode_fill_fb_struct() panfrost: - print correct RSS size Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250812064712.GA14554@2a02-2454-fd5e-fd00-2c49-c639-c55f-a125.dyn6.pyur.net
2025-08-13Merge tag 'erofs-for-6.17-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Align FSDAX enablement among multiple devices - Fix EROFS_FS_ZIP_ACCEL build dependency again to prevent forcing CRYPTO{,_DEFLATE}=y even if EROFS=m - Fix atomic context detection to properly launch kworkers on demand - Fix block count statistics for 48-bit addressing support * tag 'erofs-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix block count report when 48-bit layout is on erofs: fix atomic context detection when !CONFIG_DEBUG_LOCK_ALLOC erofs: Do not select tristate symbols from bool symbols erofs: Fallback to normal access if DAX is not supported on extra device
2025-08-13jbd2: prevent softlockup in jbd2_log_do_checkpoint()Baokun Li
Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list() periodically release j_list_lock after processing a batch of buffers to avoid long hold times on the j_list_lock. However, since both functions contend for j_list_lock, the combined time spent waiting and processing can be significant. jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when need_resched() is true to avoid softlockups during prolonged operations. But jbd2_log_do_checkpoint() only exits its loop when need_resched() is true, relying on potentially sleeping functions like __flush_batch() or wait_on_buffer() to trigger rescheduling. If those functions do not sleep, the kernel may hit a softlockup. watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373] CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017 Workqueue: writeback wb_workfn (flush-7:2) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : native_queued_spin_lock_slowpath+0x358/0x418 lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] Call trace: native_queued_spin_lock_slowpath+0x358/0x418 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2] add_transaction_credits+0x3bc/0x418 [jbd2] start_this_handle+0xf8/0x560 [jbd2] jbd2__journal_start+0x118/0x228 [jbd2] __ext4_journal_start_sb+0x110/0x188 [ext4] ext4_do_writepages+0x3dc/0x740 [ext4] ext4_writepages+0xa4/0x190 [ext4] do_writepages+0x94/0x228 __writeback_single_inode+0x48/0x318 writeback_sb_inodes+0x204/0x590 __writeback_inodes_wb+0x54/0xf8 wb_writeback+0x2cc/0x3d8 wb_do_writeback+0x2e0/0x2f8 wb_workfn+0x80/0x2a8 process_one_work+0x178/0x3e8 worker_thread+0x234/0x3b8 kthread+0xf0/0x108 ret_from_fork+0x10/0x20 So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid softlockup. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250812063752.912130-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-08-13ext4: fix incorrect function name in commentBaolin Liu
Since commit 6b730a405037 “ext4: hoist ext4_block_write_begin and replace the __block_write_begin”, the comment should be updated accordingly from "__block_write_begin" to "ext4_block_write_begin". Fixes: 6b730a405037 (“ext4: hoist ext4_block_write_begin and replace...") Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250812021709.1120716-1-liubaolin12138@163.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-08-13Merge tag 'rcu.fixes.6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU fix from Neeraj Upadhyay: "Fix a regression introduced by commit b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work") which results in boot hang as reported by kernel test bot at [1]. This issue happens because RCU re-initializes the deferred QS IRQ work everytime it is queued. With commit b41642c87716, the IRQ work re-initialization can happen while it is already queued. This results in IRQ work being requeued to itself. When IRQ work finally fires, as it is requeued to itself, it is repeatedly executed and results in hang. Fix this with initializing the IRQ work only once before the CPU boots" Link: https://lore.kernel.org/rcu/202508071303.c1134cce-lkp@intel.com/ [1] * tag 'rcu.fixes.6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: rcu: Fix racy re-initialization of irq_work causing hangs
2025-08-13smb: client: remove redundant lstrp update in negotiate protocolWang Zhaolong
Commit 34331d7beed7 ("smb: client: fix first command failure during re-negotiation") addressed a race condition by updating lstrp before entering negotiate state. However, this approach may have some unintended side effects. The lstrp field is documented as "when we got last response from this server", and updating it before actually receiving a server response could potentially affect other mechanisms that rely on this timestamp. For example, the SMB echo detection logic also uses lstrp as a reference point. In scenarios with frequent user operations during reconnect states, the repeated calls to cifs_negotiate_protocol() might continuously update lstrp, which could interfere with the echo detection timing. Additionally, commit 266b5d02e14f ("smb: client: fix race condition in negotiate timeout by using more precise timing") introduced a dedicated neg_start field specifically for tracking negotiate start time. This provides a more precise solution for the original race condition while preserving the intended semantics of lstrp. Since the race condition is now properly handled by the neg_start mechanism, the lstrp update in cifs_negotiate_protocol() is no longer necessary and can be safely removed. Fixes: 266b5d02e14f ("smb: client: fix race condition in negotiate timeout by using more precise timing") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-13cifs: update internal version numberSteve French
to 2.56 Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-13smb: client: don't wait for info->send_pending == 0 on errorStefan Metzmacher
We already called ib_drain_qp() before and that makes sure send_done() was called with IB_WC_WR_FLUSH_ERR, but didn't called atomic_dec_and_test(&sc->send_io.pending.count) So we may never reach the info->send_pending == 0 condition. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 5349ae5e05fa ("smb: client: let send_done() cleanup before calling smbd_disconnect_rdma_connection()") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-13smb: client: fix mid_q_entry memleak leak with per-mid lockingWang Zhaolong
This is step 4/4 of a patch series to fix mid_q_entry memory leaks caused by race conditions in callback execution. In compound_send_recv(), when wait_for_response() is interrupted by signals, the code attempts to cancel pending requests by changing their callbacks to cifs_cancelled_callback. However, there's a race condition between signal interruption and network response processing that causes both mid_q_entry and server buffer leaks: ``` User foreground process cifsd cifs_readdir open_cached_dir cifs_send_recv compound_send_recv smb2_setup_request smb2_mid_entry_alloc smb2_get_mid_entry smb2_mid_entry_alloc mempool_alloc // alloc mid kref_init(&temp->refcount); // refcount = 1 mid[0]->callback = cifs_compound_callback; mid[1]->callback = cifs_compound_last_callback; smb_send_rqst rc = wait_for_response wait_event_state TASK_KILLABLE cifs_demultiplex_thread allocate_buffers server->bigbuf = cifs_buf_get() standard_receive3 ->find_mid() smb2_find_mid __smb2_find_mid kref_get(&mid->refcount) // +1 cifs_handle_standard handle_mid /* bigbuf will also leak */ mid->resp_buf = server->bigbuf server->bigbuf = NULL; dequeue_mid /* in for loop */ mids[0]->callback cifs_compound_callback /* Signal interrupts wait: rc = -ERESTARTSYS */ /* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *? midQ[0]->callback = cifs_cancelled_callback; cancelled_mid[i] = true; /* The change comes too late */ mid->mid_state = MID_RESPONSE_READY release_mid // -1 /* cancelled_mid[i] == true causes mid won't be released in compound_send_recv cleanup */ /* cifs_cancelled_callback won't executed to release mid */ ``` The root cause is that there's a race between callback assignment and execution. Fix this by introducing per-mid locking: - Add spinlock_t mid_lock to struct mid_q_entry - Add mid_execute_callback() for atomic callback execution - Use mid_lock in cancellation paths to ensure atomicity This ensures that either the original callback or the cancellation callback executes atomically, preventing reference count leaks when requests are interrupted by signals. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404 Fixes: ee258d79159a ("CIFS: Move credit processing to mid callbacks for SMB3") Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-13smb3: fix for slab out of bounds on mount to ksmbdSteve French
With KASAN enabled, it is possible to get a slab out of bounds during mount to ksmbd due to missing check in parse_server_interfaces() (see below): BUG: KASAN: slab-out-of-bounds in parse_server_interfaces+0x14ee/0x1880 [cifs] Read of size 4 at addr ffff8881433dba98 by task mount/9827 CPU: 5 UID: 0 PID: 9827 Comm: mount Tainted: G OE 6.16.0-rc2-kasan #2 PREEMPT(voluntary) Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.13.1 06/14/2019 Call Trace: <TASK> dump_stack_lvl+0x9f/0xf0 print_report+0xd1/0x670 __virt_addr_valid+0x22c/0x430 ? parse_server_interfaces+0x14ee/0x1880 [cifs] ? kasan_complete_mode_report_info+0x2a/0x1f0 ? parse_server_interfaces+0x14ee/0x1880 [cifs] kasan_report+0xd6/0x110 parse_server_interfaces+0x14ee/0x1880 [cifs] __asan_report_load_n_noabort+0x13/0x20 parse_server_interfaces+0x14ee/0x1880 [cifs] ? __pfx_parse_server_interfaces+0x10/0x10 [cifs] ? trace_hardirqs_on+0x51/0x60 SMB3_request_interfaces+0x1ad/0x3f0 [cifs] ? __pfx_SMB3_request_interfaces+0x10/0x10 [cifs] ? SMB2_tcon+0x23c/0x15d0 [cifs] smb3_qfs_tcon+0x173/0x2b0 [cifs] ? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs] ? cifs_get_tcon+0x105d/0x2120 [cifs] ? do_raw_spin_unlock+0x5d/0x200 ? cifs_get_tcon+0x105d/0x2120 [cifs] ? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs] cifs_mount_get_tcon+0x369/0xb90 [cifs] ? dfs_cache_find+0xe7/0x150 [cifs] dfs_mount_share+0x985/0x2970 [cifs] ? check_path.constprop.0+0x28/0x50 ? save_trace+0x54/0x370 ? __pfx_dfs_mount_share+0x10/0x10 [cifs] ? __lock_acquire+0xb82/0x2ba0 ? __kasan_check_write+0x18/0x20 cifs_mount+0xbc/0x9e0 [cifs] ? __pfx_cifs_mount+0x10/0x10 [cifs] ? do_raw_spin_unlock+0x5d/0x200 ? cifs_setup_cifs_sb+0x29d/0x810 [cifs] cifs_smb3_do_mount+0x263/0x1990 [cifs] Reported-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Namjae Jeon <linkinjeon@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-13Revert "ALSA: hda: Add ASRock X670E Taichi to denylist"Mario Limonciello (AMD)
On a motherboard with an AMD Granite Ridge CPU there is a report that 3.5mm microphone and headphones aren't working. In the log it's observed: snd_hda_intel 0000:02:00.6: Skipping the device on the denylist This was because of commit df42ee7e22f03 ("ALSA: hda: Add ASRock X670E Taichi to denylist"). Reverting this commit allows the microphone and headphones to work again. As at least some combinations of this motherboard do have applicable devices, revert so that they can be probed. Cc: Richard Gong <richard.gong@amd.com> Cc: Juan Martinez <juan.martinez@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/20250813140427.1577172-1-superm1@kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-13ALSA: azt3328: Put __maybe_unused for inline functions for gameportTakashi Iwai
Some inline functions are unused depending on kconfig, and the recent change for clang builds made those handled as errors with W=1. For avoiding pitfalls, mark those with __maybe_unused attributes. Link: https://patch.msgid.link/20250813153628.12303-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-08-13rust: faux: fix C header linkMiguel Ojeda
Starting with Rust 1.91.0 (expected 2025-10-30), `rustdoc` has improved some false negatives around intra-doc links [1], and it found a broken intra-doc link we currently have: error: unresolved link to `include/linux/device/faux.h` --> rust/kernel/faux.rs:7:17 | 7 | //! C header: [`include/linux/device/faux.h`] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ no item named `include/linux/device/faux.h` in scope | = help: to escape `[` and `]` characters, add '\' before them like `\[` or `\]` = note: `-D rustdoc::broken-intra-doc-links` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(rustdoc::broken_intra_doc_links)]` Our `srctree/` C header links are not intra-doc links, thus they need the link destination. Thus fix it. Cc: stable <stable@kernel.org> Link: https://github.com/rust-lang/rust/pull/132748 [1] Fixes: 78418f300d39 ("rust/kernel: Add faux device bindings") Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Benno Lossin <lossin@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250804171311.1186538-1-ojeda@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13Merge tag 'mm-hotfixes-stable-2025-08-12-20-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "12 hotfixes. 5 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 10 of these fixes are for MM" * tag 'mm-hotfixes-stable-2025-08-12-20-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: proc: proc_maps_open allow proc_mem_open to return NULL mm/mremap: avoid expensive folio lookup on mremap folio pte batch userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry mm: pass page directly instead of using folio_page selftests/proc: fix string literal warning in proc-maps-race.c fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_stats mm/smaps: fix race between smaps_hugetlb_range and migration mm: fix the race between collapse and PT_RECLAIM under per-vma lock mm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup() MAINTAINERS: add Masami as a reviewer of hung task detector mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock kasan/test: fix protection against compiler elision
2025-08-13usb: storage: realtek_cr: Use correct byte order for bcs->ResidueThorsten Blum
Since 'bcs->Residue' has the data type '__le32', convert it to the correct byte order of the CPU using this driver when assigning it to the local variable 'residue'. Cc: stable <stable@kernel.org> Fixes: 50a6cb932d5c ("USB: usb_storage: add ums-realtek driver") Suggested-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250813145247.184717-3-thorsten.blum@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13usb: chipidea: imx: improve usbmisc_imx7d_pullup()Xu Yang
When add workaround for ERR051725, the usbmisc will put PHY to Non-driving mode (OPMODE = 01) after stopping the device controller and put PHY back to Normal mode (OPMODE = 00) after starting the device controller. However, this will bring issue for host controller. Because the PHY may stay in Non-driving mode after switching the role from device to host. Then the port will not work if USB device is attached. To fix this issue, improving the workaround by putting PHY to Non-driving mode for a certain period and back to Normal mode finally. To make host detect a disconnect signal, the period should be at least 125us (a micro-frame time) for high-speed link. And only working as high-speed mode will need workaround for ERR051725. So this will also filter the pullup event for high-speed. Fixes: 11992b410083 ("usb: chipidea: imx: implement workaround for ERR051725") Reviewed-by: Jun Li <jun.li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20250811100833.862876-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13kcov, usb: Don't disable interrupts in kcov_remote_start_usb_softirq()Sebastian Andrzej Siewior
kcov_remote_start_usb_softirq() the begin of urb's completion callback. HCDs marked HCD_BH will invoke this function from the softirq and in_serving_softirq() will detect this properly. Root-HUB (RH) requests will not be delayed to softirq but complete immediately in IRQ context. This will confuse kcov because in_serving_softirq() will report true if the softirq is served after the hardirq and if the softirq got interrupted by the hardirq in which currently runs. This was addressed by simply disabling interrupts in kcov_remote_start_usb_softirq() which avoided the interruption by the RH while a regular completion callback was invoked. This not only changes the behaviour while kconv is enabled but also breaks PREEMPT_RT because now sleeping locks can no longer be acquired. Revert the previous fix. Address the issue by invoking kcov_remote_start_usb() only if the context is just "serving softirqs" which is identified by checking in_serving_softirq() and in_hardirq() must be false. Fixes: f85d39dd7ed89 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq") Cc: stable <stable@kernel.org> Reported-by: Yunseong Kim <ysk@kzalloc.com> Closes: https://lore.kernel.org/all/20250725201400.1078395-2-ysk@kzalloc.com/ Tested-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250811082745.ycJqBXMs@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13usb: dwc3: pci: add support for the Intel Wildcat LakeHeikki Krogerus
This patch adds the necessary PCI ID for Intel Wildcat Lake devices. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20250812131101.2930199-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13usb: dwc3: Ignore late xferNotReady event to prevent halt timeoutKuen-Han Tsai
During a device-initiated disconnect, the End Transfer command resets the event filter, allowing a new xferNotReady event to be generated before the controller is fully halted. Processing this late event incorrectly triggers a Start Transfer, which prevents the controller from halting and results in a DSTS.DEVCTLHLT bit polling timeout. Ignore the late xferNotReady event if the controller is already in a disconnected state. Fixes: 72246da40f37 ("usb: Introduce DesignWare USB3 DRD Driver") Cc: stable <stable@kernel.org> Signed-off-by: Kuen-Han Tsai <khtsai@google.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/20250807090700.2397190-1-khtsai@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13USB: storage: Add unusual-devs entry for Novatek NTK96550-based cameraMael GUERIN
Add the US_FL_BULK_IGNORE_TAG quirk for Novatek NTK96550-based camera to fix USB resets after sending SCSI vendor commands due to CBW and CSW tags difference, leading to undesired slowness while communicating with the device. Please find below the copy of /sys/kernel/debug/usb/devices with my device plugged in (listed as TechSys USB mass storage here, the underlying chipset being the Novatek NTK96550-based camera): T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0603 ProdID=8611 Rev= 0.01 S: Manufacturer=TechSys S: Product=USB Mass Storage S: SerialNumber=966110000000100 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=100mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Mael GUERIN <mael.guerin@murena.io> Cc: stable <stable@kernel.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20250806164406.43450-1-mael.guerin@murena.io Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13usb: core: hcd: fix accessing unmapped memory in SINGLE_STEP_SET_FEATURE testXu Yang
The USB core will unmap urb->transfer_dma after SETUP stage completes. Then the USB controller will access unmapped memory when it received device descriptor. If iommu is equipped, the entire test can't be completed due to the memory accessing is blocked. Fix it by calling map_urb_for_dma() again for IN stage. To reduce redundant map for urb->transfer_buffer, this will also set URB_NO_TRANSFER_DMA_MAP flag before first map_urb_for_dma() to skip dma map for urb->transfer_buffer and clear URB_NO_TRANSFER_DMA_MAP flag before second map_urb_for_dma(). Fixes: 216e0e563d81 ("usb: core: hcd: use map_urb_for_dma for single step set feature urb") Cc: stable <stable@kernel.org> Reviewed-by: Jun Li <jun.li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20250806083955.3325299-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13usb: renesas-xhci: Fix External ROM access timeoutsMarek Vasut
Increase the External ROM access timeouts to prevent failures during programming of External SPI EEPROM chips. The current timeouts are too short for some SPI EEPROMs used with uPD720201 controllers. The current timeout for Chip Erase in renesas_rom_erase() is 100 ms , the current timeout for Sector Erase issued by the controller before Page Program in renesas_fw_download_image() is also 100 ms. Neither timeout is sufficient for e.g. the Macronix MX25L5121E or MX25V5126F. MX25L5121E reference manual [1] page 35 section "ERASE AND PROGRAMMING PERFORMANCE" and page 23 section "Table 8. AC CHARACTERISTICS (Temperature = 0°C to 70°C for Commercial grade, VCC = 2.7V ~ 3.6V)" row "tCE" indicate that the maximum time required for Chip Erase opcode to complete is 2 s, and for Sector Erase it is 300 ms . MX25V5126F reference manual [2] page 47 section "13. ERASE AND PROGRAMMING PERFORMANCE (2.3V - 3.6V)" and page 42 section "Table 8. AC CHARACTERISTICS (Temperature = -40°C to 85°C for Industrial grade, VCC = 2.3V - 3.6V)" row "tCE" indicate that the maximum time required for Chip Erase opcode to complete is 3.2 s, and for Sector Erase it is 400 ms . Update the timeouts such, that Chip Erase timeout is set to 5 seconds, and Sector Erase timeout is set to 500 ms. Such lengthy timeouts ought to be sufficient for majority of SPI EEPROM chips. [1] https://www.macronix.com/Lists/Datasheet/Attachments/8634/MX25L5121E,%203V,%20512Kb,%20v1.3.pdf [2] https://www.macronix.com/Lists/Datasheet/Attachments/8750/MX25V5126F,%202.5V,%20512Kb,%20v1.1.pdf Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201") Cc: stable <stable@kernel.org> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org> Link: https://lore.kernel.org/r/20250802225526.25431-1-marek.vasut+renesas@mailbox.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13usb: gadget: tegra-xudc: fix PM use count underflowRussell King (Oracle)
Upon resume from system suspend, the PM runtime core issues the following warning: tegra-xudc 3550000.usb: Runtime PM usage count underflow! This is because tegra_xudc_resume() unconditionally calls schedule_work(&xudc->usb_role_sw_work) whether or not anything has changed, which causes tegra_xudc_device_mode_off() to be called even when we're already in that mode. Keep track of the current state of "device_mode", and only schedule this work if it has changed from the hardware state on resume. Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1uhtkH-007KDZ-JT@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13usb: quirks: Add DELAY_INIT quick for another SanDisk 3.2Gen1 Flash DriveMiao Li
Another SanDisk 3.2Gen1 Flash Drive also need DELAY_INIT quick, or it will randomly work incorrectly on Huawei hisi platforms when doing reboot test. Signed-off-by: Miao Li <limiao@kylinos.cn> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20250801082728.469406-1-limiao870622@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-13ASoC: tas2781: Normalize the volume kcontrol nameBaojun Xu
Change the name of the kcontrol from "Gain" to "Volume". Signed-off-by: Baojun Xu <baojun.xu@ti.com> Link: https://patch.msgid.link/20250813100708.12197-1-baojun.xu@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-08-13io_uring/io-wq: add check free worker before create new workerFengnan Chang
After commit 0b2b066f8a85 ("io_uring/io-wq: only create a new worker if it can make progress"), in our produce environment, we still observe that part of io_worker threads keeps creating and destroying. After analysis, it was confirmed that this was due to a more complex scenario involving a large number of fsync operations, which can be abstracted as frequent write + fsync operations on multiple files in a single uring instance. Since write is a hash operation while fsync is not, and fsync is likely to be suspended during execution, the action of checking the hash value in io_wqe_dec_running cannot handle such scenarios. Similarly, if hash-based work and non-hash-based work are sent at the same time, similar issues are likely to occur. Returning to the starting point of the issue, when a new work arrives, io_wq_enqueue may wake up free worker A, while io_wq_dec_running may create worker B. Ultimately, only one of A and B can obtain and process the task, leaving the other in an idle state. In the end, the issue is caused by inconsistent logic in the checks performed by io_wq_enqueue and io_wq_dec_running. Therefore, the problem can be resolved by checking for available workers in io_wq_dec_running. Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> Reviewed-by: Diangang Li <lidiangang@bytedance.com> Link: https://lore.kernel.org/r/20250813120214.18729-1-changfengnan@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-13btrfs: fix printing of mount info messages for NODATACOW/NODATASUMKyoji Ogasawara
The NODATASUM message was printed twice by mistake and the NODATACOW was missing from the 'unset' part. Fix the duplication and make the output look the same. Fixes: eddb1a433f26 ("btrfs: add reconfigure callback for fs_context") CC: stable@vger.kernel.org # 6.8+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: restore mount option info messages during mountKyoji Ogasawara
After the fsconfig migration in 6.8, mount option info messages are no longer displayed during mount operations because btrfs_emit_options() is only called during remount, not during initial mount. Fix this by calling btrfs_emit_options() in btrfs_fill_super() after open_ctree() succeeds. Additionally, prevent log duplication by ensuring btrfs_check_options() handles validation with warn-level and err-level messages, while btrfs_emit_options() provides info-level messages. Fixes: eddb1a433f26 ("btrfs: add reconfigure callback for fs_context") CC: stable@vger.kernel.org # 6.8+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: fix incorrect log message for nobarrier mount optionKyoji Ogasawara
Fix a wrong log message that appears when the "nobarrier" mount option is unset. When "nobarrier" is unset, barrier is actually enabled. However, the log incorrectly stated "turning off barriers". Fixes: eddb1a433f26 ("btrfs: add reconfigure callback for fs_context") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Kyoji Ogasawara <sawara04.o@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: fix buffer index in wait_eb_writebacks()Naohiro Aota
The commit f2cb97ee964a ("btrfs: index buffer_tree using node size") changed the index of buffer_tree from "start >> sectorsize_bits" to "start >> nodesize_bits". However, the change is not applied for wait_eb_writebacks() and caused IO failures by writing in a full zone. Use the index properly. Fixes: f2cb97ee964a ("btrfs: index buffer_tree using node size") Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: subpage: keep TOWRITE tag until folio is cleanedNaohiro Aota
btrfs_subpage_set_writeback() calls folio_start_writeback() the first time a folio is written back, and it also clears the PAGECACHE_TAG_TOWRITE tag even if there are still dirty blocks in the folio. This can break ordering guarantees, such as those required by btrfs_wait_ordered_extents(). That ordering breakage leads to a real failure. For example, running generic/464 on a zoned setup will hit the following ASSERT. This happens because the broken ordering fails to flush existing dirty pages before the file size is truncated. assertion failed: !list_empty(&ordered->list) :: 0, in fs/btrfs/zoned.c:1899 ------------[ cut here ]------------ kernel BUG at fs/btrfs/zoned.c:1899! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 UID: 0 PID: 1906169 Comm: kworker/u130:2 Kdump: loaded Not tainted 6.16.0-rc6-BTRFS-ZNS+ #554 PREEMPT(voluntary) Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] RIP: 0010:btrfs_finish_ordered_zoned.cold+0x50/0x52 [btrfs] RSP: 0018:ffffc9002efdbd60 EFLAGS: 00010246 RAX: 000000000000004c RBX: ffff88811923c4e0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff827e38b1 RDI: 00000000ffffffff RBP: ffff88810005d000 R08: 00000000ffffdfff R09: ffffffff831051c8 R10: ffffffff83055220 R11: 0000000000000000 R12: ffff8881c2458c00 R13: ffff88811923c540 R14: ffff88811923c5e8 R15: ffff8881c1bd9680 FS: 0000000000000000(0000) GS:ffff88a04acd0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f907c7a918c CR3: 0000000004024000 CR4: 0000000000350ef0 Call Trace: <TASK> ? srso_return_thunk+0x5/0x5f btrfs_finish_ordered_io+0x4a/0x60 [btrfs] btrfs_work_helper+0xf9/0x490 [btrfs] process_one_work+0x204/0x590 ? srso_return_thunk+0x5/0x5f worker_thread+0x1d6/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x118/0x230 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x205/0x260 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Consider process A calling writepages() with WB_SYNC_NONE. In zoned mode or for compressed writes, it locks several folios for delalloc and starts writing them out. Let's call the last locked folio folio X. Suppose the write range only partially covers folio X, leaving some pages dirty. Process A calls btrfs_subpage_set_writeback() when building a bio. This function call clears the TOWRITE tag of folio X, whose size = 8K and the block size = 4K. It is following state. 0 4K 8K |/////|/////| (flag: DIRTY, tag: DIRTY) <-----> Process A will write this range. Now suppose process B concurrently calls writepages() with WB_SYNC_ALL. It calls tag_pages_for_writeback() to tag dirty folios with PAGECACHE_TAG_TOWRITE. Since folio X is still dirty, it gets tagged. Then, B collects tagged folios using filemap_get_folios_tag() and must wait for folio X to be written before returning from writepages(). 0 4K 8K |/////|/////| (flag: DIRTY, tag: DIRTY|TOWRITE) However, between tagging and collecting, process A may call btrfs_subpage_set_writeback() and clear folio X's TOWRITE tag. 0 4K 8K | |/////| (flag: DIRTY|WRITEBACK, tag: DIRTY) As a result, process B won't see folio X in its batch, and returns without waiting for it. This breaks the WB_SYNC_ALL ordering requirement. Fix this by using btrfs_subpage_set_writeback_keepwrite(), which retains the TOWRITE tag. We now manually clear the tag only after the folio becomes clean, via the xas operation. Fixes: 3470da3b7d87 ("btrfs: subpage: introduce helpers for writeback status") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: clear TAG_TOWRITE from buffer tree when submitting a tree blockQu Wenruo
[POSSIBLE BUG] After commit 5e121ae687b8 ("btrfs: use buffer xarray for extent buffer writeback operations"), we have a dedicated xarray for extent buffers, and a lot of tags are migrated to that buffer tree, like PAGECACHE_TAG_TOWRITE/DIRTY/WRITEBACK. This frees us from the limits of page flags, but there is a new asymmetric behavior, we call buffer_tree_tag_for_writeback() to set PAGECACHE_TAG_TOWRITE for the involved ranges, but there is no one to clear that tag. Before that rework, we relied on the page cache tag which was cleared when folio_start_writeback() was called. Although this has its own problems (e.g. the first one calling folio_start_writeback() will clear the tag for the whole page), it at least cleared the tag. But now our real tags are stored in the buffer tree, no one is really clearing the PAGECACHE_TAG_TOWRITE tag now. [FIX] Thankfully this is not going to cause any real bug, but just some inefficiency iterating the extent buffers. As if we hit an extent buffer which is not dirty but still has the PAGECACHE_TAG_TOWRITE tag, lock_extent_buffer_for_io() will skip it so we won't writeback the extent buffer again. To properly fix the inefficiency, just clear the PAGECACHE_TAG_TOWRITE inside lock_extent_buffer_for_io(). There is no error path between lock_extent_buffer_for_io() and write_one_eb(), so we're safe to clear the tag there. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: do not set mtime/ctime to current time when unlinking for log replayFilipe Manana
If we are doing an unlink for log replay, we are updating the directory's mtime and ctime to the current time, and this is incorrect since it should stay with the mtime and ctime that were set when the directory was logged. This is the same as when adding a link to an inode during log replay (with btrfs_add_link()), where we want the mtime and ctime to be the values that were in place when the inode was logged. This was found with generic/547 using LOAD_FACTOR=20 and TIME_FACTOR=20, where due to large log trees we have longer log replay times and fssum could detect a mismatch of the mtime and ctime of a directory. Fix this by skipping the mtime and ctime update at __btrfs_unlink_inode() if we are in log replay context (just like btrfs_add_link()). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: clear block dirty if btrfs_writepage_cow_fixup() failedQu Wenruo
[BUG] If btrfs_writepage_cow_fixup() failed (returning value -EUCLEAN), the block will be kept dirty, but with its corresponding range finished in the ordered extent. Currently that error pattern is only possible for experimental builds, which places extra check to ensure we shouldn't hit a dirty block without a corresponding ordered extent. This means if later a writeback happens again, we can hit the following problems: - ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector() If the original extent map is a hole, then we can hit this case, as the new ordered extent failed, we will drop the new extent map and re-read one from the disk. - DEBUG_WARN() in btrfs_writepage_cow_fixup() This is because we no longer have an ordered extent for those dirty blocks. The original for them is already finished with error. [CAUSE] The function btrfs_writepage_cow_fixup() is not following the regular error handling of writeback. The common practice is to clear the folio dirty, start and finish the writeback for the block. This is normally done by extent_clear_unlock_delalloc() with PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during run_delalloc_range(). So if we keep those failed blocks dirty, they will stay in the page cache and wait for the next writeback. And since the original ordered extent is already finished and removed, depending on the original extent map, we either hit the ASSERT() inside submit_one_sector(), or hit the DEBUG_WARN() in btrfs_writepage_cow_fixup() again (and very ironic). [FIX] Follow the regular error handling to clear the dirty flag for the block range, start and finish writeback for that block range instead. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13btrfs: clear block dirty if submit_one_sector() failedQu Wenruo
[BUG] If submit_one_sector() failed, the block will be kept dirty, but with their corresponding range finished in the ordered extent. This means if a writeback happens later again, we can hit the following problems: - ASSERT(block_start != EXTENT_MAP_HOLE) in submit_one_sector() If the original extent map is a hole, then we can hit this case, as the new ordered extent failed, we will drop the new extent map and re-read one from the disk. - DEBUG_WARN() in btrfs_writepage_cow_fixup() This is because we no longer have an ordered extent for those dirty blocks. The original for them is already finished with error. [CAUSE] The function submit_one_sector() is not following the regular error handling of writeback. The common practice is to clear the folio dirty, start and finish the writeback for the block. This is normally done by extent_clear_unlock_delalloc() with PAGE_START_WRITEBACK | PAGE_END_WRITEBACK flags during run_delalloc_range(). So if we keep those failed blocks dirty, they will stay in the page cache and wait for the next writeback. And since the original ordered extent is already finished and removed, depending on the original extent map, we either hit the ASSERT() inside submit_one_sector(), or hit the DEBUG_WARN() in btrfs_writepage_cow_fixup(). [FIX] Follow the regular error handling to clear the dirty flag for the block, start and finish writeback for that block instead. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-13md: add legacy_async_del_gendisk modeXiao Ni
commit 9e59d609763f ("md: call del_gendisk in control path") changes the async way to sync way of calling del_gendisk. But it breaks mdadm --assemble command. The assemble command runs like this: 1. create the array 2. stop the array 3. access the sysfs files after stopping The sync way calls del_gendisk in step 2, so all sysfs files are removed. Now to avoid breaking mdadm assemble command, this patch adds the parameter legacy_async_del_gendisk that can be used to choose which way. The default is async way. In future, we plan to change default to sync way in kernel 7.0. Then users need to upgrade to mdadm 4.5+ which removes step 2. Fixes: 9e59d609763f ("md: call del_gendisk in control path") Reported-by: Mikulas Patocka <mpatocka@redhat.com> Closes: https://lore.kernel.org/linux-raid/CAMw=ZnQ=ET2St-+hnhsuq34rRPnebqcXqP1QqaHW5Bh4aaaZ4g@mail.gmail.com/T/#t Suggested-and-reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Xiao Ni <xni@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/linux-raid/20250813032929.54978-1-xni@redhat.com Signed-off-by: Yu Kuai <yukuai3@huawei.com>
2025-08-13block: restore default wbt enablementJulian Sun
The commit 245618f8e45f ("block: protect wbt_lat_usec using q->elevator_lock") protected wbt_enable_default() with q->elevator_lock; however, it also placed wbt_enable_default() before blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);, resulting in wbt failing to be enabled. Moreover, the protection of wbt_enable_default() by q->elevator_lock was removed in commit 78c271344b6f ("block: move wbt_enable_default() out of queue freezing from sched ->exit()"), so we can directly fix this issue by placing wbt_enable_default() after blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);. Additionally, this issue also causes the inability to read the wbt_lat_usec file, and the scenario is as follows: root@q:/sys/block/sda/queue# cat wbt_lat_usec cat: wbt_lat_usec: Invalid argument root@q:/data00/sjc/linux# ls /sys/kernel/debug/block/sda/rqos cannot access '/sys/kernel/debug/block/sda/rqos': No such file or directory root@q:/data00/sjc/linux# find /sys -name wbt /sys/kernel/debug/tracing/events/wbt After testing with this patch, wbt can be enabled normally. Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Cc: stable@vger.kernel.org Fixes: 245618f8e45f ("block: protect wbt_lat_usec using q->elevator_lock") Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250812154257.57540-1-sunjunchao@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-13Docs: admin-guide: Correct spelling mistakeErick Karanja
Fix spelling mistake directoy to directory Reported-by: codespell Signed-off-by: Erick Karanja <karanja99erick@gmail.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250813071837.668613-1-karanja99erick@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-13RDMA/hns: Fix dip entries leak on devices newer than hip09Junxian Huang
DIP algorithm is also supported on devices newer than hip09, so free dip entries too. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250812122602.3524602-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>