summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-27xfs: avoid dquot buffer pin deadlockDave Chinner
On shutdown when quotas are enabled, the shutdown can deadlock trying to unpin the dquot buffer buf_log_item like so: [ 3319.483590] task:kworker/20:0H state:D stack:14360 pid:1962230 tgid:1962230 ppid:2 task_flags:0x4208060 flags:0x00004000 [ 3319.493966] Workqueue: xfs-log/dm-6 xlog_ioend_work [ 3319.498458] Call Trace: [ 3319.500800] <TASK> [ 3319.502809] __schedule+0x699/0xb70 [ 3319.512672] schedule+0x64/0xd0 [ 3319.515573] schedule_timeout+0x30/0xf0 [ 3319.528125] __down_common+0xc3/0x200 [ 3319.531488] __down+0x1d/0x30 [ 3319.534186] down+0x48/0x50 [ 3319.540501] xfs_buf_lock+0x3d/0xe0 [ 3319.543609] xfs_buf_item_unpin+0x85/0x1b0 [ 3319.547248] xlog_cil_committed+0x289/0x570 [ 3319.571411] xlog_cil_process_committed+0x6d/0x90 [ 3319.575590] xlog_state_shutdown_callbacks+0x52/0x110 [ 3319.580017] xlog_force_shutdown+0x169/0x1a0 [ 3319.583780] xlog_ioend_work+0x7c/0xb0 [ 3319.587049] process_scheduled_works+0x1d6/0x400 [ 3319.591127] worker_thread+0x202/0x2e0 [ 3319.594452] kthread+0x20c/0x240 The CIL push has seen the deadlock, so it has aborted the push and is running CIL checkpoint completion to abort all the items in the checkpoint. This calls ->iop_unpin(remove = true) to clean up the log items in the checkpoint. When a buffer log item is unpined like this, it needs to lock the buffer to run io completion to correctly fail the buffer and run all the required completions to fail attached log items as well. In this case, the attempt to lock the buffer on unpin is hanging because the buffer is already locked. I suspected a leaked XFS_BLI_HOLD state because of XFS_BLI_STALE handling changes I was testing, so I went looking for pin events on HOLD buffers and unpin events on locked buffer. That isolated this one buffer with these two events: xfs_buf_item_pin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 2 pincount 0 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags HOLD|DIRTY|LOGGED liflags DIRTY .... xfs_buf_item_unpin: dev 251:6 daddr 0xa910 bbcount 0x2 hold 4 pincount 1 lock 0 flags DONE|KMEM recur 0 refcount 1 bliflags DIRTY liflags ABORTED Firstly, bbcount = 0x2, which means it is not a single sector structure. That rules out every xfs_trans_bhold() case except one: dquot buffers. Then hung task dumping gave this trace: [ 3197.312078] task:fsync-tester state:D stack:12080 pid:2051125 tgid:2051125 ppid:1643233 task_flags:0x400000 flags:0x00004002 [ 3197.323007] Call Trace: [ 3197.325581] <TASK> [ 3197.327727] __schedule+0x699/0xb70 [ 3197.334582] schedule+0x64/0xd0 [ 3197.337672] schedule_timeout+0x30/0xf0 [ 3197.350139] wait_for_completion+0xbd/0x180 [ 3197.354235] __flush_workqueue+0xef/0x4e0 [ 3197.362229] xlog_cil_force_seq+0xa0/0x300 [ 3197.374447] xfs_log_force+0x77/0x230 [ 3197.378015] xfs_qm_dqunpin_wait+0x49/0xf0 [ 3197.382010] xfs_qm_dqflush+0x55/0x460 [ 3197.385663] xfs_qm_dquot_isolate+0x29e/0x4d0 [ 3197.389977] __list_lru_walk_one+0x141/0x220 [ 3197.398867] list_lru_walk_one+0x10/0x20 [ 3197.402713] xfs_qm_shrink_scan+0x6a/0x100 [ 3197.406699] do_shrink_slab+0x18a/0x350 [ 3197.410512] shrink_slab+0xf7/0x430 [ 3197.413967] drop_slab+0x97/0xf0 [ 3197.417121] drop_caches_sysctl_handler+0x59/0xc0 [ 3197.421654] proc_sys_call_handler+0x18b/0x280 [ 3197.426050] proc_sys_write+0x13/0x20 [ 3197.429750] vfs_write+0x2b8/0x3e0 [ 3197.438532] ksys_write+0x7e/0xf0 [ 3197.441742] __x64_sys_write+0x1b/0x30 [ 3197.445363] x64_sys_call+0x2c72/0x2f60 [ 3197.449044] do_syscall_64+0x6c/0x140 [ 3197.456341] entry_SYSCALL_64_after_hwframe+0x76/0x7e Yup, another test run by check-parallel is running drop_caches concurrently and the dquot shrinker for the hung filesystem is running. That's trying to flush a dirty dquot from reclaim context, and it waiting on a log force to complete. xfs_qm_dqflush is called with the dquot buffer held locked, and so we've called xfs_log_force() with that buffer locked. Now the log force is waiting for a workqueue flush to complete, and that workqueue flush is waiting of CIL checkpoint processing to finish. The CIL checkpoint processing is aborting all the log items it has, and that requires locking aborted buffers to cancel them. Now, normally this isn't a problem if we are issuing a log force to unpin an object, because the ->iop_unpin() method wakes pin waiters first. That results in the pin waiter finishing off whatever it was doing, dropping the lock and then xfs_buf_item_unpin() can lock the buffer and fail it. However, xfs_qm_dqflush() is waiting on the -dquot- unpin event, not the dquot buffer unpin event, and so it never gets woken and so does not drop the buffer lock. Inodes do not have this problem, as they can only be written from one spot (->iop_push) whilst dquots can be written from multiple places (memory reclaim, ->iop_push, xfs_dq_dqpurge, and quotacheck). The reason that the dquot buffer has an attached buffer log item is that it has been recently allocated. Initialisation of the dquot buffer logs the buffer directly, thereby pinning it in memory. We then modify the dquot in a separate operation, and have memory reclaim racing with a shutdown and we trigger this deadlock. check-parallel reproduces this reliably on 1kB FSB filesystems with quota enabled because it does all of these things concurrently without having to explicitly write tests to exercise these corner case conditions. xfs_qm_dquot_logitem_push() doesn't have this deadlock because it checks if the dquot is pinned before locking the dquot buffer and skipping it if it is pinned. This means the xfs_qm_dqunpin_wait() log force in xfs_qm_dqflush() never triggers and we unlock the buffer safely allowing a concurrent shutdown to fail the buffer appropriately. xfs_qm_dqpurge() could have this problem as it is called from quotacheck and we might have allocated dquot buffers when recording the quota updates. This can be fixed by calling xfs_qm_dqunpin_wait() before we lock the dquot buffer. Because we hold the dquot locked, nothing will be able to add to the pin count between the unpin_wait and the dqflush callout, so this now makes xfs_qm_dqpurge() safe against this race. xfs_qm_dquot_isolate() can also be fixed this same way but, quite frankly, we shouldn't be doing IO in memory reclaim context. If the dquot is pinned or dirty, simply rotate it and let memory reclaim come back to it later, same as we do for inodes. This then gets rid of the nasty issue in xfs_qm_flush_one() where quotacheck writeback races with memory reclaim flushing the dquots. We can lift xfs_qm_dqunpin_wait() up into this code, then get rid of the "can't get the dqflush lock" buffer write to cycle the dqlfush lock and enable it to be flushed again. checking if the dquot is pinned and returning -EAGAIN so that the dquot walk will revisit the dquot again later. Finally, with xfs_qm_dqunpin_wait() lifted into all the callers, we can remove it from the xfs_qm_dqflush() code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: catch stale AGF/AGF metadataDave Chinner
There is a race condition that can trigger in dmflakey fstests that can result in asserts in xfs_ialloc_read_agi() and xfs_alloc_read_agf() firing. The asserts look like this: XFS: Assertion failed: pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks), file: fs/xfs/libxfs/xfs_alloc.c, line: 3440 ..... Call Trace: <TASK> xfs_alloc_read_agf+0x2ad/0x3a0 xfs_alloc_fix_freelist+0x280/0x720 xfs_alloc_vextent_prepare_ag+0x42/0x120 xfs_alloc_vextent_iterate_ags+0x67/0x260 xfs_alloc_vextent_start_ag+0xe4/0x1c0 xfs_bmapi_allocate+0x6fe/0xc90 xfs_bmapi_convert_delalloc+0x338/0x560 xfs_map_blocks+0x354/0x580 iomap_writepages+0x52b/0xa70 xfs_vm_writepages+0xd7/0x100 do_writepages+0xe1/0x2c0 __writeback_single_inode+0x44/0x340 writeback_sb_inodes+0x2d0/0x570 __writeback_inodes_wb+0x9c/0xf0 wb_writeback+0x139/0x2d0 wb_workfn+0x23e/0x4c0 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 I've seen the AGI variant from scrub running on the filesysetm after unmount failed due to systemd interference: XFS: Assertion failed: pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) || xfs_is_shutdown(pag->pag_mount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 2804 ..... Call Trace: <TASK> xfs_ialloc_read_agi+0xee/0x150 xchk_perag_drain_and_lock+0x7d/0x240 xchk_ag_init+0x34/0x90 xchk_inode_xref+0x7b/0x220 xchk_inode+0x14d/0x180 xfs_scrub_metadata+0x2e2/0x510 xfs_ioc_scrub_metadata+0x62/0xb0 xfs_file_ioctl+0x446/0xbf0 __se_sys_ioctl+0x6f/0xc0 __x64_sys_ioctl+0x1d/0x30 x64_sys_call+0x1879/0x2ee0 do_syscall_64+0x68/0x130 ? exc_page_fault+0x62/0xc0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Essentially, it is the same problem. When _flakey_drop_and_remount() loads the drop-writes table, it makes all writes silently fail. Writes are reported to the fs as completed successfully, but they are not issued to the backing store. The filesystem sees the successful write completion and marks the metadata buffer clean and removes it from the AIL. If this happens at the same time as memory pressure is occuring, the now-clean AGF and/or AGI buffers can be reclaimed from memory. Shortly afterwards, but before _flakey_drop_and_remount() runs unmount, background writeback is kicked and it tries to allocate blocks for the dirty pages in memory. This then tries to access the AGF buffer we just turfed out of memory. It's not found, so it gets read in from disk. This is all fine, except for the fact that the last writeback of the AGF did not actually reach disk. The AGF on disk is stale compared to the in-memory state held by the perag, and so they don't match and the assert fires. Then other operations on that inode hang because the task was killed whilst holding inode locks. e.g: Workqueue: xfs-conv/dm-12 xfs_end_io Call Trace: <TASK> __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_preempt_disabled+0x15/0x30 rwsem_down_write_slowpath+0x31a/0x5f0 down_write+0x43/0x60 xfs_ilock+0x1a8/0x210 xfs_trans_alloc_inode+0x9c/0x240 xfs_iomap_write_unwritten+0xe3/0x300 xfs_end_ioend+0x90/0x130 xfs_end_io+0xce/0x100 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> and it's all down hill from there. Memory pressure is one way to trigger this, another is to run "echo 3 > /proc/sys/vm/drop_caches" randomly while tests are running. Regardless of how it is triggered, this effectively takes down the system once umount hangs because it's holding a sb->s_umount lock exclusive and now every sync(1) call gets stuck on it. Fix this by replacing the asserts with a corruption detection check and a shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27xfs: xfs_ifree_cluster vs xfs_iflush_shutdown_abort deadlockDave Chinner
Lock order of xfs_ifree_cluster() is cluster buffer -> try ILOCK -> IFLUSHING, except for the last inode in the cluster that is triggering the free. In that case, the lock order is ILOCK -> cluster buffer -> IFLUSHING. xfs_iflush_cluster() uses cluster buffer -> try ILOCK -> IFLUSHING, so this can safely run concurrently with xfs_ifree_cluster(). xfs_inode_item_precommit() uses ILOCK -> cluster buffer, but this cannot race with xfs_ifree_cluster() so being in a different order will not trigger a deadlock. xfs_reclaim_inode() during a filesystem shutdown uses ILOCK -> IFLUSHING -> cluster buffer via xfs_iflush_shutdown_abort(), and this deadlocks against xfs_ifree_cluster() like so: sysrq: Show Blocked State task:kworker/10:37 state:D stack:12560 pid:276182 tgid:276182 ppid:2 flags:0x00004000 Workqueue: xfs-inodegc/dm-3 xfs_inodegc_worker Call Trace: <TASK> __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_timeout+0x8b/0x180 schedule_timeout_uninterruptible+0x1e/0x30 xfs_ifree+0x326/0x730 xfs_inactive_ifree+0xcb/0x230 xfs_inactive+0x2c8/0x380 xfs_inodegc_worker+0xaa/0x180 process_scheduled_works+0x1d4/0x400 worker_thread+0x234/0x2e0 kthread+0x147/0x170 ret_from_fork+0x3e/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> task:fsync-tester state:D stack:12160 pid:2255943 tgid:2255943 ppid:3988702 flags:0x00004006 Call Trace: <TASK> __schedule+0x650/0xb10 schedule+0x6d/0xf0 schedule_timeout+0x31/0x180 __down_common+0xbe/0x1f0 __down+0x1d/0x30 down+0x48/0x50 xfs_buf_lock+0x3d/0xe0 xfs_iflush_shutdown_abort+0x51/0x1e0 xfs_icwalk_ag+0x386/0x690 xfs_reclaim_inodes_nr+0x114/0x160 xfs_fs_free_cached_objects+0x19/0x20 super_cache_scan+0x17b/0x1a0 do_shrink_slab+0x180/0x350 shrink_slab+0xf8/0x430 drop_slab+0x97/0xf0 drop_caches_sysctl_handler+0x59/0xc0 proc_sys_call_handler+0x189/0x280 proc_sys_write+0x13/0x20 vfs_write+0x33d/0x3f0 ksys_write+0x7c/0xf0 __x64_sys_write+0x1b/0x30 x64_sys_call+0x271d/0x2ee0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x76/0x7e We can't change the lock order of xfs_ifree_cluster() - XFS_ISTALE and XFS_IFLUSHING are serialised through to journal IO completion by the cluster buffer lock being held. There's quite a few asserts in the code that check that XFS_ISTALE does not occur out of sync with buffer locking (e.g. in xfs_iflush_cluster). There's also a dependency on the inode log item being removed from the buffer before XFS_IFLUSHING is cleared, also with asserts that trigger on this. Further, we don't have a requirement for the inode to be locked when completing or aborting inode flushing because all the inode state updates are serialised by holding the cluster buffer lock across the IO to completion. We can't check for XFS_IRECLAIM in xfs_ifree_mark_inode_stale() and skip the inode, because there is no guarantee that the inode will be reclaimed. Hence it *must* be marked XFS_ISTALE regardless of whether reclaim is preparing to free that inode. Similarly, we can't check for IFLUSHING before locking the inode because that would result in dirty inodes not being marked with ISTALE in the event of racing with XFS_IRECLAIM. Hence we have to address this issue from the xfs_reclaim_inode() side. It is clear that we cannot hold the inode locked here when calling xfs_iflush_shutdown_abort() because it is the inode->buffer lock order that causes the deadlock against xfs_ifree_cluster(). Hence we need to drop the ILOCK before aborting the inode in the shutdown case. Once we've aborted the inode, we can grab the ILOCK again and then immediately reclaim it as it is now guaranteed to be clean. Note that dropping the ILOCK in xfs_reclaim_inode() means that it can now be locked by xfs_ifree_mark_inode_stale() and seen whilst in this state. This is safe because we have left the XFS_IFLUSHING flag on the inode and so xfs_ifree_mark_inode_stale() will simply set XFS_ISTALE and move to the next inode. An ASSERT check in this path needs to be tweaked to take into account this new shutdown interaction. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27i2c: scx200_acb: depends on HAS_IOPORTJohannes Berg
It already depends on X86_32, but that's also set for ARCH=um. Recent changes made UML no longer have IO port access since it's not needed, but this driver uses it. Build it only for HAS_IOPORT. This is pretty much the same as depending on X86, but on the off-chance that HAS_IOPORT will ever be optional on x86 HAS_IOPORT is the real prerequisite. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-06-27LoongArch: KVM: Disable updating of "num_cpu" and "feature"Bibo Mao
Property "num_cpu" and "feature" are read-only once eiointc is created, which are set with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL attr group before device creation. Attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS is to update register and software state for migration and reset usage, property "num_cpu" and "feature" can not be update again if it is created already. Here discard write operation with property "num_cpu" and "feature" in attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL. Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Check validity of "num_cpu" from user spaceBibo Mao
The maximum supported cpu number is EIOINTC_ROUTE_MAX_VCPUS about irqchip EIOINTC, here add validation about cpu number to avoid array pointer overflow. Cc: stable@vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Check interrupt route from physical CPUBibo Mao
With EIOINTC interrupt controller, physical CPU ID is set for irq route. However the function kvm_get_vcpu() is used to get destination vCPU when delivering irq. With API kvm_get_vcpu(), the logical CPU ID is used. With API kvm_get_vcpu_by_cpuid(), vCPU ID can be searched from physical CPU ID. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Fix interrupt route update with EIOINTCBibo Mao
With function eiointc_update_sw_coremap(), there is forced assignment like val = *(u64 *)pvalue. Parameter pvalue may be pointer to char type or others, there is problem with forced assignment with u64 type. Here the detailed value is passed rather address pointer. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27LoongArch: KVM: Add address alignment check for IOCSR emulationBibo Mao
IOCSR instruction supports 1/2/4/8 bytes access, the address should be naturally aligned with its access size. Here address alignment check is added in the EIOINTC kernel emulation. Cc: stable@vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-06-27Merge tag 'i2c-host-fixes-6.16-rc4' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current i2c-host fixes for v6.16-rc4 - imx: fix SMBus protocol compliance during block read - omap: fix error handling path in probe - robotfuzz, tiny-usb: prevent zero-length reads - x86, designware, amdisp: fix build error when modules are disabled
2025-06-27tg3: spelling correctionsSimon Horman
Correct spelling as flagged by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-27net: mdio: Add MDIO bus controller for Airoha AN7583Christian Marangi
Airoha AN7583 SoC have 2 dedicated MDIO bus controller in the SCU register map. To driver register an MDIO controller based on the DT reg property and access the register by accessing the parent syscon. The MDIO bus logic is similar to the MT7530 internal MDIO bus but deviates of some setting and some HW bug. On Airoha AN7583 the MDIO clock is set to 25MHz by default and needs to be correctly setup to 2.5MHz to correctly work (by setting the divisor to 10x). There seems to be Hardware bug where AN7583_MII_RWDATA is not wiped in the context of unconnected PHY and the previous read value is returned. Example: (only one PHY on the BUS at 0x1f) - read at 0x1f report at 0x2 0x7500 - read at 0x0 report 0x7500 on every address To workaround this, we reset the Mdio BUS at every read to have consistent values on read operation. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-27dt-bindings: net: Document support for Airoha AN7583 MDIO ControllerChristian Marangi
Airoha AN7583 SoC have 3 different MDIO Controller. One comes from the intergated Switch based on MT7530. The other 2 live under the SCU register and expose 2 dedicated MDIO controller. Document the schema for the 2 dedicated MDIO controller. Each MDIO controller can be independently reset with the SoC reset line. Each MDIO controller have a dedicated clock configured to 2.5MHz by default to follow MDIO bus IEEE 802.3 standard. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-26Merge tag 'v6.16-p6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a regression where wp512 can no longer be used with hmac" * tag 'v6.16-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: wp512 - Use API partial block handling
2025-06-26Merge tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - Lots of small check/repair fixes, primarily in subvol loop and directory structure loop (when involving snapshots). - Fix a few 6.16 regressions: rare UAF in the foreground allocator path when taking a transaction restart from the transaction bump allocator, and some small fallout from the change to log the error being corrected in the journal when repairing errors, also some fallout from the btree node read error logging improvements. (Alan, Bharadwaj) - New option: journal_rewind This lets the entire filesystem be reset to an earlier point in time. Note that this is only a disaster recovery tool, and right now there are major caveats to using it (discards should be disabled, in particular), but it successfully restored the filesystem of one of the users who was bit by the subvolume deletion bug and didn't have backups. I'll likely be making some changes to the discard path in the future to make this a reliable recovery tool. - Some new btree iterator tracepoints, for tracking down some livelock-ish behaviour we've been seeing in the main data write path. * tag 'bcachefs-2025-06-26' of git://evilpiepirate.org/bcachefs: (51 commits) bcachefs: Plumb correct ip to trans_relock_fail tracepoint bcachefs: Ensure we rewind to run recovery passes bcachefs: Ensure btree node scan runs before checking for scanned nodes bcachefs: btree_root_unreadable_and_scan_found_nothing should not be autofix bcachefs: fix bch2_journal_keys_peek_prev_min() underflow bcachefs: Use wait_on_allocator() when allocating journal bcachefs: Check for bad write buffer key when moving from journal bcachefs: Don't unlock the trans if ret doesn't match BCH_ERR_operation_blocked bcachefs: Fix range in bch2_lookup_indirect_extent() error path bcachefs: fix spurious error_throw bcachefs: Add missing bch2_err_class() to fileattr_set() bcachefs: Add missing key type checks to check_snapshot_exists() bcachefs: Don't log fsck err in the journal if doing repair elsewhere bcachefs: Fix *__bch2_trans_subbuf_alloc() error path bcachefs: Fix missing newlines before ero bcachefs: fix spurious error in read_btree_roots() bcachefs: fsck: Fix oops in key_visible_in_snapshot() bcachefs: fsck: fix unhandled restart in topology repair bcachefs: fsck: Fix check_directory_structure when no check_dirents bcachefs: Fix restart handling in btree_node_scrub_work() ...
2025-06-26Merge branch 'ptp-belated-spring-cleaning-of-the-chardev-driver'Jakub Kicinski
Thomas Gleixner says: ==================== ptp: Belated spring cleaning of the chardev driver When looking into supporting auxiliary clocks in the PTP ioctl, the inpenetrable ptp_ioctl() letter soup bothered me enough to clean it up. The code (~400 lines!) is really hard to follow due to a gazillion of local variables, which are only used in certain case scopes, and a mixture of gotos, breaks and direct error return paths. Clean it up by splitting out the IOCTL functionality into seperate functions, which contain only the required local variables and are trivial to follow. Complete the cleanup by converting the code to lock guards and get rid of all gotos. That reduces the code size by 48 lines and also the binary text size is 80 bytes smaller than the current maze. The series is split up into one patch per IOCTL command group for easy review. v1: https://lore.kernel.org/20250620130144.351492917@linutronix.de ==================== Link: https://patch.msgid.link/20250625114404.102196103@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Simplify ptp_read()Thomas Gleixner
The mixture of gotos and direct return codes is inconsistent and just makes the code harder to read. Let it consistently return error codes directly and tidy the code flow up accordingly. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20250625115133.486953538@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Convert chardev code to lock guardsThomas Gleixner
Convert the various spin_lock_irqsave() protected critical regions to scoped guards. Use spinlock_irq instead of spinlock_irqsave as all the functions are invoked in thread context with interrupts enabled. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20250625115133.425029269@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_MASK_EN_SINGLE ioctl codeThomas Gleixner
Finish the ptp_ioctl() cleanup by splitting out the PTP_MASK_EN_SINGLE ioctl code and removing the remaining local variables and return statements. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115133.364422719@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_MASK_CLEAR_ALL ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_MASK_CLEAR_ALL ioctl code into a helper function. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115133.302755618@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_PIN_SETFUNC ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_PIN_SETFUNC ioctl code into a helper function. Convert to lock guard while at it and remove the pointless memset of the pd::rsv because nothing uses it. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115133.241503804@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_PIN_GETFUNC ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_PIN_GETFUNC ioctl code into a helper function. Convert to lock guard while at it and remove the pointless memset of the pd::rsv because nothing uses it. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115133.177265865@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_SYS_OFFSET ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_SYS_OFFSET ioctl code into a helper function. Convert it to __free() to avoid gotos. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20250625115133.113841216@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_SYS_OFFSET_EXTENDED ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_SYS_OFFSET_EXTENDED ioctl code into a helper function. Convert it to __free() to avoid gotos. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115133.050445505@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_SYS_OFFSET_PRECISE ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_SYS_OFFSET_PRECISE ioctl code into a helper function. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115132.986897454@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_ENABLE_PPS ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_ENABLE_PPS ioctl code into a helper function. Convert to a lock guard while at it. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115132.923803136@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_PEROUT_REQUEST ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_PEROUT_REQUEST ioctl code into a helper function. Convert to a lock guard while at it. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115132.860150473@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_EXTTS_REQUEST ioctl codeThomas Gleixner
Continue the ptp_ioctl() cleanup by splitting out the PTP_EXTTS_REQUEST ioctl code into a helper function. Convert to a lock guard while at it. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115132.797588258@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ptp: Split out PTP_CLOCK_GETCAPS ioctl codeThomas Gleixner
ptp_ioctl() is an inpenetrable letter soup with a gazillion of case (scope) specific variables defined at the top of the function and pointless breaks and gotos. Start cleaning it up by splitting out the PTP_CLOCK_GETCAPS ioctl code into a helper function. Use a argument pointer with a single sparse compliant type cast instead of proliferating the type cast all over the place. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250625115132.733409073@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26selftests: forwarding: lib: Split setup_wait()Petr Machata
setup_wait() takes an optional argument and then is called from the top level of the test script. That confuses shellcheck, which thinks that maybe the intention is to pass $1 of the script to the function, which is never the case. To avoid having to annotate every single new test with a SC disable, split the function in two: one that takes a mandatory argument, and one that takes no argument at all. Convert the two existing users of that optional argument, both in Spectrum resource selftest, to use the new form. Clean up vxlan_bridge_1q_mc_ul.sh to not pass a now-unused argument. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/8e13123236fe3912ae29bc04a1528bdd8551da1f.1750847794.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26net: Remove unused function first_net_device_rcu()Yue Haibing
This is unused since commit f04565ddf52e ("dev: use name hash for dev_seq_ops") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250625102155.483570-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26ipv4: fib: Remove unnecessary encap_type checkYue Haibing
lwtunnel_build_state() has check validity of encap_type, so no need to do this before call it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250625022059.3958215-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2025-06-27 We've added 6 non-merge commits during the last 8 day(s) which contain a total of 6 files changed, 120 insertions(+), 20 deletions(-). The main changes are: 1) Fix RCU usage in task_cls_state() for BPF programs using helpers like bpf_get_cgroup_classid_curr() outside of networking, from Charalampos Mitrodimas. 2) Fix a sockmap race between map_update and a pending workqueue from an earlier map_delete freeing the old psock where both pointed to the same psock->sk, from Jiayuan Chen. 3) Fix a data corruption issue when using bpf_msg_pop_data() in kTLS which failed to recalculate the ciphertext length, also from Jiayuan Chen. 4) Remove xdp_redirect_map{,_err} trace events since they are unused and also hide XDP trace events under CONFIG_BPF_SYSCALL, from Steven Rostedt. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: xdp: tracing: Hide some xdp events under CONFIG_BPF_SYSCALL xdp: Remove unused events xdp_redirect_map and xdp_redirect_map_err net, bpf: Fix RCU usage in task_cls_state() for BPF programs selftests/bpf: Add test to cover ktls with bpf_msg_pop_data bpf, ktls: Fix data corruption when using bpf_msg_pop_data() in ktls bpf, sockmap: Fix psock incorrectly pointing to sk ==================== Link: https://patch.msgid.link/20250626230111.24772-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26net: airoha: Get rid of dma_sync_single_for_device() in ↵Lorenzo Bianconi
airoha_qdma_fill_rx_queue() Since the page_pool for airoha_eth driver is created with PP_FLAG_DMA_SYNC_DEV flag, we do not need to sync_for_device each page received from the pool since it is already done by the page_pool codebase. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250625-airoha-sync-for-device-v1-1-923741deaabf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26net: mana: Fix build errors when CONFIG_NET_SHAPER is disabledErni Sri Satya Vennela
Fix build errors when CONFIG_NET_SHAPER is disabled, including: drivers/net/ethernet/microsoft/mana/mana_en.c:804:10: error: 'const struct net_device_ops' has no member named 'net_shaper_ops' 804 | .net_shaper_ops = &mana_shaper_ops, drivers/net/ethernet/microsoft/mana/mana_en.c:804:35: error: initialization of 'int (*)(struct net_device *, struct neigh_parms *)' from incompatible pointer type 'const struct net_shaper_ops *' [-Werror=incompatible-pointer-types] 804 | .net_shaper_ops = &mana_shaper_ops, Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Fixes: 75cabb46935b ("net: mana: Add support for net_shaper_ops") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506230625.bfUlqb8o-lkp@intel.com/ Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1750851355-8067-1-git-send-email-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26Merge tag 'hid-for-linus-2025062701' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - fix for stalls during suspend/resume cycles with hid-nintendo (Daniel J. Ogorchock) - memory leak and reference count fixes in hid-wacom and in-appletb-kdb (Qasim Ijaz) - race condition (leading to kernel crash) fix during device removal in hid-wacom (Thomas Zeitlhofer) - fix for missed interrupt in intel-thc-hid (Intel-thc-hid:) - support for a bunch of new device IDs * tag 'hid-for-linus-2025062701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: lenovo: Add support for ThinkPad X1 Tablet Thin Keyboard Gen2 HID: appletb-kbd: fix "appletb_backlight" backlight device reference counting HID: wacom: fix crash in wacom_aes_battery_handler() HID: intel-ish-hid: ipc: Add Wildcat Lake PCI device ID hid: intel-ish-hid: Use PCI_DEVICE_DATA() macro for ISH device table HID: lenovo: Restrict F7/9/11 mode to compact keyboards only HID: Add IGNORE quirk for SMARTLINKTECHNOLOGY HID: input: lower message severity of 'No inputs registered, leaving' to debug HID: quirks: Add quirk for 2 Chicony Electronics HP 5MP Cameras HID: Intel-thc-hid: Intel-quicki2c: Enhance QuickI2C reset flow HID: nintendo: avoid bluetooth suspend/resume stalls HID: wacom: fix kobject reference count leak HID: wacom: fix memory leak on sysfs attribute creation failure HID: wacom: fix memory leak on kobject creation failure
2025-06-26dt-bindings: net: Rename renesas,r9a09g057-gbeth.yamlGeert Uytterhoeven
The DT bindings file "renesas,r9a09g057-gbeth.yaml" applies to a whole family of SoCs, and uses "renesas,rzv2h-gbeth" as a fallback compatible value. Hence rename it to the more generic "renesas,rzv2h-gbeth.yaml". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/721f6e0e09777e0842ecaca4578bc50c953d2428.1750838954.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-27Merge tag 'drm-xe-fixes-2025-06-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: Driver Changes: - Missing error check (Haoxiang Li) - Fix xe_hwmon_power_max_write (Karthik) - Move flushes (Maarten and Matthew Auld) - Explicitly exit CT safe mode on unwind (Michal) - Process deferred GGTT node removals on device unwind (Michal) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/aF1T6EzzC3xj4K4H@fedora
2025-06-27Merge tag 'drm-intel-fixes-2025-06-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix for SNPS PHY HDMI for 1080p@120Hz - Correct DP AUX DPCD probe address - Followup build fix for GCOV and AutoFDO enabled config Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://lore.kernel.org/r/aFzsHR9WLYsxg8jy@jlahtine-mobl
2025-06-27Merge tag 'amd-drm-fixes-6.16-2025-25-25' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.16-2025-25-25: amdgpu: - Cleaner shader support for additional GFX9 GPUs - MES firmware compatibility fixes - Discovery error reporting fixes - SDMA6/7 userq fixes - Backlight fix - EDID sanity check Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250625151734.11537-1-alexander.deucher@amd.com
2025-06-26Merge tag 'devicetree-fixes-for-6.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Convert altr,uart-1.0 and altr,juart-1.0 to DT schema. These were applied for nios2, but never sent upstream. - Fix extra '/' in fsl,ls1028a-reset '$id' path - Fix warnings in ti,sn65dsi83 schema due to unnecessary $ref. * tag 'devicetree-fixes-for-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: serial: Convert altr,uart-1.0 to DT schema dt-bindings: serial: Convert altr,juart-1.0 to DT schema dt-bindings: soc: fsl,ls1028a-reset: Drop extra "/" in $id dt-bindings: drm/bridge: ti-sn65dsi83: drop $ref to fix lvds-vod* warnings
2025-06-26io_uring/kbuf: flag partial buffer mappingsJens Axboe
A previous commit aborted mapping more for a non-incremental ring for bundle peeking, but depending on where in the process this peeking happened, it would not necessarily prevent a retry by the user. That can create gaps in the received/read data. Add struct buf_sel_arg->partial_map, which can pass this information back. The networking side can then map that to internal state and use it to gate retry as well. Since this necessitates a new flag, change io_sr_msg->retry to a retry_flags member, and store both the retry and partial map condition in there. Cc: stable@vger.kernel.org Fixes: 26ec15e4b0c1 ("io_uring/kbuf: don't truncate end buffer for multiple buffer peeks") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-26NFSv4/flexfiles: Fix handling of NFS level errors in I/OTrond Myklebust
Allow the flexfiles error handling to recognise NFS level errors (as opposed to RPC level errors) and handle them separately. The main motivator is the NFSERR_PERM errors that get returned if the NFS client connects to the data server through a port number that is lower than 1024. In that case, the client should disconnect and retry a READ on a different data server, or it should retry a WRITE after reconnecting. Reviewed-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-06-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc4). Conflicts: Documentation/netlink/specs/mptcp_pm.yaml 9e6dd4c256d0 ("netlink: specs: mptcp: replace underscores with dashes in names") ec362192aa9e ("netlink: specs: fix up indentation errors") https://lore.kernel.org/20250626122205.389c2cd4@canb.auug.org.au Adjacent changes: Documentation/netlink/specs/fou.yaml 791a9ed0a40d ("netlink: specs: fou: replace underscores with dashes in names") 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26Merge tag 'net-6.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and wireless. Current release - regressions: - bridge: fix use-after-free during router port configuration Current release - new code bugs: - eth: wangxun: fix the creation of page_pool Previous releases - regressions: - netpoll: initialize UDP checksum field before checksumming - wifi: mac80211: finish link init before RCU publish - bluetooth: fix use-after-free in vhci_flush() - eth: - ionic: fix DMA mapping test - bnxt: properly flush XDP redirect lists Previous releases - always broken: - netlink: specs: enforce strict naming of properties - unix: don't leave consecutive consumed OOB skbs. - vsock: fix linux/vm_sockets.h userspace compilation errors - selftests: fix TCP packet checksum" * tag 'net-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) net: libwx: fix the creation of page_pool net: selftests: fix TCP packet checksum atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister(). netlink: specs: enforce strict naming of properties netlink: specs: tc: replace underscores with dashes in names netlink: specs: rt-link: replace underscores with dashes in names netlink: specs: mptcp: replace underscores with dashes in names netlink: specs: ovs_flow: replace underscores with dashes in names netlink: specs: devlink: replace underscores with dashes in names netlink: specs: dpll: replace underscores with dashes in names netlink: specs: ethtool: replace underscores with dashes in names netlink: specs: fou: replace underscores with dashes in names netlink: specs: nfsd: replace underscores with dashes in names net: enetc: Correct endianness handling in _enetc_rd_reg64 atm: idt77252: Add missing `dma_map_error()` bnxt: properly flush XDP redirect lists vsock/uapi: fix linux/vm_sockets.h userspace compilation errors wifi: mac80211: finish link init before RCU publish wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version selftest: af_unix: Add tests for -ECONNRESET. ...
2025-06-26cifs: Fix reading into an ITER_FOLIOQ from the smbdirect codeDavid Howells
When performing a file read from RDMA, smbd_recv() prints an "Invalid msg type 4" error and fails the I/O. This is due to the switch-statement there not handling the ITER_FOLIOQ handed down from netfslib. Fix this by collapsing smbd_recv_buf() and smbd_recv_page() into smbd_recv() and just using copy_to_iter() instead of memcpy(). This future-proofs the function too, in case more ITER_* types are added. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Talpey <tom@talpey.com> cc: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-26cifs: Fix the smbd_response slab to allow usercopyDavid Howells
The handling of received data in the smbdirect client code involves using copy_to_iter() to copy data from the smbd_reponse struct's packet trailer to a folioq buffer provided by netfslib that encapsulates a chunk of pagecache. If, however, CONFIG_HARDENED_USERCOPY=y, this will result in the checks then performed in copy_to_iter() oopsing with something like the following: CIFS: Attempting to mount //172.31.9.1/test CIFS: VFS: RDMA transport established usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102! ... RIP: 0010:usercopy_abort+0x6c/0x80 ... Call Trace: <TASK> __check_heap_object+0xe3/0x120 __check_object_size+0x4dc/0x6d0 smbd_recv+0x77f/0xfe0 [cifs] cifs_readv_from_socket+0x276/0x8f0 [cifs] cifs_read_from_socket+0xcd/0x120 [cifs] cifs_demultiplex_thread+0x7e9/0x2d50 [cifs] kthread+0x396/0x830 ret_from_fork+0x2b8/0x3b0 ret_from_fork_asm+0x1a/0x30 The problem is that the smbd_response slab's packet field isn't marked as being permitted for usercopy. Fix this by passing parameters to kmem_slab_create() to indicate that copy_to_iter() is permitted from the packet region of the smbd_response slab objects, less the header space. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Reported-by: Stefan Metzmacher <metze@samba.org> Link: https://lore.kernel.org/r/acb7f612-df26-4e2a-a35d-7cd040f513e1@samba.org/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Stefan Metzmacher <metze@samba.org> Tested-by: Stefan Metzmacher <metze@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-26smb: client: fix potential deadlock when reconnecting channelsPaulo Alcantara
Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order and prevent the following deadlock from happening ====================================================== WARNING: possible circular locking dependency detected 6.16.0-rc3-build2+ #1301 Tainted: G S W ------------------------------------------------------ cifsd/6055 is trying to acquire lock: ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200 but task is already holding lock: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_setup_session+0x81/0x4b0 cifs_get_smb_ses+0x771/0x900 cifs_mount_get_session+0x7e/0x170 cifs_mount+0x92/0x2d0 cifs_smb3_do_mount+0x161/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}: validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_match_super+0x101/0x320 sget+0xab/0x270 cifs_smb3_do_mount+0x1e0/0x460 smb3_get_tree+0x55/0x90 vfs_get_tree+0x46/0x180 do_new_mount+0x1b0/0x2e0 path_mount+0x6ee/0x740 do_mount+0x98/0xe0 __do_sys_mount+0x148/0x180 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}: check_noncircular+0x95/0xc0 check_prev_add+0x115/0x2f0 validate_chain+0x1cf/0x270 __lock_acquire+0x60e/0x780 lock_acquire.part.0+0xb4/0x1f0 _raw_spin_lock+0x2f/0x40 cifs_signal_cifsd_for_reconnect+0x134/0x200 __cifs_reconnect+0x8f/0x500 cifs_handle_standard+0x112/0x280 cifs_demultiplex_thread+0x64d/0xbc0 kthread+0x2f7/0x310 ret_from_fork+0x2a/0x230 ret_from_fork_asm+0x1a/0x30 other info that might help us debug this: Chain exists of: &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ret_buf->chan_lock); lock(&ret_buf->ses_lock); lock(&ret_buf->chan_lock); lock(&tcp_ses->srv_lock); *** DEADLOCK *** 3 locks held by cifsd/6055: #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200 #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200 #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 Cc: linux-cifs@vger.kernel.org Reported-by: David Howells <dhowells@redhat.com> Fixes: d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data") Reviewed-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-26ice: default to TIME_REF instead of TXCO on E825-CJacob Keller
The driver currently defaults to the internal oscillator as the clock source for E825-C hardware. While this clock source is labeled TCXO, indicating a temperature compensated oscillator, this is only true for some board designs. Many board designs have a less capable oscillator. The E825-C hardware may also have its clock source set to the TIME_REF pin. This pin is connected to the DPLL and is often a more stable clock source. The choice of the internal oscillator is not suitable for all systems, especially those which want to enable SyncE support. There is currently no interface available for users to configure the clock source. Other variants of the E82x board have the clock source configured in the NVM, but E825-C lacks this capability, so different board designs cannot select a different default clock via firmware. In most setups, the TIME_REF is a suitable default clock source. Additionally, we now fall back to the internal oscillator automatically if the TIME_REF clock source cannot be locked. Change the default clock source for E825-C to TIME_REF. Note that the driver logs a dev_dbg message upon configuring the TSPLL which includes the clock source and frequency. This can be enabled to confirm which clock source is in use. Longterm a proper interface to dynamically introspect and change the clock source will be designed (perhaps some extension of the DPLL subsystem?) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-06-26ice: move TSPLL init calls to ice_ptp.cKarol Kolacinski
Initialize TSPLL after initializing PHC in ice_ptp.c instead of calling for each product in PHC init in ice_ptp_hw.c. Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>