summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-08-06smb: smbdirect: introduce smbdirect_socket.recv_io.expectedStefan Metzmacher
The expected message type can be global as they never change during the after negotiation process. This will replace smbd_response->type and smb_direct_recvmsg->type in future. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: client: remove unused smbd_connection->fragment_reassembly_remainingStefan Metzmacher
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: client: let recv_done() avoid touching data_transfer after cleanup/moveStefan Metzmacher
Calling enqueue_reassembly() and wake_up_interruptible(&info->wait_reassembly_queue) or put_receive_buffer() means the response/data_transfer pointer might get re-used by another thread, which means these should be the last operations before calling return. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: client: let recv_done() cleanup before notifying the callers.Stefan Metzmacher
We should call put_receive_buffer() before waking up the callers. For the internal error case of response->type being unexpected, we now also call smbd_disconnect_rdma_connection() instead of not waking up the callers at all. Note that the SMBD_TRANSFER_DATA case still has problems, which will be addressed in the next commit in order to make it easier to review this one. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: client: make sure we call ib_dma_unmap_single() only if we called ↵Stefan Metzmacher
ib_dma_map_single already In case of failures either ib_dma_map_single() might not be called yet or ib_dma_unmap_single() was already called. We should make sure put_receive_buffer() only calls ib_dma_unmap_single() if needed. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: client: remove separate empty_packet_queueStefan Metzmacher
There's no need to maintain two lists, we can just have a single list of receive buffers, which are free to use. It just added unneeded complexity and resulted in ib_dma_unmap_single() not being called from recv_done() for empty keepalive packets. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: client: let send_done() cleanup before calling ↵Stefan Metzmacher
smbd_disconnect_rdma_connection() We should call ib_dma_unmap_single() and mempool_free() before calling smbd_disconnect_rdma_connection(). And smbd_disconnect_rdma_connection() needs to be the last function to call as all other state might already be gone after it returns. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06cifs: Fix null-ptr-deref by static initializing global lockYunseong Kim
A kernel panic can be triggered by reading /proc/fs/cifs/debug_dirs. The crash is a null-ptr-deref inside spin_lock(), caused by the use of the uninitialized global spinlock cifs_tcp_ses_lock. init_cifs() └── cifs_proc_init() └── // User can access /proc/fs/cifs/debug_dirs here └── cifs_debug_dirs_proc_show() └── spin_lock(&cifs_tcp_ses_lock); // Uninitialized! KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [dfff800000000000] address between user and kernel address ranges Internal error: Oops: 0000000096000005 [#1] SMP Modules linked in: CPU: 3 UID: 0 PID: 16435 Comm: stress-ng-procf Not tainted 6.16.0-10385-g79f14b5d84c6 #37 PREEMPT Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu1 06/11/2025 pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : do_raw_spin_lock+0x84/0x2cc lr : _raw_spin_lock+0x24/0x34 sp : ffff8000966477e0 x29: ffff800096647860 x28: ffff800096647b88 x27: ffff0001c0c22070 x26: ffff0003eb2b60c8 x25: ffff0001c0c22018 x24: dfff800000000000 x23: ffff0000f624e000 x22: ffff0003eb2b6020 x21: ffff0000f624e768 x20: 0000000000000004 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: ffff8000804b9600 x15: ffff700012cc8f04 x14: 1ffff00012cc8f04 x13: 0000000000000004 x12: ffffffffffffffff x11: 1ffff00012cc8f00 x10: ffff80008d9af0d2 x9 : f3f3f304f1f1f1f1 x8 : 0000000000000000 x7 : 7365733c203e6469 x6 : 20656572743c2023 x5 : ffff0000e0ce0044 x4 : ffff80008a4deb6e x3 : ffff8000804b9718 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: do_raw_spin_lock+0x84/0x2cc (P) _raw_spin_lock+0x24/0x34 cifs_debug_dirs_proc_show+0x1ac/0x4c0 seq_read_iter+0x3b0/0xc28 proc_reg_read_iter+0x178/0x2a8 vfs_read+0x5f8/0x88c ksys_read+0x120/0x210 __arm64_sys_read+0x7c/0x90 invoke_syscall+0x98/0x2b8 el0_svc_common+0x130/0x23c do_el0_svc+0x48/0x58 el0_svc+0x40/0x140 el0t_64_sync_handler+0x84/0x12c el0t_64_sync+0x1ac/0x1b0 Code: aa0003f3 f9000feb f2fe7e69 f8386969 (38f86908) ---[ end trace 0000000000000000 ]--- The root cause is an initialization order problem. The lock is declared as a global variable and intended to be initialized during module startup. However, the procfs entry that uses this lock can be accessed by userspace before the spin_lock_init() call has run. This creates a race window where reading the proc file will attempt to use the lock before it is initialized, leading to the crash. For a global lock with a static lifetime, the correct and robust approach is to use compile-time initialization. Fixes: 844e5c0eb176 ("smb3 client: add way to show directory leases for improved debugging") Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: server: let recv_done() avoid touching data_transfer after cleanup/moveStefan Metzmacher
Calling enqueue_reassembly() and wake_up_interruptible(&t->wait_reassembly_queue) or put_receive_buffer() means the recvmsg/data_transfer pointer might get re-used by another thread, which means these should be the last operations before calling return. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: server: let recv_done() consistently call ↵Stefan Metzmacher
put_recvmsg/smb_direct_disconnect_rdma_connection We should call put_recvmsg() before smb_direct_disconnect_rdma_connection() in order to call it before waking up the callers. In all error cases we should call smb_direct_disconnect_rdma_connection() in order to avoid stale connections. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: server: make sure we call ib_dma_unmap_single() only if we called ↵Stefan Metzmacher
ib_dma_map_single already In case of failures either ib_dma_map_single() might not be called yet or ib_dma_unmap_single() was already called. We should make sure put_recvmsg() only calls ib_dma_unmap_single() if needed. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06smb: server: remove separate empty_recvmsg_queueStefan Metzmacher
There's no need to maintain two lists, we can just have a single list of receive buffers, which are free to use. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06cifs: Move the SMB1 transport code out of transport.cSteve French
Shrink the size of cifs.ko when SMB1 is not enabled in the config by moving the SMB1 transport code to different file. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-06Merge tag 'for-6.17-fix-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A single btrfs commit. It fixes a problem that people started to hit since 6.15.3 during log replay (e.g. after a crash). The bug is old but got more likely to happen since commit 5e85262e542d ("btrfs: fix fsync of files with no hard links not persisting deletion") got backported to stable (6.15 only)" * tag 'for-6.17-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix log tree replay failure due to file with 0 links and extents
2025-08-06btrfs: fix log tree replay failure due to file with 0 links and extentsFilipe Manana
If we log a new inode (not persisted in a past transaction) that has 0 links and extents, then log another inode with an higher inode number, we end up with failing to replay the log tree with -EINVAL. The steps for this are: 1) create new file A 2) write some data to file A 3) open an fd on file A 4) unlink file A 5) fsync file A using the previously open fd 6) create file B (has higher inode number than file A) 7) fsync file B 8) power fail before current transaction commits Now when attempting to mount the fs, the log replay will fail with -ENOENT at replay_one_extent() when attempting to replay the first extent of file A. The failure comes when trying to open the inode for file A in the subvolume tree, since it doesn't exist. Before commit 5f61b961599a ("btrfs: fix inode lookup error handling during log replay"), the returned error was -EIO instead of -ENOENT, since we converted any errors when attempting to read an inode during log replay to -EIO. The reason for this is that the log replay procedure fails to ignore the current inode when we are at the stage LOG_WALK_REPLAY_ALL, our current inode has 0 links and last inode we processed in the previous stage has a non 0 link count. In other words, the issue is that at replay_one_extent() we only update wc->ignore_cur_inode if the current replay stage is LOG_WALK_REPLAY_INODES. Fix this by updating wc->ignore_cur_inode whenever we find an inode item regardless of the current replay stage. This is a simple solution and easy to backport, but later we can do other alternatives like avoid logging extents or inode items other than the inode item for inodes with a link count of 0. The problem with the wc->ignore_cur_inode logic has been around since commit f2d72f42d5fa ("Btrfs: fix warning when replaying log after fsync of a tmpfile") but it only became frequent to hit since the more recent commit 5e85262e542d ("btrfs: fix fsync of files with no hard links not persisting deletion"), because we stopped skipping inodes with a link count of 0 when logging, while before the problem would only be triggered if trying to replay a log tree created with an older kernel which has a logged inode with 0 links. A test case for fstests will be submitted soon. Reported-by: Peter Jung <ptr1337@cachyos.org> Link: https://lore.kernel.org/linux-btrfs/fce139db-4458-4788-bb97-c29acf6cb1df@cachyos.org/ Reported-by: burneddi <burneddi@protonmail.com> Link: https://lore.kernel.org/linux-btrfs/lh4W-Lwc0Mbk-QvBhhQyZxf6VbM3E8VtIvU3fPIQgweP_Q1n7wtlUZQc33sYlCKYd-o6rryJQfhHaNAOWWRKxpAXhM8NZPojzsJPyHMf2qY=@protonmail.com/#t Reported-by: Russell Haley <yumpusamongus@gmail.com> Link: https://lore.kernel.org/linux-btrfs/598ecc75-eb80-41b3-83c2-f2317fbb9864@gmail.com/ Fixes: f2d72f42d5fa ("Btrfs: fix warning when replaying log after fsync of a tmpfile") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-08-05NFS/localio: nfs_uuid_put() fix the wake up after unlinking the fileTrond Myklebust
Use store_release_wake_up() instead of wake_up_var_locked(), because the waiter cannot retake the nfs_uuid->lock. Acked-by: Mike Snitzer <snitzer@kernel.org> Tested-by: Mike Snitzer <snitzer@kernel.org> Suggested-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/all/175262948827.2234665.1891349021754495573@noble.neil.brown.name/ Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-05NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()Trond Myklebust
In order for the wait in nfs_uuid_put() to be safe, it is necessary to ensure that nfs_uuid_add_file() doesn't add a new entry once the nfs_uuid->net has been NULLed out. Also fix up the wake_up_var_locked() / wait_var_event_spinlock() to both use the nfs_uuid address, since nfl, and &nfl->uuid could be used elsewhere. Acked-by: Mike Snitzer <snitzer@kernel.org> Tested-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/all/175262893035.2234665.1735173020338594784@noble.neil.brown.name/ Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-05NFS/localio: nfs_close_local_fh() fix check for file closedTrond Myklebust
If the struct nfs_file_localio is closed, its list entry will be empty, but the nfs_uuid->files list might still contain other entries. Acked-by: Mike Snitzer <snitzer@kernel.org> Tested-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Fixes: 21fb44034695 ("nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-05fs/proc/task_mmu: hold PTL in pagemap_hugetlb_range and gather_hugetlb_statsJinjiang Tu
Hold PTL in pagemap_hugetlb_range() and gather_hugetlb_stats() to avoid operating on stale page, as pagemap_pmd_range() and gather_pte_stats() have done. Link: https://lkml.kernel.org/r/20250724090958.455887-3-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brahmajit Das <brahmajit.xyz@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joern Engel <joern@logfs.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-05mm/smaps: fix race between smaps_hugetlb_range and migrationJinjiang Tu
smaps_hugetlb_range() handles the pte without holdling ptl, and may be concurrenct with migration, leaing to BUG_ON in pfn_swap_entry_to_page(). The race is as follows. smaps_hugetlb_range migrate_pages huge_ptep_get remove_migration_ptes folio_unlock pfn_swap_entry_folio BUG_ON To fix it, hold ptl lock in smaps_hugetlb_range(). Link: https://lkml.kernel.org/r/20250724090958.455887-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20250724090958.455887-2-tujinjiang@huawei.com Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brahmajit Das <brahmajit.xyz@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joern Engel <joern@logfs.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-05smb: client: smb: client: eliminate mid_flags fieldWang Zhaolong
This is step 3/4 of a patch series to fix mid_q_entry memory leaks caused by race conditions in callback execution. Replace the mid_flags bitmask with dedicated boolean fields to simplify locking logic and improve code readability: - Replace MID_DELETED with bool deleted_from_q - Replace MID_WAIT_CANCELLED with bool wait_cancelled - Remove mid_flags field entirely The new boolean fields have clearer semantics: - deleted_from_q: whether mid has been removed from pending_mid_q - wait_cancelled: whether request was cancelled during wait This change reduces memory usage (from 4-byte bitmask to 2 boolean flags) and eliminates confusion about which lock protects which flag bits, preparing for per-mid locking in the next patch. Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Acked-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-05smb: client: add mid_counter_lock to protect the mid counter counterWang Zhaolong
This is step 2/4 of a patch series to fix mid_q_entry memory leaks caused by race conditions in callback execution. Add a dedicated mid_counter_lock to protect current_mid counter, separating it from mid_queue_lock which protects pending_mid_q operations. This reduces lock contention and prepares for finer- grained locking in subsequent patches. Changes: - Add TCP_Server_Info->mid_counter_lock spinlock - Rename CurrentMid to current_mid for consistency - Use mid_counter_lock to protect current_mid access - Update locking documentation in cifsglob.h This separation allows mid allocation to proceed without blocking queue operations, improving performance under heavy load. Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Acked-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-05smb: client: rename server mid_lock to mid_queue_lockWang Zhaolong
This is step 1/4 of a patch series to fix mid_q_entry memory leaks caused by race conditions in callback execution. The current mid_lock name is somewhat ambiguous about what it protects. To prepare for splitting this lock into separate, more granular locks, this patch renames mid_lock to mid_queue_lock to clearly indicate its specific responsibility for protecting the pending_mid_q list and related queue operations. No functional changes are made in this patch - it only prepares the codebase for the lock splitting that follows. - mid_queue_lock for queue operations - mid_counter_lock for mid counter operations - per-mid locks for individual mid state management Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Acked-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-05nfsd: avoid ref leak in nfsd_open_local_fh()NeilBrown
If two calls to nfsd_open_local_fh() race and both successfully call nfsd_file_acquire_local(), they will both get an extra reference to the net to accompany the file reference stored in *pnf. One of them will fail to store (using xchg()) the file reference in *pnf and will drop that reference but WON'T drop the accompanying reference to the net. This leak means that when the nfs server is shut down it will hang in nfsd_shutdown_net() waiting for &nn->nfsd_net_free_done. This patch adds the missing nfsd_net_put(). Reported-by: Mike Snitzer <snitzer@kernel.org> Fixes: e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for getting nfsd_file") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Tested-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-08-05nfsd: don't set the ctime on delegated atime updatesJeff Layton
Clients will typically precede a DELEGRETURN for a delegation with delegated timestamp with a SETATTR to set the timestamps on the server to match what the client has. knfsd implements this by using the nfsd_setattr() infrastructure, which will set ATTR_CTIME on any update that goes to notify_change(). This is problematic as it means that the client will get a spurious ctime update when updating the atime. POSIX unfortunately doesn't phrase it succinctly, but updating the atime due to reads should not update the ctime. In this case, the client is sending a SETATTR to update the atime on the server to match its latest value. The ctime should not be advanced in this case as that would incorrectly indicate a change to the inode. Fix this by not implicitly setting ATTR_CTIME when ATTR_DELEG is set in __nfsd_setattr(). The decoder for FATTR4_WORD2_TIME_DELEG_MODIFY already sets ATTR_CTIME, so this is sufficient to make it skip setting the ctime on atime-only updates. Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR") Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-08-05Merge tag 'exfat-for-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Use generic_write_sync instead of vfs_fsync_range in exfat_file_write_iter. It will fix an issue where fdatasync would be set incorrectly. - Fix potential infinite loop by the self-linked chain. * tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: add cluster chain loop check for dir exfat: fdatasync flag should be same like generic_write_sync()
2025-08-04smb: client: fix creating symlinks under POSIX mountsPaulo Alcantara
SMB3.1.1 POSIX mounts support native symlinks that are created with IO_REPARSE_TAG_SYMLINK reparse points, so skip the checking of FILE_SUPPORTS_REPARSE_POINTS as some servers might not have it set. Cc: linux-cifs@vger.kernel.org Cc: Ralph Boehme <slow@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Reported-by: Matthew Richardson <m.richardson@ed.ac.uk> Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-04smb: client: default to nonativesocket under POSIX mountsPaulo Alcantara
SMB3.1.1 POSIX mounts require sockets to be created with NFS reparse points. Cc: linux-cifs@vger.kernel.org Cc: Ralph Boehme <slow@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Reported-by: Matthew Richardson <m.richardson@ed.ac.uk> Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-04Merge tag 'f2fs-for-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "Three main updates: folio conversion by Matthew, switch to a new mount API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS with finer granularity by Daeho. There are also patches to address bugs and issues in the existing features such as GCs, file pinning, write-while-dio-read, contingous block allocation, and memory access violations. Enhancements: - switch to new mount API and folio conversion - add sysfs nodes to controle F2FS GCs for ZUFS - improve performance on the nat entry cache - drop inode from the donation list when the last file is closed - avoid splitting bio when reading multiple pages Bug fixes: - fix to trigger foreground gc during f2fs_map_blocks() in lfs mode - make sure zoned device GC to use FG_GC in shortage of free section - fix to calculate dirty data during has_not_enough_free_secs() - fix to update upper_p in __get_secs_required() correctly - wait for inflight dio completion, excluding pinned files read using dio - don't break allocation when crossing contiguous sections - vm_unmap_ram() may be called from an invalid context - fix to avoid out-of-boundary access in dnode page - fix to avoid panic in f2fs_evict_inode - fix to avoid UAF in f2fs_sync_inode_meta() - fix to use f2fs_is_valid_blkaddr_raw() in do_write_page() - fix UAF of f2fs_inode_info in f2fs_free_dic - fix to avoid invalid wait context issue - fix bio memleak when committing super block - handle nat.blkaddr corruption in f2fs_get_node_info() In addition, there are also clean-ups and minor bug fixes" * tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits) f2fs: drop inode from the donation list when the last file is closed f2fs: add gc_boost_gc_greedy sysfs node f2fs: add gc_boost_gc_multiple sysfs node f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode f2fs: fix to calculate dirty data during has_not_enough_free_secs() f2fs: fix to update upper_p in __get_secs_required() correctly f2fs: directly add newly allocated pre-dirty nat entry to dirty set list f2fs: avoid redundant clean nat entry move in lru list f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio f2fs: ignore valid ratio when free section count is low f2fs: don't break allocation when crossing contiguous sections f2fs: remove unnecessary tracepoint enabled check f2fs: merge the two conditions to avoid code duplication f2fs: vm_unmap_ram() may be called from an invalid context f2fs: fix to avoid out-of-boundary access in dnode page f2fs: switch to the new mount api f2fs: introduce fs_context_operation structure f2fs: separate the options parsing and options checking f2fs: Add f2fs_fs_context to record the mount options f2fs: Allow sbi to be NULL in f2fs_printk ...
2025-08-04NFSv4: Remove duplicate lookups, capability probes and fsinfo callsTrond Myklebust
When crossing into a new filesystem, the NFSv4 client will look up the new directory, and then call nfs4_server_capabilities() as well as nfs4_do_fsinfo() at least twice. This patch removes the duplicate calls, and reduces the initial lookup to retrieve just a minimal set of attributes. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-04NFS: Fix the setting of capabilities when automounting a new filesystemTrond Myklebust
Capabilities cannot be inherited when we cross into a new filesystem. They need to be reset to the minimal defaults, and then probed for again. Fixes: 54ceac451598 ("NFS: Share NFS superblocks per-protocol per-server per-FSID") Cc: stable@vger.kernel.org Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-03Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer||notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...
2025-08-03nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()Li RongQing
The usage of read_seqbegin_or_lock() in nfs_copy_boot_verifier() is wrong. "seq" is always even and thus "or_lock" has no effect. nfs_copy_boot_verifier() just copies 8 bytes and is supposed to be very rare operation, so we do not need the adaptive locking in this case. Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20250731081038.3478-1-lirongqing@baidu.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-08-02fat: fix too many log in fat_chain_add()OGAWA Hirofumi
This log was excessive for a serial console. So use the ratelimited version instead. Link: https://lkml.kernel.org/r/87qzy611d9.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reported-by: syzbot+fa7ef54f66c189c04b73@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa7ef54f66c189c04b73 Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-01smb: client: set symlink type as native for POSIX mountsPaulo Alcantara
SMB3.1.1 POSIX mounts require symlinks to be created natively with IO_REPARSE_TAG_SYMLINK reparse point. Cc: linux-cifs@vger.kernel.org Cc: Ralph Boehme <slow@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Reported-by: Matthew Richardson <m.richardson@ed.ac.uk> Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-07-31Merge tag 'v6.17-rc-part1-smb3-client-fixes' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull smb client updates from Steve French: - Fix network namespace refcount leak - Multichannel reconnect fix - Perf improvement to not do unneeded EA query on native symlinks - Performance improvement for directory leases to allow extending lease for actively queried directories - Improve debugging of directory leases by adding pseudofile to show them - Five minor mount cleanup patches - Minor directory lease cleanup patch - Allow creating special files via reparse points over SMB1 - Two minor improvements to FindFirst over SMB1 - Two NTLMSSP session setup fixes * tag 'v6.17-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3 client: add way to show directory leases for improved debugging smb: client: get rid of kstrdup() when parsing iocharset mount option smb: client: get rid of kstrdup() when parsing domain mount option smb: client: get rid of kstrdup() when parsing pass2 mount option smb: client: get rid of kstrdup() when parsing pass mount option smb: client: get rid of kstrdup() when parsing user mount option cifs: Add support for creating reparse points over SMB1 cifs: Do not query WSL EAs for native SMB symlink cifs: Optimize CIFSFindFirst() response when not searching cifs: Fix calling CIFSFindFirst() for root path without msearch smb: client: fix session setup against servers that require SPN smb: client: allow parsing zero-length AV pairs cifs: add new field to track the last access time of cfid smb: change return type of cached_dir_lease_break() to bool cifs: reset iface weights when we cannot find a candidate smb: client: fix netns refcount leak after net_passive changes
2025-08-01exfat: add cluster chain loop check for dirYuezhang Mo
An infinite loop may occur if the following conditions occur due to file system corruption. (1) Condition for exfat_count_dir_entries() to loop infinitely. - The cluster chain includes a loop. - There is no UNUSED entry in the cluster chain. (2) Condition for exfat_create_upcase_table() to loop infinitely. - The cluster chain of the root directory includes a loop. - There are no UNUSED entry and up-case table entry in the cluster chain of the root directory. (3) Condition for exfat_load_bitmap() to loop infinitely. - The cluster chain of the root directory includes a loop. - There are no UNUSED entry and bitmap entry in the cluster chain of the root directory. (4) Condition for exfat_find_dir_entry() to loop infinitely. - The cluster chain includes a loop. - The unused directory entries were exhausted by some operation. (5) Condition for exfat_check_dir_empty() to loop infinitely. - The cluster chain includes a loop. - The unused directory entries were exhausted by some operation. - All files and sub-directories under the directory are deleted. This commit adds checks to break the above infinite loop. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-08-01exfat: fdatasync flag should be same like generic_write_sync()Zhengxu Zhang
Test: androbench by default setting, use 64GB sdcard. the random write speed: without this patch 3.5MB/s with this patch 7MB/s After patch "11a347fb6cef", the random write speed decreased significantly. the .write_iter() interface had been modified, and check the differences with generic_file_write_iter(), when calling generic_write_sync() and exfat_file_write_iter() to call vfs_fsync_range(), the fdatasync flag is wrong, and make not use the fdatasync mode, and make random write speed decreased. So use generic_write_sync() instead of vfs_fsync_range(). Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Zhengxu Zhang <zhengxu.zhang@unisoc.com> Acked-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-07-31Merge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2025-07-31Merge tag 'fsnotify_for_v6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "A couple of small improvements for fsnotify subsystem. The most interesting is probably Amir's change modifying the meaning of fsnotify fmode bits (and I spell it out specifically because I know you care about those). There's no change for the common cases of no fsnotify watches or no permission event watches. But when there are permission watches (either for open or for pre-content events) but no FAN_ACCESS_PERM watch (which nobody uses in practice) we are now able optimize away unnecessary cache loads from the read path" * tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook samples: fix building fs-monitor on musl systems fanotify: sanitize handle_type values when reporting fid
2025-07-31Merge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggyLinus Torvalds
Pull jfs updates from Dave Kleikamp: "Fixes and cleanups for JFS filesystem" * tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy: jfs: fix metapage reference count leak in dbAllocCtl jfs: stop using write_cache_pages jfs: truncate good inode pages when hard link is 0 jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage() jfs: Regular file corruption check jfs: upper bound check of tree index in dbAllocAG
2025-07-31Merge tag 'for-linus-6.17-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Fixes for string handling in debugfs and sysfs: - change scnprintf to sysfs_emit in sysfs code. - change sprintf to scnprintf in debugfs code. - refactor debugfs mask-to-string code for readability and slightly improved functionality" * tag 'for-linus-6.17-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: fs/orangefs: Allow 2 more characters in do_c_string() fs: orangefs: replace scnprintf() with sysfs_emit() fs/orangefs: use snprintf() instead of sprintf()
2025-07-31Merge tag 'ubifs-for-linus-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: "UBIFS: - No longer use write_cache_pages() UBI: - Remove an unused function" * tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: stop using write_cache_pages mtd: ubi: Remove unused ubi_flush
2025-07-31Merge tag 'ext4_for_linus_6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major ext4 changes for 6.17: - Better scalability for ext4 block allocation - Fix insufficient credits when writing back large folios Miscellaneous bug fixes, especially when handling exteded attriutes, inline data, and fast commit" * tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (39 commits) ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr ext4: implement linear-like traversal across order xarrays ext4: refactor choose group to scan group ext4: convert free groups order lists to xarrays ext4: factor out ext4_mb_scan_group() ext4: factor out ext4_mb_might_prefetch() ext4: factor out __ext4_mb_scan_group() ext4: fix largest free orders lists corruption on mb_optimize_scan switch ext4: fix zombie groups in average fragment size lists ext4: merge freed extent with existing extents before insertion ext4: convert sbi->s_mb_free_pending to atomic_t ext4: fix typo in CR_GOAL_LEN_SLOW comment ext4: get rid of some obsolete EXT4_MB_HINT flags ext4: utilize multiple global goals to reduce contention ext4: remove unnecessary s_md_lock on update s_mb_last_group ext4: remove unnecessary s_mb_last_start ext4: separate stream goal hits from s_bal_goals for better tracking ext4: add ext4_try_lock_group() to skip busy groups ext4: initialize superblock fields in the kballoc-test.c kunit tests ext4: refactor the inline directory conversion and new directory codepaths ...
2025-07-31smb3 client: add way to show directory leases for improved debuggingSteve French
When looking at performance issues around directory caching, or debugging directory lease issues, it is helpful to be able to display the current directory leases (as we can e.g. or open files). Create pseudo-file /proc/fs/cifs/open_dirs that displays current directory leases. Here is sample output: cat /proc/fs/cifs/open_dirs Version:1 Format: <tree id> <sess id> <persistent fid> <path> Num entries: 3 0xce4c1c68 0x7176aa54 0xd95ef58e \dira valid file info, valid dirents 0xce4c1c68 0x7176aa54 0xd031e211 \dir5 valid file info, valid dirents 0xce4c1c68 0x7176aa54 0x96533a90 \dir1 valid file info Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-07-30f2fs: drop inode from the donation list when the last file is closedJaegeuk Kim
Let's drop the inode from the donation list when there is no other open file. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-30Merge tag 'bpf-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Remove usermode driver (UMD) framework (Thomas Weißschuh) - Introduce Strongly Connected Component (SCC) in the verifier to detect loops and refine register liveness (Eduard Zingerman) - Allow 'void *' cast using bpf_rdonly_cast() and corresponding '__arg_untrusted' for global function parameters (Eduard Zingerman) - Improve precision for BPF_ADD and BPF_SUB operations in the verifier (Harishankar Vishwanathan) - Teach the verifier that constant pointer to a map cannot be NULL (Ihor Solodrai) - Introduce BPF streams for error reporting of various conditions detected by BPF runtime (Kumar Kartikeya Dwivedi) - Teach the verifier to insert runtime speculation barrier (lfence on x86) to mitigate speculative execution instead of rejecting the programs (Luis Gerhorst) - Various improvements for 'veristat' (Mykyta Yatsenko) - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to improve bug detection by syzbot (Paul Chaignon) - Support BPF private stack on arm64 (Puranjay Mohan) - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's node (Song Liu) - Introduce kfuncs for read-only string opreations (Viktor Malik) - Implement show_fdinfo() for bpf_links (Tao Chen) - Reduce verifier's stack consumption (Yonghong Song) - Implement mprog API for cgroup-bpf programs (Yonghong Song) * tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits) selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite selftests/bpf: Add selftest for attaching tracing programs to functions in deny list bpf: Add log for attaching tracing programs to functions in deny list bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions bpf: Fix various typos in verifier.c comments bpf: Add third round of bounds deduction selftests/bpf: Test invariants on JSLT crossing sign selftests/bpf: Test cross-sign 64bits range refinement selftests/bpf: Update reg_bound range refinement logic bpf: Improve bounds when s64 crosses sign boundary bpf: Simplify bounds refinement from s32 selftests/bpf: Enable private stack tests for arm64 bpf, arm64: JIT support for private stack bpf: Move bpf_jit_get_prog_name() to core.c bpf, arm64: Fix fp initialization for exception boundary umd: Remove usermode driver framework bpf/preload: Don't select USERMODE_DRIVER selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure selftests/bpf: Increase xdp data size for arm64 64K page size ...
2025-07-30Merge tag 'net-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
2025-07-29Merge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
2025-07-29f2fs: add gc_boost_gc_greedy sysfs nodeDaeho Jeong
Add this to control GC algorithm for boost GC. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>