summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-03-11xfs: trigger zone GC when out of available rt blocksHans Holmberg
We periodically check the available rt blocks when filling up zones and start GC if needed, but we may run completely out in between filling zones, so start GC(unless already running) if we can't reserve writable space. This should only happen as a corner case in setups with very few backing zones. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-11Revert "f2fs: rebuild nat_bits during umount"Chao Yu
This reverts commit 94c821fb286b545d37549ff30a0c341e066f0d6c. It reports that there is potential corruption in node footer, the most suspious feature is nat_bits, let's revert recovery related code. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11f2fs: fix to avoid accessing uninitialized cursegChao Yu
syzbot reports a f2fs bug as below: F2FS-fs (loop3): Stopped filesystem due to reason: 7 kworker/u8:7: attempt to access beyond end of device BUG: unable to handle page fault for address: ffffed1604ea3dfa RIP: 0010:get_ckpt_valid_blocks fs/f2fs/segment.h:361 [inline] RIP: 0010:has_curseg_enough_space fs/f2fs/segment.h:570 [inline] RIP: 0010:__get_secs_required fs/f2fs/segment.h:620 [inline] RIP: 0010:has_not_enough_free_secs fs/f2fs/segment.h:633 [inline] RIP: 0010:has_enough_free_secs+0x575/0x1660 fs/f2fs/segment.h:649 <TASK> f2fs_is_checkpoint_ready fs/f2fs/segment.h:671 [inline] f2fs_write_inode+0x425/0x540 fs/f2fs/inode.c:791 write_inode fs/fs-writeback.c:1525 [inline] __writeback_single_inode+0x708/0x10d0 fs/fs-writeback.c:1745 writeback_sb_inodes+0x820/0x1360 fs/fs-writeback.c:1976 wb_writeback+0x413/0xb80 fs/fs-writeback.c:2156 wb_do_writeback fs/fs-writeback.c:2303 [inline] wb_workfn+0x410/0x1080 fs/fs-writeback.c:2343 process_one_work kernel/workqueue.c:3236 [inline] process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317 worker_thread+0x870/0xd30 kernel/workqueue.c:3398 kthread+0x7a9/0x920 kernel/kthread.c:464 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Commit 8b10d3653735 ("f2fs: introduce FAULT_NO_SEGMENT") allows to trigger no free segment fault in allocator, then it will update curseg->segno to NULL_SEGNO, though, CP_ERROR_FLAG has been set, f2fs_write_inode() missed to check the flag, and access invalid curseg->segno directly in below call path, then resulting in panic: - f2fs_write_inode - f2fs_is_checkpoint_ready - has_enough_free_secs - has_not_enough_free_secs - __get_secs_required - has_curseg_enough_space - get_ckpt_valid_blocks : access invalid curseg->segno To avoid this issue, let's: - check CP_ERROR_FLAG flag in prior to f2fs_is_checkpoint_ready() in f2fs_write_inode(). - in has_curseg_enough_space(), save curseg->segno into a temp variable, and verify its validation before use. Fixes: 8b10d3653735 ("f2fs: introduce FAULT_NO_SEGMENT") Reported-by: syzbot+b6b347b7a4ea1b2e29b6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67973c2b.050a0220.11b1bb.0089.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11f2fs: introduce FAULT_INCONSISTENT_FOOTERChao Yu
To simulate inconsistent node footer error. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11f2fs: do sanity check on xattr node footer in f2fs_get_xnode_page()Chao Yu
This patch introduces a new wrapper f2fs_get_xnode_page(), then, caller can use it to load xattr block to page cache, meanwhile it will do sanity check on xattr node footer. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11f2fs: do sanity check on inode footer in f2fs_get_inode_page()Chao Yu
This patch introduces a new wrapper f2fs_get_inode_page(), then, caller can use it to load inode block to page cache, meanwhile it will do sanity check on inode footer. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-10ksmbd: prevent connection release during oplock break notificationNamjae Jeon
ksmbd_work could be freed when after connection release. Increment r_count of ksmbd_conn to indicate that requests are not finished yet and to not release the connection. Cc: stable@vger.kernel.org Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-03-10ksmbd: fix use-after-free in ksmbd_free_work_structNamjae Jeon
->interim_entry of ksmbd_work could be deleted after oplock is freed. We don't need to manage it with linked list. The interim request could be immediately sent whenever a oplock break wait is needed. Cc: stable@vger.kernel.org Reported-by: Norbert Szetei <norbert@doyensec.com> Tested-by: Norbert Szetei <norbert@doyensec.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-03-10gfs2: Convert gfs2_meta_read_endio() to use a folioMatthew Wilcox (Oracle)
Switch from bio_for_each_segment_all() to bio_for_each_folio_all() which removes a call to page_buffers(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Convert gfs2_end_log_write_bh() to work on a folioMatthew Wilcox (Oracle)
gfs2_end_log_write() has to handle bios which consist of both pages which belong to folios and pages which were allocated from a mempool and do not belong to a folio. It would be cleaner to have separate endio handlers which handle each type, but it's not clear to me whether that's even possible. This patch is slightly forward-looking in that page_folio() cannot currently return NULL, but it will return NULL in the future for pages which do not belong to a folio. This was the last user of page_has_buffers(), so remove it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Convert gfs2_find_jhead() to use a folioMatthew Wilcox (Oracle)
Remove a call to grab_cache_page() by using a folio throughout this function. [agruenba@redhat.com: Adjust to return value difference between bio_add_page() and bio_add_folio().] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search()Matthew Wilcox (Oracle)
Pass in the folio instead of the page. Add an assert that this is not a large folio as we'd need a more complex solution if we wanted to kmap() each page out of a large folio. Removes a use of folio->page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [agruenba@redhat.com: Rename gfs2_jhead_folio_srch() to gfs2_jhead_folio_search().] Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_check_magic()Matthew Wilcox (Oracle)
We are preparing to remove bh->b_page. Use kmap_local_folio() instead of kmap_local_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_submit_bhs()Matthew Wilcox (Oracle)
Remove a reference to bh->b_page which is going to be removed soon. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_trans_add_meta()Matthew Wilcox (Oracle)
The lock bit is maintained on the folio, not on the page. Saves two calls to compound_head() as well as removing two references to bh->b_page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_log_write_bh()Matthew Wilcox (Oracle)
We are preparing to remove bh->b_page. gfs2_log_write() should continue to operate on pages as some of the memory being logged does not come from folios, so convert from folio to page in this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: skip if we cannot defer deleteAndreas Gruenbacher
In gfs2_evict_inode(), in the unlikely case that we cannot defer deleting the inode, it is not safe to fall back to deleting the inode; the only valid choice we have is to skip the delete. In addition, in evict_should_delete(), if we cannot lock the inode glock exclusively, we are in a bad enough state that skipping the delete is likely a better choice than trying to recover from the failure later. Fixes: c5b7a2400edc ("gfs2: Only defer deletes when we have an iopen glock") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: remove redundant warningsAndreas Gruenbacher
In glock_set_object() and glock_clear_object(), there is no need to print the glock type and number when we dump the entire glock, anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: minor evict fixAndreas Gruenbacher
In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we detach the iopen glock from the inode without calling glock_clear_object(). This leads to a warning in glock_set_object() when the same inode is recreated and the glock is reused. Fix that by only detaching the iopen glock in gfs2_evict_inode(). In addition, remove the dequeue code from evict_should_delete(); we already perform a conditional dequeue in gfs2_evict_inode(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Prevent inode creation race (2)Andreas Gruenbacher
In gfs2_try_evict(), we try grabbing the inode to evict, we try to evict it, and then we try grabbing it again to see if it still exists. There is no guarantee that we will end up with the same inode both times; the inode validity check that commit ffd1cf0443a2 ("gfs2: Prevent inode creation race") added to the first grab is actually needed both times. (To avoid code duplication, add a grab_existing_inode() helper.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Fix additional unlikely request cancelation raceAndreas Gruenbacher
In gfs2_glock_dq(), we must drop the glock spin lock before calling ->lm_cancel, but this means that in the meantime, the operation we are trying to cancel could complete. If the operation completes unsuccessfully, another holder can end up at the head of the queue and another ->lm_lock operation can get started. In this case, we would end up canceling that second operation by accident. To prevent that, introduce a new GLF_CANCELING flag. Set that flag in gfs2_glock_dq() when trying to cancel an operation. When seeing that flag, finish_xmote() will then keep the GLF_LOCK flag set to prevent other glock operations from taking place. gfs2_glock_dq() then completes the cancelation attempt by clearing GLF_LOCK and GLF_CANCELING. In addition, add a missing GLF_DEMOTE_IN_PROGRESS check in gfs2_glock_dq() to make sure that we won't accidentally cancel a demote request. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Fix request cancelation bugAndreas Gruenbacher
In finish_xmote(), when a locking request is canceled, the corresponding holder is moved to the tail of the holders list instead of being dequeued immediately. When there is only a single holder, the canceled locking request is then immediately repeated. This makes no sense; it looks like another remnant of LM_FLAG_PRIORITY support. Instead, dequeue canceled holders and proceed with the next holder in finish_xmote(). We can then easily detect in gfs2_glock_dq() when a holder has been canceled. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Check for empty queue in run_queueAndreas Gruenbacher
In run_queue(), check if the queue of pending requests is empty instead of blindly assuming that it won't be. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Remove more dead code in add_to_queueAndreas Gruenbacher
Remove some more dead code in add_to_queue() that commit 0b93bac2271e ("gfs2: Remove LM_FLAG_PRIORITY flag") has rendered obsolete. This is a continuation of commit 3302764610057 ("gfs2: remove dead code in add_to_queue"); no functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETEAndreas Gruenbacher
Having this flag attached to the iopen glock instead of the inode is much simpler; it eliminates a protential weird race in gfs2_try_evict(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: glock holder GL_NOPID fixAndreas Gruenbacher
Glocks are always actively acquired by processes, but as indicated by the GL_NOPID holder flag, some of them are then associated with objects like cached inodes rather than the process that acquired them. As such, for those glock holders, it makes little sense to dump which processes originally acquired them. Therefore, gfs2 is trying to hide the identity of the processes that acquired those glocks. The code for doing that is incorrect though, so fix it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Add GLF_PENDING_REPLY flagAndreas Gruenbacher
Introduce a new GLF_PENDING_REPLY flag to indicate that a reply from DLM is expected. Include that flag in glock dumps to show more clearly what's going on. (When the GLF_PENDING_REPLY flag is set, the GLF_LOCK flag will also be set but the GLF_LOCK flag alone isn't sufficient to tell that we are waiting for a DLM reply.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Decode missing glock flags in tracepointsAndreas Gruenbacher
Add a number of glock flags are currently not shown in the text form of glock tracepoints. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10Merge 6.14-rc6 into driver-core-nextGreg Kroah-Hartman
We need the driver core fix in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-10xfs: trace what memory backs a bufferChristoph Hellwig
Add three trace points for the different backing memory allocators for buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: cleanup mapping tmpfs folios into the buffer cacheChristoph Hellwig
Directly assign b_addr based on the tmpfs folios without a detour through pages, reuse the folio_put path used for non-tmpfs buffers and replace all references to pages in comments with folios. Partially based on a patch from Dave Chinner <dchinner@redhat.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: use vmalloc instead of vm_map_area for buffer backing memoryChristoph Hellwig
The fallback buffer allocation path currently open codes a suboptimal version of vmalloc to allocate pages that are then mapped into vmalloc space. Switch to using vmalloc instead, which uses all the optimizations in the common vmalloc code, and removes the need to track the backing pages in the xfs_buf structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: buffer items don't straddle pages anymoreDave Chinner
Unmapped buffers don't exist anymore, so the page straddling detection and slow path code can go away now. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: kill XBF_UNMAPPEDChristoph Hellwig
Unmapped buffer access is a pain, so kill it. The switch to large folios means we rarely pay a vmap penalty for large buffers, so this functionality is largely unnecessary now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: convert buffer cache to use high order foliosChristoph Hellwig
Now that we have the buffer cache using the folio API, we can extend the use of folios to allocate high order folios for multi-page buffers rather than an array of single pages that are then vmapped into a contiguous range. This creates a new type of single folio buffers that can have arbitrary order in addition to the existing multi-folio buffers made up of many single page folios that get vmapped. The single folio is for now stashed into the existing b_pages array, but that will go away entirely later in the series and remove the temporary page vs folio typing issues that only work because the two structures currently can be used largely interchangeable. The code that allocates buffers will optimistically attempt a high order folio allocation as a fast path if the buffer size is a power of two and thus fits into a folio. If this high order allocation fails, then we fall back to the existing multi-folio allocation code. This now forms the slow allocation path, and hopefully will be largely unused in normal conditions except for buffers with size that are not a power of two like larger remote xattrs. This should improve performance of large buffer operations (e.g. large directory block sizes) as we should now mostly avoid the expense of vmapping large buffers (and the vmap lock contention that can occur) as well as avoid the runtime pressure that frequently accessing kernel vmapped pages put on the TLBs. Based on a patch from Dave Chinner <dchinner@redhat.com>, but mutilated beyond recognition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: remove the kmalloc to page allocator fallbackChristoph Hellwig
Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)", kmalloc and friends guarantee that power of two sized allocations are naturally aligned. Limit our use of kmalloc for buffers to these power of two sizes and remove the fallback to the page allocator for this case, but keep a check in addition to trusting the slab allocator to get the alignment right. Also refactor the kmalloc path to reuse various calculations for the size and gfp flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: refactor backing memory allocations for buffersChristoph Hellwig
Lift handling of shmem and slab backed buffers into xfs_buf_alloc_pages and rename the result to xfs_buf_alloc_backing_mem. This shares more code and ensures uncached buffers can also use slab, which slightly reduces the memory usage of growfs on 512 byte sector size file systems, but more importantly means the allocation invariants are the same for cached and uncached buffers. Document these new invariants with a big fat comment mostly stolen from a patch by Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: remove xfs_buf_is_vmappedChristoph Hellwig
No need to look at the page count if we can simply call is_vmalloc_addr on bp->b_addr. This prepares for eventualy removing the b_page_count field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: remove xfs_buf.b_offsetChristoph Hellwig
b_offset is only set for slab backed buffers and always set to offset_in_page(bp->b_addr), which can be done just as easily in the only user of b_offset. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: add a fast path to xfs_buf_zero when b_addr is setChristoph Hellwig
No need to walk the page list if bp->b_addr is valid. That also means b_offset doesn't need to be taken into account in the unmapped loop as b_offset is only set for kmem backed buffers which are always mapped. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10xfs: unmapped buffer item size straddling mismatchDave Chinner
We never log large contiguous regions of unmapped buffers, so this bug is never triggered by the current code. However, the slowpath for formatting buffer straddling regions is broken. That is, the size and shape of the log vector calculated across a straddle does not match how the formatting code formats a straddle. This results in a log vector with an uninitialised iovec and this causes a crash when xlog_write_full() goes to copy the iovec into the journal. Whilst touching this code, don't bother checking mapped or single folio buffers for discontiguous regions because they don't have them. This significantly reduces the overhead of this check when logging large buffers as calling xfs_buf_offset() is not free and it occurs a *lot* in those cases. Fixes: 929f8b0deb83 ("xfs: optimise xfs_buf_item_size/format for contiguous regions") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-03-10sysctl: Fixes nsm_local_state boundsNicolas Bouchinet
Bound nsm_local_state sysctl writings between SYSCTL_ZERO and SYSCTL_INT_MAX. The proc_handler has thus been updated to proc_dointvec_minmax. Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr> [ cel: updated to handle zero - UINT_MAX instead ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: use a long for the count in nfsd4_state_shrinker_count()Jeff Layton
If there are no courtesy clients then the return value from the atomic_long_read() could overflow an int. Use a long to store the value instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: remove obsolete comment from nfs4_alloc_stidJeff Layton
idr_alloc_cyclic() is what guarantees that now, not this long-gone trick. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()Jeff Layton
This isn't needed. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: reorganize struct nfs4_delegation for better packingJeff Layton
Move dl_type field above dl_time, which shaves 8 bytes off this struct. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: handle errors from rpc_call_async()Jeff Layton
It's possible for rpc_call_async() to fail (mainly due to memory allocation failure). If it does, there isn't much recourse other than to requeue the callback and try again later. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: move cb_need_restart flag into cb_flagsJeff Layton
Since there is now a cb_flags word, use a new NFSD4_CALLBACK_REQUEUE flag in that instead of a separate boolean. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNINGJeff Layton
These flags serve essentially the same purpose and get set and cleared at the same time. Drop CB_GETATTR_BUSY and just use NFSD4_CALLBACK_RUNNING instead. For this to work, we must use clear_and_wake_up_bit(), but doing that on for other types of callbacks is wasteful. Declare a new NFSD4_CALLBACK_WAKE flag in cb_flags to indicate that wake_up is needed, and only set that for CB_GETATTRs. Also, make the wait use a TASK_UNINTERRUPTIBLE sleep. This is done in the context of an nfsd thread, and it should never need to deal with signals. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANYJeff Layton
deleg_reaper() will walk the client_lru list and put any suitable entries onto "cblist" using the cl_ra_cblist pointer. It then walks the objects outside the spinlock and queues callbacks for them. None of the operations that deleg_reaper() does outside the nn->client_lock are blocking operations. Just queue their workqueue jobs under the nn->client_lock instead. Also, the NFSD4_CLIENT_CB_RECALL_ANY and NFSD4_CALLBACK_RUNNING flags serve an identical purpose now. Drop the NFSD4_CLIENT_CB_RECALL_ANY flag and just use the one in the callback. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>