summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
2024-07-04xfs: AIL doesn't need manual pushingDave Chinner
We have a mechanism that checks the amount of log space remaining available every time we make a transaction reservation. If the amount of space is below a threshold (25% free) we push on the AIL to tell it to do more work. To do this, we end up calculating the LSN that the AIL needs to push to on every reservation and updating the push target for the AIL with that new target LSN. This is silly and expensive. The AIL is perfectly capable of calculating the push target itself, and it will always be running when the AIL contains objects. What the target does is determine if the AIL needs to do any work before it goes back to sleep. If we haven't run out of reservation space or memory (or some other push all trigger), it will simply go back to sleep for a while if there is more than 25% of the journal space free without doing anything. If there are items in the AIL at a lower LSN than the target, it will try to push up to the target or to the point of getting stuck before going back to sleep and trying again soon after.` Hence we can modify the AIL to calculate it's own 25% push target before it starts a push using the same reserve grant head based calculation as is currently used, and remove all the places where we ask the AIL to push to a new 25% free target. We can also drop the minimum free space size of 256BBs from the calculation because the 25% of a minimum sized log is *always going to be larger than 256BBs. This does still require a manual push in certain circumstances. These circumstances arise when the AIL is not full, but the reservation grants consume the entire of the free space in the log. In this case, we still need to push on the AIL to free up space, so when we hit this condition (i.e. reservation going to sleep to wait on log space) we do a single push to tell the AIL it should empty itself. This will keep the AIL moving as new reservations come in and want more space, rather than keep queuing them and having to push the AIL repeatedly. The reason for using the "push all" when grant space runs out is that we can run out of grant space when there is more than 25% of the log free. Small logs are notorious for this, and we have a hack in the log callback code (xlog_state_set_callback()) where we push the AIL because the *head* moved) to ensure that we kick the AIL when we consume space in it because that can push us over the "less than 25% available" available that starts tail pushing back up again. Hence when we run out of grant space and are going to sleep, we have to consider that the grant space may be consuming almost all the log space and there is almost nothing in the AIL. In this situation, the AIL pins the tail and moving the tail forwards is the only way the grant space will come available, so we have to force the AIL to push everything to guarantee grant space will eventually be returned. Hence triggering a "push all" just before sleeping removes all the nasty corner cases we have in other parts of the code that work around the "we didn't ask the AIL to push enough to free grant space" condition that leads to log space hangs... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-04xfs: move and rename xfs_trans_committed_bulkDave Chinner
Ever since the CIL and delayed logging was introduced, xfs_trans_committed_bulk() has been a purely CIL checkpoint completion function and not a transaction commit completion function. Now that we are adding log specific updates to this function, it really does not have anything to do with the transaction subsystem - it is really log and log item level functionality. This should be part of the CIL code as it is the callback that moves log items from the CIL checkpoint to the AIL. Move it and rename it to xlog_cil_ail_insert(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-04xfs: Avoid races with cnt_btree lastrec updatesZizhi Wo
A concurrent file creation and little writing could unexpectedly return -ENOSPC error since there is a race window that the allocator could get the wrong agf->agf_longest. Write file process steps: 1) Find the entry that best meets the conditions, then calculate the start address and length of the remaining part of the entry after allocation. 2) Delete this entry and update the -current- agf->agf_longest. 3) Insert the remaining unused parts of this entry based on the calculations in 1), and update the agf->agf_longest again if necessary. Create file process steps: 1) Check whether there are free inodes in the inode chunk. 2) If there is no free inode, check whether there has space for creating inode chunks, perform the no-lock judgment first. 3) If the judgment succeeds, the judgment is performed again with agf lock held. Otherwire, an error is returned directly. If the write process is in step 2) but not go to 3) yet, the create file process goes to 2) at this time, it may be mistaken for no space, resulting in the file system still has space but the file creation fails. We have sent two different commits to the community in order to fix this problem[1][2]. Unfortunately, both solutions have flaws. In [2], I discussed with Dave and Darrick, realized that a better solution to this problem requires the "last cnt record tracking" to be ripped out of the generic btree code. And surprisingly, Dave directly provided his fix code. This patch includes appropriate modifications based on his tmp-code to address this issue. The entire fix can be roughly divided into two parts: 1) Delete the code related to lastrec-update in the generic btree code. 2) Place the process of updating longest freespace with cntbt separately to the end of the cntbt modifications. Move the cursor to the rightmost firstly, and update the longest free extent based on the record. Note that we can not update the longest with xfs_alloc_get_rec() after find the longest record, as xfs_verify_agbno() may not pass because pag->block_count is updated on the outside. Therefore, use xfs_btree_get_rec() as a replacement. [1] https://lore.kernel.org/all/20240419061848.1032366-2-yebin10@huawei.com [2] https://lore.kernel.org/all/20240604071121.3981686-1-wozizhi@huawei.com Reported by: Ye Bin <yebin10@huawei.com> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-07-02xfs: move xfs_refcount_update_defer_add to xfs_refcount_item.cDarrick J. Wong
Move the code that adds the incore xfs_refcount_update_item deferred work data to a transaction live with the CUI log item code. This means that the refcount code no longer has to know about the inner workings of the CUI log items. As a consequence, we can get rid of the _{get,put}_group helpers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: simplify usage of the rcur local variable in xfs_refcount_finish_oneDarrick J. Wong
Only update rcur when we know the final *pcur value. Inspired-by: Christoph Hellwig <hch@lst.de> [djwong: don't leave the caller with a dangling ref] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: don't bother calling xfs_refcount_finish_one_cleanup in ↵Darrick J. Wong
xfs_refcount_finish_one In xfs_refcount_finish_one we know the cursor is non-zero when calling xfs_refcount_finish_one_cleanup and we pass a 0 error variable. This means xfs_refcount_finish_one_cleanup is just doing a xfs_btree_del_cursor. Open code that and move xfs_refcount_finish_one_cleanup to fs/xfs/xfs_refcount_item.c. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: reuse xfs_refcount_update_cancel_itemDarrick J. Wong
Reuse xfs_refcount_update_cancel_item to put the AG/RTG and free the item in a few places that currently open code the logic. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: add a ci_entry helperDarrick J. Wong
Add a helper to translate from the item list head to the refcount_intent_item structure and use it so shorten assignments and avoid the need for extra local variables. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: remove xfs_trans_set_refcount_flagsDarrick J. Wong
Remove this single-use helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: clean up refcount log intent item tracepoint callsitesDarrick J. Wong
Pass the incore refcount intent structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: pass btree cursors to refcount btree tracepointsDarrick J. Wong
Prepare the rest of refcount btree tracepoints for use with realtime reflink by making them take the btree cursor object as a parameter. This will save us a lot of trouble later on. Remove the xfs_refcount_recover_extent tracepoint since it's already covered by other refcount tracepoints. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: create specialized classes for refcount tracepointsDarrick J. Wong
The only user of the "ag" tracepoint event classes is the refcount btree, so rename them to make that obvious and make them take the btree cursor to simplify the arguments. This will save us a lot of trouble later on. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: give refcount btree cursor error tracepoints their own classDarrick J. Wong
Convert all the refcount tracepoints to use the btree error tracepoint class. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: move xfs_rmap_update_defer_add to xfs_rmap_item.cDarrick J. Wong
Move the code that adds the incore xfs_rmap_update_item deferred work data to a transaction to live with the RUI log item code. This means that the rmap code no longer has to know about the inner workings of the RUI log items. As a consequence, we can get rid of the _{get,put}_group helpers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: simplify usage of the rcur local variable in xfs_rmap_finish_oneChristoph Hellwig
Only update rcur when we know the final *pcur value. Signed-off-by: Christoph Hellwig <hch@lst.de> [djwong: don't leave the caller with a dangling ref] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: don't bother calling xfs_rmap_finish_one_cleanup in xfs_rmap_finish_oneChristoph Hellwig
In xfs_rmap_finish_one we known the cursor is non-zero when calling xfs_rmap_finish_one_cleanup and we pass a 0 error variable. This means xfs_rmap_finish_one_cleanup is just doing a xfs_btree_del_cursor. Open code that and move xfs_rmap_finish_one_cleanup to fs/xfs/xfs_rmap_item.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: minor porting changes] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: reuse xfs_rmap_update_cancel_itemChristoph Hellwig
Reuse xfs_rmap_update_cancel_item to put the AG/RTG and free the item in a few places that currently open code the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: add a ri_entry helperChristoph Hellwig
Add a helper to translate from the item list head to the rmap_intent_item structure and use it so shorten assignments and avoid the need for extra local variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: remove xfs_trans_set_rmap_flagsDarrick J. Wong
Remove this single-use helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: clean up rmap log intent item tracepoint callsitesDarrick J. Wong
Pass the incore rmap structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: pass btree cursors to rmap btree tracepointsDarrick J. Wong
Prepare the rmap btree tracepoints for use with realtime rmap btrees by making them take the btree cursor object as a parameter. This will save us a lot of trouble later on. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: give rmap btree cursor error tracepoints their own classDarrick J. Wong
Create a new tracepoint class for btree-related errors, then convert all the rmap tracepoints to use it. Also fix the one tracepoint that was abusing the old class by making it a separate tracepoint. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: move xfs_extent_free_defer_add to xfs_extfree_item.cDarrick J. Wong
Move the code that adds the incore xfs_extent_free_item deferred work data to a transaction to live with the EFI log item code. This means that the allocator code no longer has to know about the inner workings of the EFI log items. As a consequence, we can get rid of the _{get,put}_group helpers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: remove xfs_defer_agfl_blockChristoph Hellwig
xfs_free_extent_later can handle the extra AGFL special casing with very little extra logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: remove duplicate asserts in xfs_defer_extent_freeChristoph Hellwig
The bno/len verification is already done by the calls to xfs_verify_rtbext / xfs_verify_fsbext, and reporting a corruption error seem like the better handling than tripping an assert anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: factor out a xfs_efd_add_extent helperChristoph Hellwig
Factor out a helper to add an extent to and EFD instead of duplicating the logic in two places. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: reuse xfs_extent_free_cancel_itemChristoph Hellwig
Reuse xfs_extent_free_cancel_item to put the AG/RTG and free the item in a few places that currently open code the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: add a xefi_entry helperChristoph Hellwig
Add a helper to translate from the item list head to the xfs_extent_free_item structure and use it so shorten assignments and avoid the need for extra local variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: pass the fsbno to xfs_perag_intent_getChristoph Hellwig
All callers of xfs_perag_intent_get have a fsbno and need boilerplate code to turn that into an agno. Just pass the fsbno to xfs_perag_intent_get and look up the agno there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-07-02xfs: convert "skip_discard" to a proper flags bitsetDarrick J. Wong
Convert the boolean to skip discard on free into a proper flags field so that we can add more flags in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: clean up extent free log intent item tracepoint callsitesDarrick J. Wong
Pass the incore EFI structure to the tracepoints instead of open-coding the argument passing. This cleans up the call sites a bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsbDarrick J. Wong
Currently, the XFS_SB_CRC_OFF macro uses the incore superblock struct (xfs_sb) to compute the address of sb_crc within the ondisk superblock struct (xfs_dsb). This is a landmine if we ever change the layout of the incore superblock (as we're about to do), so redefine the macro to use xfs_dsb to compute the layout of xfs_dsb. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: get rid of trivial rename helpersDarrick J. Wong
Get rid of the largely pointless xfs_cross_rename and xfs_finish_rename now that we've refactored its parent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: move dirent update hooks to xfs_dir2.cDarrick J. Wong
Move the directory entry update hook code to xfs_dir2 so that it is mostly consolidated with the higher level directory functions. Retain the exports so that online fsck can still send notifications through the hooks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: create libxfs helper to rename two directory entriesDarrick J. Wong
Create a new libxfs function to rename two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: create libxfs helper to exchange two directory entriesDarrick J. Wong
Create a new libxfs function to exchange two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: create libxfs helper to remove an existing inode/name from a directoryDarrick J. Wong
Create a new libxfs function to remove a (name, inode) entry from a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: hoist inode free function to libxfsDarrick J. Wong
Create a libxfs helper function that marks an inode free on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: create libxfs helper to link an existing inode into a directoryDarrick J. Wong
Create a new libxfs function to link an existing inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: create libxfs helper to link a new inode into a directoryDarrick J. Wong
Create a new libxfs function to link a newly created inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: separate the icreate logic around INIT_XATTRSDarrick J. Wong
INIT_XATTRS is overloaded here -- it's set during the creat process when we think that we're immediately going to set some ACL xattrs to save time. However, it's also used by the parent pointers code to enable the attr fork in preparation to receive ppptr xattrs. This results in xfs_has_parent() branches scattered around the codebase to turn on INIT_XATTRS. Linkable files are created far more commonly than unlinkable temporary files or directory tree roots, so we should centralize this logic in xfs_inode_init. For the three callers that don't want parent pointers (online repiar tempfiles, unlinkable tempfiles, rootdir creation) we provide an UNLINKABLE flag to skip attr fork initialization. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: hoist xfs_{bump,drop}link to libxfsDarrick J. Wong
Move xfs_bumplink and xfs_droplink to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: hoist xfs_iunlink to libxfsDarrick J. Wong
Move xfs_iunlink and xfs_iunlink_remove to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: wrap inode creation dqalloc callsDarrick J. Wong
Create a helper that calls dqalloc to allocate and grab a reference to dquots for the user, group, and project ids listed in an icreate structure. This simplifies the creat-related dqalloc callsites scattered around the code base. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: push xfs_icreate_args creation out of xfs_create*Darrick J. Wong
Move the initialization of the xfs_icreate_args structure out of xfs_create and xfs_create_tempfile into their callers so that we can set the new inode's attributes in one place and pass that through instead of open coding the collection of attributes all over the code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: hoist new inode initialization functions to libxfsDarrick J. Wong
Move all the code that initializes a new inode's attributes from the icreate_args structure and the parent directory into libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: split new inode creation into two piecesDarrick J. Wong
There are two parts to initializing a newly allocated inode: setting up the incore structures, and initializing the new inode core based on the parent inode and the current user's environment. The initialization code is not specific to the kernel, so we would like to share that with userspace by hoisting it to libxfs. Therefore, split xfs_icreate into separate functions to prepare for the next few patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: use xfs_trans_ichgtime to set times when allocating inodeDarrick J. Wong
Use xfs_trans_ichgtime to set the inode times when allocating an inode, instead of open-coding them here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: implement atime updates in xfs_trans_ichgtimeDarrick J. Wong
Enable xfs_trans_ichgtime to change the inode access time so that we can use this function to set inode times when allocating inodes instead of open-coding it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-07-02xfs: pack icreate initialization parameters into a separate structureDarrick J. Wong
Callers that want to create an inode currently pass all possible file attribute values for the new inode into xfs_init_new_inode as ten separate parameters. This causes two code maintenance issues: first, we have large multi-line call sites which programmers must read carefully to make sure they did not accidentally invert a value. Second, all three file id parameters must be passed separately to the quota functions; any discrepancy results in quota count errors. Clean this up by creating a new icreate_args structure to hold all this information, some helpers to initialize them properly, and make the callers pass this structure through to the creation function, whose name we shorten to xfs_icreate. This eliminates the issues, enables us to keep the inode init code in sync with userspace via libxfs, and is needed for future metadata directory tree management. (A subsequent cleanup will also fix the quota alloc calls.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>