summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-04-23xfs: use helpers to extract xattr op from opflagsDarrick J. Wong
Create helper functions to extract the xattr op from the ondisk xattri log item and the incore attr intent item. These will get more use in the patches that follow. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: restructure xfs_attr_complete_op a bitDarrick J. Wong
Eliminate the local variable from this function so that we can streamline things a bit later when we add the PPTR_REPLACE op code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: check shortform attr entry flags specificallyDarrick J. Wong
While reviewing flag checking in the attr scrub functions, we noticed that the shortform attr scanner didn't catch entries that have the LOCAL or INCOMPLETE bits set. Neither of these flags can ever be set on a shortform attr, so we need to check this narrower set of valid flags. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: fix missing check for invalid attr flagsDarrick J. Wong
The xattr scrubber doesn't check for undefined flags in shortform attr entries. Therefore, define a mask XFS_ATTR_ONDISK_MASK that has all possible XFS_ATTR_* flags in it, and use that to check for unknown bits in xchk_xattr_actor. Refactor the check in the dabtree scanner function to use the new mask as well. The redundant checks need to be in place because the dabtree check examines the hash mappings and therefore needs to decode the attr leaf entries to compute the namehash. This happens before the walk of the xattr entries themselves. Fixes: ae0506eba78fd ("xfs: check used space of shortform xattr structures") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2Darrick J. Wong
Check that the number of recovered log iovecs is what is expected for the xattri opcode is expecting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are availableDarrick J. Wong
Per reviewer request, use an OPSTATE flag (+ helpers) to decide if logged xattrs are enabled, instead of querying the xfs_sb. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recoveryDarrick J. Wong
The XFS_SB_FEAT_INCOMPAT_LOG_XATTRS feature bit protects a filesystem from old kernels that do not know how to recover extended attribute log intent items. Make this check mandatory instead of a debugging assert. Fixes: fd920008784ea ("xfs: Set up infrastructure for log attribute replay") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: attr fork iext must be loaded before calling xfs_attr_is_leafDarrick J. Wong
Christoph noticed that the xfs_attr_is_leaf in xfs_attr_get_ilocked can access the incore extent tree of the attr fork, but nothing in the xfs_attr_get path guarantees that the incore tree is actually loaded. Most of the time it is, but seeing as xfs_attr_is_leaf ignores the return value of xfs_iext_get_extent I guess we've been making choices based on random stack contents and nobody's complained? Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: rearrange xfs_da_args a bit to use less spaceDarrick J. Wong
A few notes about struct xfs_da_args: The XFS_ATTR_* flags only go up as far as XFS_ATTR_INCOMPLETE, which means that attr_filter could be a u8 field. I've reduced the number of XFS_DA_OP_* flags down to the point where op_flags would also fit into a u8. filetype has 7 bytes of slack after it, which is wasteful. namelen will never be greater than MAXNAMELEN, which is 256. This field could be reduced to a short. Rearrange the fields in xfs_da_args to waste less space. This reduces the structure size from 136 bytes to 128. Later when we add extra fields to support parent pointer replacement, this will only bloat the structure to 144 bytes, instead of 168. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: make attr removal an explicit operationDarrick J. Wong
Parent pointers match attrs on name+value, unlike everything else which matches on only the name. Therefore, we cannot keep using the heuristic that !value means remove. Make this an explicit operation code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: remove xfs_da_args.attr_flagsDarrick J. Wong
This field only ever contains XATTR_{CREATE,REPLACE}, and it only goes as deep as xfs_attr_set. Remove the field from the structure and replace it with an enum specifying exactly what kind of change we want to make to the xattr structure. Upsert is the name that we'll give to the flags==0 operation, because we're either updating an existing value or inserting it, and the caller doesn't care. Note: The "UPSERTR" name created here is to make userspace porting easier. It will be removed in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: remove XFS_DA_OP_NOTIMEDarrick J. Wong
The only user of this flag sets it prior to an xfs_attr_get_ilocked call, which doesn't update anything. Get rid of the flag. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23xfs: remove XFS_DA_OP_REMOVEDarrick J. Wong
Nobody checks this flag, so get rid of it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-23udf: Use a folio in udf_write_end()Matthew Wilcox (Oracle)
Convert the page to a folio and use the folio APIs. Replaces three calls to compound_head() with one. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-8-willy@infradead.org>
2024-04-23udf: Convert udf_page_mkwrite() to use a folioMatthew Wilcox (Oracle)
Convert the vm_fault page to a folio, then use it throughout. Replaces five calls to compound_head() with one. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-7-willy@infradead.org>
2024-04-23udf: Convert udf_symlink_getattr() to use a folioMatthew Wilcox (Oracle)
We're getting this from the page cache, so it's definitely a folio. Saves a call to compound_head() hidden in put_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-6-willy@infradead.org>
2024-04-23udf: Convert udf_adinicb_readpage() to udf_adinicb_read_folio()Matthew Wilcox (Oracle)
Now that all three callers have a folio, convert this function to take a folio, and use the folio APIs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-5-willy@infradead.org>
2024-04-23udf: Convert udf_expand_file_adinicb() to use a folioMatthew Wilcox (Oracle)
Use the folio APIs throughout this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Fixes: 1eeceaec794e ("udf: Convert udf_expand_file_adinicb() to avoid kmap_atomic()") Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-4-willy@infradead.org>
2024-04-23udf: Convert udf_write_begin() to use a folioMatthew Wilcox (Oracle)
Use the folio APIs throughout instead of the deprecated page APIs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-3-willy@infradead.org>
2024-04-23udf: Convert udf_symlink_filler() to use a folioMatthew Wilcox (Oracle)
Remove the conversion to struct page and use folio APIs throughout. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-2-willy@infradead.org>
2024-04-23Merge 6.9-rc5 into driver-core-nextGreg Kroah-Hartman
We want the kernfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-23netfs: Fix writethrough-mode error handlingDavid Howells
Fix the error return in netfs_perform_write() acting in writethrough-mode to return any cached error in the case that netfs_end_writethrough() returns 0. This can affect the use of O_SYNC/O_DSYNC/RWF_SYNC/RWF_DSYNC in 9p and afs. Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/6736.1713343639@warthog.procyon.org.uk Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23ntfs3: add legacy ntfs file operationsChristian Brauner
To ensure that ioctl()s can't be used to circumvent write restrictions. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23ntfs3: enforce read-only when used as legacy ntfs driverChristian Brauner
Ensure that ntfs3 is mounted read-only when it is used to provide the legacy ntfs driver. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23fs/ntfs3: Mark volume as dirty if xattr is brokenKonstantin Komarov
Mark a volume as corrupted if the name length exceeds the space occupied by ea. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-04-23fs/ntfs3: Always make file nonresident on fallocate callKonstantin Komarov
xfstest 438 is starting to pass with this change. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-04-23fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inodeKonstantin Komarov
As Al Viro correctly pointed out, there is no need to return the whole structure to check the error. https://lore.kernel.org/ntfs3/20240322023515.GK538574@ZenIV/ Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-04-23fs/ntfs3: Use variable length array instead of fixed sizeKonstantin Komarov
Should fix smatch warning: ntfs_set_label() error: __builtin_memcpy() 'uni->name' too small (20 vs 256) Fixes: 4534a70b7056f ("fs/ntfs3: Add headers and misc files") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202401091421.3RJ24Mn3-lkp@intel.com/ Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-04-23fs/ntfs3: Use 64 bit variable to avoid 32 bit overflowKonstantin Komarov
For example, in the expression: vbo = 2 * vbo + skip Fixes: b46acd6a6a627 ("fs/ntfs3: Add NTFS journal") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-04-23fs/ntfs3: Check 'folio' pointer for NULLKonstantin Komarov
It can be NULL if bmap is called. Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-04-23fs/ntfs3: Missed le32_to_cpu conversionKonstantin Komarov
NTFS data structure fields are stored in little-endian, it is necessary to take this into account when working on big-endian architectures. Fixes: 1b7dd28e14c47("fs/ntfs3: Correct function is_rst_area_valid") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-04-23fs/ntfs3: Remove max link count info display during driver initKonstantin Komarov
Removes the output of this purely informational message from the kernel buffer: "ntfs3: Max link count 4000" Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: stable@vger.kernel.org
2024-04-23fs/ntfs3: Taking DOS names into account during link countingKonstantin Komarov
When counting and checking hard links in an ntfs file record, struct MFT_REC { struct NTFS_RECORD_HEADER rhdr; // 'FILE' __le16 seq; // 0x10: Sequence number for this record. >> __le16 hard_links; // 0x12: The number of hard links to record. __le16 attr_off; // 0x14: Offset to attributes. ... the ntfs3 driver ignored short names (DOS names), causing the link count to be reduced by 1 and messages to be output to dmesg. For Windows, such a situation is a minor error, meaning chkdsk does not report errors on such a volume, and in the case of using the /f switch, it silently corrects them, reporting that no errors were found. This does not affect the consistency of the file system. Nevertheless, the behavior in the ntfs3 driver is incorrect and changes the content of the file system. This patch should fix that. PS: most likely, there has been a confusion of concepts MFT_REC::hard_links and inode::__i_nlink. Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: stable@vger.kernel.org
2024-04-22Merge tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Five ksmbd server fixes, most also for stable: - rename fix - two fixes for potential out of bounds - fix for connections from MacOS (padding in close response) - fix for when to enable persistent handles" * tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: add continuous availability share parameter ksmbd: common: use struct_group_attr instead of struct_group for network_open_info ksmbd: clear RENAME_NOREPLACE before calling vfs_rename ksmbd: validate request buffer size in smb2_allocate_rsp_buf() ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
2024-04-22Merge tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Nothing too crazy in this one, and it looks like (fingers crossed) the recovery and repair issues are settling down - although there's going to be a long tail there, as we've still yet to really ramp up on error injection or syzbot. - fix a few more deadlocks in recovery - fix u32/u64 issues in mi_btree_bitmap - btree key cache shrinker now actually frees, with more instrumentation coming so we can verify that it's working correctly more easily in the future" * tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefs: bcachefs: If we run merges at a lower watermark, they must be nonblocking bcachefs: Fix inode early destruction path bcachefs: Fix deadlock in journal write path bcachefs: Tweak btree key cache shrinker so it actually frees bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong bcachefs: Fix missing call to bch2_fs_allocator_background_exit() bcachefs: Check for journal entries overruning end of sb clean section bcachefs: Fix bio alloc in check_extent_checksum() bcachefs: fix leak in bch2_gc_write_reflink_key bcachefs: KEY_TYPE_error is allowed for reflink bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift bcachefs: make sure to release last journal pin in replay bcachefs: node scan: ignore multiple nodes with same seq if interior bcachefs: Fix format specifier in validate_bset_keys() bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE
2024-04-22cifs: reinstate original behavior again for forceuid/forcegidTakayuki Nagata
forceuid/forcegid should be enabled by default when uid=/gid= options are specified, but commit 24e0a1eff9e2 ("cifs: switch to new mount api") changed the behavior. Due to the change, a mounted share does not show intentional uid/gid for files and directories even though uid=/gid= options are specified since forceuid/forcegid are not enabled. This patch reinstates original behavior that overrides uid/gid with specified uid/gid by the options. Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Signed-off-by: Takayuki Nagata <tnagata@redhat.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-22 fs/9p: mitigate inode collisionsEric Van Hensbergen
Detect and mitigate inode collsions that now occur since we fixed 9p generating duplicate inode structures. Underlying cause of these appears to be a race condition between reuse of inode numbers in underlying file system and cleanup of inode numbers in the client. Enabling caching makes this much more likely to happen as it increases cleanup latency due to writebacks. Reported-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-04-22fuse: verify zero padding in fuse_backing_mapAmir Goldstein
To allow us extending the interface in the future. Fixes: 44350256ab94 ("fuse: implement ioctls to manage backing files") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-04-22xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1)Christoph Hellwig
Commit aff3a9edb708 ("xfs: Use preallocation for inodes with extsz hints") disabled delayed allocation for all inodes with extent size hints due a data exposure problem. It turns out we fixed this data exposure problem since by always creating unwritten extents for delalloc conversions due to more data exposure problems, but the writeback path doesn't actually support extent size hints when converting delalloc these days, which probably isn't a problem given that people using the hints know what they get. However due to the way how xfs_get_extsz_hint is implemented, it always claims an extent size hint for RT inodes even if the RT extent size is a single FSB. Due to that the above commit effectively disabled delalloc support for RT inodes. Switch xfs_get_extsz_hint to return 0 for this case and work around that in a few places to reinstate delalloc support for RT inodes on file systems with an sb_rextsize of 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: stop the steal (of data blocks for RT indirect blocks)Christoph Hellwig
When xfs_bmap_del_extent_delay has to split an indirect block it tries to steal blocks from the the part that gets unmapped to increase the indirect block reservation that now needs to cover for two extents instead of one. This works perfectly fine on the data device, where the data and indirect blocks come from the same pool. It has no chance of working when the inode sits on the RT device. To support re-enabling delalloc for inodes on the RT device, make this behavior conditional on not being for rt extents. Note that split of delalloc extents should only happen on writeback failure, as for other kinds of hole punching we first write back all data and thus convert the delalloc reservations covering the hole to a real allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: rework splitting of indirect block reservationsChristoph Hellwig
Move the check if we have enough indirect blocks and the stealing of the deleted extent blocks out of xfs_bmap_split_indlen and into the caller to prepare for handling delayed allocation of RT extents that can't easily be stolen. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocationsChristoph Hellwig
Add a check for files on the RT subvolume and use m_frextents instead of m_fdblocks to adjust the preallocation size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: support RT inodes in xfs_mod_delallocChristoph Hellwig
To prepare for re-enabling delalloc on RT devices, track the data blocks (which use the RT device when the inode sits on it) and the indirect blocks (which don't) separately to xfs_mod_delalloc, and add a new percpu counter to also track the RT delalloc blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delayChristoph Hellwig
The code to account fdblocks and frextents in xfs_bmap_del_extent_delay is a bit weird in that it accounts frextents before the iext tree manipulations and fdblocks after it. Given that the iext tree manipulations cannot fail currently that's not really a problem, but still odd. Move the frextent manipulation to the end, and use a fdblocks variable to account of the unconditional indirect blocks and the data blocks only freed for !RT. This prepares for following updates in the area and already makes the code more readable. Also remove the !isrt assert given that this code clearly handles rt extents correctly, and we'll soon reinstate delalloc support for RT inodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: reinstate RT support in xfs_bmapi_reserve_delallocChristoph Hellwig
Allocate data blocks for RT inodes using xfs_dec_frextents. While at it optimize the data device case by doing only a single xfs_dec_fdblocks call for the extent itself and the indirect blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: split xfs_mod_freecounterChristoph Hellwig
xfs_mod_freecounter has two entirely separate code paths for adding or subtracting from the free counters. Only the subtract case looks at the rsvd flag and can return an error. Split xfs_mod_freecounter into separate helpers for subtracting or adding the freecounter, and remove all the impossible to reach error handling for the addition case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positiveChristoph Hellwig
And to make that more clear, rearrange the code a bit and add asserts and a comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: move RT inode locking out of __xfs_bunmapiChristoph Hellwig
__xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary inodes given that it is very high level code. While this only looks ugly right now, it will become a problem when supporting delayed allocations for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents and thus never unlock the rt inodes. Move the locking into xfs_bmap_del_extent_real just before the call to xfs_rtfree_blocks instead and use a new flag in the transaction to ensure that the locking happens only once. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: free RT extents after updating the bmap btreeChristoph Hellwig
Currently xfs_bmap_del_extent_real frees RT extents before updating the bmap btree, while it frees regular blocks after performing the bmap btree update for convoluted historic reasons. Switch to free the RT blocks in the same place as the regular data blocks instead to simply the code and fix a very theoretical bug. A short history of this code researched by Dave Chiner below: The truncate for data device extents was originally a two-phase operation. First it removed the bmapbt record, but because this can free BMBT extents, it can use up all the free space tree reservation space. So the transaction gets rolled to commit the BMBT change and the xfs_bmap_finish() call that frees the data extent runs with a new transaction reservation that allows different free space btrees to be logged without overrun. However, on crash, this could lose the free space because there was nothing to tell recovery about the extents removed from the BMBT, hence EFIs were introduced. They tie the extent free operation to the bmapbt record removal commit for recovery of the second phase of the extent removal process. Then RT extents came along. RT extent freeing does not require a free space btree reservation because the free space metadata is static and transaction size is bound. Hence we don't need to care if the BMBT record removal modifies the per-ag free space trees and we don't need a two-phase extent remove transaction. The only thing we have to care about is not losing space on crash. Hence instead of recording the extent for freeing in the bmap list for xfs_bmap_finish() to process in a new transaction, it simply freed the rtextent directly. So the original code (from 1994) simply replaced the "free AG extent later" queueing with a direct free. This code was originally at the start of xfs_dmap_del_extent(), but the xfs_bmap_add_free() got moved to the end of the function via the "do_fx" flag (the current code logic) in 1997 (commit c4fac74eaa58 in the historic xfs-import tree) because there was a shutdown occurring because of a case where splitting the extent record failed because the BMBT split and the filesystem didn't have enough space for the split to be done. (FWIW, I'm not sure this can happen anymore.) The commit backed out the BMBT change on ENOSPC error, and in doing so I think this actually breaks RT free space tracking. However, it then returns an ENOSPC error, and we have a dirty transaction in the RT case so this will shut down the filesysetm when the transaction is cancelled. Hence the corrupted "bmbt now points at freed rt dev space" condition never make it to disk, but it's still the wrong way to handle the issue. IOWs, this proposed change fixes that "shutdown at ENOSPC on rt devices" situation that was introduced by the above commit back in 1997. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: refactor realtime inode lockingChristoph Hellwig
Create helper functions to deal with locking realtime metadata inodes. This enables us to maintain correct locking order once we start adding the realtime rmap and refcount btree inodes. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>