summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_trans.c
AgeCommit message (Collapse)Author
2018-09-29xfs: avoid lockdep false positives in xfs_trans_allocDave Chinner
We've had a few reports of lockdep tripping over memory reclaim context vs filesystem freeze "deadlocks". They all have looked to be false positives on analysis, but it seems that they are being tripped because we take freeze references before we run a GFP_KERNEL allocation for the struct xfs_trans. We can avoid this false positive vector just by re-ordering the operations in xfs_trans_alloc(). That is. we need allocate the structure before we take the freeze reference and enter the GFP_NOFS allocation context that follows the xfs_trans around. This prevents lockdep from seeing the GFP_KERNEL allocation inside the transaction context, and that prevents it from triggering the freeze level vs alloc context vs reclaim warnings. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-08-02xfs: fold dfops into the transactionBrian Foster
struct xfs_defer_ops has now been reduced to a single list_head. The external dfops mechanism is unused and thus everywhere a (permanent) transaction is accessible the associated dfops structure is as well. Remove the xfs_defer_ops structure and fold the list_head into the transaction. Also remove the last remnant of external dfops in xfs_trans_dup(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: replace xfs_defer_ops ->dop_pending with on-stack listBrian Foster
The xfs_defer_ops ->dop_pending list is used to track active deferred operations once intents are logged. These items must be aborted in the event of an error. The list is populated as intents are logged and items are removed as they complete (or are aborted). Now that xfs_defer_finish() cancels on error, there is no need to ever access ->dop_pending outside of xfs_defer_finish(). The list is only ever populated after xfs_defer_finish() begins and is either completed or cancelled before it returns. Remove ->dop_pending from xfs_defer_ops and replace it with a local list in the xfs_defer_finish() path. Pass the local list to the various helpers now that it is not accessible via dfops. Note that we have to check for NULL in the abort case as the final tx roll occurs outside of the scope of the new local list (once the dfops has completed and thus drained the list). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: cancel dfops on xfs_defer_finish() errorBrian Foster
The current semantics of xfs_defer_finish() require the caller to call xfs_defer_cancel() on error. This is slightly inconsistent with transaction commit error handling where a failed commit cleans up the transaction before returning. More significantly, the only requirement for exposure of ->dop_pending outside of xfs_defer_finish() is so that xfs_defer_cancel() can drain it on error. Since the only recourse of xfs_defer_finish() errors is cancellation, mirror the transaction logic and cancel remaining dfops before returning from xfs_defer_finish() with an error. Beside simplifying xfs_defer_finish() semantics, this ensures that xfs_defer_finish() always returns with an empty ->dop_pending and thus facilitates removal of the list from xfs_defer_ops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: clean out superfluous dfops dop params/varsBrian Foster
The dfops code still passes around the xfs_defer_ops pointer superfluously in a few places. Clean this up wherever the transaction will suffice. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: pass transaction to dfops reset/move helpersBrian Foster
All callers pass ->t_dfops of the associated transactions. Refactor the helpers to receive the transactions and facilitate further cleanups between xfs_defer_ops and xfs_trans. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: remove unused __xfs_defer_cancel() internal helperBrian Foster
With no more external dfops users, there is no need for an xfs_defer_ops cancel wrapper. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02xfs: refactor internal dfops initializationBrian Foster
The current transaction allocation code conditionally initializes the ->t_dfops indirection pointer. Transaction commit/cancel check the validity of the pointer to determine whether to finish/cancel the internal dfops. This disallows the ability to use the internal dfops list as a temporary container (via xfs_trans_alloc_empty()). Refactor transaction allocation to always initialize ->t_dfops and check permanent reservation state on transaction commit/cancel. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-26xfs: bypass final dfops roll in trans commit pathBrian Foster
Once xfs_defer_finish() has completed all deferred operations, it checks the dirty state of the transaction and rolls it once more to return a clean transaction for the caller. This primarily to cover the case where repeated xfs_defer_finish() calls are made in a loop and we need to make sure that the caller starts the next iteration with a clean transaction. Otherwise we risk transaction reservation overrun. This final transaction roll is not required in the transaction commit path, however, because the transaction is immediately committed and freed after dfops completion. Refactor the final roll into a separate helper such that we can avoid it in the transaction commit path. Lift the dfops reset as well so dfops remains valid until after the last call to xfs_defer_trans_roll(). The reset is also unnecessary in the transaction commit path because the transaction is about to complete. This eliminates unnecessary regrants of transactions where the associated transaction roll can be replaced by a transaction commit. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-26xfs: drop unnecessary xfs_defer_finish() dfops parameterBrian Foster
Every caller of xfs_defer_finish() now passes the transaction and its associated ->t_dfops. The xfs_defer_ops parameter is therefore no longer necessary and can be removed. Since most xfs_defer_finish() callers also have to consider xfs_defer_cancel() on error, update the latter to also receive the transaction for consistency. The log recovery code contains an outlier case that cancels a dfops directly without an available transaction. Retain an internal wrapper to support this outlier case for the time being. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-26xfs: support embedded dfops in transactionBrian Foster
The dfops structure used by multi-transaction operations is typically stored on the stack and carried around by the associated transaction. The lifecycle of dfops does not quite match that of the transaction, but they are tightly related in that the former depends on the latter. The relationship of these objects is tight enough that we can avoid the cumbersome boilerplate code required in most cases to manage them separately by just embedding an xfs_defer_ops in the transaction itself. This means that a transaction allocation returns with an initialized dfops, a transaction commit finishes pending deferred items before the tx commit, a transaction cancel cancels the dfops before the transaction and a transaction dup operation transfers the current dfops state to the new transaction. The dup operation is slightly complicated by the fact that we can no longer just copy a dfops pointer from the old transaction to the new transaction. This is solved through a dfops move helper that transfers the pending items and other dfops state across the transactions. This also requires that transaction rolling code always refer to the transaction for the current dfops reference. Finally, to facilitate incremental conversion to the internal dfops and continue to support the current external dfops mode of operation, create the new ->t_dfops_internal field with a layer of indirection. On allocation, ->t_dfops points to the internal dfops. This state is overridden by callers who re-init a local dfops on the transaction. Once ->t_dfops is overridden, the external dfops reference is maintained as the transaction rolls. This patch adds the fundamental ability to support an internal dfops. All codepaths that perform deferred processing continue to override the internal dfops until they are converted over in subsequent patches. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: add firstblock field to xfs_transBrian Foster
A firstblock var is typically allocated and initialized along with xfs_defer_ops structures and passed around independent from the associated transaction. To facilitate combining the two, add an optional ->t_firstblock field to xfs_trans that can be used in place of an on-stack variable. The firstblock value follows the lifetime of the transaction, so initialize it on allocation and when a transaction rolls. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: rename xfs_trans ->t_agfl_dfops to ->t_dfopsBrian Foster
The ->t_agfl_dfops field is currently used to defer agfl block frees from associated transaction contexts. While all known problematic contexts have already been updated to use ->t_agfl_dfops, the broader goal is defer agfl frees from all callers that already use a deferred operations structure. Further, the transaction field facilitates a good amount of code clean up where the transaction and dfops have historically been passed down through the stack separately. Rename the field to something more generic to prepare to use it as such throughout XFS. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-24xfs: allow empty transactions while frozenDarrick J. Wong
In commit e89c041338ed6ef ("xfs: implement the GETFSMAP ioctl") we created the ability to obtain empty transactions. These transactions have no log or block reservations and therefore can't modify anything. Since they're also NO_WRITECOUNT they can run while the fs is frozen, so we don't need to WARN_ON about that usage. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-06xfs: convert to SPDX license tagsDave Chinner
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: get rid of the log item descriptorDave Chinner
It's just a connector between a transaction and a log item. There's a 1:1 relationship between a log item descriptor and a log item, and a 1:1 relationship between a log item descriptor and a transaction. Both relationships are created and terminated at the same time, so why do we even have the descriptor? Replace it with a specific list_head in the log item and a new log item dirtied flag to replace the XFS_LID_DIRTY flag. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: add tracing to high level transaction operationsDave Chinner
Because currently we have no idea what the transaction context we are operating in is, and I need to know that information to track down bugs in multiple log item joins to transactions. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: log item flags are racyDave Chinner
The log item flags contain a field that is protected by the AIL lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to set and clear these flags, but most of the updates and checks are not done with the AIL lock held and so are susceptible to update races. Fix this by changing the log item flags to use atomic bitops rather than be reliant on the AIL lock for update serialisation. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: defer agfl block frees when dfops is availableBrian Foster
The AGFL fixup code executes before every block allocation/free and rectifies the AGFL based on the current, dynamic allocation requirements of the fs. The AGFL must hold a minimum number of blocks to satisfy a worst case split of the free space btrees caused by the impending allocation operation. The AGFL is also updated to maintain the implicit requirement for a minimum number of free slots to satisfy a worst case join of the free space btrees. Since the AGFL caches individual blocks, AGFL reduction typically involves multiple, single block frees. We've had reports of transaction overrun problems during certain workloads that boil down to AGFL reduction freeing multiple blocks and consuming more space in the log than was reserved for the transaction. Since the objective of freeing AGFL blocks is to ensure free AGFL free slots are available for the upcoming allocation, one way to address this problem is to release surplus blocks from the AGFL immediately but defer the free of those blocks (similar to how file-mapped blocks are unmapped from the file in one transaction and freed via a deferred operation) until the transaction is rolled. This turns AGFL reduction into an operation with predictable log reservation consumption. Add the capability to defer AGFL block frees when a deferred ops list is available to the AGFL fixup code. Add a dfops pointer to the transaction to carry dfops through various contexts to the allocator context. Deferring AGFL frees is conditional behavior based on whether the transaction pointer is populated. The long term objective is to reuse the transaction pointer to clean up all unrelated callchains that pass dfops on the stack along with a transaction and in doing so, consistently defer AGFL blocks from the allocator. A bit of customization is required to handle deferred completion processing because AGFL blocks are accounted against a per-ag reservation pool and AGFL blocks are not inserted into the extent busy list when freed (they are inserted when used and released back to the AGFL). Reuse the majority of the existing deferred extent free infrastructure and customize it appropriately to handle AGFL blocks. Note that this patch only adds infrastructure. It does not change behavior because no callers have been updated to pass ->t_agfl_dfops into the allocation code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-14xfs: merge _xfs_log_force_lsn and xfs_log_force_lsnChristoph Hellwig
Switch to a single interface for flushing the log to a specific LSN, which gives consistent trace point coverage and a less confusing interface. The was only a single user of the previous xfs_log_force_lsn function, which now also passes a NULL log_flushed argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11xfs: shutdown if block allocation overruns tx reservationBrian Foster
The ->t_blk_res_used field tracks how many blocks have been used in the current transaction. This should never exceed the block reservation (->t_blk_res) for a particular transaction. We currently assert this condition in the transaction block accounting code, but otherwise take no additional action should this situation occur. The overrun generally has no effect if space ends up being available and the associated transaction commits. If the transaction is duplicated, however, the current block usage is used to determine the remaining block reservation to be transferred to the new transaction. If usage exceeds reservation, this calculation underflows and creates a transaction with an invalid and excessive reservation. When the second transaction commits, the release of unused blocks corrupts the in-core free space counters. With lazy superblock accounting enabled, this inconsistency eventually trickles to the on-disk superblock and corrupts the filesystem. Replace the transaction block usage accounting assert with an explicit overrun check. If the transaction overruns the reservation, shutdown the filesystem immediately to prevent corruption. Add a new assert to xfs_trans_dup() to catch any callers that might induce this invalid state in the future. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11xfs: Rename xa_ elements to ail_Matthew Wilcox
This is a simple rename, except that xa_ail becomes ail_head. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-08xfs: trace log reservations at mount timeDarrick J. Wong
At each mount, emit the transaction reservation type information via tracepoints. This makes it easier to compare the log reservation info calculated by the kernel and xfsprogs so that we can more easily diagnose minimum log size failures on freshly formatted filesystems. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-09-01xfs: refactor xfs_trans_rollChristoph Hellwig
Split xfs_trans_roll into a low-level helper that just rolls the actual transaction and a new higher level xfs_trans_roll_inode that takes care of logging and rejoining the inode. This gets rid of the NULL inode case, and allows to simplify the special cases in the deferred operation code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-06Merge tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.12. The big new feature for this release is the new space mapping ioctl that we've been discussing since LSF2016, but other than that most of the patches are larger bug fixes, memory corruption prevention, and other cleanups. Summary: - various code cleanups - introduce GETFSMAP ioctl - various refactoring - avoid dio reads past eof - fix memory corruption and other errors with fragmented directory blocks - fix accidental userspace memory corruptions - publish fs uuid in superblock - make fstrim terminatable - fix race between quotaoff and in-core inode creation - avoid use-after-free when finishing up w/ buffer heads - reserve enough space to handle bmap tree resizing during cow remap" * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits) xfs: fix use-after-free in xfs_finish_page_writeback xfs: reserve enough blocks to handle btree splits when remapping xfs: wait on new inodes during quotaoff dquot release xfs: update ag iterator to support wait on new inodes xfs: support ability to wait on new inodes xfs: publish UUID in struct super_block xfs: Allow user to kill fstrim process xfs: better log intent item refcount checking xfs: fix up quotacheck buffer list error handling xfs: remove xfs_trans_ail_delete_bulk xfs: don't use bool values in trace buffers xfs: fix getfsmap userspace memory corruption while setting OF_LAST xfs: fix __user annotations for xfs_ioc_getfsmap xfs: corruption needs to respect endianess too! xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap xfs: simplify validation of the unwritten extent bit xfs: remove unused values from xfs_exntst_t xfs: remove the unused XFS_MAXLINK_1 define xfs: more do_div cleanups ...
2017-05-03xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFSMichal Hocko
xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite some time ago. We would like to make this concept more generic and use it for other filesystems as well. Let's start by giving the flag a more generic name PF_MEMALLOC_NOFS which is in line with an exiting PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO contexts. Replace all PF_FSTRANS usage from the xfs code in the first step before we introduce a full API for it as xfs uses the flag directly anyway. This patch doesn't introduce any functional change. Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-06xfs: fold __xfs_trans_roll into xfs_trans_rollChristoph Hellwig
No one cares about the low-level helper anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03xfs: implement the GETFSMAP ioctlDarrick J. Wong
Introduce a new ioctl that uses the reverse mapping btree to return information about the physical layout of the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2016-10-03Merge branch 'xfs-4.9-reflink-prep' into for-nextDave Chinner
2016-09-19xfs: set up per-AG free space reservationsDarrick J. Wong
One unfortunate quirk of the reference count and reverse mapping btrees -- they can expand in size when blocks are written to *other* allocation groups if, say, one large extent becomes a lot of tiny extents. Since we don't want to start throwing errors in the middle of CoWing, we need to reserve some blocks to handle future expansion. The transaction block reservation counters aren't sufficient here because we have to have a reserve of blocks in every AG, not just somewhere in the filesystem. Therefore, create two per-AG block reservation pools. One feeds the AGFL so that rmapbt expansion always succeeds, and the other feeds all other metadata so that refcountbt expansion never fails. Use the count of how many reserved blocks we need to have on hand to create a virtual reservation in the AG. Through selective clamping of the maximum length of allocation requests and of the length of the longest free extent, we can make it look like there's less free space in the AG unless the reservation owner is asking for blocks. In other words, play some accounting tricks in-core to make sure that we always have blocks available. On the plus side, there's nothing to clean up if we crash, which is contrast to the strategy that the rough draft used (actually removing extents from the freespace btrees). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-14xfs: undo block reservation correctly in xfs_trans_reserve()Eryu Guan
"blocks" should be added back to fdblocks at undo time, not taken away, i.e. the minus sign should not be used. This is a regression introduced by commit 0d485ada404b ("xfs: use generic percpu counters for free block counter"). And it's found by code inspection, I didn't it in real world, so there's no reproducer. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-04-06xfs: remove transaction typesChristoph Hellwig
These aren't used for CIL-style logging and can be dropped. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-04-06xfs: better xfs_trans_alloc interfaceChristoph Hellwig
Merge xfs_trans_reserve and xfs_trans_alloc into a single function call that returns a transaction with all the required log and block reservations, and which allows passing transaction flags directly to avoid the cumbersome _xfs_trans_alloc interface. While we're at it we also get rid of the transaction type argument that has been superflous since we stopped supporting the non-CIL logging mode. The guts of it will be removed in another patch. [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-03-15xfs: ensure committed is initialized in xfs_trans_rollEric Sandeen
__xfs_trans_roll() can return without setting the *committed argument; this was a problem for xfs_bmap_finish(): int committed;/* xact committed or not */ ... error = __xfs_trans_roll(tp, ip, &committed); if (error) { ... if (committed) { and we tested an uninitialized "committed" variable on the error path. No caller is preserving "committed" state across calls to __xfs_trans_roll(), so just initialize committed inside the function to avoid future errors like this. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-10-12xfs: per-filesystem stats counter implementationBill O'Donnell
This patch modifies the stats counting macros and the callers to those macros to properly increment, decrement, and add-to the xfs stats counts. The counts for global and per-fs stats are correctly advanced, and cleared by writing a "1" to the corresponding clear file. global counts: /sys/fs/xfs/stats/stats per-fs counts: /sys/fs/xfs/sda*/stats/stats global clear: /sys/fs/xfs/stats/stats_clear per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around stats structures and macros. ] Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-08-19xfs: return committed status from xfs_trans_roll()Brian Foster
Some callers need to make error handling decisions based on whether the current transaction successfully committed or not. Rename xfs_trans_roll(), add a new parameter and provide a wrapper to preserve existing callers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: fix xfs_log_done interfaceChristoph Hellwig
Instead of the confusing flags argument pass a boolean flag to indicate if we want to release or regrant a log reservation. Also ensure that xfs_log_done always drop the reference on the log ticket, to both simplify the code and make the logic in xfs_trans_roll easier to understand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: saner xfs_trans_commit interfaceChristoph Hellwig
The flags argument to xfs_trans_commit is not useful for most callers, as a commit of a transaction without a permanent log reservation must pass 0 here, and all callers for a transaction with a permanent log reservation except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove the flags argument from the public xfs_trans_commit interfaces, and introduce low-level __xfs_trans_commit variant just for xfs_trans_roll that regrants a log reservation instead of releasing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: remove the flags argument to xfs_trans_cancelChristoph Hellwig
xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and XFS_TRANS_ABORT. Both of them are a direct product of the transaction state, and can be deducted: - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled, and XFS_TRANS_ABORT is a noop for a transaction that is not dirty. - any transaction with a permanent log reservation needs XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent log reservation is invalid. So just remove the flags argument and do the right thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: pass a boolean flag to xfs_trans_free_itemsChristoph Hellwig
The flags value always was 0 or XFS_TRANS_ABORT. Switch to a bool parameter to allow further cleanups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-04xfs: switch remaining xfs_trans_dup users to xfs_trans_rollChristoph Hellwig
We have three remaining callers of xfs_trans_dup: - xfs_itruncate_extents which open codes xfs_trans_roll - xfs_bmap_finish doesn't have an xfs_inode argument and thus leaves attaching them to it's callers, but otherwise is identical to xfs_trans_roll - xfs_dir_ialloc looks at the log reservations in the old xfs_trans structure instead of the log reservation parameters, but otherwise is identical to xfs_trans_roll. By allowing a NULL xfs_inode argument to xfs_trans_roll we can switch these three remaining users over to xfs_trans_roll and mark xfs_trans_dup static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23xfs: replace xfs_mod_incore_sb_batchedDave Chinner
Introduce helper functions for modifying fields in the superblock into xfs_trans.c, the only caller of xfs_mod_incore_sb_batch(). We can then use these directly in xfs_trans_unreserve_and_mod_sb() and so remove another user of the xfs_mode_incore_sb() API without losing any functionality or scalability of the transaction commit code.. Based on a patch from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23xfs: introduce xfs_mod_frextentsDave Chinner
Add a new helper to modify the incore counter of free realtime extents. This matches the helpers used for inode and data block counters, and removes a significant users of the xfs_mod_incore_sb() interface. Based on a patch originally from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23xfs: use generic percpu counters for free block counterDave Chinner
XFS has hand-rolled per-cpu counters for the superblock since before there was any generic implementation. The free block counter is special in that it is used for ENOSPC detection outside transaction contexts for for delayed allocation. This means that the counter needs to be accurate at zero. The current per-cpu counter code jumps through lots of hoops to ensure we never run past zero, but we don't need to make all those jumps with the generic counter implementation. The generic counter implementation allows us to pass a "batch" threshold at which the addition/subtraction to the counter value will be folded back into global value under lock. We can use this feature to reduce the batch size as we approach 0 in a very similar manner to the existing counters and their rebalance algorithm. If we use a batch size of 1 as we approach 0, then every addition and subtraction will be done against the global value and hence allow accurate detection of zero threshold crossing. Hence we can replace the handrolled, accurate-at-zero counters with generic percpu counters. Note: this removes just enough of the icsb infrastructure to compile without warnings. The rest will go in subsequent commits. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23xfs: use generic percpu counters for free inode counterDave Chinner
XFS has hand-rolled per-cpu counters for the superblock since before there was any generic implementation. The free inode counter is not used for any limit enforcement - the per-AG free inode counters are used during allocation to determine if there are inode available for allocation. Hence we don't need any of the complexity of the hand-rolled counters and we can simply replace them with generic per-cpu counters similar to the inode counter. This version introduces a xfs_mod_ifree() helper function from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23xfs: use generic percpu counters for inode counterDave Chinner
XFS has hand-rolled per-cpu counters for the superblock since before there was any generic implementation. There are some warts around the use of them for the inode counter as the hand rolled counter is designed to be accurate at zero, but has no specific accurracy at any other value. This design causes problems for the maximum inode count threshold enforcement, as there is no trigger that balances the counters as they get close tothe maximum threshold. Instead of designing new triggers for balancing, just replace the handrolled per-cpu counter with a generic counter. This enables us to update the counter through the normal superblock modification funtions, but rather than do that we add a xfs_mod_icount() helper function (from Christoph Hellwig) and keep the percpu counter outside the superblock in the struct xfs_mount. This means we still need to initialise the per-cpu counter specifically when we read the superblock, and vice versa when we log/write it, but it does mean that we don't need to change any other code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22xfs: set superblock buffer type correctlyDave Chinner
When the superblock is modified in a transaction, the commonly modified fields are not actually copied to the superblock buffer to avoid the buffer lock becoming a serialisation point. However, there are some other operations that modify the superblock fields within the transaction that don't directly log to the superblock but rely on the changes to be applied during the transaction commit (to minimise the buffer lock hold time). When we do this, we fail to mark the buffer log item as being a superblock buffer and that can lead to the buffer not being marked with the corect type in the log and hence causing recovery issues. Fix it by setting the type correctly, similar to xfs_mod_sb()... cc: <stable@vger.kernel.org> # 3.10 to current Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-11-28xfs: move most of xfs_sb.h to xfs_format.hChristoph Hellwig
More on-disk format consolidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-11-28xfs: merge xfs_ag.h into xfs_format.hChristoph Hellwig
More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-06-25xfs: global error sign conversionDave Chinner
Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>