summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_iomap.c
diff options
context:
space:
mode:
authorBrian Foster <bfoster@redhat.com>2019-02-01 09:14:23 -0800
committerDarrick J. Wong <darrick.wong@oracle.com>2019-02-11 16:07:01 -0800
commitd9252d526ba66a8f95ad2830ae1b62825ef3dbd5 (patch)
treef6aebf395726e3ecd5906805fee481e9448c0d66 /fs/xfs/xfs_iomap.c
parent9f9bc034b84958523689347ee2bdd9c660008e5e (diff)
xfs: validate writeback mapping using data fork seq counter
The writeback code caches the current extent mapping across multiple xfs_do_writepage() calls to avoid repeated lookups for sequential pages backed by the same extent. This is known to be slightly racy with extent fork changes in certain difficult to reproduce scenarios. The cached extent is trimmed to within EOF to help avoid the most common vector for this problem via speculative preallocation management, but this is a band-aid that does not address the fundamental problem. Now that we have an xfs_ifork sequence counter mechanism used to facilitate COW writeback, we can use the same mechanism to validate consistency between the data fork and cached writeback mappings. On its face, this is somewhat of a big hammer approach because any change to the data fork invalidates any mapping currently cached by a writeback in progress regardless of whether the data fork change overlaps with the range under writeback. In practice, however, the impact of this approach is minimal in most cases. First, data fork changes (delayed allocations) caused by sustained sequential buffered writes are amortized across speculative preallocations. This means that a cached mapping won't be invalidated by each buffered write of a common file copy workload, but rather only on less frequent allocation events. Second, the extent tree is always entirely in-core so an additional lookup of a usable extent mostly costs a shared ilock cycle and in-memory tree lookup. This means that a cached mapping reval is relatively cheap compared to the I/O itself. Third, spurious invalidations don't impact ioend construction. This means that even if the same extent is revalidated multiple times across multiple writepage instances, we still construct and submit the same size ioend (and bio) if the blocks are physically contiguous. Update struct xfs_writepage_ctx with a new field to hold the sequence number of the data fork associated with the currently cached mapping. Check the wpc seqno against the data fork when the mapping is validated and reestablish the mapping whenever the fork has changed since the mapping was cached. This ensures that writeback always uses a valid extent mapping and thus prevents lost writebacks and stale delalloc block problems. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Diffstat (limited to 'fs/xfs/xfs_iomap.c')
-rw-r--r--fs/xfs/xfs_iomap.c5
1 files changed, 2 insertions, 3 deletions
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 27c93b5f029d..ab69caa685b4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -681,7 +681,7 @@ xfs_iomap_write_allocate(
int whichfork,
xfs_off_t offset,
xfs_bmbt_irec_t *imap,
- unsigned int *cow_seq)
+ unsigned int *seq)
{
xfs_mount_t *mp = ip->i_mount;
struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -797,8 +797,7 @@ xfs_iomap_write_allocate(
if (error)
goto error0;
- if (whichfork == XFS_COW_FORK)
- *cow_seq = READ_ONCE(ifp->if_seq);
+ *seq = READ_ONCE(ifp->if_seq);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
}