summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
2018-02-05Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "As promised, here's a (much smaller) second pull request for the second week of the merge cycle. This time around we have a couple patches shutting off unsupported fs configurations, and a couple of cleanups. Last, we turn off EXPERIMENTAL for the reverse mapping btree, since the primary downstream user of that information (online fsck) is now upstream and I haven't seen any major failures in a few kernel releases. Summary: - Print scrub build status in the xfs build info. - Explicitly call out the remaining two scenarios where we don't support reflink and never have. - Remove EXPERIMENTAL tag from reverse mapping btree!" * tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove experimental tag for reverse mapping xfs: don't allow reflink + realtime filesystems xfs: don't allow DAX on reflink filesystems xfs: add scrub to XFS_BUILD_OPTIONS xfs: fix u32 type usage in sb validation function
2018-02-03Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Only miscellaneous cleanups and bug fixes for ext4 this cycle" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: create ext4_kset dynamically ext4: create ext4_feat kobject dynamically ext4: release kobject/kset even when init/register fail ext4: fix incorrect indentation of if statement ext4: correct documentation for grpid mount option ext4: use 'sbi' instead of 'EXT4_SB(sb)' ext4: save error to disk in __ext4_grp_locked_error() jbd2: fix sphinx kernel-doc build warnings ext4: fix a race in the ext4 shutdown path mbcache: make sure c_entry_count is not decremented past zero ext4: no need flush workqueue before destroying it ext4: fixed alignment and minor code cleanup in ext4.h ext4: fix ENOSPC handling in DAX page fault handler dax: pass detailed error code from dax_iomap_fault() mbcache: revert "fs/mbcache.c: make count_objects() more robust" mbcache: initialize entry->e_referenced in mb_cache_entry_create() ext4: fix up remaining files with SPDX cleanups
2018-02-01xfs: remove experimental tag for reverse mappingDarrick J. Wong
Reverse mapping has had a while to soak, so remove the experimental tag. Now that we've landed space metadata cross-referencing in scrub, the feature actually has a purpose. Reject rmap filesystems with an rt device until the code to support it is actually implemented. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-02-01xfs: don't allow reflink + realtime filesystemsDarrick J. Wong
We don't support realtime filesystems with reflink either, so fail those mounts. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-02-01xfs: don't allow DAX on reflink filesystemsDarrick J. Wong
Now that reflink is no longer experimental, reject attempts to mount with DAX until that whole mess gets sorted out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-02-01xfs: add scrub to XFS_BUILD_OPTIONSEric Sandeen
Advertise this config option along with the others. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-31xfs: fix u32 type usage in sb validation functionDarrick J. Wong
Don't use u32, use uint32_t, because this won't work in xfsprogs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-01-31Merge tag 'docs-4.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer" * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits) linux-next: docs-rst: Fix typos in kfigure.py linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt Documentation: Fix misconversion of #if docs: add index entry for networking/msg_zerocopy Documentation: security/credentials.rst: explain need to sort group_list LICENSES: Add MPL-1.1 license LICENSES: Add the GPL 1.0 license LICENSES: Add Linux syscall note exception LICENSES: Add the MIT license LICENSES: Add the BSD-3-clause "Clear" license LICENSES: Add the BSD 3-clause "New" or "Revised" License LICENSES: Add the BSD 2-clause "Simplified" license LICENSES: Add the LGPL-2.1 license LICENSES: Add the LGPL 2.0 license LICENSES: Add the GPL 2.0 license Documentation: Add license-rules.rst to describe how to properly identify file licenses scripts: kernel_doc: better handle show warnings logic fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at doc: md: Fix a file name to md-fault.c in fault-injection.txt errseq: Add to documentation tree ...
2018-01-31Merge tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "This merge cycle, we're again some substantive changes to XFS. Metadata verifiers have been restructured to provide more detail about which part of a metadata structure failed checks, and we've enhanced the new online fsck feature to cross-reference extent allocation information with the other metadata structures. With this pull, the metadata verification part of online fsck is more or less finished, though the feature is still experimental and still disabled by default. We're also preparing to remove the EXPERIMENTAL tag from a couple of features this cycle. This week we're committing a bunch of space accounting fixes for reflink and removing the EXPERIMENTAL tag from reflink; I anticipate that we'll be ready to do the same for the reverse mapping feature next week. (I don't have any pending fixes for rmap; however I wish to remove the tags one at a time.) This giant pile of patches has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Let me know if there's any merge problems -- git merge reported that one of our patches touched the same function as the i_version series, but it resolved things cleanly. Summary: - Log faulting code locations when verifiers fail, for improved diagnosis of corrupt filesystems. - Implement metadata verifiers for local format inode fork data. - Online scrub now cross-references metadata records with other metadata. - Refactor the fs geometry ioctl generation functions. - Harden various metadata verifiers. - Fix various accounting problems. - Fix uncancelled transactions leaking when xattr functions fail. - Prevent the copy-on-write speculative preallocation garbage collector from racing with writeback. - Emit log reservation type information as trace data so that we can compare against xfsprogs. - Fix some erroneous asserts in the online scrub code. - Clean up the transaction reservation calculations. - Fix various minor bugs in online scrub. - Log complaints about mixed dio/buffered writes once per day and less noisily than before. - Refactor buffer log item lists to use list_head. - Break PNFS leases before reflinking blocks. - Reduce lock contention on reflink source files. - Fix some quota accounting problems with reflink. - Fix a serious corruption problem in the direct cow write code where we fed bad iomaps to the vfs iomap consumers. - Various other refactorings. - Remove EXPERIMENTAL tag from reflink!" * tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (94 commits) xfs: remove experimental tag for reflinks xfs: don't screw up direct writes when freesp is fragmented xfs: check reflink allocation mappings iomap: warn on zero-length mappings xfs: treat CoW fork operations as delalloc for quota accounting xfs: only grab shared inode locks for source file during reflink xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes xfs: reflink should break pnfs leases before sharing blocks xfs: don't clobber inobt/finobt cursors when xref with rmap xfs: skip CoW writes past EOF when writeback races with truncate xfs: preserve i_rdev when recycling a reclaimable inode xfs: refactor accounting updates out of xfs_bmap_btalloc xfs: refactor inode verifier corruption error printing xfs: make tracepoint inode number format consistent xfs: always zero di_flags2 when we free the inode xfs: call xfs_qm_dqattach before performing reflink operations xfs: bmap code cleanup Use list_head infra-structure for buffer's log items list Split buffer's b_fspriv field Get rid of xfs_buf_log_item_t typedef ...
2018-01-29Merge tag 'iversion-v4.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull inode->i_version rework from Jeff Layton: "This pile of patches is a rework of the inode->i_version field. We have traditionally incremented that field on every inode data or metadata change. Typically this increment needs to be logged on disk even when nothing else has changed, which is rather expensive. It turns out though that none of the consumers of that field actually require this behavior. The only real requirement for all of them is that it be different iff the inode has changed since the last time the field was checked. Given that, we can optimize away most of the i_version increments and avoid dirtying inode metadata when the only change is to the i_version and no one is querying it. Queries of the i_version field are rather rare, so we can help write performance under many common workloads. This patch series converts existing accesses of the i_version field to a new API, and then converts all of the in-kernel filesystems to use it. The last patch in the series then converts the backend implementation to a scheme that optimizes away a large portion of the metadata updates when no one is looking at it. In my own testing this series significantly helps performance with small I/O sizes. I also got this email for Christmas this year from the kernel test robot (a 244% r/w bandwidth improvement with XFS over DAX, with 4k writes): https://lkml.org/lkml/2017/12/25/8 A few of the earlier patches in this pile are also flowing to you via other trees (mm, integrity, and nfsd trees in particular)". * tag 'iversion-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: (22 commits) fs: handle inode->i_version more efficiently btrfs: only dirty the inode in btrfs_update_time if something was changed xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing fs: only set S_VERSION when updating times if necessary IMA: switch IMA over to new i_version API xfs: convert to new i_version API ufs: use new i_version API ocfs2: convert to new i_version API nfsd: convert to new i_version API nfs: convert to new i_version API ext4: convert to new i_version API ext2: convert to new i_version API exofs: switch to new i_version API btrfs: convert to new i_version API afs: convert to new i_version API affs: convert to new i_version API fat: convert to new i_version API fs: don't take the i_lock in inode_inc_iversion fs: new API for handling inode->i_version ntfs: remove i_version handling ...
2018-01-29xfs: remove experimental tag for reflinksChristoph Hellwig
But reject reflink + DAX file systems for now until the code to support reflinks on DAX is actually implemented. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: port to 4.16] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: don't screw up direct writes when freesp is fragmentedDarrick J. Wong
xfs_bmap_btalloc is given a range of file offset blocks that must be allocated to some data/attr/cow fork. If the fork has an extent size hint associated with it, the request will be enlarged on both ends to try to satisfy the alignment hint. If free space is fragmentated, sometimes we can allocate some blocks but not enough to fulfill any of the requested range. Since bmapi_allocate always trims the new extent mapping to match the originally requested range, this results in bmapi_write returning zero and no mapping. The consequences of this vary -- buffered writes will simply re-call bmapi_write until it can satisfy at least one block from the original request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC through the dio mechanism out to userspace with the weird result that writes fail even when we have enough space because the ENOSPC return overrides any partial write status. For direct CoW writes the situation was disastrous because nobody notices us returning an invalid zero-length wrong-offset mapping to iomap and the write goes off into space. Therefore, if free space is so fragmented that we managed to allocate some space but not enough to map into even a single block of the original allocation request range, we should break the alignment hint in order to guarantee at least some forward progress for the direct write. If we return a short allocation to iomap_apply it'll call back about the remaining blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: check reflink allocation mappingsDarrick J. Wong
There's a really bad bug in xfs_reflink_allocate_cow -- if bmapi_write can return a zero error code but no mappings. This happens if there's an extent size hint (which causes allocation requests to be rounded to extsz granularity internally), but there wasn't a big enough chunk of free space to start filling at the extsz granularity and fill even one block of the range that we actually requested. In any case, if we got no mappings we can't possibly do anything useful with the contents of imap, so we must bail out with ENOSPC here. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: treat CoW fork operations as delalloc for quota accountingDarrick J. Wong
Since the CoW fork only exists in memory, it is incorrect to update the on-disk quota block counts when we modify the CoW fork. Unlike the data fork, even real extents in the CoW fork are only delalloc-style reservations (on-disk they're owned by the refcountbt) so they must not be tracked in the on disk quota info. Ensure the i_delayed_blks accounting reflects this too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: only grab shared inode locks for source file during reflinkDarrick J. Wong
Reflink and dedupe operations remap blocks from a source file into a destination file. The destination file needs exclusive locks on all levels because we're updating its block map, but the source file isn't undergoing any block map changes so we can use a shared lock. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modesDarrick J. Wong
Refactor xfs_lock_two_inodes to take separate locking modes for each inode. Specifically, this enables us to take a SHARED lock on one inode and an EXCL lock on the other. The lock class (MMAPLOCK/ILOCK) must be the same for each inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: reflink should break pnfs leases before sharing blocksDarrick J. Wong
Before we share blocks between files, we need to break the pnfs leases on the layout before we start slicing and dicing the block map. The structure of this function sets us up for the lock contention reduction in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: don't clobber inobt/finobt cursors when xref with rmapDarrick J. Wong
Even if we can't use the inobt/finobt cursors to count the number of inode btree blocks, we are never allowed to clobber the cursor of the btree being checked, so don't do this. Found by fuzzing level = ones in xfs/364. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: skip CoW writes past EOF when writeback races with truncateDarrick J. Wong
Every so often we blow the ASSERT(type != XFS_IO_COW) in xfs_map_blocks when running fsstress, as we do in generic/269. The cause of this is writeback racing with truncate -- writeback doesn't take the iolock, so truncate can sneak in to decrease i_size and truncate page cache while writeback is gathering buffer heads to schedule writeout. If we hit this race on a block that has a CoW mapping, we'll get a valid imap from the CoW fork but the reduced i_size trims the mapping to zero length (which makes it invalid), so we call xfs_map_blocks to try again. This doesn't do much anyway, since any mapping we get out of that will also be invalid, so we might as well skip the assert and just stop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: preserve i_rdev when recycling a reclaimable inodeAmir Goldstein
Commit 66f364649d870 ("xfs: remove if_rdev") moved storing of rdev value for special inodes to VFS inodes, but forgot to preserve the value of i_rdev when recycling a reclaimable xfs_inode. This was detected by xfstest overlay/017 with inodex=on mount option and xfs base fs. The test does a lookup of overlay chardev and blockdev right after drop caches. Overlayfs inodes hold a reference on underlying xfs inodes when mount option index=on is configured. If drop caches reclaim xfs inodes, before it relclaims overlayfs inodes, that can sometimes leave a reclaimable xfs inode and that test hits that case quite often. When that happens, the xfs inode cache remains broken (zere i_rdev) until the next cycle mount or drop caches. Fixes: 66f364649d870 ("xfs: remove if_rdev") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: refactor accounting updates out of xfs_bmap_btallocDarrick J. Wong
Move all the inode and quota accounting updates out of xfs_bmap_btalloc in preparation for fixing some quota accounting problems with copy on write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-29xfs: refactor inode verifier corruption error printingDarrick J. Wong
Refactor inode verifier error reporting into a non-libxfs function so that we aren't encoding the message format in libxfs. This also changes the kernel dmesg output to resemble buffer verifier errors more closely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: make tracepoint inode number format consistentDarrick J. Wong
Fix all the inode number formats to be consistently (0x%llx) in all trace point definitions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: always zero di_flags2 when we free the inodeDarrick J. Wong
Always zero the di_flags2 field when we free the inode so that we never end up with an on-disk record for an unallocated inode that also has the reflink iflag set. This is in keeping with the general principle that only files can have the reflink iflag set, even though we'll zero out di_flags2 if we ever reallocate the inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: call xfs_qm_dqattach before performing reflink operationsDarrick J. Wong
Ensure that we've attached all the necessary dquots before performing reflink operations so that quota accounting is accurate. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-01-29xfs: bmap code cleanupShan Hai
Remove the extent size hint and realtime inode relevant code from the xfs_bmapi_reserve_delalloc since it is not called on the inode with extent size hint set or on a realtime inode. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Use list_head infra-structure for buffer's log items listCarlos Maiolino
Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style cleanups] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Split buffer's b_fspriv fieldCarlos Maiolino
By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style changes] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29Get rid of xfs_buf_log_item_t typedefCarlos Maiolino
Take advantage of the rework on xfs_buf log items list, to get rid of ths typedef for xfs_buf_log_item. This patch also fix some indentation alignment issues found along the way. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementingJeff Layton
If XFS_ILOG_CORE is already set then go ahead and increment it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2018-01-29xfs: convert to new i_version APIJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: fix non-debug build compiler warningsDarrick J. Wong
Fix compiler warning on non-debug build Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: check sb_agblocks and sb_agblklog when validating superblockDarrick J. Wong
Currently, we don't check sb_agblocks or sb_agblklog when we validate the superblock, which means that we can fuzz garbage values into those values and the mount succeeds. This leads to all sorts of UBSAN warnings in xfs/350 since we can then coerce other parts of xfs into shifting by ridiculously large values. Once we've validated agblocks, make sure the agcount makes sense. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: recheck reflink / dirty page status before freeing CoW reservationsDarrick J. Wong
Eryu Guan reported seeing occasional hangs when running generic/269 with a new fsstress that supports clonerange/deduperange. The cause of this hang is an infinite loop when we convert the CoW fork extents from unwritten to real just prior to writing the pages out; the infinite loop happens because there's nothing in the CoW fork to convert, and so it spins forever. The fundamental issue here is that when we go to perform these CoW fork conversions, we're supposed to have an extent waiting for us, but the low space CoW reaper has snuck in and blown them away! There are four conditions that can dissuade the reaper from touching our file -- no reflink iflag; dirty page cache; writeback in progress; or directio in progress. We check the four conditions prior to taking the locks, but we neglect to recheck them once we have the locks, which is how we end up whacking the writeback that's in progress. Therefore, refactor the four checks into a helper function and call it once again once we have the locks to make sure we really want to reap the inode. While we're at it, add an ASSERT for this weird condition so that we'll fail noisily if we ever screw this up again. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: check that br_blockcount doesn't overflowDarrick J. Wong
xfs_bmbt_irec.br_blockcount is declared as xfs_filblks_t, which is an unsigned 64-bit integer. Though the bmbt helpers will never set a value larger than 2^21 (since the underlying on-disk extent record has a length field that is only 21 bits wide), we should be a little defensive about checking that a bmbt record doesn't exceed what we're expecting or overflow into the next AG. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: btree format ifork loader should check for zero numrecsDarrick J. Wong
A btree format inode fork with zero records makes no sense, so reject it if we see it, or else we can miscalculate memory allocations. Found by zeroes fuzzing {a,u3}.bmbt.numrecs in xfs/{374,378,412} with KASAN. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: attr leaf verifier needs to check for obviously bad countDarrick J. Wong
In the attribute leaf verifier, we can check for obviously bad values of firstused and count so that later attempts at lasthash don't run off the end of the memory buffer. Found by ones fuzzing hdr.count in xfs/400 with KASAN. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: directory scrubber must walk through data block to offsetDarrick J. Wong
In xfs_scrub_dir_rec, we must walk through the directory block entries to arrive at the offset given by the hash structure. If we blindly trust the hash address, we can end up midway into a directory entry and stray outside the block. Found by lastbit fuzzing lents[3].address in xfs/390 with KASAN enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: don't iunlock unlocked inodesDarrick J. Wong
Don't iunlock an unlocked inode, which can happen if the parent pointer scrubber bails out with sc->ip unlocked while trying to grab the parent directory inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-01-17xfs: scrub in-core metadataDarrick J. Wong
Whenever we load a buffer, explicitly re-call the structure verifier to ensure that memory isn't corrupting things. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference the block mappings when possibleDarrick J. Wong
Use an inode's block mappings to cross-reference inode block counters. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference the realtime bitmapDarrick J. Wong
While we're scrubbing various btrees, cross-reference the records with the other metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference refcount btree during scrubDarrick J. Wong
During metadata btree scrub, we should cross-reference with the reference counts. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference the rmapbt data with the refcountbtDarrick J. Wong
Cross reference the refcount data with the rmap data to check that the number of rmaps for a given block match the refcount of that block, and that CoW blocks (which are owned entirely by the refcountbt) are tracked as well. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference reverse-mapping btreeDarrick J. Wong
When scrubbing various btrees, we should cross-reference the records with the reverse mapping btree and ensure that traversing the btree finds the same number of blocks that the rmapbt thinks are owned by that btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference inode btrees during scrubDarrick J. Wong
Cross-reference the inode btrees with the other metadata when we scrub the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference bnobt records with cntbtDarrick J. Wong
Scrub should make sure that each bnobt record has a corresponding cntbt record. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: cross-reference with the bnobtDarrick J. Wong
When we're scrubbing various btrees, cross-reference the records with the bnobt to ensure that we don't also think the space is free. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: introduce scrubber cross-referencing stubsDarrick J. Wong
Create some stubs that will be used to cross-reference metadata records. The actual cross-referencing will be filled in by subsequent patches. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-01-17xfs: check btree block ownership with bnobt/rmapbt when scrubbing btreeDarrick J. Wong
When scanning a metadata btree block, cross-reference the block location with the free space btree and the reverse mapping btree to ensure that the rmapbt knows about the block and the bnobt does not. Add a mechanism to defer checks when we happen to be scanning the bnobt/rmapbt itself because it's less efficient to repeatedly clone and destroy the cursor. This patch provides the framework to make btree block owner checks happen; the actual meat will be added in subsequent patches. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>