summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_icache.h
AgeCommit message (Collapse)Author
2014-11-28xfs: merge xfs_ag.h into xfs_format.hChristoph Hellwig
More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-24xfs: run an eofblocks scan on ENOSPC/EDQUOTBrian Foster
From: Brian Foster <bfoster@redhat.com> Speculative preallocation and and the associated throttling metrics assume we're working with large files on large filesystems. Users have reported inefficiencies in these mechanisms when we happen to be dealing with large files on smaller filesystems. This can occur because while prealloc throttling is aggressive under low free space conditions, it is not active until we reach 5% free space or less. For example, a 40GB filesystem has enough space for several files large enough to have multi-GB preallocations at any given time. If those files are slow growing, they might reserve preallocation for long periods of time as well as avoid the background scanner due to frequent modification. If a new file is written under these conditions, said file has no access to this already reserved space and premature ENOSPC is imminent. To handle this scenario, modify the buffered write ENOSPC handling and retry sequence to invoke an eofblocks scan. In the smaller filesystem scenario, the eofblocks scan resets the usage of preallocation such that when the 5% free space threshold is met, throttling effectively takes over to provide fair and efficient preallocation until legitimate ENOSPC. The eofblocks scan is selective based on the nature of the failure. For example, an EDQUOT failure in a particular quota will use a filtered scan for that quota. Because we don't know which quota might have caused an allocation failure at any given time, we include each applicable quota determined to be under low free space conditions in the scan. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-24xfs: add scan owner field to xfs_eofblocksBrian Foster
From: Brian Foster <bfoster@redhat.com> The scan owner field represents an optional inode number that is responsible for the current scan. The purpose is to identify that an inode is under iolock and as such, the iolock shouldn't be attempted when trimming eofblocks. This is an internal only field. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-06-25xfs: global error sign conversionDave Chinner
Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-04-14xfs: remove unused pag ptr arg from iterator execute functionsEric Sandeen
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-09-12Merge tag 'xfs-for-linus-v3.12-rc1-2' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs update #2 from Ben Myers: "Here we have defrag support for v5 superblock, a number of bugfixes and a cleanup or two. - defrag support for CRC filesystems - fix endian worning in xlog_recover_get_buf_lsn - fixes for sparse warnings - fix for assert in xfs_dir3_leaf_hdr_from_disk - fix for log recovery of remote symlinks - fix for log recovery of btree root splits - fixes formemory allocation failures with ACLs - fix for assert in xfs_buf_item_relse - fix for assert in xfs_inode_buf_verify - fix an assignment in an assert that should be a test in xfs_bmbt_change_owner - remove dead code in xlog_recover_inode_pass2" * tag 'xfs-for-linus-v3.12-rc1-2' of git://oss.sgi.com/xfs/xfs: xfs: remove dead code from xlog_recover_inode_pass2 xfs: = vs == typo in ASSERT() xfs: don't assert fail on bad inode numbers xfs: aborted buf items can be in the AIL. xfs: factor all the kmalloc-or-vmalloc fallback allocations xfs: fix memory allocation failures with ACLs xfs: ensure we copy buffer type in da btree root splits xfs: set remote symlink buffer type for recovery xfs: recovery of swap extents operations for CRC filesystems xfs: swap extents operations for CRC filesystems xfs: check magic numbers in dir3 leaf verifier first xfs: fix some minor sparse warnings xfs: fix endian warning in xlog_recover_get_buf_lsn()
2013-09-10shrinker: convert superblock shrinkers to new APIDave Chinner
Convert superblock shrinker to use the new count/scan API, and propagate the API changes through to the filesystem callouts. The filesystem callouts already use a count/scan API, so it's just changing counters to longs to match the VM API. This requires the dentry and inode shrinker callouts to be converted to the count/scan API. This is mainly a mechanical change. [glommer@openvz.org: use mult_frac for fractional proportions, build fixes] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10xfs: recovery of swap extents operations for CRC filesystemsDave Chinner
This is the recovery side of the btree block owner change operation performed by swapext on CRC enabled filesystems. We detect that an owner change is needed by the flag that has been placed on the inode log format flag field. Because the inode recovery is being replayed after the buffers that make up the BMBT in the given checkpoint, we can walk all the buffers and directly modify them when we see the flag set on an inode. Because the inode can be relogged and hence present in multiple chekpoints with the "change owner" flag set, we could do multiple passes across the inode to do this change. While this isn't optimal, we can't directly ignore the flag as there may be multiple independent swap extent operations being replayed on the same inode in different checkpoints so we can't ignore them. Further, because the owner change operation uses ordered buffers, we might have buffers that are newer on disk than the current checkpoint and so already have the owner changed in them. Hence we cannot just peek at a buffer in the tree and check that it has the correct owner and assume that the change was completed. So, for the moment just brute force the owner change every time we see an inode with the flag set. Note that we have to be careful here because the owner of the buffers may point to either the old owner or the new owner. Currently the verifier can't verify the owner directly, so there is no failure case here right now. If we verify the owner exactly in future, then we'll have to take this into account. This was tested in terms of normal operation via xfstests - all of the fsr tests now pass without failure. however, we really need to modify xfs/227 to stress v3 inodes correctly to ensure we fully cover this case for v5 filesystems. In terms of recovery testing, I used a hacked version of xfs_fsr that held the temp inode open for a few seconds before exiting so that the filesystem could be shut down with an open owner change recovery flags set on at least the temp inode. fsr leaves the temp inode unlinked and in btree format, so this was necessary for the owner change to be reliably replayed. logprint confirmed the tmp inode in the log had the correct flag set: INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88 INODE: #regs:3 ino:0x44 flags:0x209 dsize:88 ^^^^^ 0x200 is set, indicating a data fork owner change needed to be replayed on inode 0x44. A printk in the revoery code confirmed that the inode change was recovered: XFS (vdc): Mounting Filesystem XFS (vdc): Starting recovery (logdev: internal) recovering owner change ino 0x44 XFS (vdc): Version 5 superblock detected. This kernel L support enabled! Use of these features in this kernel is at your own risk! XFS (vdc): Ending recovery (logdev: internal) The script used to test this was: $ cat ./recovery-fsr.sh #!/bin/bash dev=/dev/vdc mntpt=/mnt/scratch testfile=$mntpt/testfile umount $mntpt mkfs.xfs -f -m crc=1 $dev mount $dev $mntpt chmod 777 $mntpt for i in `seq 10000 -1 0`; do xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1 done xfs_bmap -vp $testfile |head -20 xfs_fsr -d -v $testfile & sleep 10 /home/dave/src/xfstests-dev/src/godown -f $mntpt wait umount $mntpt xfs_logprint -t $dev |tail -20 time mount $dev $mntpt xfs_bmap -vp $testfile umount $mntpt $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-15xfs: create internal eofblocks structure with kuid_t typesDwight Engen
Have eofblocks ioctl convert uid_t to kuid_t into internal structure. Update internal filter matching to compare ids with kuid_t types. Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-08-12xfs: move unrelated definitions out of xfs_inode.hDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-26xfs: Remove dead function prototype xfs_sync_inode_grab()Jie Liu
Remove dead function prototype xfs_sync_inode_grab() from xfs_icache.h. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-08xfs: add background scanning to clear eofblocks inodesBrian Foster
Create a new mount workqueue and delayed_work to enable background scanning and freeing of eofblocks inodes. The scanner kicks in once speculative preallocation occurs and stops requeueing itself when no eofblocks inodes exist. The scan interval is based on the new 'speculative_prealloc_lifetime' tunable (default to 5m). The background scanner performs unfiltered, best effort scans (which skips inodes under lock contention or with a dirty cache mapping). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-08xfs: add XFS_IOC_FREE_EOFBLOCKS ioctlBrian Foster
The XFS_IOC_FREE_EOFBLOCKS ioctl allows users to invoke an EOFBLOCKS scan. The xfs_eofblocks structure is defined to support the command parameters (scan mode). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-08xfs: create function to scan and clear EOFBLOCKS inodesBrian Foster
xfs_inodes_free_eofblocks() implements scanning functionality for EOFBLOCKS inodes. It uses the AG iterator to walk the tagged inodes and free post-EOF blocks via the xfs_inode_free_eofblocks() execute function. The scan can be invoked in best-effort mode or wait (force) mode. A best-effort scan (default) handles all inodes that do not have a dirty cache and we successfully acquire the io lock via trylock. In wait mode, we continue to cycle through an AG until all inodes are handled. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-08xfs: support a tag-based inode_ag_iteratorBrian Foster
Genericize xfs_inode_ag_walk() to support an optional radix tree tag and args argument for the execute function. Create a new wrapper called xfs_inode_ag_iterator_tag() that performs a tag based walk of perag's and inodes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-08xfs: add EOFBLOCKS inode tagging/untaggingBrian Foster
Add the XFS_ICI_EOFBLOCKS_TAG inode tag to identify inodes with speculatively preallocated blocks beyond EOF. An inode is tagged when speculative preallocation occurs and untagged either via truncate down or when post-EOF blocks are freed via release or reclaim. The tag management is intentionally not aggressive to prefer simplicity over the complexity of handling all the corner cases under which post-EOF blocks could be freed (i.e., forward truncation, fallocate, write error conditions, etc.). This means that a tagged inode may or may not have post-EOF blocks after a period of time. The tag is eventually cleared when the inode is released or reclaimed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-10-17xfs: remove xfs_iget.cDave Chinner
The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c along with the other inode cache functions. This removes all functionality from xfs_iget.c, so the file can simply be removed. This move results in various functions now only having the scope of a single file (e.g. xfs_inode_free()), so clean up all the definitions and exported prototypes in xfs_icache.[ch] and xfs_inode.h appropriately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-10-17xfs: rename xfs_sync.[ch] to xfs_icache.[ch]Dave Chinner
xfs_sync.c now only contains inode reclaim functions and inode cache iteration functions. It is not related to sync operations anymore. Rename to xfs_icache.c to reflect it's contents and prepare for consolidation with the other inode cache file that exists (xfs_iget.c). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>