summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-03-13Merge branch 'xfs-O_TMPFILE-support' into for-nextDave Chinner
Conflicts: fs/xfs/xfs_trans_resv.c - fix for XFS_INODE_CLUSTER_SIZE macro removal
2014-03-13Merge branch 'xfs-bug-fixes-for-3.15-2' into for-nextDave Chinner
2014-03-13Merge branch 'xfs-verifier-cleanup' into for-nextDave Chinner
2014-03-13Merge branch 'xfs-stack-fixes' into for-nextDave Chinner
2014-03-13Merge branch 'xfs-collapse-range' into for-nextDave Chinner
2014-03-13xfs: Add support for FALLOC_FL_ZERO_RANGELukas Czerner
Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. We can also preallocate blocks past EOF in the same was as with fallocate. Flag FALLOC_FL_KEEP_SIZE will cause the inode size to remain the same even if we preallocate blocks past EOF. It uses the same code to zero range as it is used by the XFS_IOC_ZERO_RANGE ioctl. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-03-13fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocateLukas Czerner
Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents - even though file system may choose to zero out the extent or do whatever which will result in reading zeros from the range while the range remains allocated for the file. This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-03-07xfs: inode log reservations are still too smallDave Chinner
Back in commit 23956703 ("xfs: inode log reservations are too small"), the reservation size was increased to take into account the difference in size between the in-memory BMBT block headers and the on-disk BMDR headers. This solved a transaction overrun when logging the inode size. Recently, however, we've seen a number of these same overruns on kernels with the above fix in it. All of them have been by 4 bytes, so we must still not be accounting for something correctly. Through inspection it turns out the above commit didn't take into account everything it should have. That is, it only accounts for a single log op_hdr structure, when it can actually require up to four op_hdrs - one for each region (log iovec) that is formatted. These regions are the inode log format header, the inode core, and the two forks that can be held in the literal area of the inode. This means we are not accounting for 36 bytes of log space that the transaction can use, and hence when we get inodes in certain formats with particular fragmentation patterns we can overrun the transaction. Fix this by adding the correct accounting for log op_headers in the transaction. Tested-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-03-07xfs: xfs_check_page_type buffer checks need helpDave Chinner
xfs_aops_discard_page() was introduced in the following commit: xfs: truncate delalloc extents when IO fails in writeback ... to clean up left over delalloc ranges after I/O failure in ->writepage(). generic/224 tests for this scenario and occasionally reproduces panics on sub-4k blocksize filesystems. The cause of this is failure to clean up the delalloc range on a page where the first buffer does not match one of the expected states of xfs_check_page_type(). If a buffer is not unwritten, delayed or dirty&mapped, xfs_check_page_type() stops and immediately returns 0. The stress test of generic/224 creates a scenario where the first several buffers of a page with delayed buffers are mapped & uptodate and some subsequent buffer is delayed. If the ->writepage() happens to fail for this page, xfs_aops_discard_page() incorrectly skips the entire page. This then causes later failures either when direct IO maps the range and finds the stale delayed buffer, or we evict the inode and find that the inode still has a delayed block reservation accounted to it. We can easily fix this xfs_aops_discard_page() failure by making xfs_check_page_type() check all buffers, but this breaks xfs_convert_page() more than it is already broken. Indeed, xfs_convert_page() wants xfs_check_page_type() to tell it if the first buffers on the pages are of a type that can be aggregated into the contiguous IO that is already being built. xfs_convert_page() should not be writing random buffers out of a page, but the current behaviour will cause it to do so if there are buffers that don't match the current specification on the page. Hence for xfs_convert_page() we need to: a) return "not ok" if the first buffer on the page does not match the specification provided to we don't write anything; and b) abort it's buffer-add-to-io loop the moment we come across a buffer that does not match the specification. Hence we need to fix both xfs_check_page_type() and xfs_convert_page() to work correctly with pages that have mixed buffer types, whilst allowing xfs_aops_discard_page() to scan all buffers on the page for a type match. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-03-07xfs: avoid AGI/AGF deadlock scenario for inode chunk allocationBrian Foster
The inode chunk allocation path can lead to deadlock conditions if a transaction is dirtied with an AGF (to fix up the freelist) for an AG that cannot satisfy the actual allocation request. This code path is written to try and avoid this scenario, but it can be reproduced by running xfstests generic/270 in a loop on a 512b fs. An example situation is: - process A attempts an inode allocation on AG 3, modifies the freelist, fails the allocation and ultimately moves on to AG 0 with the AG 3 AGF held - process B is doing a free space operation (i.e., truncate) and acquires the AG 0 AGF, waits on the AG 3 AGF - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock) The problem here is that process A acquired the AG 3 AGF while moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF held). xfs_dialloc() makes one pass through each of the AGs when attempting to allocate an inode chunk. The expectation is a clean transaction if a particular AG cannot satisfy the allocation request. xfs_ialloc_ag_alloc() is written to support this through use of the minalignslop allocation args field. When using the agi->agi_newino optimization, we attempt an exact bno allocation request based on the location of the previously allocated chunk. minalignslop is set to inform the allocator that we will require alignment on this chunk, and thus to not allow the request for this AG if the extra space is not available. Suppose that the AG in question has just enough space for this request, but not at the requested bno. xfs_alloc_fix_freelist() will proceed as normal as it determines the request should succeed, and thus it is allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails because the requested bno is not available. In response, the caller moves on to a NEAR_BNO allocation request for the same AG. The alignment is set, but the minalignslop field is never reset. This increases the overall requirement of the request from the first attempt. If this delta is the difference between allocation success and failure for the AG, xfs_alloc_fix_freelist() rejects this request outright the second time around and causes the allocation request to unnecessarily fail for this AG. To address this situation, reset the minalignslop field immediately after use and prevent it from leaking into subsequent requests. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-03-07xfs: use NOIO contexts for vm_map_ramDave Chinner
When we map pages in the buffer cache, we can do so in GFP_NOFS contexts. However, the vmap interfaces do not provide any method of communicating this information to memory reclaim, and hence we get lockdep complaining about it regularly and occassionally see hangs that may be vmap related reclaim deadlocks. We can also see these same problems from anywhere where we use vmalloc for a large buffer (e.g. attribute code) inside a transaction context. A typical lockdep report shows up as a reclaim state warning like so: [14046.101458] ================================= [14046.102850] [ INFO: inconsistent lock state ] [14046.102850] 3.14.0-rc4+ #2 Not tainted [14046.102850] --------------------------------- [14046.102850] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [14046.102850] kswapd0/14 [HC0[0]:SC0[0]:HE1:SE1] takes: [14046.102850] (&xfs_dir_ilock_class){++++?+}, at: [<791a04bb>] xfs_ilock+0xff/0x16a [14046.102850] {RECLAIM_FS-ON-W} state was registered at: [14046.102850] [<7904cdb1>] mark_held_locks+0x81/0xe7 [14046.102850] [<7904d390>] lockdep_trace_alloc+0x5c/0xb4 [14046.102850] [<790c2c28>] kmem_cache_alloc_trace+0x2b/0x11e [14046.102850] [<790ba7f4>] vm_map_ram+0x119/0x3e6 [14046.102850] [<7914e124>] _xfs_buf_map_pages+0x5b/0xcf [14046.102850] [<7914ed74>] xfs_buf_get_map+0x67/0x13f [14046.102850] [<7917506f>] xfs_attr_rmtval_set+0x396/0x4d5 [14046.102850] [<7916e8bb>] xfs_attr_leaf_addname+0x18f/0x37d [14046.102850] [<7916ed9e>] xfs_attr_set_int+0x2f5/0x3e8 [14046.102850] [<7916eefc>] xfs_attr_set+0x6b/0x74 [14046.102850] [<79168355>] xfs_xattr_set+0x61/0x81 [14046.102850] [<790e5b10>] generic_setxattr+0x59/0x68 [14046.102850] [<790e4c06>] __vfs_setxattr_noperm+0x58/0xce [14046.102850] [<790e4d0a>] vfs_setxattr+0x8e/0x92 [14046.102850] [<790e4ddd>] setxattr+0xcf/0x159 [14046.102850] [<790e5423>] SyS_lsetxattr+0x88/0xbb [14046.102850] [<79268438>] sysenter_do_call+0x12/0x36 Now, we can't completely remove these traces - mainly because vm_map_ram() will do GFP_KERNEL allocation and that generates the above warning before we get into the reclaim code, but we can turn them all into false positive warnings. To do that, use the method that DM and other IO context code uses to avoid this problem: there is a process flag to tell memory reclaim not to do IO that we can set appropriately. That prevents GFP_KERNEL context reclaim being done from deep inside the vmalloc code in places we can't directly pass a GFP_NOFS context to. That interface has a pair of wrapper functions: memalloc_noio_save() and memalloc_noio_restore(). Adding them around vm_map_ram and the vzalloc call in kmem_alloc_large() will prevent deadlocks and most lockdep reports for this issue. Also, convert the vzalloc() call in kmem_alloc_large() to use __vmalloc() so that we can pass the correct gfp context to the data page allocation routine inside __vmalloc() so that it is clear that GFP_NOFS context is important to this vmalloc call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-03-07xfs: don't leak EFSBADCRC to userspaceDave Chinner
While the verifier routines may return EFSBADCRC when a buffer has a bad CRC, we need to translate that to EFSCORRUPTED so that the higher layers treat the error appropriately and we return a consistent error to userspace. This fixes a xfs/005 regression. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: fix directory inode iolock lockdep false positiveDave Chinner
The change to add the IO lock to protect the directory extent map during readdir operations has cause lockdep to have a heart attack as it now sees a different locking order on inodes w.r.t. the mmap_sem because readdir has a different ordering to write(). Add a new lockdep class for directory inodes to avoid this false positive. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: allocate xfs_da_args to reduce stack footprintDave Chinner
The struct xfs_da_args used to pass directory/attribute operation information to the lower layers is 128 bytes in size and is allocated on the stack. Dynamically allocate them to reduce the stack footprint of directory operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: always do log forces via the workqueueDave Chinner
Log forces can occur deep in the call chain when we have relatively little stack free. Log forces can also happen at close to the call chain leaves (e.g. xfs_buf_lock()) and hence we can trigger IO from places where we really don't want to add more stack overhead. This stack overhead occurs because log forces do foreground CIL pushes (xlog_cil_push_foreground()) rather than waking the background push wq and waiting for the for the push to complete. This foreground push was done to avoid confusing the CFQ Io scheduler when fsync()s were issued, as it has trouble dealing with dependent IOs being issued from different process contexts. Avoiding blowing the stack is much more critical than performance optimisations for CFQ, especially as we've been recommending against the use of CFQ for XFS since 3.2 kernels were release because of it's problems with multi-threaded IO workloads. Hence convert xlog_cil_push_foreground() to move the push work to the CIL workqueue. We already do the waiting for the push to complete in xlog_cil_force_lsn(), so there's nothing else we need to modify to make this work. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: modify verifiers to differentiate CRC from other errorsEric Sandeen
Modify all read & write verifiers to differentiate between CRC errors and other inconsistencies. This sets the appropriate error number on bp->b_error, and then calls xfs_verifier_error() if something went wrong. That function will issue the appropriate message to the user. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: print useful caller information in xfs_error_reportEric Sandeen
xfs_error_report used to just print the hex address of the caller; %pF will give us something more human-readable. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: add xfs_verifier_error()Eric Sandeen
We want to distinguish between corruption, CRC errors, etc. In addition, the full stack trace on verifier errors seems less than helpful; it looks more like an oops than corruption. Create a new function to specifically alert the user to verifier errors, which can differentiate between EFSCORRUPTED and CRC mismatches. It doesn't dump stack unless the xfs error level is turned up high. Define a new error message (EFSBADCRC) to clearly identify CRC errors. (Defined to EBADMSG, bad message) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: add helper for updating checksums on xfs_bufsEric Sandeen
Many/most callers of xfs_update_cksum() pass bp->b_addr and BBTOB(bp->b_length) as the first 2 args. Add a helper which can just accept the bp and the crc offset, and work it out on its own, for brevity. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: add helper for verifying checksums on xfs_bufsEric Sandeen
Many/most callers of xfs_verify_cksum() pass bp->b_addr and BBTOB(bp->b_length) as the first 2 args. Add a helper which can just accept the bp and the crc offset, and work it out on its own, for brevity. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: Use defines for CRC offsets in all casesEric Sandeen
Some calls to crc functions used useful #defines, others used awkward offsetof() constructs. Switch them all to #define to make things a bit cleaner. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27xfs: skip pointless CRC updates after verifier failuresEric Sandeen
Most write verifiers don't update CRCs after the verifier has failed and the buffer has been marked in error. These two didn't, but should. Add returns to the verifier failure block, since the buffer won't be written anyway. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-24xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocateNamjae Jeon
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for XFS. The semantics of this flag are following: 1) It collapses the range lying between offset and length by removing any data blocks which are present in this range and than updates all the logical offsets of extents beyond "offset + len" to nullify the hole created by removing blocks. In short, it does not leave a hole. 2) It should be used exclusively. No other fallocate flag in combination. 3) Offset and length supplied to fallocate should be fs block size aligned in case of xfs and ext4. 4) Collaspe range does not work beyond i_size. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-24fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocateNamjae Jeon
This patch is in response of the following post: http://lwn.net/Articles/556136/ "ext4: introduce two new ioctls" Dave chinner suggested that truncate_block_range (which was one of the ioctls name) should be a fallocate operation and not any fs specific ioctl, hence we add this functionality to new flags of fallocate. This new functionality of collapsing range could be used by media editing tools which does non linear editing to quickly purge and edit parts of a media file. This will immensely improve the performance of these operations. The limitation of fs block size aligned offsets can be easily handled by media codecs which are encapsulated in a conatiner as they have to just change the offset to next keyframe value to match the proper alignment. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-20Merge remote-tracking branch 'xfs-async-aio-extend' into for-nextDave Chinner
2014-02-20Merge branch 'xfs-fixes-for-3.15' into for-nextDave Chinner
2014-02-19xfs: limit superblock corruption errors to actual corruptionEric Sandeen
Today, if xfs_sb_read_verify xfs_sb_verify xfs_mount_validate_sb detects superblock corruption, it'll be extremely noisy, dumping 2 stacks, 2 hexdumps, etc. This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb as well as in xfs_sb_read_verify. Also, *any* errors in xfs_mount_validate_sb which are not corruption per se; things like too-big-blocksize, bad version, bad magic, v1 dirs, rw-incompat etc - things which do not return EFSCORRUPTED - will still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify sees any error at all. And it suggests to the user that they should run xfs_repair, even if the root cause of the mount failure is a simple incompatibility. I'll submit that the probably-not-corrupted errors don't warrant this much noise, so this patch removes the warning for anything other than EFSCORRUPTED returns, and replaces the lower-level XFS_CORRUPTION_ERROR with an xfs_notice(). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19xfs: skip verification on initial "guess" superblock readEric Sandeen
When xfs_readsb() does the very first read of the superblock, it makes a guess at the length of the buffer, based on the sector size of the underlying storage. This may or may not match the filesystem sector size in sb_sectsize, so we can't i.e. do a CRC check on it; it might be too short. In fact, mounting a filesystem with sb_sectsize larger than the device sector size will cause a mount failure if CRCs are enabled, because we are checksumming a length which exceeds the buffer passed to it. So always read twice; the first time we read with NULL buffer ops to skip verification; then set the proper read length, hook up the proper verifier, and give it another go. Once we are sure that we've got the right buffer length, we can also use bp->b_length in the xfs_sb_read_verify, rather than the less-trusted on-disk sectorsize for secondary superblocks. Before this we ran the risk of passing junk to the crc32c routines, which didn't always handle extreme values. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19MAINTAINERS: SGI no longer maintaining XFSBen Myers
SGI is stepping out of maintainer roles for xfs, xfsprogs, xfsdump, and xfstests. This removes me from the MAINTAINERS entry. Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sbEric Sandeen
My earlier commit 10e6e65 deserves a layer or two of brown paper bags. The logic in that commit means that a CRC failure on the primary superblock will *never* result in an error return. Hopefully this fixes it, so that we always return the error if it's a primary superblock, otherwise only if the filesystem has CRCs enabled. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2014-02-10xfs: ensure correct log item buffer alignmentDave Chinner
On 32 bit platforms, the log item vector headers are not 64 bit aligned or sized. hence if we don't take care to align them correctly or pad the buffer appropriately for 8 byte alignment, we can end up with alignment issues when accessing the user buffer directly as a structure. To solve this, simply pad the buffer headers to 64 bit offset so that the data section is always 8 byte aligned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reported-by: Michael L. Semon <mlsemon35@gmail.com> Tested-by: Michael L. Semon <mlsemon35@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10xfs: ensure correct timestamp updates from truncateChristoph Hellwig
The VFS doesn't set the proper ATTR_CTIME and ATTR_MTIME values for truncate, so filesystems have to manually add them. The introduction of xfs_setattr_time accidentally broke this special case an caused a regression in generic/313. Fix this by removing the local mask variable in xfs_setattr_size so that we only have a single place to keep the attribute information. cc: <stable@vger.kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10xfs: allow appending aio writesChristoph Hellwig
XFS can easily support appending aio writes by ensuring we always allocate blocks as unwritten extents when performing direct I/O writes and only converting them to written extents at I/O completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10xfs: always use unwritten extents for direct I/O writesChristoph Hellwig
To allow aio writes beyond i_size we need to create unwritten extents for newly allocated blocks, similar to how we already do inside i_size. Instead of adding another special case we now use unwritten extents unconditionally. This also marks the end of directly allocation data extents in all of XFS - we now always use either delalloc or unwritten extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10direct-io: add flag to allow aio writes beyond i_sizeChristoph Hellwig
Some filesystems can handle direct I/O writes beyond i_size safely, so allow them to opt into receiving them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07xfs: remove XFS_TRANS_DEBUG dead codeJie Liu
Remove the leftover XFS_TRANS_DEBUG dead code following the previous cleaning up of it in commits ec47eb6b0b450. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07xfs: return -E2BIG if hit the maximum size limits of ACLsJie Liu
We should return -E2BIG rather than -EINVAL if hit the maximum size limits of ACLS, as the former is consistent with VFS xattr syscalls. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07xfs: sanitize sb_inopblock in xfs_mount_validate_sbEric Sandeen
xfs_mount_validate_sb doesn't check sb_inopblock for sanity (as does its xfs_repair counterpart, FWIW). If it's out of bounds, we can go off the rails in i.e. xfs_inode_buf_verify(), which uses sb_inopblock as a loop limit when stepping through a metadata buffer. The problem can be demonstrated easily by corrupting sb_inopblock with xfs_db and trying to mount the result: # mkfs.xfs -dfile,name=fsfile,size=1g # xfs_db -x fsfile xfs_db> sb 0 xfs_db> write inopblock 512 inopblock = 512 xfs_db> quit # mount -o loop fsfile mnt and we blow up in xfs_inode_buf_verify(). With this patch, we get a (very noisy) corruption error, and fail the mount as we should. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07xfs: convert xfs_log_commit_cil() to voidJie Liu
Convert xfs_log_commit_cil() to a void function since it return nothing but 0 in any case, after that we can simplify the relative code logic in xfs_trans_commit() accordingly. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07xfs: use tr_qm_dqalloc log reservation for dquot allocBrian Foster
The dquot allocation path in xfs_qm_dqread() currently uses the attribute set log reservation, which appears to be incorrect. We have reports of transaction reservation overruns with the current code. E.g., a repeated run of xfstests test generic/270 on a 512b block size fs occassionally produces the following in dmesg: XFS (sdN): xlog_write: reservation summary: trans type = QM_DQALLOC (30) unit res = 7080 bytes current res = -632 bytes total reg = 0 bytes (o/flow = 0 bytes) ophdrs = 0 (ophdr space = 0 bytes) ophdr + reg = 0 bytes num regions = 0 XFS (sdN): xlog_write: reservation ran out. Need to up reservation The dquot allocation case should consist of a write reservation (i.e., we are allocating a range of the internal quota file) plus the size of the actual dquots. We already have a log reservation definition for this operation (tr_qm_dqalloc). Use it in xfs_qm_dqread() and update the log reservation calculation function to use the write res. calculation function rather than reading the assumed to be pre-calculated value directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07xfs: remove unused tr_swriteEric Sandeen
tr_swrite is never used, remove it. From a very quick look, I think the usage of it (and its ancestor XFS_SWRITE_LOG_RES) went away in commit 13e6d5cd "xfs: merge fsync and O_SYNC handling" back in 2009. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07xfs: use tr_growrtalloc for growing rt filesBrian Foster
This is a regression from the following commit: 3d3c8b5222b9 xfs: refactor xfs_trans_reserve() interface Use the tr_growrtalloc log reservation for growing the bitmap/summary files. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-02Linus 3.14-rc1Linus Torvalds
2014-02-02Merge branch 'parisc-3.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "The three major changes in this patchset is a implementation for flexible userspace memory maps, cache-flushing fixes (again), and a long-discussed ABI change to make EWOULDBLOCK the same value as EAGAIN. parisc has been the only platform where we had EWOULDBLOCK != EAGAIN to keep HP-UX compatibility. Since we will probably never implement full HP-UX support, we prefer to drop this compatibility to make it easier for us with Linux userspace programs which mostly never checked for both values. We don't expect major fall-outs because of this change, and if we face some, we will simply rebuild the necessary applications in the debian archives" * 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: add flexible mmap memory layout support parisc: Make EWOULDBLOCK be equal to EAGAIN on parisc parisc: convert uapi/asm/stat.h to use native types only parisc: wire up sched_setattr and sched_getattr parisc: fix cache-flushing parisc/sti_console: prefer Linux fonts over built-in ROM fonts
2014-02-02hpfs: optimize quad buffer loadingMikulas Patocka
HPFS needs to load 4 consecutive 512-byte sectors when accessing the directory nodes or bitmaps. We can't switch to 2048-byte block size because files are allocated in the units of 512-byte sectors. Previously, the driver would allocate a 2048-byte area using kmalloc, copy the data from four buffers to this area and eventually copy them back if they were modified. In the current implementation of the buffer cache, buffers are allocated in the pagecache. That means that 4 consecutive 512-byte buffers are stored in consecutive areas in the kernel address space. So, we don't need to allocate extra memory and copy the content of the buffers there. This patch optimizes the code to avoid copying the buffers. It checks if the four buffers are stored in contiguous memory - if they are not, it falls back to allocating a 2048-byte area and copying data there. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-02hpfs: remember free spaceMikulas Patocka
Previously, hpfs scanned all bitmaps each time the user asked for free space using statfs. This patch changes it so that hpfs scans the bitmaps only once, remembes the free space and on next invocation of statfs it returns the value instantly. New versions of wine are hammering on the statfs syscall very heavily, making some games unplayable when they're stored on hpfs, with load times in minutes. This should be backported to the stable kernels because it fixes user-visible problem (excessive level load times in wine). Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-02parisc: add flexible mmap memory layout supportHelge Deller
Add support for the flexible mmap memory layout (as described in http://lwn.net/Articles/91829). This is especially very interesting on parisc since we currently only support 32bit userspace (even with a 64bit Linux kernel). Signed-off-by: Helge Deller <deller@gmx.de>
2014-02-02parisc: Make EWOULDBLOCK be equal to EAGAIN on pariscGuy Martin
On Linux, only parisc uses a different value for EWOULDBLOCK which causes a lot of troubles for applications not checking for both values. Since the hpux compat is long dead, make EWOULDBLOCK behave the same as all other architectures. Signed-off-by: Guy Martin <gmsoft@tuxicoman.be> Signed-off-by: Helge Deller <deller@gmx.de>
2014-02-02parisc: convert uapi/asm/stat.h to use native types onlyHelge Deller
The stat.h header file is exported to userspace. Some userspace applications failed to compile due to missing/unknown types, so we better convert it to use native types only (like it's done on other architectures too). Signed-off-by: Helge Deller <deller@gmx.de>
2014-02-02parisc: wire up sched_setattr and sched_getattrHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>