summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-09-01ext4: improve extents status tree trace pointZheng Liu
This commit improves the trace point of extents status tree. We rename trace_ext4_es_shrink_enter in ext4_es_count() because it is also used in ext4_es_scan() and we can not identify them from the result. Further this commit fixes a variable name in trace point in order to keep consistency with others. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: fix comments about get_blocksSeunghun Lee
get_blocks is renamed to get_block. Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-02xfs: trim eofblocks before collapse rangeBrian Foster
xfs_collapse_file_space() currently writes back the entire file undergoing collapse range to settle things down for the extent shift algorithm. While this prevents changes to the extent list during the collapse operation, the writeback itself is not enough to prevent unnecessary collapse failures. The current shift algorithm uses the extent index to iterate the in-core extent list. If a post-eof delalloc extent persists after the writeback (e.g., a prior zero range op where the end of the range aligns with eof can separate the post-eof blocks such that they are not written back and converted), xfs_bmap_shift_extents() becomes confused over the encoded br_startblock value and fails the collapse. As with the full writeback, this is a temporary fix until the algorithm is improved to cope with a volatile extent list and avoid attempts to shift post-eof extents. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02xfs: xfs_file_collapse_range is delalloc challengedDave Chinner
If we have delalloc extents on a file before we run a collapse range opertaion, we sync the range that we are going to collapse to convert delalloc extents in that region to real extents to simplify the shift operation. However, the shift operation then assumes that the extent list is not going to change as it iterates over the extent list moving things about. Unfortunately, this isn't true because we can't hold the ILOCK over all the operations. We can prevent new IO from modifying the extent list by holding the IOLOCK, but that doesn't prevent writeback from running.... And when writeback runs, it can convert delalloc extents is the range of the file prior to the region being collapsed, and this changes the indexes of all the extents in the file. That causes the collapse range operation to Go Bad. The right fix is to rewrite the extent shift operation not to be dependent on the extent list not changing across the entire operation, but this is a fairly significant piece of work to do. Hence, as a short-term workaround for the problem, sync the entire file before starting a collapse operation to remove all delalloc ranges from the file and so avoid the problem of concurrent writeback changing the extent list. Diagnosed-and-Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02xfs: don't log inode unless extent shift makes extent modificationsBrian Foster
The file collapse mechanism uses xfs_bmap_shift_extents() to collapse all subsequent extents down into the specified, previously punched out, region. This function performs some validation, such as whether a sufficient hole exists in the target region of the collapse, then shifts the remaining exents downward. The exit path of the function currently logs the inode unconditionally. While we must log the inode (and abort) if an error occurs and the transaction is dirty, the initial validation paths can generate errors before the transaction has been dirtied. This creates an unnecessary filesystem shutdown scenario, as the caller will cancel a transaction that has been marked dirty. Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications are made to the inode bmap. Only log the inode in the exit path if logflags has been set. This ensures we only have to cancel a dirty transaction if modifications have been made and prevents an unnecessary filesystem shutdown otherwise. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02xfs: use ranged writeback and invalidation for direct IODave Chinner
Now we are not doing silly things with dirtying buffers beyond EOF and using invalidation correctly, we can finally reduce the ranges of writeback and invalidation used by direct IO to match that of the IO being issued. Bring the writeback and invalidation ranges back to match the generic direct IO code - this will greatly reduce the perturbation of cached data when direct IO and buffered IO are mixed, but still provide the same buffered vs direct IO coherency behaviour we currently have. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02xfs: don't zero partial page cache pages during O_DIRECT writesDave Chinner
Similar to direct IO reads, direct IO writes are using truncate_pagecache_range to invalidate the page cache. This is incorrect due to the sub-block zeroing in the page cache that truncate_pagecache_range() triggers. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. cc: stable@vger.kernel.org Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02xfs: don't zero partial page cache pages during O_DIRECT writesChris Mason
xfs is using truncate_pagecache_range to invalidate the page cache during DIO reads. This is different from the other filesystems who only invalidate pages during DIO writes. truncate_pagecache_range is meant to be used when we are freeing the underlying data structs from disk, so it will zero any partial ranges in the page. This means a DIO read can zero out part of the page cache page, and it is possible the page will stay in cache. buffered reads will find an up to date page with zeros instead of the data actually on disk. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. [dchinner: catch error and warn if it fails. Comment.] cc: stable@vger.kernel.org Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-02xfs: don't dirty buffers beyond EOFDave Chinner
generic/263 is failing fsx at this point with a page spanning EOF that cannot be invalidated. The operations are: 1190 mapwrite 0x52c00 thru 0x5e569 (0xb96a bytes) 1191 mapread 0x5c000 thru 0x5d636 (0x1637 bytes) 1192 write 0x5b600 thru 0x771ff (0x1bc00 bytes) where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO write attempts to invalidate the cached page over this range, it fails with -EBUSY and so any attempt to do page invalidation fails. The real question is this: Why can't that page be invalidated after it has been written to disk and cleaned? Well, there's data on the first two buffers in the page (1k block size, 4k page), but the third buffer on the page (i.e. beyond EOF) is failing drop_buffers because it's bh->b_state == 0x3, which is BH_Uptodate | BH_Dirty. IOWs, there's dirty buffers beyond EOF. Say what? OK, set_buffer_dirty() is called on all buffers from __set_page_buffers_dirty(), regardless of whether the buffer is beyond EOF or not, which means that when we get to ->writepage, we have buffers marked dirty beyond EOF that we need to clean. So, we need to implement our own .set_page_dirty method that doesn't dirty buffers beyond EOF. This is messy because the buffer code is not meant to be shared and it has interesting locking issues on the buffer dirty bits. So just copy and paste it and then modify it to suit what we need. Note: the solutions the other filesystems and generic block code use of marking the buffers clean in ->writepage does not work for XFS. It still leaves dirty buffers beyond EOF and invalidations still fail. Hence rather than play whack-a-mole, this patch simply prevents those buffers from being dirtied in the first place. cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-09-01ext4: enable block_validity by defaultDarrick J. Wong
Enable by default the block_validity feature, which checks for collisions between newly allocated blocks and critical system metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint()Theodore Ts'o
__wait_cp_io() is only called by jbd2_log_do_checkpoint(). Fold it in to make it a bit easier to understand. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01jbd2: fold __process_buffer() into jbd2_log_do_checkpoint()Theodore Ts'o
__process_buffer() is only called by jbd2_log_do_checkpoint(), and it had a very complex locking protocol where it would be called with the j_list_lock, and sometimes exit with the lock held (if the return code was 0), or release the lock. This was confusing both to humans and to smatch (which erronously complained that the lock was taken twice). Folding __process_buffer() to the caller allows us to simplify the control flow, making the resulting function easier to read and reason about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150 bytes (over 4% of the text size). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-09-01ext4: rename ext4_ext_find_extent() to ext4_find_extent()Theodore Ts'o
Make the function name less redundant. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: reuse path object in ext4_move_extents()Theodore Ts'o
Reuse the path object in ext4_move_extents() so we don't unnecessarily free and reallocate it. Also clean up the get_ext_path() wrapper so that it has the same semantics of freeing the path object on error as ext4_ext_find_extent(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: reuse path object in ext4_ext_shift_extents()Theodore Ts'o
Now that the semantics of ext4_ext_find_extent() are much cleaner, it's safe and more efficient to reuse the path object across the multiple calls to ext4_ext_find_extent() in ext4_ext_shift_extents(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: teach ext4_ext_find_extent() to realloc path if necessaryTheodore Ts'o
This adds additional safety in case for some reason we end reusing a path structure which isn't big enough for current depth of the inode. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: allow a NULL argument to ext4_ext_drop_refs()Theodore Ts'o
Teach ext4_ext_drop_refs() to accept a NULL argument, much like kfree(). This allows us to drop a lot of checks to make sure path is non-NULL before calling ext4_ext_drop_refs(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: call ext4_ext_drop_refs() from ext4_ext_find_extent()Theodore Ts'o
In nearly all of the calls to ext4_ext_find_extent() where the caller is trying to recycle the path object, ext4_ext_drop_refs() gets called to release the buffer heads before the path object gets overwritten. To simplify things for the callers, and to avoid the possibility of a memory leak, make ext4_ext_find_extent() responsible for dropping the buffers. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling codeTheodore Ts'o
Drop EXT4_EX_NOFREE_ON_ERR from ext4_ext_create_new_leaf(), ext4_split_extent(), ext4_convert_unwritten_extents_endio(). This requires fixing all of their callers to potentially ext4_ext_find_extent() to free the struct ext4_ext_path object in case of an error, and there are interlocking dependencies all the way up to ext4_ext_map_blocks(), ext4_swap_extents(), and ext4_ext_remove_space(). Once this is done, we can drop the EXT4_EX_NOFREE_ON_ERR flag since it is no longer necessary. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: drop EXT4_EX_NOFREE_ON_ERR in convert_initialized_extent()Theodore Ts'o
Transfer responsibility of freeing struct ext4_ext_path on error to ext4_ext_find_extent(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: collapse ext4_convert_initialized_extents()Theodore Ts'o
The function ext4_convert_initialized_extents() is only called by a single function --- ext4_ext_convert_initalized_extents(). Inline the code and get rid of the unnecessary bits in order to simplify the code. Rename ext4_ext_convert_initalized_extents() to convert_initalized_extents() since it's a static function that is actually only used in a single caller, ext4_ext_map_blocks(). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: teach ext4_ext_find_extent() to free path on errorTheodore Ts'o
Right now, there are a places where it is all to easy to leak memory on an error path, via a usage like this: struct ext4_ext_path *path = NULL while (...) { ... path = ext4_ext_find_extent(inode, block, path, 0); if (IS_ERR(path)) { /* oops, if path was non-NULL before the call to ext4_ext_find_extent, we've leaked it! :-( */ ... return PTR_ERR(path); } ... } Unfortunately, there some code paths where we are doing the following instead: path = ext4_ext_find_extent(inode, block, orig_path, 0); and where it's important that we _not_ free orig_path in the case where ext4_ext_find_extent() returns an error. So change the function signature of ext4_ext_find_extent() so that it takes a struct ext4_ext_path ** for its third argument, and by default, on an error, it will free the struct ext4_ext_path, and then zero out the struct ext4_ext_path * pointer. In order to avoid causing problems, we add a flag EXT4_EX_NOFREE_ON_ERR which causes ext4_ext_find_extent() to use the original behavior of forcing the caller to deal with freeing the original path pointer on the error case. The goal is to get rid of EXT4_EX_NOFREE_ON_ERR entirely, but this allows for a gentle transition and makes the patches easier to verify. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: fix accidental flag aliasing in ext4_map_blocks flagsTheodore Ts'o
Commit b8a8684502a0f introduced an accidental flag aliasing between EXT4_EX_NOCACHE and EXT4_GET_BLOCKS_CONVERT_UNWRITTEN. Fortunately, this didn't introduce any untorward side effects --- we got lucky. Nevertheless, fix this and leave a warning to hopefully avoid this from happening in the future. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-01ext4: fix ZERO_RANGE bug hidden by flag aliasingTheodore Ts'o
We accidently aliased EXT4_EX_NOCACHE and EXT4_GET_CONVERT_UNWRITTEN falgs, which apparently was hiding a bug that was unmasked when this flag aliasing issue was addressed (see the subsequent commit). The reproduction case was: fsx -N 10000 -l 500000 -r 4096 -t 4096 -w 4096 -Z -R -W /vdb/junk ... which would cause fsx to report corruption in the data file. The fix we have is a bit of an overkill, but I'd much rather be conservative for now, and we can optimize ZERO_RANGE_FL handling later. The fact that we need to zap the extent_status cache for the inode is unfortunate, but correctness is far more important than performance. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com>
2014-08-31ext4: fix ext4_swap_extents() error handlingTheodore Ts'o
If ext4_ext_find_extent() returns an error, we have to clear path1 or path2 or else we would end up trying to free an ERR_PTR, which would be bad. Also eliminate some redundant code and mark the error paths as unlikely() Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-30Merge tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking bugfx from Jeff Layton: "Just a bugfix for a bug that crept in to v3.15. It's in a rather rare error path, and I'm not aware of anyone having hit it, but it's worth fixing for v3.17" * tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linux: locks: pass correct "before" pointer to locks_unlink_lock in generic_add_lease
2014-08-30ext4: refactor ext4_move_extents code baseDmitry Monakhov
ext4_move_extents is too complex for review. It has duplicate almost each function available in the rest of other codebase. It has useless artificial restriction orig_offset == donor_offset. But in fact logic of ext4_move_extents is very simple: Iterate extents one by one (similar to ext4_fill_fiemap_extents) ->Iterate each page covered extent (similar to generic_perform_write) ->swap extents for covered by page (can be shared with IOC_MOVE_DATA) Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-30ext4: use ext4_ext_next_allocated_block instead of mext_next_extentDmitry Monakhov
This allows us to make mext_next_extent static and potentially get rid of it. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-30ext4: use ext4_update_i_disksize instead of opencoded onesDmitry Monakhov
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-30fix EBUSY on umount() from MNT_SHRINKABLEAl Viro
We need the parents of victims alive until namespace_unlock() gets to dput() of the (ex-)mountpoints. However, that screws up the "is it busy" checks in case when we have shrinkable mounts that need to be killed. Solution: go ahead and decrement refcounts of parents right in umount_tree(), increment them again just before dropping rwsem in namespace_unlock() (and let the loop in the end of namespace_unlock() finally drop those references for good, as we do now). Parents can't get freed until we drop rwsem - at least one reference is kept until then, both in case when parent is among the victims and when it is not. So they'll still be around when we get to namespace_unlock(). Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-30get rid of propagate_umount() mistakenly treating slaves as busy.Al Viro
The check in __propagate_umount() ("has somebody explicitly mounted something on that slave?") is done *before* taking the already doomed victims out of the child lists. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-29ext4: remove a duplicate call in ext4_init_new_dir()Wang Shilong
ext4_journal_get_write_access() has just been called in ext4_append() calling it again here is duplicated. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-29ext4: convert do_split() to use the ERR_PTR conventionTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-29ext4: convert dx_probe() to use the ERR_PTR conventionTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-29ext4: convert ext4_bread() to use the ERR_PTR conventionTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-29ext4: convert ext4_getblk() to use the ERR_PTR conventionTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-29ext4: convert ext4_dx_find_entry() to use the ERR_PTR conventionTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-08-29Merge branch 'akpm' (fixes from Andrew Morton)Linus Torvalds
Merge patches from Andrew Morton: "22 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) kexec: purgatory: add clean-up for purgatory directory Documentation/kdump/kdump.txt: add ARM description flush_icache_range: export symbol to fix build errors tools: selftests: fix build issue with make kselftests target ocfs2: quorum: add a log for node not fenced ocfs2: o2net: set tcp user timeout to max value ocfs2: o2net: don't shutdown connection when idle timeout ocfs2: do not write error flag to user structure we cannot copy from/to x86/purgatory: use approprate -m64/-32 build flag for arch/x86/purgatory drivers/rtc/rtc-s5m.c: re-add support for devices without irq specified xattr: fix check for simultaneous glibc header inclusion kexec: remove CONFIG_KEXEC dependency on crypto kexec: create a new config option CONFIG_KEXEC_FILE for new syscall x86,mm: fix pte_special versus pte_numa hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked mm/zpool: use prefixed module loading zram: fix incorrect stat with failed_reads lib: turn CONFIG_STACKTRACE into an actual option. mm: actually clear pmd_numa before invalidating memblock, memhotplug: fix wrong type in memblock_find_in_range_node(). ...
2014-08-29ocfs2: quorum: add a log for node not fencedJunxiao Bi
For debug use, we can see from the log whether the fence decision is made and why it is not fenced. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29ocfs2: o2net: set tcp user timeout to max valueJunxiao Bi
When tcp retransmit timeout(15mins), the connection will be closed. Pending messages may be lost during this time. So we set tcp user timeout to override the retransmit timeout to the max value. This is OK for ocfs2 since we have disk heartbeat, if peer crash, the disk heartbeat will timeout and it will be evicted, if disk heartbeat not timeout and connection idle for a long time, then this means the cluster enters split-brain state, since fence can't happen, we'd better keep the connection and wait network recover. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29ocfs2: o2net: don't shutdown connection when idle timeoutJunxiao Bi
This patch series is to fix a possible message lost bug in ocfs2 when network go bad. This bug will cause ocfs2 hung forever even network become good again. The messages may lost in this case. After the tcp connection is established between two nodes, an idle timer will be set to check its state periodically, if no messages are received during this time, idle timer will timeout, it will shutdown the connection and try to reconnect, so pending messages in tcp queues will be lost. This messages may be from dlm. Dlm may get hung in this case. This may cause the whole ocfs2 cluster hung. This is very possible to happen when network state goes bad. Do the reconnect is useless, it will fail if network state is still bad. Just waiting there for network recovering may be a good idea, it will not lost messages and some node will be fenced until cluster goes into split-brain state, for this case, Tcp user timeout is used to override the tcp retransmit timeout. It will timeout after 25 days, user should have notice this through the provided log and fix the network, if they don't, ocfs2 will fall back to original reconnect way. This patch (of 3): Some messages in the tcp queue maybe lost if we shutdown the connection and reconnect when idle timeout. If packets lost and reconnect success, then the ocfs2 cluster maybe hung. To fix this, we can leave the connection there and do the fence decision when idle timeout, if network recover before fence dicision is made, the connection survive without lost any messages. This bug can be saw when network state go bad. It may cause ocfs2 hung forever if some packets lost. With this fix, ocfs2 will recover from hung if network becomes good again. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29ocfs2: do not write error flag to user structure we cannot copy from/toBen Hutchings
If we failed to copy from the structure, writing back the flags leaks 31 bits of kernel memory (the rest of the ir_flags field). In any case, if we cannot copy from/to the structure, why should we expect putting just the flags to work? Also make sure ocfs2_info_handle_freeinode() returns the right error code if the copy_to_user() fails. Fixes: ddee5cdb70e6 ('Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8.') Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29Merge tag 'nfs-for-3.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: "Highlights: - NFSv3 stable fix for another POSIX ACL regression - NFSv4 stable fix for a regression with OPEN_DOWNGRADE - NFSv4 stable fix for bad close() behaviour when holding a delegation" * tag 'nfs-for-3.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv3: Fix another acl regression NFSv4: Don't clear the open state when we just did an OPEN_DOWNGRADE NFSv4: Fix problems with close in the presence of a delegation
2014-08-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Ext4 bug fixes for 3.17, to provide better handling of memory allocation failures, and to fix some journaling bugs involving journal checksums and FALLOC_FL_ZERO_RANGE" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix same-dir rename when inline data directory overflows jbd2: fix descriptor block size handling errors with journal_csum jbd2: fix infinite loop when recovering corrupt journal blocks ext4: update i_disksize coherently with block allocation on error path ext4: fix transaction issues for ext4_fallocate and ext_zero_range ext4: fix incorect journal credits reservation in ext4_zero_range ext4: move i_size,i_disksize update routines to helper function ext4: fix BUG_ON in mb_free_blocks() ext4: propagate errors up to ext4_find_entry()'s callers
2014-08-29f2fs: fix wrong casting for dentry nameJaegeuk Kim
The dentry name type is unsigned char *. If we don't match this type, some character codes can be changed by signed bit. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-08-28ext4: fix same-dir rename when inline data directory overflowsDarrick J. Wong
When performing a same-directory rename, it's possible that adding or setting the new directory entry will cause the directory to overflow the inline data area, which causes the directory to be converted to an extent-based directory. Under this circumstance it is necessary to re-read the directory when deleting the old dirent because the "old directory" context still points to i_block in the inode table, which is now an extent tree root! The delete fails with an FS error, and the subsequent fsck complains about incorrect link counts and hardlinked directories. Test case (originally found with flat_dir_test in the metadata_csum test program): # mkfs.ext4 -O inline_data /dev/sda # mount /dev/sda /mnt # mkdir /mnt/x # touch /mnt/x/changelog.gz /mnt/x/copyright /mnt/x/README.Debian # sync # for i in /mnt/x/*; do mv $i $i.longer; done # ls -la /mnt/x/ total 0 -rw-r--r-- 1 root root 0 Aug 25 12:03 changelog.gz.longer -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright -rw-r--r-- 1 root root 0 Aug 25 12:03 copyright.longer -rw-r--r-- 1 root root 0 Aug 25 12:03 README.Debian.longer (Hey! Why are there four files now??) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-08-28jbd2: fix descriptor block size handling errors with journal_csumDarrick J. Wong
It turns out that there are some serious problems with the on-disk format of journal checksum v2. The foremost is that the function to calculate descriptor tag size returns sizes that are too big. This causes alignment issues on some architectures and is compounded by the fact that some parts of jbd2 use the structure size (incorrectly) to determine the presence of a 64bit journal instead of checking the feature flags. Therefore, introduce journal checksum v3, which enlarges the descriptor block tag format to allow for full 32-bit checksums of journal blocks, fix the journal tag function to return the correct sizes, and fix the jbd2 recovery code to use feature flags to determine 64bitness. Add a few function helpers so we don't have to open-code quite so many pieces. Switching to a 16-byte block size was found to increase journal size overhead by a maximum of 0.1%, to convert a 32-bit journal with no checksumming to a 32-bit journal with checksum v3 enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-08-28jbd2: fix infinite loop when recovering corrupt journal blocksDarrick J. Wong
When recovering the journal, don't fall into an infinite loop if we encounter a corrupt journal block. Instead, just skip the block and return an error, which fails the mount and thus forces the user to run a full filesystem fsck. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-08-28ext4: update i_disksize coherently with block allocation on error pathDmitry Monakhov
In case of delalloc block i_disksize may be less than i_size. So we have to update i_disksize each time we allocated and submitted some blocks beyond i_disksize. We weren't doing this on the error paths, so fix this. testcase: xfstest generic/019 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-08-28nfsd4: remove labeled NFS warning from config helpJ. Bruce Fields
The working group appears committed to keeping the protocol stable, the code has gotten some use and seems to work OK. Signed-off-by: J. Bruce Fields <bfields@redhat.com>