summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-11-30btrfs: reada, remove unused parameter from __readahead_hookDavid Sterba
'start' is not used since "btrfs: reada: Pass reada_extent into __readahead_hook directly" (6e39dbe8b9e55280c). Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: reada, cleanup remove unneeded variable in __readahead_hookDavid Sterba
We can't touch the eb directly in case the function is called with a non-zero error, so we can read the eb level when needed. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: rename helper macros for qgroup and aux data castsDavid Sterba
The helpers are not meant to be generic, the name is misleading. Convert them to static inlines for type checking. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove stale comment from btrfs_statfsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove unused headers, statfs.hDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove useless commentsXiaoguang Wang
Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely") Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: make block group flags in balance printks human-readableAdam Borowski
They're not even documented anywhere, letting users with no recourse but to RTFS. It's no big burden to output the bitfield as words. Also, display unknown flags as hex. Signed-off-by: Adam Borowski <kilobyte@angband.pl> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30Btrfs: deal with existing encompassing extent map in btrfs_get_extent()Omar Sandoval
My QEMU VM was seeing inexplicable I/O errors that I tracked down to errors coming from the qcow2 virtual drive in the host system. The qcow2 file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT. Every once in awhile, pread() or pwrite() would return EEXIST, which makes no sense. This turned out to be a bug in btrfs_get_extent(). Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where two threads race on adding the same extent map to an inode's extent map tree. However, if the added em is merged with an adjacent em in the extent tree, then we'll end up with an existing extent that is not identical to but instead encompasses the extent we tried to add. When we call merge_extent_mapping() to find the nonoverlapping part of the new em, the arithmetic overflows because there is no such thing. We then end up trying to add a bogus em to the em_tree, which results in a EEXIST that can bubble all the way up to userspace. Fix it by extending the identical extent map special case. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: add necessary comments about tickets_idWang Xiaoguang
Tickets_id's name may result in some misunderstandings, it just indicates the next ticket will be handled and is not stored per ticket. Fixes: ce12965 ("btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress") Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30quota: Remove dqonoff_mutexJan Kara
The only places that were grabbing dqonoff_mutex are functions turning quotas on and off and these are properly serialized using s_umount semaphore. Remove dqonoff_mutex. Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30ocfs2: Use s_umount for quota recovery protectionJan Kara
Currently we use dqonoff_mutex to serialize quota recovery protection and turning of quotas on / off. Use s_umount semaphore instead. Tested-by: Eric Ren <zren@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30quota: Remove dqonoff_mutex from dquot_scan_active()Jan Kara
All callers of dquot_scan_active() now hold s_umount so we can rely on that lock to protect us against quota state changes. Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30ocfs2: Protect periodic quota syncing with s_umount semaphoreJan Kara
New quota locking rules will require s_umount semaphore for all quota scanning functions. Add is for periodic quota syncing. Tested-by: Eric Ren <zren@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30Merge branch 'iomap-4.10-directio' into for-nextDave Chinner
2016-11-30xfs: use iomap_dio_rwChristoph Hellwig
Straight switch over to using iomap for direct I/O - we already have the non-COW dio path in write_begin for DAX and files with extent size hints, so nothing to add there. The COW path is ported over from the old get_blocks version and a bit of a mess, but I have some work in progress to make it look more like the buffered I/O COW path. This gets rid of xfs_get_blocks_direct and the last caller of xfs_get_blocks with the create flag set, so all that code can be removed. Last but not least I've removed a comment in xfs_filemap_fault that refers to xfs_get_blocks entirely instead of updating it - while the reference is correct, the whole DAX fault path looks different than the non-DAX one, so it seems rather pointless. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-30iomap: implement direct I/OChristoph Hellwig
This adds a full fledget direct I/O implementation using the iomap interface. Full fledged in this case means all features are supported: AIO, vectored I/O, any iov_iter type including kernel pointers, bvecs and pipes, support for hole filling and async apending writes. It does not mean supporting all the warts of the old generic code. We expect i_rwsem to be held over the duration of the call, and we expect to maintain i_dio_count ourselves, and we pass on any kinds of mapping to the file system for now. The algorithm used is very simple: We use iomap_apply to iterate over the range of the I/O, and then we use the new bio_iov_iter_get_pages helper to lock down the user range for the size of the extent. bio_iov_iter_get_pages can currently lock down twice as many pages as the old direct I/O code did, which means that we will have a better batch factor for everything but overwrites of badly fragmented files. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Tested-by: Jens Axboe <axboe@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-30fs: make sb_init_dio_done_wq available outside of direct-io.cChristoph Hellwig
We want to use the per-sb completion workqueue from the new iomap direct I/O code. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-30xfs: remove i_iolock and use i_rwsem in the VFS inode insteadChristoph Hellwig
This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which recently replaced i_mutex instead. This means we only have to take one lock instead of two in many fast path operations, and we can also shrink the xfs_inode structure. Thanks to the xfs_ilock family there is very little churn, the only thing of note is that we need to switch to use the lock_two_directory helper for taking the i_rwsem on two inodes in a few places to make sure our lock order matches the one used in the VFS. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-30Merge branch 'xfs-4.10-misc-fixes-2' into iomap-4.10-directioDave Chinner
2016-11-29f2fs: return AOP_WRITEPAGE_ACTIVATE for writepageChao Yu
We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29f2fs: do not activate auto_recovery for fallocated i_sizeJaegeuk Kim
If a file needs to keep its i_size by fallocate, we need to turn off auto recovery during roll-forward recovery. This will resolve the below scenario. 1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync" 2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync" 3. md5sum /mnt/f2fs/file; 4. godown /mnt/f2fs/ 5. umount /mnt/f2fs/ 6. mount -t f2fs /dev/sdx /mnt/f2fs 7. md5sum /mnt/f2fs/file Reported-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29kernfs: Declare two local data structures staticBart Van Assche
This was spotted by the 'sparse' static checker. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-29ext4: be more strict when verifying flags set via SETFLAGS ioctlsJan Kara
Currently we just silently ignore flags that we don't understand (or that cannot be manipulated) through EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR ioctls. This makes it problematic for the unused flags to be used in future (some app may be inadvertedly setting them and we won't notice until the flag gets used). Also this is inconsistent with other filesystems like XFS or BTRFS which return EOPNOTSUPP when they see a flag they cannot set. ext4 has the additional problem that there are flags which are returned by EXT4_IOC_GETFLAGS ioctl but which cannot be modified via EXT4_IOC_SETFLAGS. So we have to be careful to ignore value of these flags and not fail the ioctl when they are set (as e.g. chattr(1) passes flags returned from EXT4_IOC_GETFLAGS to EXT4_IOC_SETFLAGS without any masking and thus we'd break this utility). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-29ext4: add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to modifiable maskJan Kara
Add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to EXT4_FL_USER_MODIFIABLE to recognize that they are modifiable by userspace. So far we got away without having them there because ext4_ioctl_setflags() treats them in a special way. But it was really confusing like that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-29btrfs: cleanup: use already calculated value in ↵Wang Xiaoguang
btrfs_should_throttle_delayed_refs() Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-29btrfs: don't abuse REQ_OP_* flags for btrfs_map_blockChristoph Hellwig
btrfs_map_block supports different types of mappings, which to a large extent resemble block layer operations. But they don't always do, and currently btrfs dangerously overlays it's own flag over the block layer flags. This is just asking for a conflict, so introduce a different map flags enum inside of btrfs instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-29ovl: fix d_real() for stacked fsMiklos Szeredi
Handling of recursion in d_real() is completely broken. Recursion is only done in the 'inode != NULL' case. But when opening the file we have 'inode == NULL' hence d_real() will return an overlay dentry. This won't work since overlayfs doesn't define its own file operations, so all file ops will fail. Fix by doing the recursion first and the check against the inode second. Bash script to reproduce the issue written by Quentin: - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - tmpdir=$(mktemp -d) pushd ${tmpdir} mkdir -p {upper,lower,work} echo -n 'rocks' > lower/ksplice mount -t overlay level_zero upper -o lowerdir=lower,upperdir=upper,workdir=work cat upper/ksplice tmpdir2=$(mktemp -d) pushd ${tmpdir2} mkdir -p {upper,work} mount -t overlay level_one upper -o lowerdir=${tmpdir}/upper,upperdir=upper,workdir=work ls -l upper/ksplice cat upper/ksplice - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 2d902671ce1c ("vfs: merge .d_select_inode() into .d_real()") Cc: <stable@vger.kernel.org> # v4.8+
2016-11-28CIFS: iterate over posix acl xattr entry correctly in ACL_to_cifs_posix()Eryu Guan
Commit 2211d5ba5c6c ("posix_acl: xattr representation cleanups") removes the typedefs and the zero-length a_entries array in struct posix_acl_xattr_header, and uses bare struct posix_acl_xattr_header and struct posix_acl_xattr_entry directly. But it failed to iterate over posix acl slots when converting posix acls to CIFS format, which results in several test failures in xfstests (generic/053 generic/105) when testing against a samba v1 server, starting from v4.9-rc1 kernel. e.g. [root@localhost xfstests]# diff -u tests/generic/105.out /root/xfstests/results//generic/105.out.bad --- tests/generic/105.out 2016-09-19 16:33:28.577962575 +0800 +++ /root/xfstests/results//generic/105.out.bad 2016-10-22 15:41:15.201931110 +0800 @@ -1,3 +1,4 @@ QA output created by 105 -rw-r--r-- root +setfacl: subdir: Invalid argument -rw-r--r-- root Fix it by introducing a new "ace" var, like what cifs_copy_posix_acl() does, and iterating posix acl xattr entries over it in the for loop. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-11-28Call echo service immediately after socket reconnectSachin Prabhu
Commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect long after socket reconnect") changes the behaviour of the SMB2 echo service and causes it to renegotiate after a socket reconnect. However under default settings, the echo service could take up to 120 seconds to be scheduled. The patch forces the echo service to be called immediately resulting a negotiate call being made immediately on reconnect. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-11-28CIFS: Fix BUG() in calc_seckey()Sachin Prabhu
Andy Lutromirski's new virtually mapped kernel stack allocations moves kernel stacks the vmalloc area. This triggers the bug kernel BUG at ./include/linux/scatterlist.h:140! at calc_seckey()->sg_init() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-11-28f2fs: fix to determine start_cp_addr by sbi->cur_cp_packJaegeuk Kim
We don't guarantee cp_addr is fixed by cp_version. This is to sync with f2fs-tools. Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-28Merge branch 'xfs-4.10-misc-fixes-2' into for-nextDave Chinner
2016-11-28xfs: pass post-eof speculative prealloc blocks to bmapiBrian Foster
xfs_file_iomap_begin_delay() implements post-eof speculative preallocation by extending the block count of the requested delayed allocation. Now that xfs_bmapi_reserve_delalloc() has been updated to handle prealloc blocks separately and tag the inode, update xfs_file_iomap_begin_delay() to use the new parameter and rely on the former to tag the inode. Note that this patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-28xfs: clean up cow fork reservation and tag inodes correctlyBrian Foster
COW fork reservation is implemented via delayed allocation. The code is modeled after the traditional delalloc allocation code, but is slightly different in terms of how preallocation occurs. Rather than post-eof speculative preallocation, COW fork preallocation is implemented via a COW extent size hint that is designed to minimize fragmentation as a reflinked file is split over time. xfs_reflink_reserve_cow() still uses logic that is oriented towards dealing with post-eof speculative preallocation, however, and is stale or not necessarily correct. First, the EOF alignment to the COW extent size hint is implemented in xfs_bmapi_reserve_delalloc() (which does so correctly by aligning the start and end offsets) and so is not necessary in xfs_reflink_reserve_cow(). The backoff and retry logic on ENOSPC is also ineffective for the same reason, as xfs_bmapi_reserve_delalloc() will simply perform the same allocation request on the retry. Finally, since the COW extent size hint aligns the start and end offset of the range to allocate, the end_fsb != orig_end_fsb logic is not sufficient. Indeed, if a write request happens to end on an aligned offset, it is possible that we do not tag the inode for COW preallocation even though xfs_bmapi_reserve_delalloc() may have preallocated at the start offset. Kill the unnecessary, duplicate code in xfs_reflink_reserve_cow(). Remove the inode tag logic as well since xfs_bmapi_reserve_delalloc() has been updated to tag the inode correctly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-28xfs: track preallocation separately in xfs_bmapi_reserve_delalloc()Brian Foster
Speculative preallocation is currently processed entirely by the callers of xfs_bmapi_reserve_delalloc(). The caller determines how much preallocation to include, adjusts the extent length and passes down the resulting request. While this works fine for post-eof speculative preallocation, it is not as reliable for COW fork preallocation. COW fork preallocation is implemented via the cowextszhint, which aligns the start offset as well as the length of the extent. Further, it is difficult for the caller to accurately identify when preallocation occurs because the returned extent could have been merged with neighboring extents in the fork. To simplify this situation and facilitate further COW fork preallocation enhancements, update xfs_bmapi_reserve_delalloc() to take a separate preallocation parameter to incorporate into the allocation request. The preallocation blocks value is tacked onto the end of the request and adjusted to accommodate neighboring extents and extent size limits. Since xfs_bmapi_reserve_delalloc() now knows precisely how much preallocation was included in the allocation, it can also tag the inodes appropriately to support preallocation reclaim. Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to use the preallocation mechanism. This patch should not change behavior outside of correctly tagging reflink inodes when start offset preallocation occurs (which the caller does not handle correctly). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-28xfs: always succeed when deduping zero bytesDarrick J. Wong
It turns out that btrfs and xfs had differing interpretations of what to do when the dedupe length is zero. Change xfs to follow btrfs' semantics so that the userland interface is consistent. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-28fs: xfs: libxfs: constify xfs_nameops structuresBhumika Goyal
Declare the structure xfs_nameops as const as it is only stored in the m_dirnameops field of a xfs_mount structure. This field is of type const struct xfs_nameops *, so xfs_nameops structures having this property can be declared as const. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct xfs_nameops i@p = {...}; @ok1@ identifier r1.i; position p; struct xfs_mount mp; @@ mp.m_dirnameops=&i@p @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ static +const struct xfs_nameops i={...}; @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct xfs_nameops i; File size before: text data bss dec hex filename 5302 85 0 5387 150b fs/xfs/libxfs/xfs_dir2.o File size after: text data bss dec hex filename 5318 69 0 5387 150b fs/xfs/libxfs/xfs_dir2.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-28fs: xfs: xfs_icreate_item: constify xfs_item_ops structureBhumika Goyal
Declare the structure xfs_item_ops as const as it is only passed as an argument to the function xfs_log_item_init. As this argument is of type const struct xfs_item_ops *, so xfs_item_ops structures having this property can be declared as const. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct xfs_item_ops i@p = {...}; @ok1@ identifier r1.i; position p; expression e1,e2,e3; @@ xfs_log_item_init(e1,e2,e3,&i@p) @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ static +const struct xfs_item_ops i={...}; @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct xfs_item_ops i; File size before: text data bss dec hex filename 737 64 8 809 329 fs/xfs/xfs_icreate_item.o File size after: text data bss dec hex filename 801 0 8 809 329 fs/xfs/xfs_icreate_item.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-28xfs: factor rmap btree size into the indlen calculationsDarrick J. Wong
When we're estimating the amount of space it's going to take to satisfy a delalloc reservation, we need to include the space that we might need to grow the rmapbt. This helps us to avoid running out of space later when _iomap_write_allocate needs more space than we reserved. Eryu Guan observed this happening on generic/224 when sunit/swidth were set. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-28xfs: add XBF_XBF_NO_IOACCT to buf trace outputEric Sandeen
When XBF_NO_IOACCT got added, it missed the translation in XFS_BUF_FLAGS, so we see "0x8" in trace output rather than the flag name. Fix it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26fix default_file_splice_read()Al Viro
Botched calculation of number of pages. As the result, we were dropping pieces when doing splice to pipe from e.g. 9p. Reported-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-26ext4: fix mmp use after free during unmountEric Sandeen
In ext4_put_super, we call brelse on the buffer head containing the ext4 superblock, but then try to use it when we stop the mmp thread, because when the thread shuts down it does: write_mmp_block ext4_mmp_csum_set ext4_has_metadata_csum WARN_ON_ONCE(ext4_has_feature_metadata_csum(sb)...) which reaches into sb->s_fs_info->s_es->s_feature_ro_compat, which lives in the superblock buffer s_sbh which we just released. Fix this by moving the brelse down to a point where we are no longer using it. Reported-by: Wang Shu <shuwang@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-11-25f2fs: fix 32-bit buildArnd Bergmann
The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED on 32-bit machines because of a 64-bit division: fs/f2fs/f2fs.o: In function `__issue_discard_async': extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod' Fortunately, bdev_zone_size() is guaranteed to return a power-of-two number, so we can replace the % operator with a cheaper bit mask. Fixes: 792b84b74b54 ("f2fs: support multiple devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25f2fs: set ->owner for debugfs status file's file_operationsNicolai Stange
The struct file_operations instance serving the f2fs/status debugfs file lacks an initialization of its ->owner. This means that although that file might have been opened, the f2fs module can still get removed. Any further operation on that opened file, releasing included, will cause accesses to unmapped memory. Indeed, Mike Marshall reported the following: BUG: unable to handle kernel paging request at ffffffffa0307430 IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90 <...> Call Trace: [] __fput+0xdf/0x1d0 [] ____fput+0xe/0x10 [] task_work_run+0x8e/0xc0 [] do_exit+0x2ae/0xae0 [] ? __audit_syscall_entry+0xae/0x100 [] ? syscall_trace_enter+0x1ca/0x310 [] do_group_exit+0x44/0xc0 [] SyS_exit_group+0x14/0x20 [] do_syscall_64+0x61/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 <...> ---[ end trace f22ae883fa3ea6b8 ]--- Fixing recursive fault but reboot is needed! Fix this by initializing the f2fs/status file_operations' ->owner with THIS_MODULE. This will allow debugfs to grab a reference to the f2fs module upon any open on that file, thus preventing it from getting removed. Fixes: 902829aa0b72 ("f2fs: move proc files to debugfs") Reported-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25f2fs: fix incorrect free inode count in ->statfsChao Yu
While calculating inode count that we can create at most in the left space, we should consider space which data/node blocks occupied, since we create data/node mixly in main area. So fix the wrong calculation in ->statfs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25f2fs: drop duplicate header timer.hGeliang Tang
Drop duplicate header timer.h from segment.c. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25f2fs: fix wrong AUTO_RECOVER conditionJaegeuk Kim
If i_size is not aligned to the f2fs's block size, we should not skip inode update during fsync. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25f2fs: do not recover i_size if it's validJaegeuk Kim
If i_size is already valid during roll_forward recovery, we should not update it according to the block alignment. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25f2fs: fix fdatasyncChao Yu
For below two cases, we can't guarantee data consistence: a) 1. xfs_io "pwrite 0 4195328" "fsync" 2. xfs_io "pwrite 4195328 1024" "fdatasync" 3. godown 4. umount & mount --> isize we updated before fdatasync won't be recovered b) 1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync" 2. xfs_io "fpunch 4194304 4096" "fdatasync" 3. godown 4. umount & mount --> dnode we punched before fdatasync won't be recovered The reason is that normally fdatasync won't be aware of modification of metadata in file, e.g. isize changing, dnode updating, so in ->fsync we will skip flushing node pages for above cases, result in making fdatasynced file being lost during recovery. Currently we have introduced DIRTY_META global list in sbi for tracking dirty inode selectively, so in fdatasync we can choose to flush nodes depend on dirty state of current inode in the list. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>