summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-11-13Orangefs: Remove unused #defines from signal blocking code.Martin Brandenburg
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13Orangefs: set pos after generic_write_checksMike Marshall
if we are appending, generic_write_checks would have updated pos to the end of the file... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There are several patches from Ilya fixing RBD allocation lifecycle issues, a series adding a nocephx_sign_messages option (and associated bug fixes/cleanups), several patches from Zheng improving the (directory) fsync behavior, a big improvement in IO for direct-io requests when striping is enabled from Caifeng, and several other small fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: clear msg->con in ceph_msg_release() only libceph: add nocephx_sign_messages option libceph: stop duplicating client fields in messenger libceph: drop authorizer check from cephx msg signing routines libceph: msg signing callouts don't need con argument libceph: evaluate osd_req_op_data() arguments only once ceph: make fsync() wait unsafe requests that created/modified inode ceph: add request to i_unsafe_dirops when getting unsafe reply libceph: introduce ceph_x_authorizer_cleanup() ceph: don't invalidate page cache when inode is no longer used rbd: remove duplicate calls to rbd_dev_mapping_clear() rbd: set device_type::release instead of device::release rbd: don't free rbd_dev outside of the release callback rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails libceph: use local variable cursor instead of &msg->cursor libceph: remove con argument in handle_reply() ceph: combine as many iovec as possile into one OSD request ceph: fix message length computation ceph: fix a comment typo rbd: drop null test before destroy functions
2015-11-13orangefs: validate the response in decode_dirents()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: don't leave uninitialized data in ->trailer_bufAl Viro
minimal fix; it would be better to reject such requests outright. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: get rid of dec_string and enc_stringAl Viro
The latter is never used, the former has one user and would be better off spelled out right there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: switch decode_dirents() to use of kcalloc()Al Viro
gets rid of multiplication overflow Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: sanitize pvfs2_convert_time_field()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: kill pointless ->link() and ->mknod()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13pvfs2_fill_sb(): use kzalloc()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: kill struct pvfs2_mount_sb_info_sAl Viro
The only reason for that thing used to be the API of mount_nodev() callback; since we are calling pvfs2_fill_sb() ourselves now, we don't have to shove everything into a single structure. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: double iput() in case of d_make_root() failureAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: use get_user_pages_fast(), not get_user_pages()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: kill kmap/kunmap wrappersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: make pvfs2_inode_read() take iov_iterAl Viro
... and make the only caller use page-backed iov_iter, getting rid of kmap/kunmap *and* of the bug with attempted use of iovec-backed copy_page_to_iter() on a kernel pointer. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: make do_readv_writev() take iov_iterAl Viro
no need to build a copy of what the caller already has; what's more, we want the one given to caller properly advanced *and* we shouldn't depend upon it being an iovec-backed one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: don't bother with splitting iovecsAl Viro
copy_page_{to,from}_iter() advances it just fine *and* it has no problem with partially consumed segments. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: make wait_for_direct_io() take iov_iterAl Viro
incidentally, insane or compromised server returning *more* than requested on read should not oops the kernel - initialize the iov_iter for read according to the iovec we've got. That's why pvfs_bufmap_copy_to_iovec() needed a separate size argument - we shouldn't abuse iov_iter_count(iter) for passing that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: make precopy_buffers() take iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: make postcopy_buffers() take iov_iterAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13pvfs_bufmap_copy_from_iovec(): don't rely upon size being equal to ↵Al Viro
iov_iter_count(iter) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-13orangefs: explicitly pass the size to pvfs_bufmap_copy_to_iovec()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2015-11-12dax: fix __dax_pmd_fault crashDan Williams
Since 4.3 introduced devm_memremap_pages() the pfns handled by DAX may optionally have a struct page backing. When a mapped pfn reaches vmf_insert_pfn_pmd() it fails with a crash signature like the following: kernel BUG at mm/huge_memory.c:905! [..] Call Trace: [<ffffffff812a73ba>] __dax_pmd_fault+0x2ea/0x5b0 [<ffffffffa01a4182>] xfs_filemap_pmd_fault+0x92/0x150 [xfs] [<ffffffff811fbe02>] handle_mm_fault+0x312/0x1b50 Fix this by falling back to 4K mappings in the pfn_valid() case. Longer term, vmf_insert_pfn_pmd() needs to grow support for architectures that can provide a 'pmd_special' capability. Cc: <stable@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-12Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull misc block fixes from Jens Axboe: "Stuff that got collected after the merge window opened. This contains: - NVMe: - Fix for non-striped transfer size setting for NVMe from Sathyavathi. - (Some) support for the weird Apple nvme controller in the macbooks. From Stephan Günther. - The error value leak for dax from Al. - A few minor blk-mq tweaks from me. - Add the new linux-block@vger.kernel.org mailing list to the MAINTAINERS file. - Discard fix for brd, from Jan. - A kerneldoc warning for block core from Randy. - An older fix from Vivek, converting a WARN_ON() to a rate limited printk when a device is hot removed with dirty inodes" * 'for-linus' of git://git.kernel.dk/linux-block: block: don't hardcode blk_qc_t -> tag mask dax_io(): don't let non-error value escape via retval instead of EFAULT block: fix blk-core.c kernel-doc warning fs/block_dev.c: Remove WARN_ON() when inode writeback fails NVMe: add support for Apple NVMe controller NVMe: use split lo_hi_{read,write}q blk-mq: mark __blk_mq_complete_request() static MAINTAINERS: add reference to new linux-block list NVMe: Increase the max transfer size when mdts is 0 brd: Refuse improperly aligned discard requests
2015-11-11Merge tag 'xfs-for-linus-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs updates from Dave Chinner: "There is nothing really major here - the only significant addition is the per-mount operation statistics infrastructure. Otherwises there's various ACL, xattr, DAX, AIO and logging fixes, and a smattering of small cleanups and fixes elsewhere. Summary: - per-mount operational statistics in sysfs - fixes for concurrent aio append write submission - various logging fixes - detection of zeroed logs and invalid log sequence numbers on v5 filesystems - memory allocation failure message improvements - a bunch of xattr/ACL fixes - fdatasync optimisation - miscellaneous other fixes and cleanups" * tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (39 commits) xfs: give all workqueues rescuer threads xfs: fix log recovery op header validation assert xfs: Fix error path in xfs_get_acl xfs: optimise away log forces on timestamp updates for fdatasync xfs: don't leak uuid table on rmmod xfs: invalidate cached acl if set via ioctl xfs: Plug memory leak in xfs_attrmulti_attr_set xfs: Validate the length of on-disk ACLs xfs: invalidate cached acl if set directly via xattr xfs: xfs_filemap_pmd_fault treats read faults as write faults xfs: add ->pfn_mkwrite support for DAX xfs: DAX does not use IO completion callbacks xfs: Don't use unwritten extents for DAX xfs: introduce BMAPI_ZERO for allocating zeroed extents xfs: fix inode size update overflow in xfs_map_direct() xfs: clear PF_NOFREEZE for xfsaild kthread xfs: fix an error code in xfs_fs_fill_super() xfs: stats are no longer dependent on CONFIG_PROC_FS xfs: simplify /proc teardown & error handling xfs: per-filesystem stats counter implementation ...
2015-11-11Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Apologies for coming a little late in the merge window. Fortunately this is another fairly quiet one: Mainly smaller bugfixes and cleanup. We're still finding some bugs from the breakup of the big NFSv4 state lock in 3.17 -- thanks especially to Andrew Elble and Jeff Layton for tracking down some of the remaining races" * tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux: svcrpc: document lack of some memory barriers nfsd: fix race with open / open upgrade stateids nfsd: eliminate sending duplicate and repeated delegations nfsd: remove recurring workqueue job to clean DRC SUNRPC: drop stale comment in svc_setup_socket() nfsd: ensure that seqid morphing operations are atomic wrt to copies nfsd: serialize layout stateid morphing operations nfsd: improve client_has_state to check for unused openowners nfsd: fix clid_inuse on mount with security change sunrpc/cache: make cache flushing more reliable. nfsd: move include of state.h from trace.c to trace.h sunrpc: avoid warning in gss_key_timeout lockd: get rid of reference-counted NSM RPC clients SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage() lockd: create NSM handles per net namespace nfsd: switch unsigned char flags in svc_fh to bools nfsd: move svc_fh->fh_maxsize to just after fh_handle nfsd: drop null test before destroy functions nfsd: serialize state seqid morphing operations
2015-11-11Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - misc stable fixes - trivial kernel-doc and comment fixups - remove never-used block_page_mkwrite() wrapper function, and rename the function that is _actually_ used to not have double underscores. * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: 9p: cache.h: Add #define of include guard vfs: remove stale comment in inode_operations vfs: remove unused wrapper block_page_mkwrite() binfmt_elf: Correct `arch_check_elf's description fs: fix writeback.c kernel-doc warnings fs: fix inode.c kernel-doc warning fs/pipe.c: return error code rather than 0 in pipe_write() fs/pipe.c: preserve alloc_file() error code binfmt_elf: Don't clobber passed executable's file header FS-Cache: Handle a write to the page immediately beyond the EOF marker cachefiles: perform test on s_blocksize when opening cache file. FS-Cache: Don't override netfs's primary_index if registering failed FS-Cache: Increase reference of parent after registering, netfs success debugfs: fix refcount imbalance in start_creating
2015-11-11dax_io(): don't let non-error value escape via retval instead of EFAULTAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-11fs/block_dev.c: Remove WARN_ON() when inode writeback failsVivek Goyal
If a block device is hot removed and later last reference to device is put, we try to writeback the dirty inode. But device is gone and that writeback fails. Currently we do a WARN_ON() which does not seem to be the right thing. Convert it to a ratelimited kernel warning. Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> [jmoyer@redhat.com: get rid of unnecessary name initialization, 80 cols] Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-11fs: 9p: cache.h: Add #define of include guardTzvetelin Katchov
The include file was intended to have an include guard, but the #define part is missing. Signed-off-by: Tzvetelin Katchov <katchov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11vfs: remove unused wrapper block_page_mkwrite()Ross Zwisler
The function currently called "__block_page_mkwrite()" used to be called "block_page_mkwrite()" until a wrapper for this function was added by: commit 24da4fab5a61 ("vfs: Create __block_page_mkwrite() helper passing error values back") This wrapper, the current "block_page_mkwrite()", is currently unused. __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs. Remove the unused wrapper, rename __block_page_mkwrite() back to block_page_mkwrite() and update the comment above block_page_mkwrite(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.com> Cc: Jan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11binfmt_elf: Correct `arch_check_elf's descriptionMaciej W. Rozycki
Correct `arch_check_elf's description, mistakenly copied and pasted from `arch_elf_pt_proc'. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs: fix writeback.c kernel-doc warningsRandy Dunlap
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to after the function's opening brace. Also #undef this macro at the end of the function. ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE' ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs: fix inode.c kernel-doc warningRandy Dunlap
Fix kernel-doc warning in fs/inode.c: ..//fs/inode.c:1606: warning: No description found for parameter 'inode' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs/pipe.c: return error code rather than 0 in pipe_write()Eric Biggers
pipe_write() would return 0 if it failed to merge the beginning of the data to write with the last, partially filled pipe buffer. It should return an error code instead. Userspace programs could be confused by write() returning 0 when called with a nonzero 'count'. The EFAULT error case was a regression from f0d1bec9d5 ("new helper: copy_page_from_iter()"), while the ops->confirm() error case was a much older bug. Test program: #include <assert.h> #include <errno.h> #include <unistd.h> int main(void) { int fd[2]; char data[1] = {0}; assert(0 == pipe(fd)); assert(1 == write(fd[1], data, 1)); /* prior to this patch, write() returned 0 here */ assert(-1 == write(fd[1], NULL, 1)); assert(errno == EFAULT); } Cc: stable@vger.kernel.org # at least v3.15+ Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs/pipe.c: preserve alloc_file() error codeEric Biggers
If sys_pipe() was unable to allocate a 'struct file', it always failed with ENFILE, which means "The number of simultaneously open files in the system would exceed a system-imposed limit." However, alloc_file() actually returns an ERR_PTR value and might fail with other error codes. Currently, in addition to ENFILE, it can fail with ENOMEM, potentially when there are few open files in the system. Update sys_pipe() to preserve this error code. In a prior submission of a similar patch (1) some concern was raised about introducing a new error code for sys_pipe(). However, for most system calls, programs cannot assume that new error codes will never be introduced. In addition, ENOMEM was, in fact, already a possible error code for sys_pipe(), in the case where the file descriptor table could not be expanded due to insufficient memory. (1) http://comments.gmane.org/gmane.linux.kernel/1357942 Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11binfmt_elf: Don't clobber passed executable's file headerMaciej W. Rozycki
Do not clobber the buffer space passed from `search_binary_handler' and originally preloaded by `prepare_binprm' with the executable's file header by overwriting it with its interpreter's file header. Instead keep the buffer space intact and directly use the data structure locally allocated for the interpreter's file header, fixing a bug introduced in 2.1.14 with loadable module support (linux-mips.org commit beb11695 [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history). Adjust the amount of data read from the interpreter's file accordingly. This was not an issue before loadable module support, because back then `load_elf_binary' was executed only once for a given ELF executable, whether the function succeeded or failed. With loadable module support supported and enabled, upon a failure of `load_elf_binary' -- which may for example be caused by architecture code rejecting an executable due to a missing hardware feature requested in the file header -- a module load is attempted and then the function reexecuted by `search_binary_handler'. With the executable's file header replaced with its interpreter's file header the executable can then be erroneously accepted in this subsequent attempt. Cc: stable@vger.kernel.org # all the way back Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11FS-Cache: Handle a write to the page immediately beyond the EOF markerDavid Howells
Handle a write being requested to the page immediately beyond the EOF marker on a cache object. Currently this gets an assertion failure in CacheFiles because the EOF marker is used there to encode information about a partial page at the EOF - which could lead to an unknown blank spot in the file if we extend the file over it. The problem is actually in fscache where we check the index of the page being written against store_limit. store_limit is set to the number of pages that we're allowed to store by fscache_set_store_limit() - which means it's one more than the index of the last page we're allowed to store. The problem is that we permit writing to a page with an index _equal_ to the store limit - when we should reject that case. Whilst we're at it, change the triggered assertion in CacheFiles to just return -ENOBUFS instead. The assertion failure looks something like this: CacheFiles: Assertion failed 1000 < 7b1 is false ------------[ cut here ]------------ kernel BUG at fs/cachefiles/rdwr.c:962! ... RIP: 0010:[<ffffffffa02c9e83>] [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles] Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754f (at least) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11cachefiles: perform test on s_blocksize when opening cache file.NeilBrown
cachefiles requires that s_blocksize in the cache is not greater than PAGE_SIZE, and performs the check every time a block is accessed. Move the test to the place where the file is "opened", where other file-validity tests are performed. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11FS-Cache: Don't override netfs's primary_index if registering failedKinglong Mee
Only override netfs->primary_index when registering success. Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11FS-Cache: Increase reference of parent after registering, netfs successKinglong Mee
If netfs exist, fscache should not increase the reference of parent's usage and n_children, otherwise, never be decreased. v2: thanks David's suggest, move increasing reference of parent if success use kmem_cache_free() freeing primary_index directly v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;" Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11debugfs: fix refcount imbalance in start_creatingDaniel Borkmann
In debugfs' start_creating(), we pin the file system to safely access its root. When we failed to create a file, we unpin the file system via failed_creating() to release the mount count and eventually the reference of the vfsmount. However, when we run into an error during lookup_one_len() when still in start_creating(), we only release the parent's mutex but not so the reference on the mount. Looks like it was done in the past, but after splitting portions of __create_file() into start_creating() and end_creating() via 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off"), this seemed missed. Noticed during code review. Fixes: 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-10btrfs: Use fs_info directly in btrfs_delete_unused_bgsZhao Lei
No need to use root->fs_info in btrfs_delete_unused_bgs(), use fs_info directly instead. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10btrfs: Fix lost-data-profile caused by balance bgZhao Lei
Reproduce: (In integration-4.3 branch) TEST_DEV=(/dev/vdg /dev/vdh) TEST_DIR=/mnt/tmp umount "$TEST_DEV" >/dev/null mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}" mount -o nospace_cache "$TEST_DEV" "$TEST_DIR" btrfs balance start -dusage=0 $TEST_DIR btrfs filesystem usage $TEST_DIR dd if=/dev/zero of="$TEST_DIR"/file count=100 btrfs filesystem usage $TEST_DIR Result: We can see "no data chunk" in first "btrfs filesystem usage": # btrfs filesystem usage $TEST_DIR Overall: ... Metadata,single: Size:8.00MiB, Used:0.00B /dev/vdg 8.00MiB Metadata,RAID1: Size:122.88MiB, Used:112.00KiB /dev/vdg 122.88MiB /dev/vdh 122.88MiB System,single: Size:4.00MiB, Used:0.00B /dev/vdg 4.00MiB System,RAID1: Size:8.00MiB, Used:16.00KiB /dev/vdg 8.00MiB /dev/vdh 8.00MiB Unallocated: /dev/vdg 1.06GiB /dev/vdh 1.07GiB And "data chunks changed from raid1 to single" in second "btrfs filesystem usage": # btrfs filesystem usage $TEST_DIR Overall: ... Data,single: Size:256.00MiB, Used:0.00B /dev/vdh 256.00MiB Metadata,single: Size:8.00MiB, Used:0.00B /dev/vdg 8.00MiB Metadata,RAID1: Size:122.88MiB, Used:112.00KiB /dev/vdg 122.88MiB /dev/vdh 122.88MiB System,single: Size:4.00MiB, Used:0.00B /dev/vdg 4.00MiB System,RAID1: Size:8.00MiB, Used:16.00KiB /dev/vdg 8.00MiB /dev/vdh 8.00MiB Unallocated: /dev/vdg 1.06GiB /dev/vdh 841.92MiB Reason: btrfs balance delete last data chunk in case of no data in the filesystem, then we can see "no data chunk" by "fi usage" command. And when we do write operation to fs, the only available data profile is 0x0, result is all new chunks are allocated single type. Fix: Allocate a data chunk explicitly to ensure we don't lose the raid profile for data. Test: Test by above script, and confirmed the logic by debug output. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10btrfs: Fix lost-data-profile caused by auto removing bgZhao Lei
Reproduce: (In integration-4.3 branch) TEST_DEV=(/dev/vdg /dev/vdh) TEST_DIR=/mnt/tmp umount "$TEST_DEV" >/dev/null mkfs.btrfs -f -d raid1 "${TEST_DEV[@]}" mount -o nospace_cache "$TEST_DEV" "$TEST_DIR" umount "$TEST_DEV" mount -o nospace_cache "$TEST_DEV" "$TEST_DIR" btrfs filesystem usage $TEST_DIR We can see the data chunk changed from raid1 to single: # btrfs filesystem usage $TEST_DIR Data,single: Size:8.00MiB, Used:0.00B /dev/vdg 8.00MiB # Reason: When a empty filesystem mount with -o nospace_cache, the last data blockgroup will be auto-removed in umount. Then if we mount it again, there is no data chunk in the filesystem, so the only available data profile is 0x0, result is all new chunks are created as single type. Fix: Don't auto-delete last blockgroup for a raid type. Test: Test by above script, and confirmed the logic by debug output. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10btrfs: Remove len argument from scrub_find_csumZhao Lei
It is useless. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10btrfs: Reduce unnecessary arguments in scrub_recheck_blockZhao Lei
We don't need pass so many arguments for recheck sblock now, this patch cleans them. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10btrfs: Use scrub_checksum_data and scrub_checksum_tree_block for ↵Zhao Lei
scrub_recheck_block_checksum We can use existing scrub_checksum_data() and scrub_checksum_tree_block() for scrub_recheck_block_checksum(), instead of write duplicated code. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10btrfs: Reset sblock->xxx_error stats before calling scrub_recheck_block_checksumZhao Lei
We should reset sblock->xxx_error stats before calling scrub_recheck_block_checksum(). Current code run correctly because all sblock are allocated by k[cz]alloc(), and the error stats are not got changed. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-10btrfs: scrub: setup all fields for sblock_to_checkZhao Lei
scrub_setup_recheck_block() isn't setup all necessary fields for sblock_to_check because history reason. So current code need more arguments in severial functions, and more local variables, just to passing these lacked values to necessary place. This patch setup above fields to sblock_to_check in scrub_setup_recheck_block(), for: 1: more cleanup for function arg, local variable 2: to make sblock_to_check complete, then we can use sblock_to_check without concern about some uninitialized member. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>