summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-08-13nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errorsKinglong Mee
According to Christoph's advice, this patch introduce a new helper nfsd4_cb_sequence_done() for processing more callback errors, following the example of the client's nfs41_sequence_done(). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-12fs: Set the size of empty dirs to 0.Eric W. Biederman
Before the make_empty_dir_inode calls were introduce into proc, sysfs, and sysctl those directories when stated reported an i_size of 0. make_empty_dir_inode started reporting an i_size of 2. At least one userspace application depended on stat returning i_size of 0. So modify make_empty_dir_inode to cause an i_size of 0 to be reported for these directories. Cc: stable@vger.kernel.org Reported-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-08-12NFSv4.1/pnfs: Remove redundant wakeup in pnfs_send_layoutreturn()Trond Myklebust
pnfs_clear_layoutreturn_waitbit() should already be calling rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq) for us. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4.1/pnfs: Remove redundant check in pnfs_layoutgets_blocked()Trond Myklebust
layoutget now should already be serialised w.r.t. layout returns Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets in layoutreturnTrond Myklebust
The NFS_LAYOUT_RETURN bit already suffices to ensure that layoutget is blocked. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4.1/pnfs: Don't prevent layoutgets when doing return-on-closeTrond Myklebust
If there is an outstanding return-on-close, then we just want new layoutget requests to wait rather than fail. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4.1/pnfs: Fix serialisation of layout return and layoutgetTrond Myklebust
We should always test for outstanding layout returns, whether or not pnfs_should_retry_layoutget() is true. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4.1/pnfs: Remove redundant checks in pnfs_layoutgets_blocked()Trond Myklebust
If there are no valid layout segments, then we should already have checked in pnfs_update_layout() whether or not this is the first layoutget. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12pNFS: Tighten up locking around DS commit bucketsTrond Myklebust
I'm not aware of any bugreports around this issue, but the locking around the pnfs_commit_bucket is inconsistent at best. This patch tightens it up by ensuring that the 'bucket->committing' list is always changed atomically w.r.t. the 'bucket->clseg' layout segment tracking. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFS: Remove duplicate svc_xprt_put from nfs41_callback_upKinglong Mee
The xprt created by svc_create_xprt have be added to serv->sv_permsocks. So putting the xprt directly is useless. Otherwise, there is a more svc_xprt_put after the xprt be freed. v2, same as v1. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4: don't set SETATTR for O_RDONLY|O_EXCLNeilBrown
It is unusual to combine the open flags O_RDONLY and O_EXCL, but it appears that libre-office does just that. [pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0 [pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...> NFSv4 takes O_EXCL as a sign that a setattr command should be sent, probably to reset the timestamps. When it was an O_RDONLY open, the SETATTR command does not identify any actual attributes to change. If no delegation was provided to the open, the SETATTR uses the all-zeros stateid and the request is accepted (at least by the Linux NFS server - no harm, no foul). If a read-delegation was provided, this is used in the SETATTR request, and a Netapp filer will justifiably claim NFS4ERR_BAD_STATEID, which the Linux client takes as a sign to retry - indefinitely. So only treat O_EXCL specially if O_CREAT was also given. Signed-off-by: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFS: Error out when register_shrinker fail in register_nfs_fsKinglong Mee
Commit 1d3d4437ea "vmscan: per-node deferred work" have made register_shrinker can return an intergater error. If register_shrinker() fail, the later unregister_shrinker() will cause a NULL pointer access. v2, same as v1. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4.2/pnfs: Use GFP_NOIO for layoutstat reporting in the writeback pathTrond Myklebust
Prevent a potential deadlock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12pnfs/flexfiles: LAYOUTSTATS ii_count should be ops instead of bytesPeng Tao
Turned out I misinterpreted the spec... Cc: Tom Haynes <thomas.haynes@primarydata.com> Reported-by: Jean Spector <jean@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-11f2fs: remove inmem radix treeChao Yu
Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit 042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-11f2fs: report EINVAL for unalignment direct IOChao Yu
We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in detail is as fallow: dio04 <<<test_start>>> tag=dio04 stime=1432278894 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write diotest4 5 TPASS : Read beyond the file size ...... the result of ext4 with same environment: dio04 <<<test_start>>> tag=dio04 stime=1432259643 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TPASS : Odd count of read and write diotest4 4 TPASS : Read beyond the file size ...... The reason is that when triggering DIO in f2fs, we will return zero value in ->direct_IO if writer's buffer offset, file offset and transfer size is not alignment to block size of filesystem, resulting in falling back into buffered write instead of returning -EINVAL. This patch fixes that problem by returning correct error number for above case, and removing the judgement condition in check_direct_IO to make sure the verification will be enabled for direct reader too. Besides, Jaegeuk Kim pointed out that there is expectional cases we should always make direct-io falling back into buffered write, such as dio in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Chao Yu make small change and add detail description in commit message] Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-11block: don't access bio->bi_error after bio_put()Sasha Levin
Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few dereferences of 'bio' after a call to bio_put(). This causes use-after-frees such as: [521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714 [521120.720638] Read of size 4 by task mount.ocfs2/9644 [521120.721212] ============================================================================= [521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected [521120.722968] ----------------------------------------------------------------------------- [521120.722968] [521120.723915] Disabling lock debugging due to kernel taint [521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080 [521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800 [521120.726037] [521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............ [521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................ [521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................ [521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................ [521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6.... [521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..<.... [521120.736667] Object ffff880f36b38790: 00 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00 ................ [521120.737596] Object ffff880f36b387a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.738524] Object ffff880f36b387b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.739388] Object ffff880f36b387c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.740277] Object ffff880f36b387d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.741187] Object ffff880f36b387e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.742233] Object ffff880f36b387f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [521120.743229] CPU: 41 PID: 9644 Comm: mount.ocfs2 Tainted: G B 4.2.0-rc6-next-20150810-sasha-00039-gf909086 #2420 [521120.744274] ffff880f36b38000 ffff880d89c8f638 ffffffffb6e9ba8a ffff880101c0e5c0 [521120.745025] ffff880d89c8f668 ffffffffad76a313 ffff880101c0e5c0 ffffea003cdace00 [521120.745908] ffff880f36b38700 ffff880f36b38798 ffff880d89c8f690 ffffffffad772854 [521120.747063] Call Trace: [521120.747520] dump_stack (lib/dump_stack.c:52) [521120.748053] print_trailer (mm/slub.c:653) [521120.748582] object_err (mm/slub.c:660) [521120.749079] kasan_report_error (include/linux/kasan.h:20 mm/kasan/report.c:152 mm/kasan/report.c:194) [521120.750834] __asan_report_load4_noabort (mm/kasan/report.c:250) [521120.753580] dio_bio_complete (fs/direct-io.c:478) [521120.755752] do_blockdev_direct_IO (fs/direct-io.c:494 fs/direct-io.c:1291) [521120.759765] __blockdev_direct_IO (fs/direct-io.c:1322) [521120.761658] blkdev_direct_IO (fs/block_dev.c:162) [521120.762993] generic_file_read_iter (mm/filemap.c:1738) [521120.767405] blkdev_read_iter (fs/block_dev.c:1649) [521120.768556] __vfs_read (fs/read_write.c:423 fs/read_write.c:434) [521120.772126] vfs_read (fs/read_write.c:454) [521120.773118] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594) [521120.776062] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186) [521120.777375] Memory state around the buggy address: [521120.778118] ffff880f36b38600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.779211] ffff880f36b38680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.780315] >ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.781465] ^ [521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [521120.784818] ================================================================== This patch fixes a few of those places that I caught while auditing the patch, but the original patch should be audited further for more occurences of this issue since I'm not too familiar with the code. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-11quota: remove an unneeded conditionDan Carpenter
We know "ret" is zero here so we can remove this condition. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.com>
2015-08-10NFSv4.1/pnfs: Fix atomicity of commit list updatesTrond Myklebust
pnfs_layout_mark_request_commit() needs to ensure that it adds the request to the commit list atomically with all the other updates in order to prevent corruption to buckets[ds_commit_idx].wlseg due to races with pnfs_generic_clear_request_commit(). Fixes: 338d00cfef07d ("pnfs: Refactor the *_layout_mark_request_commit...") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-10Merge branch 'for-4.2' into for-4.3J. Bruce Fields
2015-08-10nfsd: Remove unused clientid arguments from, find_lockowner_str{_locked}Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Use lk_new_xxx instead of v.new.xxx for nfs4_lockownerKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove macro LOFF_OVERFLOWKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove duplicate checking of nfsd_net in nfs4_laundromat()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove unused values in nfs4_setlease()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove nfs4_set_claim_prev()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Drop duplicate checking of seqid in nfsd4_create_session()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove unneeded values in nfsd4_open()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Add missing gen_confirm in nfsd4_setclientid()Kinglong Mee
Commit 294ac32e99 "nfsd: protect clid and verifier generation with client_lock" moved gen_confirm() to gen_clid(). After that commit, setclientid will return a bad reply with all-zero verifier after copy_clid(). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: New counter for generating client confirm verifierKinglong Mee
If using clientid_counter, it seems possible that gen_confirm could generate the same verifier for the same client in some situations. Add a new counter for client confirm verifier to make sure gen_confirm generates a different verifier on each call for the same clientid. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Fix memory leak of so_owner.data in nfs4_stateownerKinglong Mee
v2, new helper nfs4_free_stateowner for freeing so_owner.data and sop Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Add layouts checking in client_has_state()Kinglong Mee
Layout is a state resource, nfsd should check it too. v2, drop unneeded updating in nfsd4_renew() v3, fix compile error without CONFIG_NFSD_PNFS Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Fix a memory leak of struct file_lockKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: abstract out svc_set_num_threads to sv_opsJeff Layton
Add an operation that will do setup of the service. In the case of a classic thread-based service that means starting up threads. In the case of a workqueue-based service, the setup will do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirliey.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operationJeff Layton
For now, all services use svc_xprt_do_enqueue, but once we add workqueue-based service support, we'll need to do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: move sv_module parm into sv_opsJeff Layton
...not technically an operation, but it's more convenient and cleaner to pass the module pointer in this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: move sv_function into sv_opsJeff Layton
Since we now have a container for holding svc_serv operations, move the sv_function into it as well. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: add a new svc_serv_ops struct and move sv_shutdown into itJeff Layton
In later patches we'll need to abstract out more operations on a per-service level, besides sv_shutdown and sv_function. Declare a new svc_serv_ops struct to hold these operations, and move sv_shutdown into this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10f2fs: report error of fill_zeroChao Yu
fill_zero can fail due to a lot of reason, but previously we do not handle its return value, so its callers such as punch_hole/f2fs_zero_range may report success, but actually can fail because of error occurs inside fill_zero. This patch fixes to report correct return value of fill_zero. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU pathwalk fix from Al Viro: "Another racy use of nd->path.dentry in RCU mode" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: may_follow_link() should use nd->inode
2015-08-09Merge 4.2-rc6 into char-misc-nextGreg Kroah-Hartman
We want the fixes in Linus's tree in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-09Merge branch 'jeffm-discard-4.3' into for-linus-4.3Chris Mason
2015-08-09Btrfs: add support for blkio controllersChris Mason
This attaches accounting information to bios as we submit them so the new blkio controllers can throttle on btrfs filesystems. Not much is required, we're just associating bios with blkcgs during clone, calling wbc_init_bio()/wbc_account_io() during writepages submission, and attaching the bios to the current context during direct IO. Finally if we are splitting bios during btrfs_map_bio, this attaches accounting information to the split. The end result is able to throttle nicely on single disk filesystems. A little more work is required for multi-device filesystems. Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: remove unused mutex from struct 'btrfs_fs_info'Byongho Lee
The code using 'ordered_extent_flush_mutex' mutex has removed by below commit. - 8d875f95da43c6a8f18f77869f2ef26e9594fecc btrfs: disable strict file flushes for renames and truncates But the mutex still lives in struct 'btrfs_fs_info'. So, this patch removes the mutex from struct 'btrfs_fs_info' and its initialization code. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix parity scrub of RAID 5/6 with missing deviceOmar Sandoval
When testing the previous patch, Zhao Lei reported a similar bug when attempting to scrub a degraded RAID 5/6 filesystem with a missing device, leading to NULL pointer dereferences from the RAID 5/6 parity scrubbing code. The first cause was the same as in the previous patch: attempting to call bio_add_page() on a missing block device. To fix this, scrub_extent_for_parity() can just mark the sectors on the missing device as errors instead of attempting to read from it. Additionally, the code uses scrub_remap_extent() to map the extent of the corresponding data stripe, but the extent wasn't already mapped. If scrub_remap_extent() finds a missing block device, it doesn't initialize extent_dev, so we're left with a NULL struct btrfs_device. The solution is to use btrfs_map_block() directly. Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix device replace of a missing RAID 5/6 deviceOmar Sandoval
The original implementation of device replace on RAID 5/6 seems to have missed support for replacing a missing device. When this is attempted, we end up calling bio_add_page() on a bio with a NULL ->bi_bdev, which crashes when we try to dereference it. This happens because btrfs_map_block() has no choice but to return us the missing device because RAID 5/6 don't have any alternate mirrors to read from, and a missing device has a NULL bdev. The idea implemented here is to handle the missing device case separately, which better only happen when we're replacing a missing RAID 5/6 device. We use the new BTRFS_RBIO_REBUILD_MISSING operation to reconstruct the data from parity, check it with scrub_recheck_block_checksum(), and write it out with scrub_write_block_to_dev_replace(). Reported-by: Philip <bugzilla@philip-seeger.de> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96141 Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operationOmar Sandoval
The current RAID 5/6 recovery code isn't quite prepared to handle missing devices. In particular, it expects a bio that we previously attempted to use in the read path, meaning that it has valid pages allocated. However, missing devices have a NULL blkdev, and we can't call bio_add_page() on a bio with a NULL blkdev. We could do manual manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add a separate path that allows us to manually add pages to the rbio. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: count devices correctly in readahead during RAID 5/6 replaceOmar Sandoval
Commit 5fbc7c59fd22 ("Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting") fixed a problem where we would skip a missing device when we shouldn't have because there are no other mirrors to read from in RAID 5/6. After commit 2c8cdd6ee4e7 ("Btrfs, replace: write dirty pages into the replace target device"), the fix doesn't work when we're doing a missing device replace on RAID 5/6 because the replace device is counted as a mirror so we're tricked into thinking we can safely skip the missing device. The fix is to count only the real stripes and decide based on that. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: remove misleading handling of missing device scrubOmar Sandoval
scrub_submit() claims that it can handle a bio with a NULL block device, but this is misleading, as calling bio_add_page() on a bio with a NULL ->bi_bdev would've already crashed. Delete this, as we're about to properly handle a missing block device. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: fix clone / extent-same deadlocksMark Fasheh
Clone and extent same lock their source and target inodes in opposite order. In addition to this, the range locking in clone doesn't take ordering into account. Fix this by having clone use the same locking helpers as btrfs-extent-same. In addition, I do a small cleanup of the locking helpers, removing a case (both inodes being the same) which was poorly accounted for and never actually used by the callers. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>