summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-08-11quota: remove an unneeded conditionDan Carpenter
We know "ret" is zero here so we can remove this condition. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.com>
2015-08-10NFSv4.1/pnfs: Fix atomicity of commit list updatesTrond Myklebust
pnfs_layout_mark_request_commit() needs to ensure that it adds the request to the commit list atomically with all the other updates in order to prevent corruption to buckets[ds_commit_idx].wlseg due to races with pnfs_generic_clear_request_commit(). Fixes: 338d00cfef07d ("pnfs: Refactor the *_layout_mark_request_commit...") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-10Merge branch 'for-4.2' into for-4.3J. Bruce Fields
2015-08-10nfsd: Remove unused clientid arguments from, find_lockowner_str{_locked}Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Use lk_new_xxx instead of v.new.xxx for nfs4_lockownerKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove macro LOFF_OVERFLOWKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove duplicate checking of nfsd_net in nfs4_laundromat()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove unused values in nfs4_setlease()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove nfs4_set_claim_prev()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Drop duplicate checking of seqid in nfsd4_create_session()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Remove unneeded values in nfsd4_open()Kinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Add missing gen_confirm in nfsd4_setclientid()Kinglong Mee
Commit 294ac32e99 "nfsd: protect clid and verifier generation with client_lock" moved gen_confirm() to gen_clid(). After that commit, setclientid will return a bad reply with all-zero verifier after copy_clid(). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: New counter for generating client confirm verifierKinglong Mee
If using clientid_counter, it seems possible that gen_confirm could generate the same verifier for the same client in some situations. Add a new counter for client confirm verifier to make sure gen_confirm generates a different verifier on each call for the same clientid. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Fix memory leak of so_owner.data in nfs4_stateownerKinglong Mee
v2, new helper nfs4_free_stateowner for freeing so_owner.data and sop Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Add layouts checking in client_has_state()Kinglong Mee
Layout is a state resource, nfsd should check it too. v2, drop unneeded updating in nfsd4_renew() v3, fix compile error without CONFIG_NFSD_PNFS Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd: Fix a memory leak of struct file_lockKinglong Mee
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: abstract out svc_set_num_threads to sv_opsJeff Layton
Add an operation that will do setup of the service. In the case of a classic thread-based service that means starting up threads. In the case of a workqueue-based service, the setup will do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirliey.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operationJeff Layton
For now, all services use svc_xprt_do_enqueue, but once we add workqueue-based service support, we'll need to do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: move sv_module parm into sv_opsJeff Layton
...not technically an operation, but it's more convenient and cleaner to pass the module pointer in this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: move sv_function into sv_opsJeff Layton
Since we now have a container for holding svc_serv operations, move the sv_function into it as well. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10nfsd/sunrpc: add a new svc_serv_ops struct and move sv_shutdown into itJeff Layton
In later patches we'll need to abstract out more operations on a per-service level, besides sv_shutdown and sv_function. Declare a new svc_serv_ops struct to hold these operations, and move sv_shutdown into this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-08-10f2fs: report error of fill_zeroChao Yu
fill_zero can fail due to a lot of reason, but previously we do not handle its return value, so its callers such as punch_hole/f2fs_zero_range may report success, but actually can fail because of error occurs inside fill_zero. This patch fixes to report correct return value of fill_zero. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU pathwalk fix from Al Viro: "Another racy use of nd->path.dentry in RCU mode" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: may_follow_link() should use nd->inode
2015-08-09Merge 4.2-rc6 into char-misc-nextGreg Kroah-Hartman
We want the fixes in Linus's tree in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-09Merge branch 'jeffm-discard-4.3' into for-linus-4.3Chris Mason
2015-08-09Btrfs: add support for blkio controllersChris Mason
This attaches accounting information to bios as we submit them so the new blkio controllers can throttle on btrfs filesystems. Not much is required, we're just associating bios with blkcgs during clone, calling wbc_init_bio()/wbc_account_io() during writepages submission, and attaching the bios to the current context during direct IO. Finally if we are splitting bios during btrfs_map_bio, this attaches accounting information to the split. The end result is able to throttle nicely on single disk filesystems. A little more work is required for multi-device filesystems. Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: remove unused mutex from struct 'btrfs_fs_info'Byongho Lee
The code using 'ordered_extent_flush_mutex' mutex has removed by below commit. - 8d875f95da43c6a8f18f77869f2ef26e9594fecc btrfs: disable strict file flushes for renames and truncates But the mutex still lives in struct 'btrfs_fs_info'. So, this patch removes the mutex from struct 'btrfs_fs_info' and its initialization code. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix parity scrub of RAID 5/6 with missing deviceOmar Sandoval
When testing the previous patch, Zhao Lei reported a similar bug when attempting to scrub a degraded RAID 5/6 filesystem with a missing device, leading to NULL pointer dereferences from the RAID 5/6 parity scrubbing code. The first cause was the same as in the previous patch: attempting to call bio_add_page() on a missing block device. To fix this, scrub_extent_for_parity() can just mark the sectors on the missing device as errors instead of attempting to read from it. Additionally, the code uses scrub_remap_extent() to map the extent of the corresponding data stripe, but the extent wasn't already mapped. If scrub_remap_extent() finds a missing block device, it doesn't initialize extent_dev, so we're left with a NULL struct btrfs_device. The solution is to use btrfs_map_block() directly. Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix device replace of a missing RAID 5/6 deviceOmar Sandoval
The original implementation of device replace on RAID 5/6 seems to have missed support for replacing a missing device. When this is attempted, we end up calling bio_add_page() on a bio with a NULL ->bi_bdev, which crashes when we try to dereference it. This happens because btrfs_map_block() has no choice but to return us the missing device because RAID 5/6 don't have any alternate mirrors to read from, and a missing device has a NULL bdev. The idea implemented here is to handle the missing device case separately, which better only happen when we're replacing a missing RAID 5/6 device. We use the new BTRFS_RBIO_REBUILD_MISSING operation to reconstruct the data from parity, check it with scrub_recheck_block_checksum(), and write it out with scrub_write_block_to_dev_replace(). Reported-by: Philip <bugzilla@philip-seeger.de> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96141 Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operationOmar Sandoval
The current RAID 5/6 recovery code isn't quite prepared to handle missing devices. In particular, it expects a bio that we previously attempted to use in the read path, meaning that it has valid pages allocated. However, missing devices have a NULL blkdev, and we can't call bio_add_page() on a bio with a NULL blkdev. We could do manual manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add a separate path that allows us to manually add pages to the rbio. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: count devices correctly in readahead during RAID 5/6 replaceOmar Sandoval
Commit 5fbc7c59fd22 ("Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting") fixed a problem where we would skip a missing device when we shouldn't have because there are no other mirrors to read from in RAID 5/6. After commit 2c8cdd6ee4e7 ("Btrfs, replace: write dirty pages into the replace target device"), the fix doesn't work when we're doing a missing device replace on RAID 5/6 because the replace device is counted as a mirror so we're tricked into thinking we can safely skip the missing device. The fix is to count only the real stripes and decide based on that. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: remove misleading handling of missing device scrubOmar Sandoval
scrub_submit() claims that it can handle a bio with a NULL block device, but this is misleading, as calling bio_add_page() on a bio with a NULL ->bi_bdev would've already crashed. Delete this, as we're about to properly handle a missing block device. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: fix clone / extent-same deadlocksMark Fasheh
Clone and extent same lock their source and target inodes in opposite order. In addition to this, the range locking in clone doesn't take ordering into account. Fix this by having clone use the same locking helpers as btrfs-extent-same. In addition, I do a small cleanup of the locking helpers, removing a case (both inodes being the same) which was poorly accounted for and never actually used by the callers. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix defrag to merge tail file extentLiu Bo
The file layout is [extent 1]...[extent n][4k extent][HOLE][extent x] extent 1~n and 4k extent can be merged during defrag, and the whole defrag bytes is larger than our defrag thresh(256k), 4k extent as a tail is left unmerged since we check if its next extent can be merged (the next one is a hole, so the check will fail), the layout thus can be [new extent][4k extent][HOLE][extent x] (1~n) To fix it, beside looking at the next one, this also looks at the previous one by checking @defrag_end, which is set to 0 when we decide to stop merging contiguous extents, otherwise, we can merge the previous one with our extent. Also, this makes btrfs behave consistent with how xfs and ext4 do. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix warning in backref walkingLiu Bo
When we do backref walking, we search firstly in queued delayed refs and then the on-disk backrefs, but we parse differently for shared references, for delayed refs we also add 'ref->root' while for on-disk backrefs we don't, this can prevent us from merging refs indexed by the same bytenr and cause find_parent_nodes() to throw a warning at 'WARN_ON(ref->count < 0)', for example, when we have a shared data extent with 'ref_cnt=1' and a delayed shared data with a BTRFS_DROP_DELAYED_REF, that happens. For shared references, no matter if it's delayed or on-disk, ref->root is not at all used, instead it's ref->parent that really matters, so this has delayed refs handled as the same way as on-disk refs. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Add WARN_ON() for double lock in btrfs_tree_lock()Zhaolei
When a task trying to double lock a extent buffer, there are no lockdep warning about it because this lock may be in "blocking_lock" state, and make us hard to debug. This patch add a WARN_ON() for above condition, it can not report all deadlock cases(as lock between tasks), but at least helps us some. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Remove root argument in extent_data_ref_count()Zhaolei
Because it is never used. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Fix wrong comment of btrfs_alloc_tree_block()Zhaolei
These wrong comment was copyed from another function(expired) from init, this patch fixed them. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: abort transaction on btrfs_reloc_cow_block()Zhaolei
When btrfs_reloc_cow_block() failed in __btrfs_cow_block(), current code just return a err-value to caller, but leave new_created extent buffer exist and locked. Then subsequent code (in relocate) try to lock above eb again, and caused deadlock without any dmesg. (eb lock use wait_event(), so no lockdep message) It is hard to do recover work in __btrfs_cow_block() at this error point, but we can abort transaction to avoid deadlock and operate on unstable state.a It also helps developer to find wrong place quickly. (better than a frozen fs without any dmesg before patch) Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Remove unnecessary variants in relocation.cZhaolei
These arguments are not used in functions, remove them for cleanup and make kernel stack happy. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Cleanup: Remove chunk_objectid argument from btrfs_relocate_chunk()Zhaolei
Remove chunk_objectid argument from btrfs_relocate_chunk() because it is not necessary, it can also cleanup some code in caller for prepare its value. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Cleanup: Remove objectid's init-value in create_reloc_inode()Zhaolei
objectid's init-value is not used in any case, remove it. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Error handle for get_ref_objectid_v0() in relocate_block_group()Zhaolei
We need error checking code for get_ref_objectid_v0() in relocate_block_group(), to avoid unpredictable result, especially for accessing uninitialized value(when function failed) after this line. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Fix data checksum error cause by replace with io-load.Zhaolei
xfstests btrfs/070 sometimes failed. In my test machine, its fail rate is about 30%. In another vm(vmware), its fail rate is about 50%. Reason: btrfs/070 do replace and defrag with fsstress simultaneously, after above operation, checksum error is found by scrub. Actually, it have no relationship with defrag operation, only replace with fsstress can trigger this bug. New data writen to target device have possibility rewrited by old data from source device by replace code in debug, to avoid above problem, we can set target block group to readonly in replace period, so new data requested by other operation will not write to same place with replace code. Before patch(4.1-rc3): 30% failed in 100 xfstests. After patch: 0% failed in 300 xfstests. It also happened in btrfs/071 as it's another scrub with IO load tests. Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks()Zhaolei
Use new intruduced scrub_pause_on/off() can make this code block clean and more readable. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off()Zhaolei
It can reduce current duplicated code which is similar to scrub_blocked_if_needed() but can not call it because little different. It also used by my next patch which is in same case. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Use ref_cnt for set_block_group_ro()Zhaolei
More than one code call set_block_group_ro() and restore rw in fail. Old code use bool bit to save blockgroup's ro state, it can not support parallel case(it is confirmd exist in my debug log). This patch use ref count to store ro state, and rename set_block_group_ro/set_block_group_rw to inc_block_group_ro/dec_block_group_ro. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Bypass unrelated items before accessing its contents in scrubZhao Lei
When we access extent_root in scrub_stripe() and scrub_raid56_parity(), we need bypass unrelated tree item firstly before using its contents to do other condition. It is not a bug fix, only making code sequence in logic. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Load only necessary csums into list in scrubZhao Lei
We need not load csum of whole strip in scrub because strip is trimed before use, it is to say, what we really need to calculate csum is data between [extent_logical, extent_len). This patch changed to use above segment for btrfs_lookup_csums_range() in scrub_stripe() Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Fix calculate typo caused by ambiguous meaning of logic_endZhao Lei
For example, in scrub_raid56_parity(), following lines are used to judge is all data processed: place1: if (key.objectid > logic_end) ... place2: if (logic_start >= logic_end) ... ... (place2 is typo, is should be ">", it is copied from other place, where logic_end's meaning is different, long story...) We can fix above typo directly, but the root reason is ambiguous meaning of logic_end in scrub raid56 parity. In other place, XXX_end is pointed to data which is not included, and we need to process segment of [XXX_start, XXX_end). But for scrub raid56 parity, logic_end is pointed to lattest data need to process, and introduced many "+ 1" and "- 1" in code as below: length = sparity->logic_end - sparity->logic_start + 1 logic_end - logic_start + 1 stripe_logical + increment - 1 This patch changed logic_end's meaning to make it in normal understanding in raid56 parity functions and data struct alone with above bugfix. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>