summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2016-02-18btrfs: Continue write in case of can_not_nocowZhao Lei
btrfs failed in xfstests btrfs/080 with -o nodatacow. Can be reproduced by following script: DEV=/dev/vdg MNT=/mnt/tmp umount $DEV &>/dev/null mkfs.btrfs -f $DEV mount -o nodatacow $DEV $MNT dd if=/dev/zero of=$MNT/test bs=1 count=2048 & btrfs subvolume snapshot -r $MNT $MNT/test_snap & wait -- We can see dd failed on NO_SPACE. Reason: __btrfs_buffered_write should run cow write when no_cow impossible, and current code is designed with above logic. But check_can_nocow() have 2 type of return value(0 and <0) on can_not_no_cow, and current code only continue write on first case, the second case happened in doing subvolume. Fix: Continue write when check_can_nocow() return 0 and <0. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
2016-02-18btrfs: drop null testing before destroy functionsKinglong Mee
Cleanup. kmem_cache_destroy has support NULL argument checking, so drop the double null testing before calling it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: fix build warningSudip Mukherjee
We were getting build warning about: fs/btrfs/extent-tree.c:7021:34: warning: ‘used_bg’ may be used uninitialized in this function It is not a valid warning as used_bg is never used uninitilized since locked is initially false so we can never be in the section where 'used_bg' is used. But gcc is not able to understand that and we can initialize it while declaring to silence the warning. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: use proper type for failrec in extent_stateDavid Sterba
We use the private member of extent_state to store the failrec and play pointless pointer games. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: Replace CURRENT_TIME by current_fs_time()Deepa Dinamani
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: linux-btrfs@vger.kernel.org Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: remove open-coded swap() in backref.c:__merge_refsDave Jones
The kernel provides a swap() that does the same thing as this code. Signed-off-by: Dave Jones <dsj@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: remove redundant error checkByongho Lee
While running btrfs_mksubvol(), d_really_is_positive() is called twice. First in btrfs_mksubvol() and second inside btrfs_may_create(). So I remove the first one. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: simplify expression in btrfs_calc_trans_metadata_size()Byongho Lee
Simplify expression in btrfs_calc_trans_metadata_size(). Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18Btrfs: check reserved when deciding to background flushJosef Bacik
We will sometimes start background flushing the various enospc related things (delayed nodes, delalloc, etc) if we are getting close to reserving all of our available space. We don't want to do this however when we are actually using this space as it causes unneeded thrashing. We currently try to do this by checking bytes_used >= thresh, but bytes_used is only part of the equation, we need to use bytes_reserved as well as this represents space that is very likely to become bytes_used in the future. My tracing tool will keep count of the number of times we kick off the async flusher, the following are counts for the entire run of generic/027 No Patch Patch avg: 5385 5009 median: 5500 4916 We skewed lower than the average with my patch and higher than the average with the patch, overall it cuts the flushing from anywhere from 5-10%, which in the case of actual ENOSPC is quite helpful. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18Btrfs: add transaction space reservation tracepointsJosef Bacik
There are a few places where we add to trans->bytes_reserved but don't have the corresponding trace point. With these added my tool no longer sees transaction leaks. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18Btrfs: fix truncate_space_checkJosef Bacik
truncate_space_check is using btrfs_csum_bytes_to_leaves() but forgetting to multiply by nodesize so we get an actual byte count. We need a tracepoint here so that we have the matching reserve for the release that will come later. Also add a comment to make clear what the intent of truncate_space_check is. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18Btrfs: change how we update the global block rsvJosef Bacik
I'm writing a tool to visualize the enospc system in order to help debug enospc bugs and I found weird data and ran it down to when we update the global block rsv. We add all of the remaining free space to the block rsv, do a trace event, then remove the extra and do another trace event. This makes my visualization look silly and is unintuitive code as well. Fix this stuff to only add the amount we are missing, or free the amount we are missing. This is less clean to read but more explicit in what it is doing, as well as only emitting events for values that make sense. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: ignore creating reada_extent for a non-existent deviceZhao Lei
For a non-existent device, old code bypasses adding it in dev's reada queue. And to solve problem of unfinished waitting in raid5/6, commit 5fbc7c59fd22 ("Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting") adding an exception for the first stripe, in short, the first stripe will always be processed whether the device exists or not. Actually we have a better way for the above request: just bypass creation of the reada_extent for non-existent device, it will make code simple and effective. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: avoid undone reada extents in btrfs_reada_waitZhao Lei
Reada background works is not designed to finish all jobs completely, it will break in following case: 1: When a device reaches workload limit (MAX_IN_FLIGHT) 2: Total reads reach max limit (10000) 3: All devices don't have queued more jobs, often happened in DUP case And if all background works exit with remaining jobs, btrfs_reada_wait() will wait indefinetelly. Above problem is rarely happened in old code, because: 1: Every work queues 2x new works So many works reduced chances of undone jobs. 2: One work will continue 10000 times loop in case of no-jobs It reduced no-thread window time. But after we fixed above case, the "undone reada extents" frequently happened. Fix: Check to ensure we have at least one thread if there are undone jobs in btrfs_reada_wait(). Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: limit max works countZhao Lei
Reada creates 2 works for each level of tree recursively. In case of a tree having many levels, the number of created works is 2^level_of_tree. Actually we don't need so many works in parallel, this patch limits max works to BTRFS_MAX_MIRRORS * 2. The per-fs works_counter will be also used for btrfs_reada_wait() to check is there are background workers. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: simplify dev->reada_in_flight processingZhao Lei
No need to decrease dev->reada_in_flight in __readahead_hook()'s internal and reada_extent_put(). reada_extent_put() have no chance to decrease dev->reada_in_flight in free operation, because reada_extent have additional refcnt when scheduled to a dev. We can put inc and dec operation for dev->reada_in_flight to one place instead to make logic simple and safe, and move useless reada_extent->scheduled_for to a bool flag instead. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: Fix a debug code typoZhao Lei
Remove one copy of loop to fix the typo of iterate zones. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: Jump into cleanup in direct way for __readahead_hook()Zhao Lei
Current code set nritems to 0 to make for_loop useless to bypass it, and set generation's value which is not necessary. Jump into cleanup directly is better choise. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: Use fs_info instead of root in __readahead_hook's argumentZhao Lei
What __readahead_hook() need exactly is fs_info, no need to convert fs_info to root in caller and convert back in __readahead_hook() Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: Pass reada_extent into __readahead_hook directlyZhao Lei
reada_start_machine_dev() already have reada_extent pointer, pass it into __readahead_hook() directly instead of search radix_tree will make code run faster. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: move reada_extent_put to place after __readahead_hook()Zhao Lei
We can't release reada_extent earlier than __readahead_hook(), because __readahead_hook() still need to use it, it is necessary to hode a refcnt to avoid it be freed. Actually it is not a problem after my patch named: Avoid many times of empty loop It make reada_extent in above line include at least one reada_extctl, which keeps additional one refcnt for reada_extent. But we still need this patch to make the code in pretty logic. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: Remove level argument in severial functionsZhao Lei
level is not used in severial functions, remove them from arguments, and remove relative code for get its value. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: bypass adding extent when all zone failedZhao Lei
When failed adding all dev_zones for a reada_extent, the extent will have no chance to be selected to run, and keep in memory for ever. We should bypass this extent to avoid above case. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: add all reachable mirrors into reada device listZhao Lei
If some device is not reachable, we should bypass and continus addingb next, instead of break on bad device. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18btrfs: reada: Move is_need_to_readahead contition earlierZhao Lei
Move is_need_to_readahead contition earlier to avoid useless loop to get relative data for readahead. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-18Merge branch 'x86/urgent' into x86/asm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16btrfs: reada: Avoid many times of empty loopZhao Lei
We can see following loop(10000 times) in trace_log: [ 75.416137] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.417413] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 75.418611] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.419793] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 75.421016] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.422324] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 75.423661] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 75.424882] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 ...(10000 times) [ 124.101672] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 124.102850] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 [ 124.104008] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2 [ 124.105121] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1 Reason: If more than one user trigger reada in same extent, the first task finished setting of reada data struct and call reada_start_machine() to start, and the second task only add a ref_count but have not add reada_extctl struct completely, the reada_extent can not finished all jobs, and will be selected in __reada_start_machine() for 10000 times(total times in __reada_start_machine()). Fix: For a reada_extent without job, we don't need to run it, just return 0 to let caller break. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-16btrfs: reada: Add missed segment checking in reada_find_zoneZhao Lei
In rechecking zone-in-tree, we still need to check zone include our logical address. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-16btrfs: reada: reduce additional fs_info->reada_lock in reada_find_zoneZhao Lei
We can avoid additional locking-acquirment and one pair of kref_get/put by combine two condition. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-16btrfs: reada: Fix in-segment calculation for readaZhao Lei
reada_zone->end is end pos of segment: end = start + cache->key.offset - 1; So we need to use "<=" in condition to judge is a pos in the segment. The problem happened rearly, because logical pos rarely pointed to last 4k of a blockgroup, but we need to fix it to make code right in logic. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-16Btrfs: fix direct IO requests not reporting IO error to user spaceFilipe Manana
If a bio for a direct IO request fails, we were not setting the error in the parent bio (the main DIO bio), making us not return the error to user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO() return the number of bytes issued for IO and not the error a bio created and submitted by btrfs_submit_direct() got from the block layer. This essentially happens because when we call: dio_end_io(dio_bio, bio->bi_error); It does not set dio_bio->bi_error to the value of the second argument. So just add this missing assignment in endio callbacks, just as we do in the error path at btrfs_submit_direct() when we fail to clone the dio bio or allocate its private object. This follows the convention of what is done with other similar APIs such as bio_endio() where the caller is responsible for setting the bi_error field in the bio it passes as an argument to bio_endio(). This was detected by the new generic test cases in xfstests: 271, 272, 276 and 278. Which essentially setup a dm error target, then load the error table, do a direct IO write and unload the error table. They expect the write to fail with -EIO, which was not getting reported when testing against btrfs. Cc: stable@vger.kernel.org # 4.3+ Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio") Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-02-12Merge branch 'for-linus-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has a few fixes from Filipe, along with a readdir fix from Dave that we've been testing for some time" * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: properly set the termination value of ctx->pos in readdir Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl Btrfs: remove no longer used function extent_read_full_page_nolock() Btrfs: fix page reading in extent_same ioctl leading to csum errors Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
2016-02-12btrfs: Introduce new mount option alias for nologreplayQu Wenruo
Introduce new mount option alias "norecovery" for nologreplay, to keep "norecovery" behavior the same with other filesystems. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-12btrfs: Introduce new mount option to disable tree log replayQu Wenruo
Introduce a new mount option "nologreplay" to co-operate with "ro" mount option to get real readonly mount, like "norecovery" in ext* and xfs. Since the new parse_options() need to check new flags at remount time, so add a new parameter for parse_options(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-12btrfs: Introduce new mount option usebackuproot to replace recoveryQu Wenruo
Current "recovery" mount option will only try to use backup root. However the word "recovery" is too generic and may be confusing for some users. Here introduce a new and more specific mount option, "usebackuproot" to replace "recovery" mount option. "Recovery" will be kept for compatibility reason, but will be deprecated. Also, since "usebackuproot" will only affect mount behavior and after open_ctree() it has nothing to do with the filesystem, so clear the flag after mount succeeded. This provides the basis for later unified "norecovery" mount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ dropped usebackuproot from show_mount, added note about 'recovery' to docs ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: teach print_leaf about temporary item subtypesDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: teach print_leaf about permanent item subtypesDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: switch dev stats item to the permanent item keyDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: introduce key type for persistent permanent itemsDavid Sterba
The number of distinct key types is not that big that we could waste one for something new we want to store in the tree. Similar to the temporary items, we'll introduce a new name for an existing key value and use the objectid for further extension. The victim is the BTRFS_DEV_STATS_KEY (248). The device stats are an example of a permanent item. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: switch balance item to the temporary item keyDavid Sterba
No visible change. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: introduce key type for persistent temporary itemsDavid Sterba
The number of distinct key types is not that big that we could waste one for something new we want to store in the tree. We'll introduce a new name for an existing key value and use the objectid for further extension. The victim is the BTRFS_BALANCE_ITEM_KEY (248). The nature of the balance status item is a good example of the temporary item. It exists from beginning of the balance, keeps the status until it finishes. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: properly set the termination value of ctx->pos in readdirDavid Sterba
The value of ctx->pos in the last readdir call is supposed to be set to INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a larger value, then it's LLONG_MAX. There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++" overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a 64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before the increment. We can get to that situation like that: * emit all regular readdir entries * still in the same call to readdir, bump the last pos to INT_MAX * next call to readdir will not emit any entries, but will reach the bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX Normally this is not a problem, but if we call readdir again, we'll find 'pos' set to LLONG_MAX and the unconditional increment will overflow. The report from Victor at (http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging print shows that pattern: Overflow: e Overflow: 7fffffff Overflow: 7fffffffffffffff PAX: size overflow detected in function btrfs_real_readdir fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0; context: dir_context; CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015 ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48 ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78 ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8 Call Trace: [<ffffffff81742f0f>] dump_stack+0x4c/0x7f [<ffffffff811cb706>] report_size_overflow+0x36/0x40 [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0 [<ffffffff811dafc8>] iterate_dir+0xa8/0x150 [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70 [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0 Overflow: 1a [<ffffffff811db070>] ? iterate_dir+0x150/0x150 [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83 The jump from 7fffffff to 7fffffffffffffff happens when new dir entries are not yet synced and are processed from the delayed list. Then the code could go to the bump section again even though it might not emit any new dir entries from the delayed list. The fix avoids entering the "bump" section again once we've finished emitting the entries, both for synced and delayed entries. References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 Reported-by: Victor <services@swwu.com> CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.com> Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-02-11btrfs: switch to kcalloc in btrfs_cmp_data_prepareDavid Sterba
Kcalloc is functionally equivalent and does overflow checks. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: extent same: use GFP_KERNEL for page array allocationsDavid Sterba
We can safely use GFP_KERNEL in the functions called from the ioctl handlers. Here we can allocate up to 32k so less pressure to the allocator could help. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: device add and remove: use GFP_KERNELDavid Sterba
We can safely use GFP_KERNEL in the functions called from the ioctl handlers. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: readdir: use GFP_KERNELDavid Sterba
Readdir is initiated from userspace and is not on the critical writeback path, we don't need to use GFP_NOFS for allocations. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: fallocate: use GFP_KERNELDavid Sterba
Fallocate is initiated from userspace and is not on the critical writeback path, we don't need to use GFP_NOFS for allocations. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: let callers of btrfs_alloc_root pass gfp flagsDavid Sterba
We don't need to use GFP_NOFS in all contexts, eg. during mount or for dummy root tree, but we might for the the log tree creation. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: scrub: use GFP_KERNEL on the submission pathDavid Sterba
Scrub is not on the critical writeback path we don't need to use GFP_NOFS for all allocations. The failures are handled and stats passed back to userspace. Let's use GFP_KERNEL on the paths where everything is ok, ie. setup the global structures and the IO submission paths. Functions that do the repair and fixups still use GFP_NOFS as we might want to skip any other filesystem activity if we encounter an error. This could turn out to be unnecessary, but requires more review compared to the easy cases in this patch. Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-11btrfs: reada: use GFP_KERNEL everywhereDavid Sterba
The readahead framework is not on the critical writeback path we don't need to use GFP_NOFS for allocations. All error paths are handled and the readahead failures are not fatal. The actual users (scrub, dev-replace) will trigger reads if the blocks are not found in cache. Signed-off-by: David Sterba <dsterba@suse.com>