summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-05-30btrfs: Factor out write portion of btrfs_get_blocks_directNikolay Borisov
Now that the read side is extracted into its own function, do the same to the write side. This leaves btrfs_get_blocks_direct_write with the sole purpose of handling common locking required. Also flip the condition in btrfs_get_blocks_direct_write so that the write case comes first and we check for if (Create) rather than if (!create). This is purely subjective but I believe makes reading a bit more "linear". No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: Factor out read portion of btrfs_get_blocks_directNikolay Borisov
Currently this function handles both the READ and WRITE dio cases. This is facilitated by a bunch of 'if' statements, a goto short-circuit statement and a very perverse aliasing of "!created"(READ) case by setting lockstart = lockend and checking for lockstart < lockend for detecting the write. Let's simplify this mess by extracting the READ-only code into a separate __btrfs_get_block_direct_read function. This is only the first step, the next one will be to factor out the write side as well. The end goal will be to have the common locking/ unlocking code in btrfs_get_blocks_direct and then it will call either the read|write subvariants. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30f2fs: turn down IO priority of discard from backgroundChao Yu
In order to avoid interfering normal r/w IO, let's turn down IO priority of discard issued from background. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: don't split checkpoint in fstrimChao Yu
Now, we issue discard asynchronously in separated thread instead of in checkpoint, after that, we won't encounter long latency in checkpoint due to huge number of synchronous discard command handling, so, we don't need to split checkpoint to do trim in batch, merge it and obsolete related sysfs entry. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30f2fs: issue discard commands proactively in high fs utilizationJaegeuk Kim
In the high utilization like over 80%, we don't expect huge # of large discard commands, but do many small pending discards which affects FTL GCs a lot. Let's issue them in that case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30btrfs: return ENOMEM if path allocation fails in btrfs_cross_ref_existSu Yue
The error code does not match the reason of failure and may confuse the callers. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: raid56: Remove VLA usageKees Cook
In the quest to remove all stack VLA usage from the kernel[1], this allocates the working buffers during regular init, instead of using stack space. This refactors the allocation code a bit to make it easier to review. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30xfs: repair superblocksDarrick J. Wong
If one of the backup superblocks is found to differ seriously from superblock 0, write out a fresh copy from the in-core sb. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to attach quotas to inodesDarrick J. Wong
Add a helper routine to attach quota information to inodes that are about to undergo repair. If that fails, we need to schedule a quotacheck for the next mount but allow the corrupted metadata repair to continue. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: recover AG btree roots from rmap dataDarrick J. Wong
Add a helper function to help us recover btree roots from the rmap data. Callers pass in a list of rmap owner codes, buffer ops, and magic numbers. We iterate the rmap records looking for owner matches, and then read the matching blocks to see if the magic number & uuid match. If so, we then read-verify the block, and if that passes then we retain a pointer to the block with the highest level, assuming that by the end of the call we will have found the root. This will be used to reset the AGF/AGI btree root fields during their rebuild procedures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to dispose of old btree blocks after a repairDarrick J. Wong
Now that we've plumbed in the ability to construct a list of dead btree blocks following a repair, add more helpers to dispose of them. This is done by examining the rmapbt -- if the btree was the only owner we can free the block, otherwise it's crosslinked and we can only remove the rmapbt record. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to collect and sift btree block pointers during repairDarrick J. Wong
Add some helpers to assemble a list of fs block extents. Generally, repair functions will iterate the rmapbt to make a list (1) of all extents owned by the nominal owner of the metadata structure; then they will iterate all other structures with the same rmap owner to make a list (2) of active blocks; and finally we have a subtraction function to subtract all the blocks in (2) from (1), with the result that (1) is now a list of blocks that were owned by the old btree and must be disposed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to allocate and initialize fresh btree rootsDarrick J. Wong
Add a pair of helper functions to allocate and initialize fresh btree roots. The repair functions will use these as part of recreating corrupted metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-05-30xfs: add helpers to deal with transaction allocation and rollingDarrick J. Wong
For repairs, we need to reserve at least as many blocks as we think we're going to need to rebuild the data structure, and we're going to need some helpers to roll transactions while maintaining locks on the AG headers so that other threads cannot wander into the middle of a repair. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-05-30xfs: grab the per-ag structure whenever relevantDarrick J. Wong
Grab and hold the per-AG data across a scrub run whenever relevant. This helps us avoid repeated trips through rcu and the radix tree in the repair code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30btrfs: return error value if create_io_em failed in cow_file_rangeSu Yue
In cow_file_range(), create_io_em() may fail, but its return value is not recorded. Then return value may be 0 even it failed which is a wrong behavior. Let cow_file_range() return PTR_ERR(em) if create_io_em() failed. Fixes: 6f9994dbabe5 ("Btrfs: create a helper to create em for IO") CC: stable@vger.kernel.org # 4.11+ Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshotGu JinXiang
Since there is no more use of qgroup_reserved member in struct btrfs_pending_snapshot, remove it. Signed-off-by: Gu JinXiang <gujx@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: drop unused parameter qgroup_reservedGu JinXiang
Since commit 7775c8184ec0 ("btrfs: remove unused parameter from btrfs_subvolume_release_metadata") parameter qgroup_reserved is not used by caller of function btrfs_subvolume_reserve_metadata. So remove it. Signed-off-by: Gu JinXiang <gujx@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: balance dirty metadata pages in btrfs_finish_ordered_ioEthan Lien
[Problem description and how we fix it] We should balance dirty metadata pages at the end of btrfs_finish_ordered_io, since a small, unmergeable random write can potentially produce dirty metadata which is multiple times larger than the data itself. For example, a small, unmergeable 4KiB write may produce: 16KiB dirty leaf (and possibly 16KiB dirty node) in subvolume tree 16KiB dirty leaf (and possibly 16KiB dirty node) in checksum tree 16KiB dirty leaf (and possibly 16KiB dirty node) in extent tree Although we do call balance dirty pages in write side, but in the buffered write path, most metadata are dirtied only after we reach the dirty background limit (which by far only counts dirty data pages) and wakeup the flusher thread. If there are many small, unmergeable random writes spread in a large btree, we'll find a burst of dirty pages exceeds the dirty_bytes limit after we wakeup the flusher thread - which is not what we expect. In our machine, it caused out-of-memory problem since a page cannot be dropped if it is marked dirty. Someone may worry about we may sleep in btrfs_btree_balance_dirty_nodelay, but since we do btrfs_finish_ordered_io in a separate worker, it will not stop the flusher consuming dirty pages. Also, we use different worker for metadata writeback endio, sleep in btrfs_finish_ordered_io help us throttle the size of dirty metadata pages. [Reproduce steps] To reproduce the problem, we need to do 4KiB write randomly spread in a large btree. In our 2GiB RAM machine: 1) Create 4 subvolumes. 2) Run fio on each subvolume: [global] direct=0 rw=randwrite ioengine=libaio bs=4k iodepth=16 numjobs=1 group_reporting size=128G runtime=1800 norandommap time_based randrepeat=0 3) Take snapshot on each subvolume and repeat fio on existing files. 4) Repeat step (3) until we get large btrees. In our case, by observing btrfs_root_item->bytes_used, we have 2GiB of metadata in each subvolume tree and 12GiB of metadata in extent tree. 5) Stop all fio, take snapshot again, and wait until all delayed work is completed. 6) Start all fio. Few seconds later we hit OOM when the flusher starts to work. It can be reproduced even when using nocow write. Signed-off-by: Ethan Lien <ethanlien@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: lift some btrfs_cross_ref_exist checks in nocow pathEthan Lien
In nocow path, we check if the extent is snapshotted in btrfs_cross_ref_exist(). We can do the similar check earlier and avoid unnecessary search into extent tree. A fio test on a Intel D-1531, 16GB RAM, SSD RAID-5 machine as follows: [global] group_reporting time_based thread=1 ioengine=libaio bs=4k iodepth=32 size=64G runtime=180 numjobs=8 rw=randwrite [file1] filename=/mnt/nocow/testfile IOPS result: unpatched patched 1 fio round: 46670 46958 snapshot 2 fio round: 51826 54498 3 fio round: 59767 61289 After snapshot, the first fio get about 5% performance gain. As we continually write to the same file, all writes will resume to nocow mode and eventually we have no performance gain. Signed-off-by: Ethan Lien <ethanlien@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update comments ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: Remove fs_info argument from btrfs_uuid_tree_remLu Fengqi
This function always takes a transaction handle which contains a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ rename the function ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: Remove fs_info argument from btrfs_uuid_tree_addLu Fengqi
This function always takes a transaction handle which contains a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: remove unused check of skip_lockingLiu Bo
The check is superfluous since all callers who set search_for_commit also have skip_locking set. ASSERT() is put in place to ensure skip_locking is set by new callers. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: remove always true check in unlock_upLiu Bo
As unlock_up() is written as for () { if (!path->locks[i]) break; ... if (... && path->locks[i]) { } } Apparently, @path->locks[i] is always true at this 'if'. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: grab write lock directly if write_lock_level is the max levelLiu Bo
Typically, when acquiring root node's lock, btrfs tries its best to get read lock and trade for write lock if @write_lock_level implies to do so. In case of (cow && (p->keep_locks || p->lowest_level)), write_lock_level is set to BTRFS_MAX_LEVEL, which means we need to acquire root node's write lock directly. In this particular case, the dance of acquiring read lock and then trading for write lock can be saved. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: move get root out of btrfs_search_slot to a helperLiu Bo
It's good to have a helper instead of having all get-root details open-coded. The new helper locks (if necessary) and sets root node of the path. Also invert the checks to make the code flow easier to read. There is no functional change in this commit. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: use more straightforward extent_buffer_uptodate checkLiu Bo
If parent_transid "0" is passed to btrfs_buffer_uptodate(), btrfs_buffer_uptodate() is equivalent to extent_buffer_uptodate(), but extent_buffer_uptodate() is preferred since we don't have to look into verify_parent_transid(). Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: remove superfluous free_extent_buffer in read_block_for_searchLiu Bo
read_block_for_search() can be simplified as: tmp = find_extent_buffer(); if (tmp) return; ... free_extent_buffer(); read_tree_block(); Apparently, @tmp must be NULL at this point, free_extent_buffer() is not needed. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: drop unused space_info parameter from create_space_infoLu Fengqi
Since commit dc2d3005d27d ("btrfs: remove dead create_space_info calls"), there is only one caller btrfs_init_space_info. However, it doesn't need create_space_info to return space_info at all. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: add parent_transid parameter to veirfy_level_keyLiu Bo
As verify_level_key() is checked after verify_parent_transid(), i.e. if (verify_parent_transid()) ret = -EIO; else if (verify_level_key()) ret = -EUCLEAN; if parent_transid is 0, verify_parent_transid() skips verifying parent_transid and considers eb as valid, and if verify_level_key() reports something wrong, we're not going to know if it's caused by corrupted metadata or non-checkecd eb (e.g. stale eb). The stale eb can be from an outdated raid1 mirror after a degraded mount, see eg "btrfs: fix reading stale metadata blocks after degraded raid1 mounts" (02a3307aa9c20b4f66262) for more details. @parent_transid is able to tell whether the eb's generation has been verified by the caller. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: qgroup: show more meaningful qgroup_rescan_init error messageQu Wenruo
Error message from qgroup_rescan_init() mostly looks like: BTRFS info (device nvme0n1p1): qgroup_rescan_init failed with -115 Which is far from meaningful, and sometimes confusing as for above -EINPROGRESS it's mostly (despite the init race) harmless, but sometimes it can also indicate problem if the return value is -EINVAL. Change it to some more meaningful messages like: BTRFS info (device nvme0n1p1): qgroup rescan is already in progress And BTRFS err(device nvme0n1p1): qgroup rescan init failed, qgroup is not enabled Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ update the messages and level ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2()Omar Sandoval
If we have invalid flags set, when we error out we must drop our writer counter and free the buffer we allocated for the arguments. This bug is trivially reproduced with the following program on 4.7+: #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/btrfs.h> #include <linux/btrfs_tree.h> int main(int argc, char **argv) { struct btrfs_ioctl_vol_args_v2 vol_args = { .flags = UINT64_MAX, }; int ret; int fd; if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_WRONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args); if (ret == -1) perror("ioctl"); close(fd); return EXIT_SUCCESS; } When unmounting the filesystem, we'll hit the WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the filesystem to be remounted read-only as the writer count will stay lifted. Fixes: 6b526ed70cf1 ("btrfs: introduce device delete by devid") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: lzo: Harden inline lzo compressed extent decompressionQu Wenruo
For inlined extent, we only have one segment, thus less things to check. And further more, inlined extent always has the csum in its leaf header, it's less probable to have corrupted data. Anyway, still check header and segment header. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: lzo: Add header length check to avoid potential out-of-bounds accessQu Wenruo
James Harvey reported that some corrupted compressed extent data can lead to various kernel memory corruption. Such corrupted extent data belongs to inode with NODATASUM flags, thus data csum won't help us detecting such bug. If lucky enough, KASAN could catch it like: BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs] Write of size 4096 at addr ffff8800606cb0f8 by task kworker/u16:0/2338 CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G O 4.17.0-rc5-custom+ #50 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-endio btrfs_endio_helper [btrfs] Call Trace: dump_stack+0xc2/0x16b print_address_description+0x6a/0x270 kasan_report+0x260/0x380 memcpy+0x34/0x50 lzo_decompress_bio+0x384/0x7a0 [btrfs] end_compressed_bio_read+0x99f/0x10b0 [btrfs] bio_endio+0x32e/0x640 normal_work_helper+0x15a/0xea0 [btrfs] process_one_work+0x7e3/0x1470 worker_thread+0x1b0/0x1170 kthread+0x2db/0x390 ret_from_fork+0x22/0x40 ... The offending compressed data has the following info: Header: length 32768 (looks completely valid) Segment 0 Header: length 3472882419 (obviously out of bounds) Then when handling segment 0, since it's over the current page, we need the copy the compressed data to temporary buffer in workspace, then such large size would trigger out-of-bounds memory access, screwing up the whole kernel. Fix it by adding extra checks on header and segment headers to ensure we won't access out-of-bounds, and even checks the decompressed data won't be out-of-bounds. Reported-by: James Harvey <jamespharvey20@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ updated comments ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-29aio: sanitize the limit checking in io_submit(2)Al Viro
as it is, the logics in native io_submit(2) is "if asked for more than LONG_MAX/sizeof(pointer) iocbs to submit, don't bother with more than LONG_MAX/sizeof(pointer)" (i.e. 512M requests on 32bit and 1E requests on 64bit) while compat io_submit(2) goes with "stop after the first PAGE_SIZE/sizeof(pointer) iocbs", i.e. 1K or so. Which is * inconsistent * *way* too much in native case * possibly too little in compat one and * wrong anyway, since the natural point where we ought to stop bothering is ctx->nr_events Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-29aio: fold do_io_submit() into callersAl Viro
get rid of insane "copy array of 32bit pointers into an array of native ones" glue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-29aio: shift copyin of iocb into io_submit_one()Al Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-29aio_read_events_ring(): make a bit more readableAl Viro
The logics for 'avail' is * not past the tail of cyclic buffer * no more than asked * not past the end of buffer * not past the end of a page Unobfuscate the last part. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-29aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the ↵Al Viro
same way ... so just make them return 0 when caller does not need to destroy iocb Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-29aio: take list removal to (some) callers of aio_complete()Al Viro
We really want iocb out of io_cancel(2) reach before we start tearing it down. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-29Merge tag 'afs-fixes-20180529' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - fix a BUG triggerable from faccessat() - fix the mounting of backup volumes * tag 'afs-fixes-20180529' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix mounting of backup volumes afs: Fix directory permissions check
2018-05-29f2fs: add fsync_mode=nobarrier for non-atomic filesJaegeuk Kim
For non-atomic files, this patch adds an option to give nobarrier which doesn't issue flush commands to the device. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29f2fs: let fstrim issue discard commands in lower priorityJaegeuk Kim
The fstrim gathers huge number of large discard commands, and tries to issue without IO awareness, which results in long user-perceive IO latencies on READ, WRITE, and FLUSH in UFS. We've observed some of commands take several seconds due to long discard latency. This patch limits the maximum size to 2MB per candidate, and check IO congestion when issuing them to disk. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-29fs: xfs: Change return type to vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handlers. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-29xfs: fix inobt magic number checkDarrick J. Wong
In commit a6a781a58befcbd467c ("xfs: have buffer verifier functions report failing address") the bad magic number return was ported incorrectly. Fixes: a6a781a58befcbd467ce843af4eaca3906aa1f08 Reported-by: syzbot+08ab33be0178b76851c8@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-05-29fs: clear writeback errors in inode_init_alwaysDarrick J. Wong
In inode_init_always(), we clear the inode mapping flags, which clears any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not also clear wb_err, which means that old mapping errors can leak through to new inodes. This is crucial for the XFS inode allocation path because we recycle old in-core inodes and we do not want error state from an old file to leak into the new file. This bug was discovered by running generic/036 and generic/047 in a loop and noticing that the EIOs generated by the collision of direct and buffered writes in generic/036 would survive the remount between 036 and 047, and get reported to the fsyncs (on different files!) in generic/047. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-29vfs: delete unnecessary assignment in vfs_listxattrChengguang Xu
It seems the first error assignment in if branch is redundant. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-29btrfs: lzo: document the compressed data formatQu Wenruo
Although it's not that complex, but such comment could still save several minutes for newer reader/reviewer instead of inferring that from the code. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor wording updates ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-29btrfs: compression: Add linux/sizes.h for compression.hQu Wenruo
Since compression.h is using the SZ_* macros, and if some file includes only compression.h without linux/sizes.h, it will cause compile error. One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED. Fix it by adding linux/sizes.h in compression.h Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-29Btrfs: fix clone vs chattr NODATASUM raceOmar Sandoval
In btrfs_clone_files(), we must check the NODATASUM flag while the inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags() will change the flags after we check and we can end up with a party checksummed file. The race window is only a few instructions in size, between the if and the locks which is: 3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode)) 3835 return -EISDIR; where the setflags must be run and toggle the NODATASUM flag (provided the file size is 0). The clone will block on the inode lock, segflags takes the inode lock, changes flags, releases log and clone continues. Not impossible but still needs a lot of bad luck to hit unintentionally. Fixes: 0e7b824c4ef9 ("Btrfs: don't make a file partly checksummed through file clone") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>