summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-03-19lockd: make nlm_ntf_refcnt and nlm_ntf_wq staticColin Ian King
The variables nlm_ntf_refcnt and nlm_ntf_wq are local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: fs/lockd/svc.c:60:10: warning: symbol 'nlm_ntf_refcnt' was not declared. Should it be static? fs/lockd/svc.c:61:1: warning: symbol 'nlm_ntf_wq' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-19nfsd4: send the special close_stateid in v4.0 replies as wellJeff Layton
We already send it for v4.1, but RFC7530 also notes that the stateid in the close reply is bogus. Always send the special close stateid, even in v4.0 responses. No client should put any meaning on it whatsoever. For now, we continue to increment the stateid value, though that might not be necessary either. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-19nfsd: remove blocked locks on client teardownJeff Layton
We had some reports of panics in nfsd4_lm_notify, and that showed a nfs4_lockowner that had outlived its so_client. Ensure that we walk any leftover lockowners after tearing down all of the stateids, and remove any blocked locks that they hold. With this change, we also don't need to walk the nbl_lru on nfsd_net shutdown, as that will happen naturally when we tear down the clients. Fixes: 76d348fadff5 (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks) Reported-by: Frank Sorenson <fsorenso@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 4.9 Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-19fs/aio: Use rcu_work instead of explicit rcu and work itemTejun Heo
Workqueue now has rcu_work. Use it instead of open-coding rcu -> work item bouncing. Signed-off-by: Tejun Heo <tj@kernel.org>
2018-03-18f2fs: check blkaddr more accuratly before issue a bioYunlei He
This patch check blkaddr more accuratly before issue a write or read bio. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-18f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_readRitesh Harjani
Quota code itself is serializing the operations by taking mutex_lock. It seems a below deadlock can happen if GF_NOFS is not used in f2fs_quota_read __switch_to+0x88 __schedule+0x5b0 schedule+0x78 schedule_preempt_disabled+0x20 __mutex_lock_slowpath+0xdc //mutex owner is itself mutex_lock+0x2c dquot_commit+0x30 //mutex_lock(&dqopt->dqio_mutex); dqput+0xe0 __dquot_drop+0x80 dquot_drop+0x48 f2fs_evict_inode+0x218 evict+0xa8 dispose_list+0x3c prune_icache_sb+0x58 super_cache_scan+0xf4 do_shrink_slab+0x208 shrink_slab.part.40+0xac shrink_zone+0x1b0 do_try_to_free_pages+0x25c try_to_free_pages+0x164 __alloc_pages_nodemask+0x534 do_read_cache_page+0x6c read_cache_page+0x14 f2fs_quota_read+0xa4 read_blk+0x54 find_tree_dqentry+0xe4 find_tree_dqentry+0xb8 find_tree_dqentry+0xb8 find_tree_dqentry+0xb8 qtree_read_dquot+0x68 v2_read_dquot+0x24 dquot_acquire+0x5c // mutex_lock(&dqopt->dqio_mutex); dqget+0x238 __dquot_initialize+0xd4 dquot_initialize+0x10 dquot_file_open+0x34 f2fs_file_open+0x6c do_dentry_open+0x1e4 vfs_open+0x6c path_openat+0xa20 do_filp_open+0x4c do_sys_open+0x178 Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-19buffer.c: call thaw_super during emergency thawMateusz Guzik
There are 2 distinct freezing mechanisms - one operates on block devices and another one directly on super blocks. Both end up with the same result, but thaw of only one of these does not thaw the other. In particular fsfreeze --freeze uses the ioctl variant going to the super block. Since prior to this patch emergency thaw was not doing a relevant thaw, filesystems frozen with this method remained unaffected. The patch is a hack which adds blind unfreezing. In order to keep the super block write-locked the whole time the code is shuffled around and the newly introduced __iterate_supers is employed. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-19vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalentsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-19vfs: make sure struct filename->iname is word-alignedRasmus Villemoes
I noticed that offsetof(struct filename, iname) is actually 28 on 64 bit platforms, so we always pass an unaligned pointer to strncpy_from_user. This is mostly a problem for those 64 bit platforms without HAVE_EFFICIENT_UNALIGNED_ACCESS, but even on x86_64, unaligned accesses carry a penalty. A user-space microbenchmark doing nothing but strncpy_from_user from the same (aligned) source string runs about 5% faster when the destination is aligned. That number increases to 20% when the string is long enough (~32 bytes) that we cross a cache line boundary - that's for example the case for about half the files a "git status" in a kernel tree ends up stat'ing. This won't make any real-life workloads 5%, or even 1%, faster, but path lookup is common enough that cutting even a few cycles should be worthwhile. So ensure we always pass an aligned destination pointer to strncpy_from_user. Instead of explicit padding, simply swap the refcnt and aname members, as suggested by Al Viro. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-19exec: Set file unwritable before LSM checkKees Cook
The LSM check should happen after the file has been confirmed to be unchanging. Without this, we could have a race between the Time of Check (the call to security_kernel_read_file() which could read the file and make access policy decisions) and the Time of Use (starting with kernel_read_file()'s reading of the file contents). In theory, file contents could change between the two. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2018-03-17f2fs: introduce a new mount option test_dummy_encryptionSheng Yong
This patch introduces a new mount option `test_dummy_encryption' to allow fscrypt to create a fake fscrypt context. This is used by xfstests. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: introduce F2FS_FEATURE_LOST_FOUND featureSheng Yong
This patch introduces a new feature, F2FS_FEATURE_LOST_FOUND, which is set by mkfs. mkfs creates a directory named lost+found, which saves unreachable files. If fsck finds a file which has no parent, or its parent is removed by fsck, the file will be placed under lost+found directory by fsck. lost+found directory could not be encrypted. As a result, the root directory cannot be encrypted too. So if LOST_FOUND feature is enabled, let's avoid to encrypt root directory. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: release locks before return in f2fs_ioc_gc_range()Qiuyang Sun
Currently, we will leave the kernel with locks still held when the gc_range is invalid. This patch fixes the bug. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: align memory boundary for bitopsJaegeuk Kim
For example, in arm64, free_nid_bitmap should be aligned to word size in order to use bit operations. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: remove unneeded set_cold_node()Chao Yu
When setting COLD_BIT_SHIFT flag in node block, we only need to call set_cold_node() in new_node_page() and recover_inode_page() during node page initialization. So remove unneeded set_cold_node() in other places. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: add nowait aio supportHyunchul Lee
This patch adds nowait aio support[1]. Return EAGAIN if any of the following checks fail for direct I/O: - i_rwsem is not lockable - Blocks are not allocated at the write location And xfstests generic/471 is passed. [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT" Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: wrap all options with f2fs_sb_info.mount_optChao Yu
This patch merges miscellaneous mount options into struct f2fs_mount_info, After this patch, once we add new mount option, we don't need to worry about recovery of it in remount_fs(), since we will recover the f2fs_sb_info.mount_opt including all options. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: Don't overwrite all types of node to keep node chainYunlei He
Currently, we enable node SSR by default, and mixed different types of node segment to do SSR more intensively. Although reuse warm node is not allowed, warm node chain will be destroyed by errors introduced by other types node chain. So we'd better forbid reusing all types of node to keep warm node chain. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: introduce mount option for fsync modeJunling Zheng
Commit "0a007b97aad6"(f2fs: recover directory operations by fsync) fixed xfstest generic/342 case, but it also increased the written data and caused the performance degradation. In most cases, there's no need to do so heavy fsync actually. So we introduce new mount option "fsync_mode={posix,strict}" to control the policy of fsync. "fsync_mode=posix" is set by default, and means that f2fs uses a light fsync, which follows POSIX semantics. And "fsync_mode=strict" means that it's a heavy fsync, which behaves in line with xfs, ext4 and btrfs, where generic/342 will pass, but the performance will regress. Signed-off-by: Junling Zheng <zhengjunling@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: fix to restore old mount option in ->remount_fsChao Yu
This patch fixes to restore old mount option once we encounter failure in ->remount_fs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: wrap sb_rdonly with f2fs_readonlyChao Yu
Use f2fs_readonly to wrap sb_rdonly for cleanup, and spread it in all places. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17f2fs: avoid selinux denial on CAP_SYS_RESOURCEJaegeuk Kim
This fixes CAP_SYS_RESOURCE denial of selinux when using resgid, since it seems selinux reports it at the first place, but mostly we don't need to check this condition first. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-16Merge tag 'for-4.16-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "There's an important revert in this pull request that needs to go to stable as it causes a corruption on big endian machines. The other fix is for FIEMAP incorrectly reporting shared extents before a sync and one fix for a crash in raid56. So far we got only one report about the BE corruption, the stable kernels were out for like a week, so hopefully the scope of the damage is low" * tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Revert "btrfs: use proper endianness accessors for super_copy" btrfs: add missing initialization in btrfs_check_shared btrfs: Fix NULL pointer exception in find_bio_stripe
2018-03-16Revert "btrfs: use proper endianness accessors for super_copy"David Sterba
This reverts commit 3c181c12c431fe33b669410d663beb9cceefcd1b. The offending patch was merged in 4.16-rc4 and was promptly applied to stable kernels 4.14.25 and 4.15.8. The patch causes a corruption in several superblock items on big-endian machines because of messed up endianity conversions. The damage is manually repairable. A filesystem cannot be mounted again after it has been unmounted once. We do a full revert and not a fixup so stable can pick that patch ASAP. Fixes: 3c181c12c431 ("btrfs: use proper endianness accessors for super_copy") Link: https://lkml.kernel.org/r/1521139304@msgid.manchmal.in-ulm.de CC: stable@vger.kernel.org # 4.14+ Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-16procfs: remove CONFIG_HARDWALL dependencyArnd Bergmann
Hardwall is a tile specific feature, and with the removal of the tile architecture, this has become dead code, so let's remove it. Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-03-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - backport-friendly part of lock_parent() race fix - a fix for an assumption in the heurisic used by path_connected() that is not true on NFS - livelock fixes for d_alloc_parallel() * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Teach path_connected to handle nfs filesystems with multiple roots. fs: dcache: Use READ_ONCE when accessing i_dir_seq fs: dcache: Avoid livelock between d_alloc_parallel and __d_add lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
2018-03-15fs: Teach path_connected to handle nfs filesystems with multiple roots.Eric W. Biederman
On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-15xfs: minor cleanup for xfs_reflink_end_cowChristoph Hellwig
Use xfs_iext_prev_extent to skip to the previous extent instead of opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: minor cleanup for xfs_get_blocksChristoph Hellwig
Simplify the control flow a bit in preparation for O_ATOMIC-related changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: remove xfs_zero_rangeChristoph Hellwig
This helper doesn't add any real value over just calling iomap_zero_range directly, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: assert that xfs_reflink_allocate_cow is called with XFS_ILOCK_EXCLChristoph Hellwig
Now that we convert COW preallocations from unwritten to real on every call this function needs to be called with the ilock held exclusively. Fortunately we already do that, but update the assert to match. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: don't use XFS_BMAPI_ENTRIRE in xfs_get_blocksChristoph Hellwig
There is no reason to get a mapping bigger than what we were asked for. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: fix the check for COW extents in xfs_swap_extentsChristoph Hellwig
i_cnextents does not include delayed allocated extents, so switch to the inode fork size check that we already use in other places instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15mtd: Unconditionally update ->fail_addr and ->addr in part_erase()Boris Brezillon
->fail_addr and ->addr can be updated no matter the result of parent->_erase(), we just need to remove the code doing the same thing in mtd_erase_callback() to avoid adjusting those fields twice. Note that this can be done because all MTD users have been converted to not pass an erase_info->callback() and are thus only taking the ->addr_fail and ->addr fields into account after part_erase() has returned. While we're at it, get rid of the erase_info->mtd field which was only needed to let mtd_erase_callback() get the partition device back. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Richard Weinberger <richard@nod.at>
2018-03-15mtd: Stop assuming mtd_erase() is asynchronousBoris Brezillon
None of the mtd->_erase() implementations work in an asynchronous manner, so let's simplify MTD users that call mtd_erase(). All they need to do is check the value returned by mtd_erase() and assume that != 0 means failure. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Richard Weinberger <richard@nod.at>
2018-03-15pstore: fix crypto dependenciesArnd Bergmann
The new crypto API use causes some problems with Kconfig dependencies, including this link error: fs/pstore/platform.o: In function `pstore_register': platform.c:(.text+0x248): undefined reference to `crypto_has_alg' platform.c:(.text+0x2a0): undefined reference to `crypto_alloc_base' fs/pstore/platform.o: In function `pstore_unregister': platform.c:(.text+0x498): undefined reference to `crypto_destroy_tfm' crypto/lz4hc.o: In function `lz4hc_sdecompress': lz4hc.c:(.text+0x1a): undefined reference to `LZ4_decompress_safe' crypto/lz4hc.o: In function `lz4hc_decompress_crypto': lz4hc.c:(.text+0x5a): undefined reference to `LZ4_decompress_safe' crypto/lz4hc.o: In function `lz4hc_scompress': lz4hc.c:(.text+0xaa): undefined reference to `LZ4_compress_HC' crypto/lz4hc.o: In function `lz4hc_mod_init': lz4hc.c:(.init.text+0xf): undefined reference to `crypto_register_alg' lz4hc.c:(.init.text+0x1f): undefined reference to `crypto_register_scomp' lz4hc.c:(.init.text+0x2f): undefined reference to `crypto_unregister_alg' The problem is that with CONFIG_CRYPTO=m, we must not 'select CRYPTO_LZ4' from a bool symbol, or call crypto API functions from a built-in module. This turns the sub-options into 'tristate' ones so the dependencies are honored, and makes the pstore itself select the crypto core if necessary. Fixes: cb3bee0369bc ("pstore: Use crypto compress API") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-03-15block, char_dev: Use correct format specifier for unsigned intsSrivatsa S. Bhat
register_blkdev() and __register_chrdev_region() treat the major number as an unsigned int. So print it the same way to avoid absurd error statements such as: "... major requested (-1) is greater than the maximum (511) ..." (and also fix off-by-one bugs in the error prints). While at it, also update the comment describing register_blkdev(). Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-15char_dev: Fix off-by-one bugs in find_dynamic_major()Srivatsa S. Bhat
CHRDEV_MAJOR_DYN_END and CHRDEV_MAJOR_DYN_EXT_END are valid major numbers. So fix the loop iteration to include them in the search for free major numbers. While at it, also remove a redundant if condition ("cd->major != i"), as it will never be true. Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-15gfs2: gfs2_iomap_end tracepoint: log block addressAndreas Gruenbacher
In the gfs2_iomap_end tracepoint, log the physical block address, just as in the gfs2_bmap tracepoint. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-14btrfs: add missing initialization in btrfs_check_sharedEdmund Nadolski
This patch addresses an issue that causes fiemap to falsely report a shared extent. The test case is as follows: xfs_io -f -d -c "pwrite -b 16k 0 64k" -c "fiemap -v" /media/scratch/file5 sync xfs_io -c "fiemap -v" /media/scratch/file5 which gives the resulting output: wrote 65536/65536 bytes at offset 0 64 KiB, 4 ops; 0.0000 sec (121.359 MiB/sec and 7766.9903 ops/sec) /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x2001 /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x1 This is because btrfs_check_shared calls find_parent_nodes repeatedly in a loop, passing a share_check struct to report the count of shared extent. But btrfs_check_shared does not re-initialize the count value to zero for subsequent calls from the loop, resulting in a false share count value. This is a regressive behavior from 4.13. With proper re-initialization the test result is as follows: wrote 65536/65536 bytes at offset 0 64 KiB, 4 ops; 0.0000 sec (110.035 MiB/sec and 7042.2535 ops/sec) /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x1 /media/scratch/file5: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..127]: 24576..24703 128 0x1 which corrects the regression. Fixes: 3ec4d3238ab ("btrfs: allow backref search checks for shared extents") Signed-off-by: Edmund Nadolski <enadolski@suse.com> [ add text from cover letter to changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-14btrfs: Fix NULL pointer exception in find_bio_stripeDmitriy Gorokh
On detaching of a disk which is a part of a RAID6 filesystem, the following kernel OOPS may happen: [63122.680461] BTRFS error (device sdo): bdev /dev/sdo errs: wr 0, rd 0, flush 1, corrupt 0, gen 0 [63122.719584] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo [63122.719587] BTRFS error (device sdo): bdev /dev/sdo errs: wr 1, rd 0, flush 1, corrupt 0, gen 0 [63122.803516] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo [63122.803519] BTRFS error (device sdo): bdev /dev/sdo errs: wr 2, rd 0, flush 1, corrupt 0, gen 0 [63122.863902] BTRFS critical (device sdo): fatal error on device /dev/sdo [63122.935338] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [63122.946554] IP: fail_bio_stripe+0x58/0xa0 [btrfs] [63122.958185] PGD 9ecda067 P4D 9ecda067 PUD b2b37067 PMD 0 [63122.971202] Oops: 0000 [#1] SMP [63123.006760] CPU: 0 PID: 3979 Comm: kworker/u8:9 Tainted: G W 4.14.2-16-scst34x+ #8 [63123.007091] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [63123.007402] Workqueue: btrfs-worker btrfs_worker_helper [btrfs] [63123.007595] task: ffff880036ea4040 task.stack: ffffc90006384000 [63123.007796] RIP: 0010:fail_bio_stripe+0x58/0xa0 [btrfs] [63123.007968] RSP: 0018:ffffc90006387ad8 EFLAGS: 00010287 [63123.008140] RAX: 0000000000000002 RBX: ffff88004beaa0b8 RCX: ffff8800b2bd5690 [63123.008359] RDX: 0000000000000000 RSI: ffff88007bb43500 RDI: ffff88004beaa000 [63123.008621] RBP: ffffc90006387ae8 R08: 0000000099100000 R09: ffff8800b2bd5600 [63123.008840] R10: 0000000000000004 R11: 0000000000010000 R12: ffff88007bb43500 [63123.009059] R13: 00000000fffffffb R14: ffff880036fc5180 R15: 0000000000000004 [63123.009278] FS: 0000000000000000(0000) GS:ffff8800b7000000(0000) knlGS:0000000000000000 [63123.009564] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [63123.009748] CR2: 0000000000000080 CR3: 00000000b0866000 CR4: 00000000000406f0 [63123.009969] Call Trace: [63123.010085] raid_write_end_io+0x7e/0x80 [btrfs] [63123.010251] bio_endio+0xa1/0x120 [63123.010378] generic_make_request+0x218/0x270 [63123.010921] submit_bio+0x66/0x130 [63123.011073] finish_rmw+0x3fc/0x5b0 [btrfs] [63123.011245] full_stripe_write+0x96/0xc0 [btrfs] [63123.011428] raid56_parity_write+0x117/0x170 [btrfs] [63123.011604] btrfs_map_bio+0x2ec/0x320 [btrfs] [63123.011759] ? ___cache_free+0x1c5/0x300 [63123.011909] __btrfs_submit_bio_done+0x26/0x50 [btrfs] [63123.012087] run_one_async_done+0x9c/0xc0 [btrfs] [63123.012257] normal_work_helper+0x19e/0x300 [btrfs] [63123.012429] btrfs_worker_helper+0x12/0x20 [btrfs] [63123.012656] process_one_work+0x14d/0x350 [63123.012888] worker_thread+0x4d/0x3a0 [63123.013026] ? _raw_spin_unlock_irqrestore+0x15/0x20 [63123.013192] kthread+0x109/0x140 [63123.013315] ? process_scheduled_works+0x40/0x40 [63123.013472] ? kthread_stop+0x110/0x110 [63123.013610] ret_from_fork+0x25/0x30 [63123.014469] RIP: fail_bio_stripe+0x58/0xa0 [btrfs] RSP: ffffc90006387ad8 [63123.014678] CR2: 0000000000000080 [63123.016590] ---[ end trace a295ea7259c17880 ]— This is reproducible in a cycle, where a series of writes is followed by SCSI device delete command. The test may take up to few minutes. Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") [ no signed-off-by provided ] Author: Dmitriy Gorokh <Dmitriy.Gorokh@wdc.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-14fs/aio: Use RCU accessors for kioctx_table->table[]Tejun Heo
While converting ioctx index from a list to a table, db446a08c23d ("aio: convert the ioctx list to table lookup v3") missed tagging kioctx_table->table[] as an array of RCU pointers and using the appropriate RCU accessors. This introduces a small window in the lookup path where init and access may race. Mark kioctx_table->table[] with __rcu and use the approriate RCU accessors when using the field. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Fixes: db446a08c23d ("aio: convert the ioctx list to table lookup v3") Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # v3.12+
2018-03-14fs/aio: Add explicit RCU grace period when freeing kioctxTejun Heo
While fixing refcounting, e34ecee2ae79 ("aio: Fix a trinity splat") incorrectly removed explicit RCU grace period before freeing kioctx. The intention seems to be depending on the internal RCU grace periods of percpu_ref; however, percpu_ref uses a different flavor of RCU, sched-RCU. This can lead to kioctx being freed while RCU read protected dereferences are still in progress. Fix it by updating free_ioctx() to go through call_rcu() explicitly. v2: Comment added to explain double bouncing. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Fixes: e34ecee2ae79 ("aio: Fix a trinity splat") Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # v3.13+
2018-03-14xfs: refactor xfs_log_forceChristoph Hellwig
Streamline the conditionals so that it is more obvious which specific case form the top of the function comments is being handled. Use gotos only for early returns. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-14xfs: merge _xfs_log_force_lsn and xfs_log_force_lsnChristoph Hellwig
Switch to a single interface for flushing the log to a specific LSN, which gives consistent trace point coverage and a less confusing interface. The was only a single user of the previous xfs_log_force_lsn function, which now also passes a NULL log_flushed argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-14xfs: merge _xfs_log_force and xfs_log_forceChristoph Hellwig
Switch to a single interface for flushing the whole log, which gives consistent trace point coverage, and removes the unused log_flushed argument for the previous _xfs_log_force callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-14xfs: remove the unused log_flushed variable in xfs_extent_busy_flushChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-14xfs: remove an outdated comment for xfs_inode_item_committingChristoph Hellwig
The function now does something, and that something is central to our inode logging scheme. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-14xfs: remove misleading comment text on xfs_inode_item_unlockChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-14devpts: comment devpts_mntget()Christian Brauner
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>