summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-01-23gfs: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. There is no need to save the dentries for the debugfs files, so drop those variables to save a bit of space and make the code simpler. Cc: Bob Peterson <rpeterso@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-01-22f2fs: fix to set sbi dirty correctlyChao Yu
In order to record direct IO count, we add two additional type in enum count_type: F2FS_DIO_{WRITE,READ}, but those IO won't dirty filesystem metadata, so we don't need to set filesystem dirty in inc_page_count(), fix it. Fixes: 02b16d0a34a1 ("f2fs: add to account direct IO") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22f2fs: fix to initialize variable to avoid UBSAN/smatch warningChao Yu
As Dan Carpenter as below: The patch df634f444ee9: "f2fs: use rb_*_cached friends" from Oct 4, 2018, leads to the following static checker warning: fs/f2fs/extent_cache.c:606 f2fs_update_extent_tree_range() error: uninitialized symbol 'leftmost'. And also Eric Biggers, and Kyungtae Kim reported, there is an UBSAN warning described as below: We report a bug in linux-4.20.2: "UBSAN: Undefined behaviour in fs/f2fs/extent_cache.c" kernel config: https://kt0755.github.io/etc/config_v4.20_stable repro: https://kt0755.github.io/etc/repro.4a3e7.c (f2fs is mounted on /mnt/f2fs/) This arose in f2fs_update_extent_tree_range (fs/f2fs/extent_cache.c:605). It seems that, for some reason, its last argument became "24" although that was supposed to be bool type. ========================================= UBSAN: Undefined behaviour in fs/f2fs/extent_cache.c:605:4 load of value 24 is not a valid value for type '_Bool' CPU: 0 PID: 6774 Comm: syz-executor5 Not tainted 4.20.2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xb1/0x118 lib/dump_stack.c:113 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159 __ubsan_handle_load_invalid_value+0x17a/0x1be lib/ubsan.c:457 f2fs_update_extent_tree_range+0x1d4a/0x1d50 fs/f2fs/extent_cache.c:605 f2fs_update_extent_cache+0x2b6/0x350 fs/f2fs/extent_cache.c:804 f2fs_update_data_blkaddr+0x61/0x70 fs/f2fs/data.c:656 f2fs_outplace_write_data+0x1d6/0x4b0 fs/f2fs/segment.c:3140 f2fs_convert_inline_page+0x86d/0x2060 fs/f2fs/inline.c:163 f2fs_convert_inline_inode+0x6b5/0xad0 fs/f2fs/inline.c:208 f2fs_preallocate_blocks+0x78b/0xb00 fs/f2fs/data.c:982 f2fs_file_write_iter+0x31b/0xf40 fs/f2fs/file.c:3062 call_write_iter include/linux/fs.h:1857 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x538/0x6e0 fs/read_write.c:487 vfs_write+0x1b3/0x520 fs/read_write.c:549 ksys_write+0xde/0x1c0 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x7e/0xc0 fs/read_write.c:607 do_syscall_64+0xbe/0x4f0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4497b9 Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1ea15edc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f1ea15ee6cc RCX: 00000000004497b9 RDX: 0000000000001000 RSI: 0000000020000140 RDI: 0000000000000013 RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000bb50 R14: 00000000006f4bf0 R15: 00007f1ea15ee700 ========================================= As I checked, this uninitialized variable won't cause extent cache corruption, but in order to avoid such kind of warning of both UBSAN and smatch, fix to initialize related variable. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Eric Biggers <ebiggers@google.com> Reported-by: Kyungtae Kim <kt0755@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22f2fs: UBSAN: set boolean value iostat_enable correctlySheng Yong
When setting /sys/fs/f2fs/<DEV>/iostat_enable with non-bool value, UBSAN reports the following warning. [ 7562.295484] ================================================================================ [ 7562.296531] UBSAN: Undefined behaviour in fs/f2fs/f2fs.h:2776:10 [ 7562.297651] load of value 64 is not a valid value for type '_Bool' [ 7562.298642] CPU: 1 PID: 7487 Comm: dd Not tainted 4.20.0-rc4+ #79 [ 7562.298653] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 7562.298662] Call Trace: [ 7562.298760] dump_stack+0x46/0x5b [ 7562.298811] ubsan_epilogue+0x9/0x40 [ 7562.298830] __ubsan_handle_load_invalid_value+0x72/0x90 [ 7562.298863] f2fs_file_write_iter+0x29f/0x3f0 [ 7562.298905] __vfs_write+0x115/0x160 [ 7562.298922] vfs_write+0xa7/0x190 [ 7562.298934] ksys_write+0x50/0xc0 [ 7562.298973] do_syscall_64+0x4a/0xe0 [ 7562.298992] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 7562.299001] RIP: 0033:0x7fa45ec19c00 [ 7562.299004] Code: 73 01 c3 48 8b 0d 88 92 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d dd eb 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 24 [ 7562.299044] RSP: 002b:00007ffca52b49e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 7562.299052] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa45ec19c00 [ 7562.299059] RDX: 0000000000000400 RSI: 000000000093f000 RDI: 0000000000000001 [ 7562.299065] RBP: 000000000093f000 R08: 0000000000000004 R09: 0000000000000000 [ 7562.299071] R10: 00007ffca52b47b0 R11: 0000000000000246 R12: 0000000000000400 [ 7562.299077] R13: 000000000093f000 R14: 000000000093f400 R15: 0000000000000000 [ 7562.299091] ================================================================================ So, if iostat_enable is enabled, set its value as true. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22f2fs: add brackets for macrosSheng Yong
Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22f2fs: check if file namelen exceeds max valueSheng Yong
Dentry bitmap is not enough to detect incorrect dentries. So this patch also checks the namelen value of a dentry. Signed-off-by: Gong Chen <gongchen4@huawei.com> Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22f2fs: fix to trigger fsck if dirent.name_len is zeroChao Yu
While traversing dirents in f2fs_fill_dentries(), if bitmap is valid, filename length should not be zero, otherwise, directory structure consistency could be corrupted, in this case, let's print related info and set SBI_NEED_FSCK to trigger fsck for repairing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-22Merge tag 'v5.0-rc3' into next-generalJames Morris
Sync to Linux 5.0-rc3 to pull in the VFS changes which impacted a lot of the LSM code.
2019-01-22writeback: synchronize sync(2) against cgroup writeback membership switchesTejun Heo
sync_inodes_sb() can race against cgwb (cgroup writeback) membership switches and fail to writeback some inodes. For example, if an inode switches to another wb while sync_inodes_sb() is in progress, the new wb might not be visible to bdi_split_work_to_wbs() at all or the inode might jump from a wb which hasn't issued writebacks yet to one which already has. This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb switch path against sync_inodes_sb() so that sync_inodes_sb() is guaranteed to see all the target wbs and inodes can't jump wbs to escape syncing. v2: Fixed misplaced rwsem init. Spotted by Jiufei. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jiufei Xue <xuejiufei@gmail.com> Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-22direct-io: allow direct writes to empty inodesErnesto A. Fernández
On a DIO_SKIP_HOLES filesystem, the ->get_block() method is currently not allowed to create blocks for an empty inode. This confusion comes from trying to bit shift a negative number, so check the size of the inode first. The problem is most visible for hfsplus, because the fallback to buffered I/O doesn't happen and the write fails with EIO. This is in part the fault of the module, because it gives a wrong return value on ->get_block(); that will be fixed in a separate patch. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-22f2fs: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-22ext2: Set superblock revision when enabling xattr featureJan Kara
When setting the first xattr, we automatically enable EXT2_FEATURE_COMPAT_EXT_ATTR. However we forget to call ext2_update_dynamic_rev() so in theory if the filesystem was created as ancient one without features support, this could be missed. The consequences are minor anyway - since the feature is compat one, only old e2fsck which does not understand xattrs could do something bad. Reported-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz>
2019-01-22ext2: Remove redundant check on s_inode_sizeLiu Xiang
The case of (EXT2_INODE_SIZE(sb) == 0) is included in (sbi->s_inode_size < EXT2_GOOD_OLD_INODE_SIZE). So there is no need to check again. Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn> Signed-off-by: Jan Kara <jack@suse.cz>
2019-01-22ext2: set proper return codeChengguang Xu
Set proper return code when failing from allocating memory in ext2_fill_super(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-01-22debugfs: debugfs_use_start/finish do not exist anymoreSergey Senozhatsky
debugfs_use_file_start() and debugfs_use_file_finish() do not exist since commit c9afbec27089 ("debugfs: purge obsolete SRCU based removal protection"); tweak debugfs_create_file_unsafe() comment. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-21pstore/ram: Replace dummy_data heap memory with stack memoryYue Hu
In ramoops_register_dummy() dummy_data is allocated via kzalloc() then it will always occupy the heap space after register platform device via platform_device_register_data(), but it will not be used any more. So let's free it for system usage, replace it with stack memory is better due to small size. Signed-off-by: Yue Hu <huyue2@yulong.com> [kees: add required memset and adjust sizeof() argument] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-21ext2: use common file type conversionPhillip Potter
Deduplicate the ext2 file type conversion implementation and remove EXT2_FT_* definitions - file systems that use the same file types as defined by POSIX do not need to define their own versions and can use the common helper functions decared in fs_types.h and implemented in fs_types.c Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>
2019-01-21fs: common implementation of file typePhillip Potter
Many file systems use a copy&paste implementation of dirent to on-disk file type conversions. Create a common implementation to be used by file systems with some useful conversion helpers to reduce open coded file type conversions in file system code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>
2019-01-21ceph: quota: cleanup license messThomas Gleixner
Precise and non-ambiguous license information is important. The recently added quota.c file has a SPDX license identifier, which is nice, but at the same time it has a contradictionary license boiler plate text. SPDX-License-Identifier: GPL-2.0 versus * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; either version 2 * of the License, or (at your option) any later version. Oh well. As the other ceph related files are licensed under the GPL v2 only, it's assumed that the SPDX id is correct and the boiler plate was randomly copied into that patch. Remove the boiler plate as it is wrong and even if correct it is redundant. Fixes: fb18a57568c2 ("ceph: quota: add initial infrastructure to support cephfs quotas") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Luis Henriques <lhenriques@suse.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Sage Weil <sage@redhat.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: ceph-devel@vger.kernel.org Acked-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-21ceph: clear inode pointer when snap realm gets dropped by its inodeYan, Zheng
snap realm and corresponding inode have pointers to each other. The two pointer should get clear at the same time. Otherwise, snap realm's pointer may reference freed inode. Cc: stable@vger.kernel.org # 4.17+ Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-01-21Merge tag 'pstore-v5.0-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fixes from Kees Cook: - Fix console ramoops to show the previous boot logs (Sai Prakash Ranjan) - Avoid allocation and leak of platform data * tag 'pstore-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Avoid allocation and leak of platform data pstore/ram: Fix console ramoops to show the previous boot logs
2019-01-20pstore/ram: Avoid allocation and leak of platform dataKees Cook
Yue Hu noticed that when parsing device tree the allocated platform data was never freed. Since it's not used beyond the function scope, this switches to using a stack variable instead. Reported-by: Yue Hu <huyue2@yulong.com> Fixes: 35da60941e44 ("pstore/ram: add Device Tree bindings") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-21Merge tag 'for-5.0-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A handful of fixes (some of them in testing for a long time): - fix some test failures regarding cleanup after transaction abort - revert of a patch that could cause a deadlock - delayed iput fixes, that can help in ENOSPC situation when there's low space and a lot data to write" * tag 'for-5.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: wakeup cleaner thread when adding delayed iput btrfs: run delayed iputs before committing btrfs: wait on ordered extents on abort cleanup btrfs: handle delayed ref head accounting cleanup in abort Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
2019-01-20Merge tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "These are mostly fixes for SUNRPC bugs, with a single v4.2 copy_file_range() fix mixed in. Stable bugfixes: - Fix TCP receive code on archs with flush_dcache_page() Other bugfixes: - Fix error code in rpcrdma_buffer_create() - Fix a double free in rpcrdma_send_ctxs_create() - Fix kernel BUG at kernel/cred.c:825 - Fix unnecessary retry in nfs42_proc_copy_file_range() - Ensure rq_bytes_sent is reset before request transmission - Ensure we respect the RPCSEC_GSS sequence number limit - Address Kerberos performance/behavior regression" * tag 'nfs-for-5.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Address Kerberos performance/behavior regression SUNRPC: Ensure we respect the RPCSEC_GSS sequence number limit SUNRPC: Ensure rq_bytes_sent is reset before request transmission NFSv4.2 fix unnecessary retry in nfs4_copy_file_range sunrpc: kernel BUG at kernel/cred.c:825! SUNRPC: Fix TCP receive code on archs with flush_dcache_page() xprtrdma: Double free in rpcrdma_sendctxs_create() xprtrdma: Fix error code in rpcrdma_buffer_create()
2019-01-20Merge tag 'for-linus-20190118' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - block size setting fixes for loop/nbd (Jan Kara) - md bio_alloc_mddev() cleanup (Marcos) - Ensure we don't lose the REQ_INTEGRITY flag (Ming) - Two NVMe fixes by way of Christoph: - Fix NVMe IRQ calculation (Ming) - Uninitialized variable in nvmet-tcp (Sagi) - BFQ comment fix (Paolo) - License cleanup for recently added blk-mq-debugfs-zoned (Thomas) * tag 'for-linus-20190118' of git://git.kernel.dk/linux-block: block: Cleanup license notice nvme-pci: fix nvme_setup_irqs() nvmet-tcp: fix uninitialized variable access block: don't lose track of REQ_INTEGRITY flag blockdev: Fix livelocks on loop device nbd: Use set_blocksize() to set device blocksize md: Make bio_alloc_mddev use bio_alloc_bioset block, bfq: fix comments on __bfq_deactivate_entity
2019-01-18btrfs: wakeup cleaner thread when adding delayed iputJosef Bacik
The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. Delaying iputs means we are potentially delaying the eviction of an inode and it's respective space. The cleaner thread only gets woken up every 30 seconds, or when we require space. If there are a lot of inodes that need to be deleted we could induce a serious amount of latency while we wait for these inodes to be evicted. So instead wakeup the cleaner if it's not already awake to process any new delayed iputs we add to the list. If we suddenly need space we will less likely be backed up behind a bunch of inodes that are waiting to be deleted, and we could possibly free space before we need to get into the flushing logic which will save us some latency. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18btrfs: run delayed iputs before committingJosef Bacik
Delayed iputs means we can have final iputs of deleted inodes in the queue, which could potentially generate a lot of pinned space that could be free'd. So before we decide to commit the transaction for ENOPSC reasons, run the delayed iputs so that any potential space is free'd up. If there is and we freed enough we can then commit the transaction and potentially be able to make our reservation. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18btrfs: wait on ordered extents on abort cleanupJosef Bacik
If we flip read-only before we initiate writeback on all dirty pages for ordered extents we've created then we'll have ordered extents left over on umount, which results in all sorts of bad things happening. Fix this by making sure we wait on ordered extents if we have to do the aborted transaction cleanup stuff. generic/475 can produce this warning: [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394 [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286 [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000 [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000 [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000 [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0 [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000 [ 8531.202707] FS: 00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000 [ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0 [ 8531.207516] Call Trace: [ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs] [ 8531.210209] ? wait_for_completion+0x5b/0x190 [ 8531.211303] close_ctree+0x157/0x350 [btrfs] [ 8531.212412] generic_shutdown_super+0x64/0x100 [ 8531.213485] kill_anon_super+0x14/0x30 [ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8531.215539] deactivate_locked_super+0x29/0x60 [ 8531.216633] cleanup_mnt+0x3b/0x70 [ 8531.217497] task_work_run+0x98/0xc0 [ 8531.218397] exit_to_usermode_loop+0x83/0x90 [ 8531.219324] do_syscall_64+0x15b/0x180 [ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 8531.221286] RIP: 0033:0x7fc68e5e4d07 [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6 [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07 [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80 [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80 [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80 [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000 Leaving a tree in the rb-tree: 3853 void btrfs_free_fs_root(struct btrfs_root *root) 3854 { 3855 iput(root->ino_cache_inode); 3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree)); CC: stable@vger.kernel.org Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add stacktrace ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18btrfs: handle delayed ref head accounting cleanup in abortJosef Bacik
We weren't doing any of the accounting cleanup when we aborted transactions. Fix this by making cleanup_ref_head_accounting global and calling it from the abort code, this fixes the issue where our accounting was all wrong after the fs aborts. The test generic/475 on a 2G VM can trigger the problems eg.: [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs] [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394 [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014 [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs] [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206 [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000 [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400 [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000 [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108 [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100 [ 8502.170296] FS: 00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000 [ 8502.172439] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0 [ 8502.175094] Call Trace: [ 8502.175759] close_ctree+0x17f/0x350 [btrfs] [ 8502.176721] generic_shutdown_super+0x64/0x100 [ 8502.177702] kill_anon_super+0x14/0x30 [ 8502.178607] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8502.179602] deactivate_locked_super+0x29/0x60 [ 8502.180595] cleanup_mnt+0x3b/0x70 [ 8502.181406] task_work_run+0x98/0xc0 [ 8502.182255] exit_to_usermode_loop+0x83/0x90 [ 8502.183113] do_syscall_64+0x15b/0x180 [ 8502.183919] entry_SYSCALL_64_after_hwframe+0x49/0xbe Corresponding to release_global_block_rsv() { ... WARN_ON(fs_info->delayed_refs_rsv.reserved > 0); CC: stable@vger.kernel.org Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add log dump ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"David Sterba
This reverts commit e73e81b6d0114d4a303205a952ab2e87c44bd279. This patch causes a few problems: - adds latency to btrfs_finish_ordered_io - as btrfs_finish_ordered_io is used for free space cache, generating more work from btrfs_btree_balance_dirty_nodelay could end up in the same workque, effectively deadlocking 12260 kworker/u96:16+btrfs-freespace-write D [<0>] balance_dirty_pages+0x6e6/0x7ad [<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90 [<0>] btrfs_finish_ordered_io+0x3da/0x770 [<0>] normal_work_helper+0x1c5/0x5a0 [<0>] process_one_work+0x1ee/0x5a0 [<0>] worker_thread+0x46/0x3d0 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Transaction commit will wait on the freespace cache: 838 btrfs-transacti D [<0>] btrfs_start_ordered_extent+0x154/0x1e0 [<0>] btrfs_wait_ordered_range+0xbd/0x110 [<0>] __btrfs_wait_cache_io+0x49/0x1a0 [<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0 [<0>] commit_cowonly_roots+0x215/0x2b0 [<0>] btrfs_commit_transaction+0x37e/0x910 [<0>] transaction_kthread+0x14d/0x180 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff And then writepages ends up waiting on transaction commit: 9520 kworker/u96:13+flush-btrfs-1 D [<0>] wait_current_trans+0xac/0xe0 [<0>] start_transaction+0x21b/0x4b0 [<0>] cow_file_range_inline+0x10b/0x6b0 [<0>] cow_file_range.isra.69+0x329/0x4a0 [<0>] run_delalloc_range+0x105/0x3c0 [<0>] writepage_delalloc+0x119/0x180 [<0>] __extent_writepage+0x10c/0x390 [<0>] extent_write_cache_pages+0x26f/0x3d0 [<0>] extent_writepages+0x4f/0x80 [<0>] do_writepages+0x17/0x60 [<0>] __writeback_single_inode+0x59/0x690 [<0>] writeback_sb_inodes+0x291/0x4e0 [<0>] __writeback_inodes_wb+0x87/0xb0 [<0>] wb_writeback+0x3bb/0x500 [<0>] wb_workfn+0x40d/0x610 [<0>] process_one_work+0x1ee/0x5a0 [<0>] worker_thread+0x1e0/0x3d0 [<0>] kthread+0xf5/0x130 [<0>] ret_from_fork+0x24/0x30 [<0>] 0xffffffffffffffff Eventually, we have every process in the system waiting on balance_dirty_pages(), and nobody is able to make progress on page writeback. The original patch tried to fix an OOM condition, that happened on 4.4 but no success reproducing that on later kernels (4.19 and 4.20). This is more likely a problem in OOM itself. Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/ Reported-by: Chris Mason <clm@fb.com> CC: stable@vger.kernel.org # 4.18+ CC: ethanlien <ethanlien@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-01-18sysfs: fix blank line coding style warningStephen Martin
Fixed a coding style issue. Signed-off-by: Stephen Martin <lockwood@opperline.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18Merge tag 'afs-fixes-20190117' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Here's a set of fixes for AFS: - Use struct_size() for kzalloc() size calculation. - When calling YFS.CreateFile rather than AFS.CreateFile, it is possible to create a file with a file lock already held. The default value indicating no lock required is actually -1, not 0. - Fix an oops in inode/vnode validation if the target inode doesn't have a server interest assigned (ie. a server that will notify us of changes by third parties). - Fix refcounting of keys in file locking. - Fix a race in refcounting asynchronous operations in the event of an error during request transmission. The provision of a dedicated function to get an extra ref on a call is split into a separate commit" * tag 'afs-fixes-20190117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix race in async call refcounting afs: Provide a function to get a ref on a call afs: Fix key refcounting in file locking code afs: Don't set vnode->cb_s_break in afs_validate() afs: Set correct lock type for the yfs CreateFile afs: Use struct_size() in kzalloc()
2019-01-17pstore/ram: Fix console ramoops to show the previous boot logsSai Prakash Ranjan
commit b05c950698fe ("pstore/ram: Simplify ramoops_get_next_prz() arguments") changed update assignment in getting next persistent ram zone by adding a check for record type. But the check always returns true since the record type is assigned 0. And this breaks console ramoops by showing current console log instead of previous log on warm reset and hard reset (actually hard reset should not be showing any logs). Fix this by having persistent ram zone type check instead of record type check. Tested this on SDM845 MTP and dragonboard 410c. Reproducing this issue is simple as below: 1. Trigger hard reset and mount pstore. Will see console-ramoops record in the mounted location which is the current log. 2. Trigger warm reset and mount pstore. Will see the current console-ramoops record instead of previous record. Fixes: b05c950698fe ("pstore/ram: Simplify ramoops_get_next_prz() arguments") Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> [kees: dropped local variable usage] Signed-off-by: Kees Cook <keescook@chromium.org>
2019-01-17kill kernfs_pin_sb()Al Viro
unused now and impossible to use safely anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-17fix cgroup_do_mount() handling of failure exitsAl Viro
same story as with last May fixes in sysfs (7b745a4e4051 "unfuck sysfs_mount()"); new_sb is left uninitialized in case of early errors in kernfs_mount_ns() and papering over it by treating any error from kernfs_mount_ns() as equivalent to !new_ns ends up conflating the cases when objects had never been transferred to a superblock with ones when that has happened and resulting new superblock had been dropped. Easily fixed (same way as in sysfs case). Additionally, there's a superblock leak on kernfs_node_dentry() failure *and* a dentry leak inside kernfs_node_dentry() itself - the latter on probably impossible errors, but the former not impossible to trigger (as the matter of fact, injecting allocation failures at that point *does* trigger it). Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-01-17afs: Fix race in async call refcountingDavid Howells
There's a race between afs_make_call() and afs_wake_up_async_call() in the case that an error is returned from rxrpc_kernel_send_data() after it has queued the final packet. afs_make_call() will try and clean up the mess, but the call state may have been moved on thereby causing afs_process_async_call() to also try and to delete the call. Fix this by: (1) Getting an extra ref for an asynchronous call for the call itself to hold. This makes sure the call doesn't evaporate on us accidentally and will allow the call to be retained by the caller in a future patch. The ref is released on leaving afs_make_call() or afs_wait_for_call_to_complete(). (2) In the event of an error from rxrpc_kernel_send_data(): (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the call has been aborted and ended. This prevents afs_deliver_to_call() from doing anything with any notifications it gets. (b) Explicitly end the call immediately to prevent further callbacks. (c) Cancel any queued async_work and wait for the work if it's executing. This allows us to be sure the race won't recur when we change the state. We put the work queue's ref on the call if we managed to cancel it. (d) Put the call's ref that we got in (1). This belongs to us as long as the call is in state AFS_CALL_CL_REQUESTING. Fixes: 341f741f04be ("afs: Refcount the afs_call struct") Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17afs: Provide a function to get a ref on a callDavid Howells
Provide a function to get a reference on an afs_call struct. Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17afs: Fix key refcounting in file locking codeDavid Howells
Fix the refcounting of the authentication keys in the file locking code. The vnode->lock_key member points to a key on which it expects to be holding a ref, but it isn't always given an extra ref, however. Fixes: 0fafdc9f888b ("afs: Fix file locking") Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17afs: Don't set vnode->cb_s_break in afs_validate()Marc Dionne
A cb_interest record is not necessarily attached to the vnode on entry to afs_validate(), which can cause an oops when we try to bring the vnode's cb_s_break up to date in the default case (ie. no current callback promise and the vnode has not been deleted). Fix this by simply removing the line, as vnode->cb_s_break will be set when needed by afs_register_server_cb_interest() when we next get a callback promise from RPC call. The oops looks something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 ... RIP: 0010:afs_validate+0x66/0x250 [kafs] ... Call Trace: afs_d_revalidate+0x8d/0x340 [kafs] ? __d_lookup+0x61/0x150 lookup_dcache+0x44/0x70 ? lookup_dcache+0x44/0x70 __lookup_hash+0x24/0xa0 do_unlinkat+0x11d/0x2c0 __x64_sys_unlink+0x23/0x30 do_syscall_64+0x4d/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-16fuse: decrement NR_WRITEBACK_TEMP on the right pageMiklos Szeredi
NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not the page cache page. Fixes: 8b284dc47291 ("fuse: writepages: handle same page rewrites") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-16fuse: call pipe_buf_release() under pipe lockJann Horn
Some of the pipe_buf_release() handlers seem to assume that the pipe is locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page without taking any extra locks. From a glance through the callers of pipe_buf_release(), it looks like FUSE is the only one that calls pipe_buf_release() without having the pipe locked. This bug should only lead to a memory leak, nothing terrible. Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-16cuse: fix ioctlMiklos Szeredi
cuse_process_init_reply() doesn't initialize fc->max_pages and thus all cuse bases ioctls fail with ENOMEM. Reported-by: Andreas Steinmetz <ast@domdv.de> Fixes: 5da784cce430 ("fuse: add max_pages to init_out") Cc: <stable@vger.kernel.org> # v4.20 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-16fuse: handle zero sized retrieve correctlyMiklos Szeredi
Dereferencing req->page_descs[0] will Oops if req->max_pages is zero. Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com Fixes: b2430d7567a3 ("fuse: add per-page descriptor <offset, length> to fuse_req") Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-01-15NFSv4.2 fix unnecessary retry in nfs4_copy_file_rangeOlga Kornievskaia
Currently nfs42_proc_copy_file_range() can not return EAGAIN. Fixes: e4648aa4f98a ("NFS recover from destination server reboot for copies") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-01-15blockdev: Fix livelocks on loop deviceJan Kara
bd_set_size() updates also block device's block size. This is somewhat unexpected from its name and at this point, only blkdev_open() uses this functionality. Furthermore, this can result in changing block size under a filesystem mounted on a loop device which leads to livelocks inside __getblk_gfp() like: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106 ... Call Trace: init_page_buffers+0x3e2/0x530 fs/buffer.c:904 grow_dev_page fs/buffer.c:947 [inline] grow_buffers fs/buffer.c:1009 [inline] __getblk_slow fs/buffer.c:1036 [inline] __getblk_gfp+0x906/0xb10 fs/buffer.c:1313 __bread_gfp+0x2d/0x310 fs/buffer.c:1347 sb_bread include/linux/buffer_head.h:307 [inline] fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75 fat_ent_read_block fs/fat/fatent.c:441 [inline] fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101 __fat_get_block fs/fat/inode.c:148 [inline] ... Trivial reproducer for the problem looks like: truncate -s 1G /tmp/image losetup /dev/loop0 /tmp/image mkfs.ext4 -b 1024 /dev/loop0 mount -t ext4 /dev/loop0 /mnt losetup -c /dev/loop0 l /mnt Fix the problem by moving initialization of a block device block size into a separate function and call it when needed. Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with debugging the problem. Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-14Merge tag 'for-5.0-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - two regression fixes in clone/dedupe ioctls, the generic check callback needs to lock extents properly and wait for io to avoid problems with writeback and relocation - fix deadlock when using free space tree due to block group creation - a recently added check refuses a valid fileystem with seeding device, make that work again with a quickfix, proper solution needs more intrusive changes * tag 'for-5.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Use real device structure to verify dev extent Btrfs: fix deadlock when using free space tree due to block group creation Btrfs: fix race between reflink/dedupe and relocation Btrfs: fix race between cloning range ending at eof and writeback
2019-01-14Merge tag 'driver-core-5.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here is one small sysfs change, and a documentation update for 5.0-rc2 The sysfs change moves from using BUG_ON to WARN_ON, as discussed in an email thread on lkml while trying to track down another driver bug. sysfs should not be crashing and preventing people from seeing where they went wrong. Now it properly recovers and warns the developer. The documentation update removes the use of BUS_ATTR() as the kernel is moving away from this to use the specific BUS_ATTR_RW() and friends instead. There are pending patches in all of the different subsystems to remove the last users of this macro, but for now, don't advertise it should be used anymore to keep new ones from being introduced. Both have been in linux-next with no reported issues" * tag 'driver-core-5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Documentation: driver core: remove use of BUS_ATTR sysfs: convert BUG_ON to WARN_ON
2019-01-14Merge tag '5.0-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "A set of cifs/smb3 fixes, 4 for stable, most from Pavel. His patches fix an important set of crediting (flow control) problems, and also two problems in cifs_writepages, ddressing some large i/o and also compounding issues" * tag '5.0-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number CIFS: Fix error paths in writeback code CIFS: Move credit processing to mid callbacks for SMB3 CIFS: Fix credits calculation for cancelled requests cifs: Fix potential OOB access of lock element array cifs: Limit memory used by lock request calls to a page cifs: move large array from stack to heap CIFS: Do not hide EINTR after sending network packets CIFS: Fix credit computation for compounded requests CIFS: Do not set credits to 1 if the server didn't grant anything CIFS: Fix adjustment of credits for MTU requests cifs: Fix a tiny potential memory leak cifs: Fix a debug message
2019-01-11Merge tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "A patch to allow setting abort_on_full and a fix for an old "rbd unmap" edge case, marked for stable" * tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client: rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set ceph: use vmf_error() in ceph_filemap_fault() libceph: allow setting abort_on_full for rbd
2019-01-11cifs: update internal module version numberSteve French
To 2.16 Signed-off-by: Steve French <stfrench@microsoft.com>