summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-09-22btrfs: fix put of uninitialized kobject after seed device deleteAnand Jain
The following test case leads to NULL kobject free error: mount seed /mnt add sprout to /mnt umount /mnt mount sprout to /mnt delete seed kobject: '(null)' (00000000dd2b87e4): is not initialized, yet kobject_put() is being called. WARNING: CPU: 1 PID: 15784 at lib/kobject.c:736 kobject_put+0x80/0x350 RIP: 0010:kobject_put+0x80/0x350 :: Call Trace: btrfs_sysfs_remove_devices_dir+0x6e/0x160 [btrfs] btrfs_rm_device.cold+0xa8/0x298 [btrfs] btrfs_ioctl+0x206c/0x22a0 [btrfs] ksys_ioctl+0xe2/0x140 __x64_sys_ioctl+0x1e/0x29 do_syscall_64+0x96/0x150 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4047c6288b :: This is because, at the end of the seed device-delete, we try to remove the seed's devid sysfs entry. But for the seed devices under the sprout fs, we don't initialize the devid kobject yet. So add a kobject state check, which takes care of the bug. Fixes: 668e48af7a94 ("btrfs: sysfs, add devid/dev_state kobject and device attributes") CC: stable@vger.kernel.org # 5.6+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-22fscrypt: use sha256() instead of open codingEric Biggers
Now that there's a library function that calculates the SHA-256 digest of a buffer in one step, use it instead of sha256_init() + sha256_update() + sha256_final(). Link: https://lore.kernel.org/r/20200917045341.324996-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char *'Eric Biggers
fscrypt_set_test_dummy_encryption() requires that the optional argument to the test_dummy_encryption mount option be specified as a substring_t. That doesn't work well with filesystems that use the new mount API, since the new way of parsing mount options doesn't use substring_t. Make it take the argument as a 'const char *' instead. Instead of moving the match_strdup() into the callers in ext4 and f2fs, make them just use arg->from directly. Since the pattern is "test_dummy_encryption=%s", the argument will be null-terminated. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: handle test_dummy_encryption in more logical wayEric Biggers
The behavior of the test_dummy_encryption mount option is that when a new file (or directory or symlink) is created in an unencrypted directory, it's automatically encrypted using a dummy encryption policy. That's it; in particular, the encryption (or lack thereof) of existing files (or directories or symlinks) doesn't change. Unfortunately the implementation of test_dummy_encryption is a bit weird and confusing. When test_dummy_encryption is enabled and a file is being created in an unencrypted directory, we set up an encryption key (->i_crypt_info) for the directory. This isn't actually used to do any encryption, however, since the directory is still unencrypted! Instead, ->i_crypt_info is only used for inheriting the encryption policy. One consequence of this is that the filesystem ends up providing a "dummy context" (policy + nonce) instead of a "dummy policy". In commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I mistakenly thought this was required. However, actually the nonce only ends up being used to derive a key that is never used. Another consequence of this implementation is that it allows for 'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge case that can be forgotten about. For example, currently FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the dummy encryption policy when the filesystem is mounted with test_dummy_encryption. That seems like the wrong thing to do, since again, the directory itself is not actually encrypted. Therefore, switch to a more logical and maintainable implementation where the dummy encryption policy inheritance is done without setting up keys for unencrypted directories. This involves: - Adding a function fscrypt_policy_to_inherit() which returns the encryption policy to inherit from a directory. This can be a real policy, a dummy policy, or no policy. - Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc. with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc. - Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead of an inode. Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: move fscrypt_prepare_symlink() out-of-lineEric Biggers
In preparation for moving the logic for "get the encryption policy inherited by new files in this directory" to a single place, make fscrypt_prepare_symlink() a regular function rather than an inline function that wraps __fscrypt_prepare_symlink(). This way, the new function fscrypt_policy_to_inherit() won't need to be exported to filesystems. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: make "#define fscrypt_policy" user-onlyEric Biggers
The fscrypt UAPI header defines fscrypt_policy to fscrypt_policy_v1, for source compatibility with old userspace programs. Internally, the kernel doesn't want that compatibility definition. Instead, fscrypt_private.h #undefs it and re-defines it to a union. That works for now. However, in order to add fscrypt_operations::get_dummy_policy(), we'll need to forward declare 'union fscrypt_policy' in include/linux/fscrypt.h. That would cause build errors because "fscrypt_policy" is used in ioctl numbers. To avoid this, modify the UAPI header to make the fscrypt_policy compatibility definition conditional on !__KERNEL__, and make the ioctls use fscrypt_policy_v1 instead of fscrypt_policy. Note that this doesn't change the actual ioctl numbers. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: stop pretending that key setup is nofs-safeEric Biggers
fscrypt_get_encryption_info() has never actually been safe to call in a context that needs GFP_NOFS, since it calls crypto_alloc_skcipher(). crypto_alloc_skcipher() isn't GFP_NOFS-safe, even if called under memalloc_nofs_save(). This is because it may load kernel modules, and also because it internally takes crypto_alg_sem. Other tasks can do GFP_KERNEL allocations while holding crypto_alg_sem for write. The use of fscrypt_init_mutex isn't GFP_NOFS-safe either. So, stop pretending that fscrypt_get_encryption_info() is nofs-safe. I.e., when it allocates memory, just use GFP_KERNEL instead of GFP_NOFS. Note, another reason to do this is that GFP_NOFS is deprecated in favor of using memalloc_nofs_save() in the proper places. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: require that fscrypt_encrypt_symlink() already has keyEric Biggers
Now that all filesystems have been converted to use fscrypt_prepare_new_inode(), the encryption key for new symlink inodes is now already set up whenever we try to encrypt the symlink target. Enforce this rather than try to set up the key again when it may be too late to do so safely. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: remove fscrypt_inherit_context()Eric Biggers
Now that all filesystems have been converted to use fscrypt_prepare_new_inode() and fscrypt_set_context(), fscrypt_inherit_context() is no longer used. Remove it. Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: adjust logging for in-creation inodesEric Biggers
Now that a fscrypt_info may be set up for inodes that are currently being created and haven't yet had an inode number assigned, avoid logging confusing messages about "inode 0". Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22ubifs: use fscrypt_prepare_new_inode() and fscrypt_set_context()Eric Biggers
Convert ubifs to use the new functions fscrypt_prepare_new_inode() and fscrypt_set_context(). Unlike ext4 and f2fs, this doesn't appear to fix any deadlock bug. But it does shorten the code slightly and get all filesystems using the same helper functions, so that fscrypt_inherit_context() can be removed. It also fixes an incorrect error code where ubifs returned EPERM instead of the expected ENOKEY. Link: https://lore.kernel.org/r/20200917041136.178600-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22f2fs: use fscrypt_prepare_new_inode() and fscrypt_set_context()Eric Biggers
Convert f2fs to use the new functions fscrypt_prepare_new_inode() and fscrypt_set_context(). This avoids calling fscrypt_get_encryption_info() from under f2fs_lock_op(), which can deadlock because fscrypt_get_encryption_info() isn't GFP_NOFS-safe. For more details about this problem, see the earlier patch "fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()". This also fixes a f2fs-specific deadlock when the filesystem is mounted with '-o test_dummy_encryption' and a file is created in an unencrypted directory other than the root directory: INFO: task touch:207 blocked for more than 30 seconds. Not tainted 5.9.0-rc4-00099-g729e3d0919844 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:touch state:D stack: 0 pid: 207 ppid: 167 flags:0x00000000 Call Trace: [...] lock_page include/linux/pagemap.h:548 [inline] pagecache_get_page+0x25e/0x310 mm/filemap.c:1682 find_or_create_page include/linux/pagemap.h:348 [inline] grab_cache_page include/linux/pagemap.h:424 [inline] f2fs_grab_cache_page fs/f2fs/f2fs.h:2395 [inline] f2fs_grab_cache_page fs/f2fs/f2fs.h:2373 [inline] __get_node_page.part.0+0x39/0x2d0 fs/f2fs/node.c:1350 __get_node_page fs/f2fs/node.c:35 [inline] f2fs_get_node_page+0x2e/0x60 fs/f2fs/node.c:1399 read_inline_xattr+0x88/0x140 fs/f2fs/xattr.c:288 lookup_all_xattrs+0x1f9/0x2c0 fs/f2fs/xattr.c:344 f2fs_getxattr+0x9b/0x160 fs/f2fs/xattr.c:532 f2fs_get_context+0x1e/0x20 fs/f2fs/super.c:2460 fscrypt_get_encryption_info+0x9b/0x450 fs/crypto/keysetup.c:472 fscrypt_inherit_context+0x2f/0xb0 fs/crypto/policy.c:640 f2fs_init_inode_metadata+0xab/0x340 fs/f2fs/dir.c:540 f2fs_add_inline_entry+0x145/0x390 fs/f2fs/inline.c:621 f2fs_add_dentry+0x31/0x80 fs/f2fs/dir.c:757 f2fs_do_add_link+0xcd/0x130 fs/f2fs/dir.c:798 f2fs_add_link fs/f2fs/f2fs.h:3234 [inline] f2fs_create+0x104/0x290 fs/f2fs/namei.c:344 lookup_open.isra.0+0x2de/0x500 fs/namei.c:3103 open_last_lookups+0xa9/0x340 fs/namei.c:3177 path_openat+0x8f/0x1b0 fs/namei.c:3365 do_filp_open+0x87/0x130 fs/namei.c:3395 do_sys_openat2+0x96/0x150 fs/open.c:1168 [...] That happened because f2fs_add_inline_entry() locks the directory inode's page in order to add the dentry, then f2fs_get_context() tries to lock it recursively in order to read the encryption xattr. This problem is specific to "test_dummy_encryption" because normally the directory's fscrypt_info would be set up prior to f2fs_add_inline_entry() in order to encrypt the new filename. Regardless, the new design fixes this test_dummy_encryption deadlock as well as potential deadlocks with fs reclaim, by setting up any needed fscrypt_info structs prior to taking so many locks. The test_dummy_encryption deadlock was reported by Daniel Rosenberg. Reported-by: Daniel Rosenberg <drosen@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22ext4: use fscrypt_prepare_new_inode() and fscrypt_set_context()Eric Biggers
Convert ext4 to use the new functions fscrypt_prepare_new_inode() and fscrypt_set_context(). This avoids calling fscrypt_get_encryption_info() from within a transaction, which can deadlock because fscrypt_get_encryption_info() isn't GFP_NOFS-safe. For more details about this problem, see the earlier patch "fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()". Link: https://lore.kernel.org/r/20200917041136.178600-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22ext4: factor out ext4_xattr_credits_for_new_inode()Eric Biggers
To compute a new inode's xattr credits, we need to know whether the inode will be encrypted or not. When we switch to use the new helper function fscrypt_prepare_new_inode(), we won't find out whether the inode will be encrypted until slightly later than is currently the case. That will require moving the code block that computes the xattr credits. To make this easier and reduce the length of __ext4_new_inode(), move this code block into a new function ext4_xattr_credits_for_new_inode(). Link: https://lore.kernel.org/r/20200917041136.178600-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-22fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()Eric Biggers
fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But actually it isn't, since it uses functions like crypto_alloc_skcipher() which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save(). Therefore it can deadlock when called from a context that needs GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and f2fs_unlock_op(). This happens when creating a new encrypted file. We can't fix this by just not setting up the key for new inodes right away, since new symlinks need their key to encrypt the symlink target. So we need to set up the new inode's key before starting the transaction. But just calling fscrypt_get_encryption_info() earlier doesn't work, since it assumes the encryption context is already set, and the encryption context can't be set until the transaction. The recently proposed fscrypt support for the ceph filesystem (https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u) will have this same ordering problem too, since ceph will need to encrypt new symlinks before setting their encryption context. Finally, f2fs can deadlock when the filesystem is mounted with '-o test_dummy_encryption' and a new file is created in an existing unencrypted directory. Similarly, this is caused by holding too many locks when calling fscrypt_get_encryption_info(). To solve all these problems, add new helper functions: - fscrypt_prepare_new_inode() sets up a new inode's encryption key (fscrypt_info), using the parent directory's encryption policy and a new random nonce. It neither reads nor writes the encryption context. - fscrypt_set_context() persists the encryption context of a new inode, using the information from the fscrypt_info already in memory. This replaces fscrypt_inherit_context(). Temporarily keep fscrypt_inherit_context() around until all filesystems have been converted to use fscrypt_set_context(). Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-09-21io_uring: fix openat/openat2 unified prep handlingJens Axboe
A previous commit unified how we handle prep for these two functions, but this means that we check the allowed context (SQPOLL, specifically) later than we should. Move the ring type checking into the two parent functions, instead of doing it after we've done some setup work. Fixes: ec65fea5a8d7 ("io_uring: deduplicate io_openat{,2}_prep()") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21io_uring: mark statx/files_update/epoll_ctl as non-SQPOLLJens Axboe
These will naturally fail when attempted through SQPOLL, but either with -EFAULT or -EBADF. Make it explicit that these are not workable through SQPOLL and return -EINVAL, just like other ops that need to use ->files. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21io_uring: don't use retry based buffered reads for non-async bdevJens Axboe
Some block devices, like dm, bubble back -EAGAIN through the completion handler. We check for this in io_read(), but don't honor it for when we have copied the iov. Return -EAGAIN for this case before retrying, to force punt to io-wq. Fixes: bcf5a06304d6 ("io_uring: support true async buffered reads, if file provides it") Reported-by: Zorro Lang <zlang@redhat.com> Tested-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21io_uring: don't re-setup vecs/iter in io_resumit_prep() is already thereJens Axboe
If we already have mapped the necessary data for retry, then don't set it up again. It's a pointless operation, and we leak the iovec if it's a large (non-stack) vec. Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21btrfs: fix overflow when copying corrupt csums for a messageJohannes Thumshirn
Syzkaller reported a buffer overflow in btree_readpage_end_io_hook() when loop mounting a crafted image: detected buffer overflow in memcpy ------------[ cut here ]------------ kernel BUG at lib/string.c:1129! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 26 Comm: kworker/u4:2 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: btrfs-endio-meta btrfs_work_helper RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129 RSP: 0018:ffffc90000e27980 EFLAGS: 00010286 RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000 RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22 RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7 R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557335f440d0 CR3: 000000009647d000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: memcpy include/linux/string.h:405 [inline] btree_readpage_end_io_hook.cold+0x206/0x221 fs/btrfs/disk-io.c:642 end_bio_extent_readpage+0x4de/0x10c0 fs/btrfs/extent_io.c:2854 bio_endio+0x3cf/0x7f0 block/bio.c:1449 end_workqueue_fn+0x114/0x170 fs/btrfs/disk-io.c:1695 btrfs_work_helper+0x221/0xe20 fs/btrfs/async-thread.c:318 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Modules linked in: ---[ end trace b68924293169feef ]--- RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129 RSP: 0018:ffffc90000e27980 EFLAGS: 00010286 RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000 RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22 RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7 R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f95b7c4d008 CR3: 000000009647d000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 The overflow happens, because in btree_readpage_end_io_hook() we assume that we have found a 4 byte checksum instead of the real possible 32 bytes we have for the checksums. With the fix applied: [ 35.726623] BTRFS: device fsid 815caf9a-dc43-4d2a-ac54-764b8333d765 devid 1 transid 5 /dev/loop0 scanned by syz-repro (215) [ 35.738994] BTRFS info (device loop0): disk space caching is enabled [ 35.738998] BTRFS info (device loop0): has skinny extents [ 35.743337] BTRFS warning (device loop0): loop0 checksum verify failed on 1052672 wanted 0xf9c035fc8d239a54 found 0x67a25c14b7eabcf9 level 0 [ 35.743420] BTRFS error (device loop0): failed to read chunk root [ 35.745899] BTRFS error (device loop0): open_ctree failed Reported-by: syzbot+e864a35d361e1d4e29a5@syzkaller.appspotmail.com Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for checksumming") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-19fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match ↵Tobias Klauser
prototype Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the definition of dirtytime_interval_handler to match its prototype in linux/writeback.h which fixes the following sparse error/warning: fs/fs-writeback.c:2189:50: warning: incorrect type in argument 3 (different address spaces) fs/fs-writeback.c:2189:50: expected void * fs/fs-writeback.c:2189:50: got void [noderef] __user *buffer fs/fs-writeback.c:2184:5: error: symbol 'dirtytime_interval_handler' redeclared with different type (incompatible argument 3 (different address spaces)): fs/fs-writeback.c:2184:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... ) fs/fs-writeback.c: note: in included file: ./include/linux/writeback.h:374:5: note: previously declared as: ./include/linux/writeback.h:374:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... ) Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200907093140.13434-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19erofs: add REQ_RAHEAD flag to readahead requestsGao Xiang
Let's add REQ_RAHEAD flag so it'd be easier to identify readahead I/O requests in blktrace. Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200919072730.24989-3-hsiangkao@redhat.com Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-09-19erofs: fold in should_decompress_synchronously()Gao Xiang
should_decompress_synchronously() has one single condition for now, so fold it instead. Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200919072730.24989-2-hsiangkao@redhat.com Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-09-19erofs: avoid unnecessary variable `err'Gao Xiang
variable `err' in z_erofs_submit_queue() isn't useful here, remove it instead. Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200919072730.24989-1-hsiangkao@redhat.com Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-09-18erofs: remove unneeded parameterChao Yu
After commit 0615090c5044 ("erofs: convert compressed files from readpages to readahead"), add_to_page_cache_lru() was moved to mm code, so that in below call path, no page will be cached into @pagepool list or grabbed from @pagepool list: - z_erofs_readpage - z_erofs_do_read_page - preload_compressed_pages - erofs_allocpage Let's get rid of this unneeded @pagepool parameter. Signed-off-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200917011821.22767-1-yuchao0@huawei.com Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-09-18erofs: avoid duplicated permission check for "trusted." xattrsGao Xiang
Don't recheck it since xattr_permission() already checks CAP_SYS_ADMIN capability. Just follow 5d3ce4f70172 ("f2fs: avoid duplicated permission check for "trusted." xattrs") Reported-by: Hongyu Jin <hongyu.jin@unisoc.com> [ Gao Xiang: since it could cause some complex Android overlay permission issue as well on android-5.4+, it'd be better to backport to 5.4+ rather than pure cleanup on mainline. ] Cc: <stable@vger.kernel.org> # 5.4+ Link: https://lore.kernel.org/r/20200811070020.6339-1-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-09-18pNFS/flexfiles: Be consistent about mirror index typesTrond Myklebust
A mirror index is always of type u32. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-18pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on readTrond Myklebust
While it is true that reading from an unmirrored source always uses index 0, that is no longer true for mirrored sources when we fail over. Fixes: 563c53e73b8b ("NFS: Fix flexfiles read failover") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-17fuse: fix the ->direct_IO() treatment of iov_iterAl Viro
the callers rely upon having any iov_iter_truncate() done inside ->direct_IO() countered by iov_iter_reexpand(). Reported-by: Qian Cai <cai@redhat.com> Tested-by: Qian Cai <cai@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-17quota: simplify the quotactl compat handlingChristoph Hellwig
Fold the misaligned u64 workarounds into the main quotactl flow instead of implementing a separate compat syscall handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-16NFSv4.2: fix client's attribute cache management for copy_file_rangeOlga Kornievskaia
After client is done with the COPY operation, it needs to invalidate its pagecache (as it did no reading or writing of the data locally) and it needs to invalidate it's attributes just like it would have for a read on the source file and write on the destination file. Once the linux server started giving out read delegations to read+write opens, the destination file of the copy_file range started having delegations and not doing syncup on close of the file leading to xfstest failures for generic/430,431,432,433,565. v2: changing cache_validity needs to be protected by the i_lock. Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Fixes: 2e72448b07dc ("NFS: Add COPY nfs operation") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-16nfs: Fix security label length not being resetJeffrey Mitchell
nfs_readdir_page_filler() iterates over entries in a directory, reusing the same security label buffer, but does not reset the buffer's length. This causes decode_attr_security_label() to return -ERANGE if an entry's security label is longer than the previous one's. This error, in nfs4_decode_dirent(), only gets passed up as -EAGAIN, which causes another failed attempt to copy into the buffer. The second error is ignored and the remaining entries do not show up in ls, specifically the getdents64() syscall. Reproduce by creating multiple files in NFS and giving one of the later files a longer security label. ls will not see that file nor any that are added afterwards, though they will exist on the backend. In nfs_readdir_page_filler(), reset security label buffer length before every reuse Signed-off-by: Jeffrey Mitchell <jeffrey.mitchell@starlab.io> Fixes: b4487b935452 ("nfs: Fix getxattr kernel panic and memory overflow") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-14Merge tag 'for-5.9-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One of the recent lockdep fixes introduced a bug that breaks the search ioctl, which is used by some applications (bees, compsize). The patch made it to stable trees so we need this fixup to make it work again" * tag 'for-5.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix wrong address when faulting in pages in the search ioctl
2020-09-14io_uring: don't run task work on an exiting taskJens Axboe
This isn't safe, and isn't needed either. We are guaranteed that any work we queue is on a live task (and will be run), or it goes to our backup io-wq threads if the task is exiting. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-14io_uring: drop 'ctx' ref on task work cancelationJens Axboe
If task_work ends up being marked for cancelation, we go through a cancelation helper instead of the queue path. In converting task_work to always hold a ctx reference, this path was missed. Make sure that io_req_task_cancel() puts the reference that is being held against the ctx. Fixes: 6d816e088c35 ("io_uring: hold 'ctx' reference around task_work queue + execute") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-14btrfs: fix wrong address when faulting in pages in the search ioctlFilipe Manana
When faulting in the pages for the user supplied buffer for the search ioctl, we are passing only the base address of the buffer to the function fault_in_pages_writeable(). This means that after the first iteration of the while loop that searches for leaves, when we have a non-zero offset, stored in 'sk_offset', we try to fault in a wrong page range. So fix this by adding the offset in 'sk_offset' to the base address of the user supplied buffer when calling fault_in_pages_writeable(). Several users have reported that the applications compsize and bees have started to operate incorrectly since commit a48b73eca4ceb9 ("btrfs: fix potential deadlock in the search ioctl") was added to stable trees, and these applications make heavy use of the search ioctls. This fixes their issues. Link: https://lore.kernel.org/linux-btrfs/632b888d-a3c3-b085-cdf5-f9bb61017d92@lechevalier.se/ Link: https://github.com/kilobyte/compsize/issues/34 Fixes: a48b73eca4ceb9 ("btrfs: fix potential deadlock in the search ioctl") CC: stable@vger.kernel.org # 4.4+ Tested-by: A L <mail@lechevalier.se> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-13io_uring: grab any needed state during defer prepJens Axboe
Always grab work environment for deferred links. The assumption that we will be running it always from the task in question is false, as exiting tasks may mean that we're deferring this one to a thread helper. And at that point it's too late to grab the work environment. Fixes: debb85f496c9 ("io_uring: factor out grab_env() from defer_prep()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-13Merge tag 'driver-core-5.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small driver core and debugfs fixes for 5.9-rc5 Included in here are: - firmware loader memory leak fix - firmware loader testing fixes for non-EFI systems - device link locking fixes found by lockdep - kobject_del() bugfix that has been affecting some callers - debugfs minor fix All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: test_firmware: Test platform fw loading on non-EFI systems PM: <linux/device.h>: fix @em_pd kernel-doc warning kobject: Drop unneeded conditional in __kobject_del() driver core: Fix device_pm_lock() locking for device links MAINTAINERS: Add the security document to SECURITY CONTACT driver code: print symbolic error code debugfs: Fix module state check condition kobject: Restore old behaviour of kobject_del(NULL) firmware_loader: fix memory leak for paged buffer
2020-09-12Merge tag 'for-5.9-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes: - regression fix for a crash after failed snapshot creation - one more lockep fix: use nofs allocation when allocating missing device - fix reloc tree leak on degraded mount - make some extent buffer alignment checks less strict to mount filesystems created by btrfs-convert" * tag 'for-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix NULL pointer dereference after failure to create snapshot btrfs: free data reloc tree on failed mount btrfs: require only sector size alignment for parent eb bytenr btrfs: fix lockdep splat in add_missing_dev
2020-09-12Merge tag '5.9-rc4-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fix from Steve French: "A fix for lookup on DFS link when cifsacl or modefromsid is used" * tag '5.9-rc4-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix DFS mount with cifsacl/modefromsid
2020-09-10epoll: EPOLL_CTL_ADD: close the race in decision to take fast pathAl Viro
Checking for the lack of epitems refering to the epoll we want to insert into is not enough; we might have an insertion of that epoll into another one that has already collected the set of files to recheck for excessive reverse paths, but hasn't gotten to creating/inserting the epitem for it. However, any such insertion in progress can be detected - it will update the generation count in our epoll when it's done looking through it for files to check. That gets done under ->mtx of our epoll and that allows us to detect that safely. We are *not* holding epmutex here, so the generation count is not stable. However, since both the update of ep->gen by loop check and (later) insertion into ->f_ep_link are done with ep->mtx held, we are fine - the sequence is grab epmutex bump loop_check_gen ... grab tep->mtx // 1 tep->gen = loop_check_gen ... drop tep->mtx // 2 ... grab tep->mtx // 3 ... insert into ->f_ep_link ... drop tep->mtx // 4 bump loop_check_gen drop epmutex and if the fastpath check in another thread happens for that eventpoll, it can come * before (1) - in that case fastpath is just fine * after (4) - we'll see non-empty ->f_ep_link, slow path taken * between (2) and (3) - loop_check_gen is stable, with ->mtx providing barriers and we end up taking slow path. Note that ->f_ep_link emptiness check is slightly racy - we are protected against insertions into that list, but removals can happen right under us. Not a problem - in the worst case we'll end up taking a slow path for no good reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-10Merge tag 'f2fs-for-5.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Small bug fixes for: - SMR drive fix - infinite loop when building free node ids - EOF at DIO read" * tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: Return EOF on unaligned end of file DIO read f2fs: fix indefinite loop scanning for free nid f2fs: Fix type of section block count variables
2020-09-10block: remove check_disk_changeChristoph Hellwig
Remove the now unused check_disk_change helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-10block: add a bdev_check_media_change helperChristoph Hellwig
Like check_disk_changed, except that it does not call ->revalidate_disk but leaves that to the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-10epoll: replace ->visited/visited_list with generation countAl Viro
removes the need to clear it, along with the races. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-09epoll: do not insert into poll queues until all sanity checks are doneAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-09Merge tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix an NFS/RDMA resource leak - Fix the error handling during delegation recall - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR - Stop printk reading past end of string * tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: stop printk reading past end of string NFS: Zero-stateid SETATTR should first return delegation NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall xprtrdma: Release in-flight MRs on disconnect
2020-09-08f2fs: Return EOF on unaligned end of file DIO readGabriel Krisman Bertazi
Reading past end of file returns EOF for aligned reads but -EINVAL for unaligned reads on f2fs. While documentation is not strict about this corner case, most filesystem returns EOF on this case, like iomap filesystems. This patch consolidates the behavior for f2fs, by making it return EOF(0). it can be verified by a read loop on a file that does a partial read before EOF (A file that doesn't end at an aligned address). The following code fails on an unaligned file on f2fs, but not on btrfs, ext4, and xfs. while (done < total) { ssize_t delta = pread(fd, buf + done, total - done, off + done); if (!delta) break; ... } It is arguable whether filesystems should actually return EOF or -EINVAL, but since iomap filesystems support it, and so does the original DIO code, it seems reasonable to consolidate on that. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08f2fs: fix indefinite loop scanning for free nidSahitya Tummala
If the sbi->ckpt->next_free_nid is not NAT block aligned and if there are free nids in that NAT block between the start of the block and next_free_nid, then those free nids will not be scanned in scan_nat_page(). This results into mismatch between nm_i->available_nids and the sum of nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids will always be greater than the sum of free nids in all the blocks. Under this condition, if we use all the currently scanned free nids, then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids is still not zero but nm_i->free_nid_count of that partially scanned NAT block is zero. Fix this to align the nm_i->next_scan_nid to the first nid of the corresponding NAT block. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08f2fs: Fix type of section block count variablesShin'ichiro Kawasaki
Commit da52f8ade40b ("f2fs: get the right gc victim section when section has several segments") added code to count blocks of each section using variables with type 'unsigned short', which has 2 bytes size in many systems. However, the counts can be larger than the 2 bytes range and type conversion results in wrong values. Especially when the f2fs sections have blocks as many as USHRT_MAX + 1, the count is handled as 0. This triggers eternal loop in init_dirty_segmap() at mount system call. Fix this by changing the type of the variables to block_t. Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>