summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-07-22nfsd4: fix NULL dereference in nfsd/clients display codeJ. Bruce Fields
We hold the cl_lock here, and that's enough to keep stateid's from going away, but it's not enough to prevent the files they point to from going away. Take fi_lock and a reference and check for NULL, as we do in other code. Reported-by: NeilBrown <neilb@suse.de> Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-07-21fs-verity: use smp_load_acquire() for ->i_verity_infoEric Biggers
Normally smp_store_release() or cmpxchg_release() is paired with smp_load_acquire(). Sometimes smp_load_acquire() can be replaced with the more lightweight READ_ONCE(). However, for this to be safe, all the published memory must only be accessed in a way that involves the pointer itself. This may not be the case if allocating the object also involves initializing a static or global variable, for example. fsverity_info::tree_params.hash_alg->tfm is a crypto_ahash object that's internal to and is allocated by the crypto subsystem. So by using READ_ONCE() for ->i_verity_info, we're relying on internal implementation details of the crypto subsystem. Remove this fragile assumption by using smp_load_acquire() instead. Also fix the cmpxchg logic to correctly execute an ACQUIRE barrier when losing the cmpxchg race, since cmpxchg doesn't guarantee a memory barrier on failure. (Note: I haven't seen any real-world problems here. This change is just fixing the code to be guaranteed correct and less fragile.) Fixes: fd2d1acfcadf ("fs-verity: add the hook for file ->open()") Link: https://lore.kernel.org/r/20200721225920.114347-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21fscrypt: use smp_load_acquire() for ->i_crypt_infoEric Biggers
Normally smp_store_release() or cmpxchg_release() is paired with smp_load_acquire(). Sometimes smp_load_acquire() can be replaced with the more lightweight READ_ONCE(). However, for this to be safe, all the published memory must only be accessed in a way that involves the pointer itself. This may not be the case if allocating the object also involves initializing a static or global variable, for example. fscrypt_info includes various sub-objects which are internal to and are allocated by other kernel subsystems such as keyrings and crypto. So by using READ_ONCE() for ->i_crypt_info, we're relying on internal implementation details of these other kernel subsystems. Remove this fragile assumption by using smp_load_acquire() instead. (Note: I haven't seen any real-world problems here. This change is just fixing the code to be guaranteed correct and less fragile.) Fixes: e37a784d8b6a ("fscrypt: use READ_ONCE() to access ->i_crypt_info") Link: https://lore.kernel.org/r/20200721225920.114347-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21fscrypt: use smp_load_acquire() for ->s_master_keysEric Biggers
Normally smp_store_release() or cmpxchg_release() is paired with smp_load_acquire(). Sometimes smp_load_acquire() can be replaced with the more lightweight READ_ONCE(). However, for this to be safe, all the published memory must only be accessed in a way that involves the pointer itself. This may not be the case if allocating the object also involves initializing a static or global variable, for example. super_block::s_master_keys is a keyring, which is internal to and is allocated by the keyrings subsystem. By using READ_ONCE() for it, we're relying on internal implementation details of the keyrings subsystem. Remove this fragile assumption by using smp_load_acquire() instead. (Note: I haven't seen any real-world problems here. This change is just fixing the code to be guaranteed correct and less fragile.) Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl") Link: https://lore.kernel.org/r/20200721225920.114347-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21fscrypt: use smp_load_acquire() for fscrypt_prepared_keyEric Biggers
Normally smp_store_release() or cmpxchg_release() is paired with smp_load_acquire(). Sometimes smp_load_acquire() can be replaced with the more lightweight READ_ONCE(). However, for this to be safe, all the published memory must only be accessed in a way that involves the pointer itself. This may not be the case if allocating the object also involves initializing a static or global variable, for example. fscrypt_prepared_key includes a pointer to a crypto_skcipher object, which is internal to and is allocated by the crypto subsystem. By using READ_ONCE() for it, we're relying on internal implementation details of the crypto subsystem. Remove this fragile assumption by using smp_load_acquire() instead. (Note: I haven't seen any real-world problems here. This change is just fixing the code to be guaranteed correct and less fragile.) Fixes: 5fee36095cda ("fscrypt: add inline encryption support") Cc: Satya Tangirala <satyat@google.com> Link: https://lore.kernel.org/r/20200721225920.114347-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21fscrypt: switch fscrypt_do_sha256() to use the SHA-256 libraryEric Biggers
fscrypt_do_sha256() is only used for hashing encrypted filenames to create no-key tokens, which isn't performance-critical. Therefore a C implementation of SHA-256 is sufficient. Also, the logic to create no-key tokens is always potentially needed. This differs from fscrypt's other dependencies on crypto API algorithms, which are conditionally needed depending on what encryption policies userspace is using. Therefore, for fscrypt there isn't much benefit to allowing SHA-256 to be a loadable module. So, make fscrypt_do_sha256() use the SHA-256 library instead of the crypto_shash API. This is much simpler, since it avoids having to implement one-time-init (which is hard to do correctly, and in fact was implemented incorrectly) and handle failures to allocate the crypto_shash object. Fixes: edc440e3d27f ("fscrypt: improve format of no-key names") Cc: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20200721225920.114347-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21btrfs: fix mount failure caused by race with umountBoris Burkov
It is possible to cause a btrfs mount to fail by racing it with a slow umount. The crux of the sequence is generic_shutdown_super not yet calling sop->put_super before btrfs_mount_root calls btrfs_open_devices. If that occurs, btrfs_open_devices will decide the opened counter is non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to 0. From here, mount will call sget which will result in grab_super trying to take the super block umount semaphore. That semaphore will be held by the slow umount, so mount will block. Before up-ing the semaphore, umount will delete the super block, resulting in mount's sget reliably allocating a new one, which causes the mount path to dutifully fill it out, and increment total_rw_bytes a second time, which causes the mount to fail, as we see double the expected bytes. Here is the sequence laid out in greater detail: CPU0 CPU1 down_write sb->s_umount btrfs_kill_super kill_anon_super(sb) generic_shutdown_super(sb); shrink_dcache_for_umount(sb); sync_filesystem(sb); evict_inodes(sb); // SLOW btrfs_mount_root btrfs_scan_one_device fs_devices = device->fs_devices fs_info->fs_devices = fs_devices // fs_devices-opened makes this a no-op btrfs_open_devices(fs_devices, mode, fs_type) s = sget(fs_type, test, set, flags, fs_info); find sb in s_instances grab_super(sb); down_write(&s->s_umount); // blocks sop->put_super(sb) // sb->fs_devices->opened == 2; no-op spin_lock(&sb_lock); hlist_del_init(&sb->s_instances); spin_unlock(&sb_lock); up_write(&sb->s_umount); return 0; retry lookup don't find sb in s_instances (deleted by CPU0) s = alloc_super return s; btrfs_fill_super(s, fs_devices, data) open_ctree // fs_devices total_rw_bytes improperly set! btrfs_read_chunk_tree read_one_dev // increment total_rw_bytes again!! super_total_bytes < fs_devices->total_rw_bytes // ERROR!!! To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree before the calls to read_one_dev, while holding the sb umount semaphore and the uuid mutex. To reproduce, it is sufficient to dirty a decent number of inodes, then quickly umount and mount. for i in $(seq 0 500) do dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1 done umount /mnt/foo& mount /mnt/foo does the trick for me. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21btrfs: fix page leaks after failure to lock page for delallocRobbie Ko
When locking pages for delalloc, we check if it's dirty and mapping still matches. If it does not match, we need to return -EAGAIN and release all pages. Only the current page was put though, iterate over all the remaining pages too. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21btrfs: qgroup: fix data leak caused by race between writeback and truncateQu Wenruo
[BUG] When running tests like generic/013 on test device with btrfs quota enabled, it can normally lead to data leak, detected at unmount time: BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096 ------------[ cut here ]------------ WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace caf08beafeca2392 ]--- BTRFS error (device dm-3): qgroup reserved space leaked [CAUSE] In the offending case, the offending operations are: 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0 The following sequence of events could happen after the writev(): CPU1 (writeback) | CPU2 (truncate) ----------------------------------------------------------------- btrfs_writepages() | |- extent_write_cache_pages() | |- Got page for 1003520 | | 1003520 is Dirty, no writeback | | So (!clear_page_dirty_for_io()) | | gets called for it | |- Now page 1003520 is Clean. | | | btrfs_setattr() | | |- btrfs_setsize() | | |- truncate_setsize() | | New i_size is 18388 |- __extent_writepage() | | |- page_offset() > i_size | |- btrfs_invalidatepage() | |- Page is clean, so no qgroup | callback executed This means, the qgroup reserved data space is not properly released in btrfs_invalidatepage() as the page is Clean. [FIX] Instead of checking the dirty bit of a page, call btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage(). As qgroup rsv are completely bound to the QGROUP_RESERVED bit of io_tree, not bound to page status, thus we won't cause double freeing anyway. Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21btrfs: fix double free on ulist after backref resolution failureFilipe Manana
At btrfs_find_all_roots_safe() we allocate a ulist and set the **roots argument to point to it. However if later we fail due to an error returned by find_parent_nodes(), we free that ulist but leave a dangling pointer in the **roots argument. Upon receiving the error, a caller of this function can attempt to free the same ulist again, resulting in an invalid memory access. One such scenario is during qgroup accounting: btrfs_qgroup_account_extents() --> calls btrfs_find_all_roots() passes &new_roots (a stack allocated pointer) to btrfs_find_all_roots() --> btrfs_find_all_roots() just calls btrfs_find_all_roots_safe() passing &new_roots to it --> allocates ulist and assigns its address to **roots (which points to new_roots from btrfs_qgroup_account_extents()) --> find_parent_nodes() returns an error, so we free the ulist and leave **roots pointing to it after returning --> btrfs_qgroup_account_extents() sees btrfs_find_all_roots() returned an error and jumps to the label 'cleanup', which just tries to free again the same ulist Stack trace example: ------------[ cut here ]------------ BTRFS: tree first key check failed WARNING: CPU: 1 PID: 1763215 at fs/btrfs/disk-io.c:422 btrfs_verify_level_key+0xe0/0x180 [btrfs] Modules linked in: dm_snapshot dm_thin_pool (...) CPU: 1 PID: 1763215 Comm: fsstress Tainted: G W 5.8.0-rc3-btrfs-next-64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_verify_level_key+0xe0/0x180 [btrfs] Code: 28 5b 5d (...) RSP: 0018:ffffb89b473779a0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff90397759bf08 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff RBP: ffff9039a419c000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffb89b43301000 R12: 000000000000005e R13: ffffb89b47377a2e R14: ffffb89b473779af R15: 0000000000000000 FS: 00007fc47e1e1000(0000) GS:ffff9039ac200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc47e1df000 CR3: 00000003d9e4e001 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: read_block_for_search+0xf6/0x350 [btrfs] btrfs_next_old_leaf+0x242/0x650 [btrfs] resolve_indirect_refs+0x7cf/0x9e0 [btrfs] find_parent_nodes+0x4ea/0x12c0 [btrfs] btrfs_find_all_roots_safe+0xbf/0x130 [btrfs] btrfs_qgroup_account_extents+0x9d/0x390 [btrfs] btrfs_commit_transaction+0x4f7/0xb20 [btrfs] btrfs_sync_file+0x3d4/0x4d0 [btrfs] do_fsync+0x38/0x70 __x64_sys_fdatasync+0x13/0x20 do_syscall_64+0x5c/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc47e2d72e3 Code: Bad RIP value. RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0 softirqs last enabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace 8639237550317b48 ]--- BTRFS error (device sdc): tree first key mismatch detected, bytenr=62324736 parent_transid=94 key expected=(262,108,1351680) has=(259,108,1921024) general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 1763215 Comm: fsstress Tainted: G W 5.8.0-rc3-btrfs-next-64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:ulist_release+0x14/0x60 [btrfs] Code: c7 07 00 (...) RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840 FS: 00007fc47e1e1000(0000) GS:ffff9039ac600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8c1c0a51c8 CR3: 00000003d9e4e004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ulist_free+0x13/0x20 [btrfs] btrfs_qgroup_account_extents+0xf3/0x390 [btrfs] btrfs_commit_transaction+0x4f7/0xb20 [btrfs] btrfs_sync_file+0x3d4/0x4d0 [btrfs] do_fsync+0x38/0x70 __x64_sys_fdatasync+0x13/0x20 do_syscall_64+0x5c/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc47e2d72e3 Code: Bad RIP value. RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50 Modules linked in: dm_snapshot dm_thin_pool (...) ---[ end trace 8639237550317b49 ]--- RIP: 0010:ulist_release+0x14/0x60 [btrfs] Code: c7 07 00 (...) RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840 FS: 00007fc47e1e1000(0000) GS:ffff9039ad200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6a776f7d40 CR3: 00000003d9e4e002 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fix this by making btrfs_find_all_roots_safe() set *roots to NULL after it frees the ulist. Fixes: 8da6d5815c592b ("Btrfs: added btrfs_find_all_roots()") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-21f2fs: add F2FS_IOC_SEC_TRIM_FILE ioctlDaeho Jeong
Added a new ioctl to send discard commands or/and zero out to selected data area of a regular file for security reason. The way of handling range.len of F2FS_IOC_SEC_TRIM_FILE: 1. Added -1 value support for range.len to secure trim the whole blocks starting from range.start regardless of i_size. 2. If the end of the range passes over the end of file, it means until the end of file (i_size). 3. ignored the case of that range.len is zero to prevent the function from making end_addr zero and triggering different behaviour of the function. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-21f2fs: should avoid inode eviction in synchronous pathJaegeuk Kim
https://bugzilla.kernel.org/show_bug.cgi?id=208565 PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init" #0 [<c0b420ec>] (__schedule) from [<c0b423c8>] #1 [<c0b423c8>] (schedule) from [<c0b459d4>] #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>] #3 [<c0b44fa0>] (down_read) from [<c044233c>] #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>] #5 [<c0442890>] (f2fs_truncate) from [<c044d408>] #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>] #7 [<c030be18>] (evict) from [<c030a558>] #8 [<c030a558>] (iput) from [<c047c600>] #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>] #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>] #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>] #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>] #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>] #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>] #15 [<c0324294>] (do_fsync) from [<c0324014>] #16 [<c0324014>] (sys_fsync) from [<c0108bc0>] This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where iput() requires f2fs_lock_op() again resulting in livelock. Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-21fscrypt: restrict IV_INO_LBLK_* to AES-256-XTSEric Biggers
IV_INO_LBLK_* exist only because of hardware limitations, and currently the only known use case for them involves AES-256-XTS. Therefore, for now only allow them in combination with AES-256-XTS. This way we don't have to worry about them being combined with other encryption modes. (To be clear, combining IV_INO_LBLK_* with other encryption modes *should* work just fine. It's just not being tested, so we can't be 100% sure it works. So with no known use case, it's best to disallow it for now, just like we don't allow other weird combinations like AES-256-XTS contents encryption with Adiantum filenames encryption.) This can be relaxed later if a use case for other combinations arises. Fixes: b103fb7653ff ("fscrypt: add support for IV_INO_LBLK_64 policies") Fixes: e3b1078bedd3 ("fscrypt: add support for IV_INO_LBLK_32 policies") Link: https://lore.kernel.org/r/20200721181012.39308-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-21exec: Implement kernel_execveEric W. Biederman
To allow the kernel not to play games with set_fs to call exec implement kernel_execve. The function kernel_execve takes pointers into kernel memory and copies the values pointed to onto the new userspace stack. The calls with arguments from kernel space of do_execve are replaced with calls to kernel_execve. The calls do_execve and do_execveat are made static as there are now no callers outside of exec. The comments that mention do_execve are updated to refer to kernel_execve or execve depending on the circumstances. In addition to correcting the comments, this makes it easy to grep for do_execve and verify it is not used. Inspired-by: https://lkml.kernel.org/r/20200627072704.2447163-1-hch@lst.de Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87wo365ikj.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21exec: Factor bprm_stack_limits out of prepare_arg_pagesEric W. Biederman
In preparation for implementiong kernel_execve (which will take kernel pointers not userspace pointers) factor out bprm_stack_limits out of prepare_arg_pages. This separates the counting which depends upon the getting data from userspace from the calculations of the stack limits which is usable in kernel_execve. The remove prepare_args_pages and compute bprm->argc and bprm->envc directly in do_execveat_common, before bprm_stack_limits is called. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/87365u6x60.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21exec: Factor bprm_execve out of do_execve_commonEric W. Biederman
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Factor bprm_execve out of do_execve_common to separate out the copying of arguments to the newe stack, and the rest of exec. In separating bprm_execve from do_execve_common the copying of the arguments onto the new stack happens earlier. As the copying of the arguments does not depend any security hooks, files, the file table, current->in_execve, current->fs->in_exec, bprm->unsafe, or creds this is safe. Likewise the security hook security_creds_for_exec does not depend upon preventing the argument copying from happening. In addition to making it possible to implement kernel_execve that performs the copying differently, this separation of bprm_execve from do_execve_common makes for a nice separation of responsibilities making the exec code easier to navigate. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/878sfm6x6x.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21exec: Move bprm_mm_init into alloc_bprmEric W. Biederman
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Move the allocation and initialization of bprm->mm into alloc_bprm so that the bprm->mm is available early to store the new user stack into. This is a prerequisite for copying argv and envp into the new user stack early before ther rest of exec. To keep the things consistent the cleanup of bprm->mm is moved into free_bprm. So that bprm->mm will be cleaned up whenever bprm->mm is allocated and free_bprm are called. Moving bprm_mm_init earlier is safe as it does not depend on any files, current->in_execve, current->fs->in_exec, bprm->unsafe, or the if the file table is shared. (AKA bprm_mm_init does not depend on any of the code that happens between alloc_bprm and where it was previously called.) This moves bprm->mm cleanup after current->fs->in_exec is set to 0. This is safe because current->fs->in_exec is only used to preventy taking an additional reference on the fs_struct. This moves bprm->mm cleanup after current->in_execve is set to 0. This is safe because current->in_execve is only used by the lsms (apparmor and tomoyou) and always for LSM specific functions, never for anything to do with the mm. This adds bprm->mm cleanup into the successful return path. This is safe because being on the successful return path implies that begin_new_exec succeeded and set brpm->mm to NULL. As bprm->mm is NULL bprm cleanup I am moving into free_bprm will do nothing. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/87eepe6x7p.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21exec: Move initialization of bprm->filename into alloc_bprmEric W. Biederman
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Move the computation of bprm->filename and possible allocation of a name in the case of execveat into alloc_bprm to make that possible. The exectuable name, the arguments, and the environment are copied into the new usermode stack which is stored in bprm until exec passes the point of no return. As the executable name is copied first onto the usermode stack it needs to be known. As there are no dependencies to computing the executable name, compute it early in alloc_bprm. As an implementation detail if the filename needs to be generated because it embeds a file descriptor store that filename in a new field bprm->fdpath, and free it in free_bprm. Previously this was done in an independent variable pathbuf. I have renamed pathbuf fdpath because fdpath is more suggestive of what kind of path is in the variable. I moved fdpath into struct linux_binprm because it is tightly tied to the other variables in struct linux_binprm, and as such is needed to allow the call alloc_binprm to move. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/87k0z66x8f.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21exec: Factor out alloc_bprmEric W. Biederman
Currently it is necessary for the usermode helper code and the code that launches init to use set_fs so that pages coming from the kernel look like they are coming from userspace. To allow that usage of set_fs to be removed cleanly the argument copying from userspace needs to happen earlier. Move the allocation of the bprm into it's own function (alloc_bprm) and move the call of alloc_bprm before unshare_files so that bprm can ultimately be allocated, the arguments can be placed on the new stack, and then the bprm can be passed into the core of exec. Neither the allocation of struct binprm nor the unsharing depend upon each other so swapping the order in which they are called is trivially safe. To keep things consistent the order of cleanup at the end of do_execve_common swapped to match the order of initialization. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87pn8y6x9a.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-07-21exfat: fix name_hash computation on big endian systemsIlya Ponetayev
On-disk format for name_hash field is LE, so it must be explicitly transformed on BE system for proper result. Fixes: 370e812b3ec1 ("exfat: add nls operations") Cc: stable@vger.kernel.org # v5.7 Signed-off-by: Chen Minqiang <ptpt52@gmail.com> Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-07-21exfat: fix wrong size update of stream entry by typoHyeongseok Kim
The stream.size field is updated to the value of create timestamp of the file entry. Fix this to use correct stream entry pointer. Fixes: 29bbb14bfc80 ("exfat: fix incorrect update of stream entry in __exfat_truncate()") Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-07-21exfat: fix wrong hint_stat initialization in exfat_find_dir_entry()Namjae Jeon
We found the wrong hint_stat initialization in exfat_find_dir_entry(). It should be initialized when cluster is EXFAT_EOF_CLUSTER. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org # v5.7 Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-07-21exfat: fix overflow issue in exfat_cluster_to_sector()Namjae Jeon
An overflow issue can occur while calculating sector in exfat_cluster_to_sector(). It needs to cast clus's type to sector_t before left shifting. Fixes: 1acf1a564b60 ("exfat: add in-memory and on-disk structures and headers") Cc: stable@vger.kernel.org # v5.7 Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-07-20fscrypt: rename FS_KEY_DERIVATION_NONCE_SIZEEric Biggers
The name "FS_KEY_DERIVATION_NONCE_SIZE" is a bit outdated since due to the addition of FSCRYPT_POLICY_FLAG_DIRECT_KEY, the file nonce may now be used as a tweak instead of for key derivation. Also, we're now prefixing the fscrypt constants with "FSCRYPT_" instead of "FS_". Therefore, rename this constant to FSCRYPT_FILE_NONCE_SIZE. Link: https://lore.kernel.org/r/20200708215722.147154-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-20fscrypt: add comments that describe the HKDF info stringsEric Biggers
Each HKDF context byte is associated with a specific format of the remaining part of the application-specific info string. Add comments so that it's easier to keep track of what these all are. Link: https://lore.kernel.org/r/20200708215529.146890-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-20f2fs: segment.h: delete a duplicated wordRandy Dunlap
Drop the repeated word "the" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: linux-f2fs-devel@lists.sourceforge.net Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-20f2fs: compress: fix to avoid memory leak on cc->cpagesChao Yu
Memory allocated for storing compressed pages' poitner should be released after f2fs_write_compressed_pages(), otherwise it will cause memory leak issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-20f2fs: use generic names for generic ioctlsEric Biggers
Don't define F2FS_IOC_* aliases to ioctls that already have a generic FS_IOC_* name. These aliases are unnecessary, and they make it unclear which ioctls are f2fs-specific and which are generic. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-07-20zonefs: count pages after truncating the iteratorJohannes Thumshirn
Count pages after possibly truncating the iterator to the maximum zone append size, not before. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-07-20zonefs: Fix compilation warningDamien Le Moal
Avoid the compilation warning "Variable 'ret' is reassigned a value before the old one has been used." in zonefs_create_zgroup() by setting ret for the error path only if an error happens. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-07-20Merge 5.8-rc6 into driver-core-nextGreg Kroah-Hartman
We need the driver core fixes in here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-19proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTOREAdrian Reber
Opening files in /proc/pid/map_files when the current user is CAP_CHECKPOINT_RESTORE capable in the root namespace is useful for checkpointing and restoring to recover files that are unreachable via the file system such as deleted files, or memfd files. Signed-off-by: Adrian Reber <areber@redhat.com> Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Link: https://lore.kernel.org/r/20200719100418.2112740-5-areber@redhat.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-199p: remove unused code in 9pJianyong Wu
These codes have been commented out since 2007 and lay in kernel since then. So, it's better to remove them. Link: http://lkml.kernel.org/r/20200628074337.45895-1-jianyong.wu@arm.com Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2020-07-199p: Fix memory leak in v9fs_mountZheng Bin
v9fs_mount v9fs_session_init v9fs_cache_session_get_cookie v9fs_random_cachetag -->alloc cachetag v9ses->fscache = fscache_acquire_cookie -->maybe NULL sb = sget -->fail, goto clunk clunk_fid: v9fs_session_close if (v9ses->fscache) -->NULL kfree(v9ses->cachetag) Thus memleak happens. Link: http://lkml.kernel.org/r/20200615012153.89538-1-zhengbin13@huawei.com Fixes: 60e78d2c993e ("9p: Add fscache support to 9p") Cc: <stable@vger.kernel.org> # v2.6.32+ Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2020-07-199p: retrieve fid from file when file instance exist.Jianyong Wu
In the current setattr implementation in 9p, fid is always retrieved from dentry no matter file instance exists or not. If so, there may be some info related to opened file instance dropped. So it's better to retrieve fid from file instance when it is passed to setattr. for example: fd=open("tmp", O_RDWR); ftruncate(fd, 10); The file context related with the fd will be lost as fid is always retrieved from dentry, then the backend can't get the info of file context. It is against the original intention of user and may lead to bug. Link: http://lkml.kernel.org/r/20200710101548.10108-1-jianyong.wu@arm.com Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2020-07-18io_uring: always allow drain/link/hardlink/async sqe flagsDaniele Albano
We currently filter these for timeout_remove/async_cancel/files_update, but we only should be filtering for fixed file and buffer select. This also causes a second read of sqe->flags, which isn't needed. Just check req->flags for the relevant bits. This then allows these commands to be used in links, for example, like everything else. Signed-off-by: Daniele Albano <d.albano@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-17io_uring: ensure double poll additions work with both request typesJens Axboe
The double poll additions were centered around doing POLL_ADD on file descriptors that use more than one waitqueue (typically one for read, one for write) when being polled. However, it can also end up being triggered for when we use poll triggered retry. For that case, we cannot safely use req->io, as that could be used by the request type itself. Add a second io_poll_iocb pointer in the structure we allocate for poll based retry, and ensure we use the right one from the two paths. Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-17Merge tag 'nfs-for-5.8-3' of git://git.linux-nfs.org/projects/anna/linux-nfs ↵Linus Torvalds
into master Pull NFS client fixes from Anna Schumaker: "A few more NFS client bugfixes for Linux 5.8: NFS: - Fix interrupted slots by using the SEQUENCE operation SUNRPC: - revert d03727b248d0 to fix unkillable IOs xprtrdma: - Fix double-free in rpcrdma_ep_create() - Fix recursion into rpcrdma_xprt_disconnect() - Fix return code from rpcrdma_xprt_connect() - Fix handling of connect errors - Fix incorrect header size calculations" * tag 'nfs-for-5.8-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO compeletion") xprtrdma: fix incorrect header size calculations NFS: Fix interrupted slots by sending a solo SEQUENCE operation xprtrdma: Fix handling of connect errors xprtrdma: Fix return code from rpcrdma_xprt_connect() xprtrdma: Fix recursion into rpcrdma_xprt_disconnect() xprtrdma: Fix double-free in rpcrdma_ep_create()
2020-07-17xfs: preserve inode versioning across remountsEric Sandeen
The MS_I_VERSION mount flag is exposed via the VFS, as documented in the mount manpages etc; see the iversion and noiversion mount options in mount(8). As a result, mount -o remount looks for this option in /proc/mounts and will only send the I_VERSION flag back in during remount it it is present. Since it's not there, a remount will /remove/ the I_VERSION flag at the vfs level, and iversion functionality is lost. xfs v5 superblocks intend to always have i_version enabled; it is set as a default at mount time, but is lost during remount for the reasons above. The generic fix would be to expose this documented option in /proc/mounts, but since that was rejected, fix it up again in the xfs remount path instead, so that at least xfs won't suffer from this misbehavior. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-17SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO ↵Olga Kornievskaia
compeletion") Reverting commit d03727b248d0 "NFSv4 fix CLOSE not waiting for direct IO compeletion". This patch made it so that fput() by calling inode_dio_done() in nfs_file_release() would wait uninterruptably for any outstanding directIO to the file (but that wait on IO should be killable). The problem the patch was also trying to address was REMOVE returning ERR_ACCESS because the file is still opened, is supposed to be resolved by server returning ERR_FILE_OPEN and not ERR_ACCESS. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-07-17Merge tag 'io_uring-5.8-2020-07-17' of git://git.kernel.dk/linux-block into ↵Linus Torvalds
master Pull io_uring fix from Jens Axboe: "Fix for a case where, with automatic buffer selection, we can leak the buffer descriptor for recvmsg" * tag 'io_uring-5.8-2020-07-17' of git://git.kernel.dk/linux-block: io_uring: fix recvmsg memory leak with buffer selection
2020-07-17Merge tag 'fuse-fixes-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into master Pull fuse fixes from Miklos Szeredi: - two regressions in this cycle caused by the conversion of writepage list to an rb_tree - two regressions in v5.4 cause by the conversion to the new mount API - saner behavior of fsconfig(2) for the reconfigure case - an ancient issue with FS_IOC_{GET,SET}FLAGS ioctls * tag 'fuse-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS fuse: don't ignore errors from fuse_writepages_fill() fuse: clean up condition for writepage sending fuse: reject options on reconfigure via fsconfig(2) fuse: ignore 'data' argument of mount(..., MS_REMOUNT) fuse: use ->reconfigure() instead of ->remount_fs() fuse: fix warning in tree_insert() and clean up writepage insertion fuse: move rb_erase() before tree_insert()
2020-07-17Merge tag 'ovl-fixes-5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into master Pull overlayfs fixes from Miklos Szeredi: - fix a regression introduced in v4.20 in handling a regenerated squashfs lower layer - two regression fixes for this cycle, one of which is Oops inducing - miscellaneous issues * tag 'ovl-fixes-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix lookup of indexed hardlinks with metacopy ovl: fix unneeded call to ovl_change_flags() ovl: fix mount option checks for nfs_export with no upperdir ovl: force read-only sb on failure to create index dir ovl: fix regression with re-formatted lower squashfs ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=on ovl: relax WARN_ON() when decoding lower directory file handle ovl: remove not used argument in ovl_check_origin ovl: change ovl_copy_up_flags static ovl: inode reference leak in ovl_is_inuse true case.
2020-07-17NFSv4.0 allow nconnect for v4.0Olga Kornievskaia
It looks like this "else" is just a typo. It turns off nconnect for NFSv4.0 even though it works for every other version. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-17freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFSHe Zhe
commit 0688e64bc600 ("NFS: Allow signal interruption of NFS4ERR_DELAYed operations") introduces nfs4_delay_interruptible which also needs an _unsafe version to avoid the following call trace for the same reason explained in commit 416ad3c9c006 ("freezer: add unsafe versions of freezable helpers for NFS") CPU: 4 PID: 3968 Comm: rm Tainted: G W 5.8.0-rc4 #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace+0x0/0x1dc show_stack+0x20/0x30 dump_stack+0xdc/0x150 debug_check_no_locks_held+0x98/0xa0 nfs4_delay_interruptible+0xd8/0x120 nfs4_handle_exception+0x130/0x170 nfs4_proc_rmdir+0x8c/0x220 nfs_rmdir+0xa4/0x360 vfs_rmdir.part.0+0x6c/0x1b0 do_rmdir+0x18c/0x210 __arm64_sys_unlinkat+0x64/0x7c el0_svc_common.constprop.0+0x7c/0x110 do_el0_svc+0x24/0xa0 el0_sync_handler+0x13c/0x1b8 el0_sync+0x158/0x180 Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-17Merge branch 'xattr-devel'Trond Myklebust
2020-07-16treewide: Remove uninitialized_var() usageKees Cook
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16f2fs: Eliminate usage of uninitialized_var() macroJason Yan
This is an effort to eliminate the uninitialized_var() macro[1]. The use of this macro is the wrong solution because it forces off ANY analysis by the compiler for a given variable. It even masks "unused variable" warnings. Quoted from Linus[2]: "It's a horrible thing to use, in that it adds extra cruft to the source code, and then shuts up a compiler warning (even the _reliable_ warnings from gcc)." Fix it by remove this variable since it is not needed at all. [1] https://github.com/KSPP/linux/issues/81 [2] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200615085132.166470-1-yanaijie@huawei.com Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16block: integrate bd_start_claiming into __blkdev_getChristoph Hellwig
bd_start_claiming duplicates a lot of the work done in __blkdev_get. Integrate the two functions to avoid the duplicate work, and to do the right thing for the md -ERESTARTSYS corner case. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-16block: use bd_prepare_to_claim directly in the loop driverChristoph Hellwig
The arcane magic in bd_start_claiming is only needed to be able to claim a block_device that hasn't been fully set up. Switch the loop driver that claims from the ioctl path with a fully set up struct block_device to just use the much simpler bd_prepare_to_claim directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>