summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-11-25cifs: Fix retrieval of DFS referrals in cifs_mount()Paulo Alcantara (SUSE)
Make sure that DFS referrals are sent to newly resolved root targets as in a multi tier DFS setup. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Link: https://lkml.kernel.org/r/05aa2995-e85e-0ff4-d003-5bb08bd17a22@canonical.com Cc: stable@vger.kernel.org Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: Fix potential softlockups while refreshing DFS cachePaulo Alcantara (SUSE)
We used to skip reconnects on all SMB2_IOCTL commands due to SMB3+ FSCTL_VALIDATE_NEGOTIATE_INFO - which made sense since we're still establishing a SMB session. However, when refresh_cache_worker() calls smb2_get_dfs_refer() and we're under reconnect, SMB2_ioctl() will not be able to get a proper status error (e.g. -EHOSTDOWN in case we failed to reconnect) but an -EAGAIN from cifs_send_recv() thus looping forever in refresh_cache_worker(). Fixes: e99c63e4d86d ("SMB3: Fix deadlock in validate negotiate hits reconnect") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Suggested-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: Fix lookup of root ses in DFS referral cachePaulo Alcantara (SUSE)
We don't care about module aliasing validation in cifs_compose_mount_options(..., is_smb3) when finding the root SMB session of an DFS namespace in order to refresh DFS referral cache. The following issue has been observed when mounting with '-t smb3' and then specifying 'vers=2.0': ... Nov 08 15:27:08 tw kernel: address conversion returned 0 for FS0.WIN.LOCAL Nov 08 15:27:08 tw kernel: [kworke] ==> dns_query((null),FS0.WIN.LOCAL,13,(null)) Nov 08 15:27:08 tw kernel: [kworke] call request_key(,FS0.WIN.LOCAL,) Nov 08 15:27:08 tw kernel: [kworke] ==> dns_resolver_cmp(FS0.WIN.LOCAL,FS0.WIN.LOCAL) Nov 08 15:27:08 tw kernel: [kworke] <== dns_resolver_cmp() = 1 Nov 08 15:27:08 tw kernel: [kworke] <== dns_query() = 13 Nov 08 15:27:08 tw kernel: fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: FS0.WIN.LOCAL to 192.168.30.26 ===> Nov 08 15:27:08 tw kernel: CIFS VFS: vers=2.0 not permitted when mounting with smb3 Nov 08 15:27:08 tw kernel: fs/cifs/dfs_cache.c: CIFS VFS: leaving refresh_tcon (xid = 26) rc = -22 ... Fixes: 5072010ccf05 ("cifs: Fix DFS cache refresher for DFS links") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: Fix use-after-free bug in cifs_reconnect()Paulo Alcantara (SUSE)
Ensure we grab an active reference in cifs superblock while doing failover to prevent automounts (DFS links) of expiring and then destroying the superblock pointer. This patch fixes the following KASAN report: [ 464.301462] BUG: KASAN: use-after-free in cifs_reconnect+0x6ab/0x1350 [ 464.303052] Read of size 8 at addr ffff888155e580d0 by task cifsd/1107 [ 464.304682] CPU: 3 PID: 1107 Comm: cifsd Not tainted 5.4.0-rc4+ #13 [ 464.305552] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58-rebuilt.opensuse.org 04/01/2014 [ 464.307146] Call Trace: [ 464.307875] dump_stack+0x5b/0x90 [ 464.308631] print_address_description.constprop.0+0x16/0x200 [ 464.309478] ? cifs_reconnect+0x6ab/0x1350 [ 464.310253] ? cifs_reconnect+0x6ab/0x1350 [ 464.311040] __kasan_report.cold+0x1a/0x41 [ 464.311811] ? cifs_reconnect+0x6ab/0x1350 [ 464.312563] kasan_report+0xe/0x20 [ 464.313300] cifs_reconnect+0x6ab/0x1350 [ 464.314062] ? extract_hostname.part.0+0x90/0x90 [ 464.314829] ? printk+0xad/0xde [ 464.315525] ? _raw_spin_lock+0x7c/0xd0 [ 464.316252] ? _raw_read_lock_irq+0x40/0x40 [ 464.316961] ? ___ratelimit+0xed/0x182 [ 464.317655] cifs_readv_from_socket+0x289/0x3b0 [ 464.318386] cifs_read_from_socket+0x98/0xd0 [ 464.319078] ? cifs_readv_from_socket+0x3b0/0x3b0 [ 464.319782] ? try_to_wake_up+0x43c/0xa90 [ 464.320463] ? cifs_small_buf_get+0x4b/0x60 [ 464.321173] ? allocate_buffers+0x98/0x1a0 [ 464.321856] cifs_demultiplex_thread+0x218/0x14a0 [ 464.322558] ? cifs_handle_standard+0x270/0x270 [ 464.323237] ? __switch_to_asm+0x40/0x70 [ 464.323893] ? __switch_to_asm+0x34/0x70 [ 464.324554] ? __switch_to_asm+0x40/0x70 [ 464.325226] ? __switch_to_asm+0x40/0x70 [ 464.325863] ? __switch_to_asm+0x34/0x70 [ 464.326505] ? __switch_to_asm+0x40/0x70 [ 464.327161] ? __switch_to_asm+0x34/0x70 [ 464.327784] ? finish_task_switch+0xa1/0x330 [ 464.328414] ? __switch_to+0x363/0x640 [ 464.329044] ? __schedule+0x575/0xaf0 [ 464.329655] ? _raw_spin_lock_irqsave+0x82/0xe0 [ 464.330301] kthread+0x1a3/0x1f0 [ 464.330884] ? cifs_handle_standard+0x270/0x270 [ 464.331624] ? kthread_create_on_node+0xd0/0xd0 [ 464.332347] ret_from_fork+0x35/0x40 [ 464.333577] Allocated by task 1110: [ 464.334381] save_stack+0x1b/0x80 [ 464.335123] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 464.335848] cifs_smb3_do_mount+0xd4/0xb00 [ 464.336619] legacy_get_tree+0x6b/0xa0 [ 464.337235] vfs_get_tree+0x41/0x110 [ 464.337975] fc_mount+0xa/0x40 [ 464.338557] vfs_kern_mount.part.0+0x6c/0x80 [ 464.339227] cifs_dfs_d_automount+0x336/0xd29 [ 464.339846] follow_managed+0x1b1/0x450 [ 464.340449] lookup_fast+0x231/0x4a0 [ 464.341039] path_openat+0x240/0x1fd0 [ 464.341634] do_filp_open+0x126/0x1c0 [ 464.342277] do_sys_open+0x1eb/0x2c0 [ 464.342957] do_syscall_64+0x5e/0x190 [ 464.343555] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 464.344772] Freed by task 0: [ 464.345347] save_stack+0x1b/0x80 [ 464.345966] __kasan_slab_free+0x12c/0x170 [ 464.346576] kfree+0xa6/0x270 [ 464.347211] rcu_core+0x39c/0xc80 [ 464.347800] __do_softirq+0x10d/0x3da [ 464.348919] The buggy address belongs to the object at ffff888155e58000 which belongs to the cache kmalloc-256 of size 256 [ 464.350222] The buggy address is located 208 bytes inside of 256-byte region [ffff888155e58000, ffff888155e58100) [ 464.351575] The buggy address belongs to the page: [ 464.352333] page:ffffea0005579600 refcount:1 mapcount:0 mapping:ffff88815a803400 index:0x0 compound_mapcount: 0 [ 464.353583] flags: 0x200000000010200(slab|head) [ 464.354209] raw: 0200000000010200 ffffea0005576200 0000000400000004 ffff88815a803400 [ 464.355353] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 464.356458] page dumped because: kasan: bad access detected [ 464.367005] Memory state around the buggy address: [ 464.367787] ffff888155e57f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 464.368877] ffff888155e58000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 464.369967] >ffff888155e58080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 464.371111] ^ [ 464.371775] ffff888155e58100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 464.372893] ffff888155e58180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 464.373983] ================================================================== Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: dump channel info in DebugDataAurelien Aptel
* show server&TCP states for extra channels * mention if an interface has a channel connected to it In this version three of the patch, fixed minor printk format issue pointed out by the kbuild robot. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25smb3: dump in_send and num_waiters stats counters by defaultSteve French
Number of requests in_send and the number of waiters on sendRecv are useful counters in various cases, move them from CONFIG_CIFS_STATS2 to be on by default especially with multichannel Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-11-25cifs: try harder to open new channelsAurelien Aptel
Previously we would only loop over the iface list once. This patch tries to loop over multiple times until all channels are opened. It will also try to reuse RSS ifaces. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Properly process SMB3 lease breaksPavel Shilovsky
Currenly we doesn't assume that a server may break a lease from RWH to RW which causes us setting a wrong lease state on a file and thus mistakenly flushing data and byte-range locks and purging cached data on the client. This leads to performance degradation because subsequent IOs go directly to the server. Fix this by propagating new lease state and epoch values to the oplock break handler through cifsFileInfo structure and removing the use of cifsInodeInfo flags for that. It allows to avoid some races of several lease/oplock breaks using those flags in parallel. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: move cifsFileInfo_put logic into a work-queueRonnie Sahlberg
This patch moves the final part of the cifsFileInfo_put() logic where we need a write lock on lock_sem to be processed in a separate thread that holds no other locks. This is to prevent deadlocks like the one below: > there are 6 processes looping to while trying to down_write > cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from > cifs_new_fileinfo > > and there are 5 other processes which are blocked, several of them > waiting on either PG_writeback or PG_locked (which are both set), all > for the same page of the file > > 2 inode_lock() (inode->i_rwsem) for the file > 1 wait_on_page_writeback() for the page > 1 down_read(inode->i_rwsem) for the inode of the directory > 1 inode_lock()(inode->i_rwsem) for the inode of the directory > 1 __lock_page > > > so processes are blocked waiting on: > page flags PG_locked and PG_writeback for one specific page > inode->i_rwsem for the directory > inode->i_rwsem for the file > cifsInodeInflock_sem > > > > here are the more gory details (let me know if I need to provide > anything more/better): > > [0 00:48:22.765] [UN] PID: 8863 TASK: ffff8c691547c5c0 CPU: 3 > COMMAND: "reopen_file" > #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095 > #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df > #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7 > #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d > #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d > #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33 > #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6 > #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315 > * (I think legitimize_path is bogus) > > in path_openat > } else { > const char *s = path_init(nd, flags); > while (!(error = link_path_walk(s, nd)) && > (error = do_last(nd, file, op)) > 0) { <<<< > > do_last: > if (open_flag & O_CREAT) > inode_lock(dir->d_inode); <<<< > else > so it's trying to take inode->i_rwsem for the directory > > DENTRY INODE SUPERBLK TYPE PATH > ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR /mnt/vm1_smb/ > inode.i_rwsem is ffff8c691158efc0 > > <struct rw_semaphore 0xffff8c691158efc0>: > owner: <struct task_struct 0xffff8c6914275d00> (UN - 8856 - > reopen_file), counter: 0x0000000000000003 > waitlist: 2 > 0xffff9965007e3c90 8863 reopen_file UN 0 1:29:22.926 > RWSEM_WAITING_FOR_WRITE > 0xffff996500393e00 9802 ls UN 0 1:17:26.700 > RWSEM_WAITING_FOR_READ > > > the owner of the inode.i_rwsem of the directory is: > > [0 00:00:00.109] [UN] PID: 8856 TASK: ffff8c6914275d00 CPU: 3 > COMMAND: "reopen_file" > #0 [ffff99650065b828] __schedule at ffffffff9b6e6095 > #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df > #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89 > #3 [ffff99650065b940] msleep at ffffffff9af573a9 > #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs] > #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs] > #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs] > #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd > #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs] > #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs] > #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs] > #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7 > #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33 > #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6 > #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315 > > cifs_launder_page is for page 0xffffd1e2c07d2480 > > crash> page.index,mapping,flags 0xffffd1e2c07d2480 > index = 0x8 > mapping = 0xffff8c68f3cd0db0 > flags = 0xfffffc0008095 > > PAGE-FLAG BIT VALUE > PG_locked 0 0000001 > PG_uptodate 2 0000004 > PG_lru 4 0000010 > PG_waiters 7 0000080 > PG_writeback 15 0008000 > > > inode is ffff8c68f3cd0c40 > inode.i_rwsem is ffff8c68f3cd0ce0 > DENTRY INODE SUPERBLK TYPE PATH > ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG > /mnt/vm1_smb/testfile.8853 > > > this process holds the inode->i_rwsem for the parent directory, is > laundering a page attached to the inode of the file it's opening, and in > _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem > for the file itself. > > > <struct rw_semaphore 0xffff8c68f3cd0ce0>: > owner: <struct task_struct 0xffff8c6914272e80> (UN - 8854 - > reopen_file), counter: 0x0000000000000003 > waitlist: 1 > 0xffff9965005dfd80 8855 reopen_file UN 0 1:29:22.912 > RWSEM_WAITING_FOR_WRITE > > this is the inode.i_rwsem for the file > > the owner: > > [0 00:48:22.739] [UN] PID: 8854 TASK: ffff8c6914272e80 CPU: 2 > COMMAND: "reopen_file" > #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095 > #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df > #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2 > #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f > #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf > #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c > #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs] > #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4 > #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a > #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs] > #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd > #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35 > #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9 > #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315 > > the process holds the inode->i_rwsem for the file to which it's writing, > and is trying to __lock_page for the same page as in the other processes > > > the other tasks: > [0 00:00:00.028] [UN] PID: 8859 TASK: ffff8c6915479740 CPU: 2 > COMMAND: "reopen_file" > #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095 > #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df > #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89 > #3 [ffff9965007b3af0] msleep at ffffffff9af573a9 > #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs] > #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs] > #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a > #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f > #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33 > #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6 > #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315 > > this is opening the file, and is trying to down_write cinode->lock_sem > > > [0 00:00:00.041] [UN] PID: 8860 TASK: ffff8c691547ae80 CPU: 2 > COMMAND: "reopen_file" > [0 00:00:00.057] [UN] PID: 8861 TASK: ffff8c6915478000 CPU: 3 > COMMAND: "reopen_file" > [0 00:00:00.059] [UN] PID: 8858 TASK: ffff8c6914271740 CPU: 2 > COMMAND: "reopen_file" > [0 00:00:00.109] [UN] PID: 8862 TASK: ffff8c691547dd00 CPU: 6 > COMMAND: "reopen_file" > #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095 > #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df > #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89 > #3 [ffff9965007c3d90] msleep at ffffffff9af573a9 > #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs] > #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs] > #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e > #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614 > #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f > #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c > > closing the file, and trying to down_write cifsi->lock_sem > > > [0 00:48:22.839] [UN] PID: 8857 TASK: ffff8c6914270000 CPU: 7 > COMMAND: "reopen_file" > #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095 > #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df > #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2 > #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6 > #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028 > #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165 > #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs] > #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1 > #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e > #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315 > > in __filemap_fdatawait_range > wait_on_page_writeback(page); > for the same page of the file > > > > [0 00:48:22.718] [UN] PID: 8855 TASK: ffff8c69142745c0 CPU: 7 > COMMAND: "reopen_file" > #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095 > #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df > #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7 > #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs] > #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd > #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35 > #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9 > #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315 > > inode_lock(inode); > > > and one 'ls' later on, to see whether the rest of the mount is available > (the test file is in the root, so we get blocked up on the directory > ->i_rwsem), so the entire mount is unavailable > > [0 00:36:26.473] [UN] PID: 9802 TASK: ffff8c691436ae80 CPU: 4 > COMMAND: "ls" > #0 [ffff996500393d28] __schedule at ffffffff9b6e6095 > #1 [ffff996500393db8] schedule at ffffffff9b6e64df > #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421 > #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2 > #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56 > #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c > #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6 > #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315 > > in iterate_dir: > if (shared) > res = down_read_killable(&inode->i_rwsem); <<<< > else > res = down_write_killable(&inode->i_rwsem); > Reported-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: try opening channels after mountingAurelien Aptel
After doing mount() successfully we call cifs_try_adding_channels() which will open as many channels as it can. Channels are closed when the master session is closed. The master connection becomes the first channel. ,-------------> global cifs_tcp_ses_list <-------------------------. | | '- TCP_Server_Info <--> TCP_Server_Info <--> TCP_Server_Info <-' (master con) (chan#1 con) (chan#2 con) | ^ ^ ^ v '--------------------|--------------------' cifs_ses | - chan_count = 3 | - chans[] ---------------------' - smb3signingkey[] (master signing key) Note how channel connections don't have sessions. That's because cifs_ses can only be part of one linked list (list_head are internal to the elements). For signing keys, each channel has its own signing key which must be used only after the channel has been bound. While it's binding it must use the master session signing key. For encryption keys, since channel connections do not have sessions attached we must now find matching session by looping over all sessions in smb2_get_enc_key(). Each channel is opened like a regular server connection but at the session setup request step it must set the SMB2_SESSION_REQ_FLAG_BINDING flag and use the session id to bind to. Finally, while sending in compound_send_recv() for requests that aren't negprot, ses-setup or binding related, use a channel by cycling through the available ones (round-robin). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: refactor cifs_get_inode_info()Aurelien Aptel
Make logic of cifs_get_inode() much clearer by moving code to sub functions and adding comments. Document the steps this function does. cifs_get_inode_info() gets and updates a file inode metadata from its file path. * If caller already has raw info data from server they can pass it. * If inode already exists (just need to update) caller can pass it. Step 1: get raw data from server if none was passed Step 2: parse raw data into intermediate internal cifs_fattr struct Step 3: set fattr uniqueid which is later used for inode number. This can sometime be done from raw data Step 4: tweak fattr according to mount options (file_mode, acl to mode bits, uid, gid, etc) Step 5: update or create inode from final fattr struct * add is_smb1_server() helper * add is_inode_cache_good() helper * move SMB1-backupcreds-getinfo-retry to separate func cifs_backup_query_path_info(). * move set-uniqueid code to separate func cifs_set_fattr_ino() * don't clobber uniqueid from backup cred retry * fix some probable corner cases memleaks Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: switch servers depending on binding stateAurelien Aptel
Currently a lot of the code to initialize a connection & session uses the cifs_ses as input. But depending on if we are opening a new session or a new channel we need to use different server pointers. Add a "binding" flag in cifs_ses and a helper function that returns the server ptr a session should use (only in the sess establishment code path). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: add server paramAurelien Aptel
As we get down to the transport layer, plenty of functions are passed the session pointer and assume the transport to use is ses->server. Instead we modify those functions to pass (ses, server) so that we can decouple the session from the server. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: add multichannel mount options and data structsAurelien Aptel
adds: - [no]multichannel to enable/disable multichannel - max_channels=N to control how many channels to create these options are then stored in the volume struct. - store channels and max_channels in cifs_ses Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: sort interface list by speedAurelien Aptel
New channels are going to be opened by walking the list sequentially, so by sorting it we will connect to the fastest interfaces first. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Fix SMB2 oplock break processingPavel Shilovsky
Even when mounting modern protocol version the server may be configured without supporting SMB2.1 leases and the client uses SMB2 oplock to optimize IO performance through local caching. However there is a problem in oplock break handling that leads to missing a break notification on the client who has a file opened. It latter causes big latencies to other clients that are trying to open the same file. The problem reproduces when there are multiple shares from the same server mounted on the client. The processing code tries to match persistent and volatile file ids from the break notification with an open file but it skips all share besides the first one. Fix this by looking up in all shares belonging to the server that issued the oplock break. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: don't use 'pre:' for MODULE_SOFTDEPRonnie Sahlberg
It can cause to fail with modprobe: FATAL: Module <module> is builtin. RHBZ: 1767094 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: smbd: Return -EAGAIN when transport is reconnectingLong Li
During reconnecting, the transport may have already been destroyed and is in the process being reconnected. In this case, return -EAGAIN to not fail and to retry this I/O. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: smbd: Only queue work for error recovery on memory registrationLong Li
It's not necessary to queue invalidated memory registration to work queue, as all we need to do is to unmap the SG and make it usable again. This can save CPU cycles in normal data paths as memory registration errors are rare and normally only happens during reconnection. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25smb3: add debug messages for closing unmatched openRonnie Sahlberg
Helps distinguish between an interrupted close and a truly unmatched open. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Do not miss cancelled OPEN responsesPavel Shilovsky
When an OPEN command is cancelled we mark a mid as cancelled and let the demultiplex thread process it by closing an open handle. The problem is there is a race between a system call thread and the demultiplex thread and there may be a situation when the mid has been already processed before it is set as cancelled. Fix this by processing cancelled requests when mids are being destroyed which means that there is only one thread referencing a particular mid. Also set mids as cancelled unconditionally on their state. Cc: Stable <stable@vger.kernel.org> Tested-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Fix NULL pointer dereference in mid callbackPavel Shilovsky
There is a race between a system call processing thread and the demultiplex thread when mid->resp_buf becomes NULL and later is being accessed to get credits. It happens when the 1st thread wakes up before a mid callback is called in the 2nd one but the mid state has already been set to MID_RESPONSE_RECEIVED. This causes NULL pointer dereference in mid callback. Fix this by saving credits from the response before we update the mid state and then use this value in the mid callback rather then accessing a response buffer. Cc: Stable <stable@vger.kernel.org> Fixes: ee258d79159afed5 ("CIFS: Move credit processing to mid callbacks for SMB3") Tested-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Close open handle after interrupted closePavel Shilovsky
If Close command is interrupted before sending a request to the server the client ends up leaking an open file handle. This wastes server resources and can potentially block applications that try to remove the file or any directory containing this file. Fix this by putting the close command into a worker queue, so another thread retries it later. Cc: Stable <stable@vger.kernel.org> Tested-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Respect O_SYNC and O_DIRECT flags during reconnectPavel Shilovsky
Currently the client translates O_SYNC and O_DIRECT flags into corresponding SMB create options when openning a file. The problem is that on reconnect when the file is being re-opened the client doesn't set those flags and it causes a server to reject re-open requests because create options don't match. The latter means that any subsequent system call against that open file fail until a share is re-mounted. Fix this by properly setting SMB create options when re-openning files after reconnects. Fixes: 1013e760d10e6: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags") Cc: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25smb3: remove confusing dmesg when mounting with encryption ("seal")Steve French
The smb2/smb3 message checking code was logging to dmesg when mounting with encryption ("seal") for compounded SMB3 requests. When encrypted the whole frame (including potentially multiple compounds) is read so the length field is longer than in the case of non-encrypted case (where length field will match the the calculated length for the particular SMB3 request in the compound being validated). Avoids the warning on mount (with "seal"): "srv rsp padded more than expected. Length 384 not ..." Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: close the shared root handle on tree disconnectRonnie Sahlberg
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Return directly after a failed build_path_from_dentry() in ↵Markus Elfring
cifs_do_create() Return directly after a call of the function "build_path_from_dentry" failed at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Use common error handling code in smb2_ioctl_query_info()Markus Elfring
Move the same error code assignments so that such exception handling can be better reused at the end of this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: Use memdup_user() rather than duplicating its implementationMarkus Elfring
Reuse existing functionality from memdup_user() instead of keeping duplicate source code. Generated by: scripts/coccinelle/api/memdup_user.cocci Fixes: f5b05d622a3e99e6a97a189fe500414be802a05c ("cifs: add IOCTL for QUERY_INFO passthrough to userspace") Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: smbd: Return -ECONNABORTED when trasnport is not in connected stateLong Li
The transport should return this error so the upper layer will reconnect. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: smbd: Add messages on RDMA session destroy and reconnectionLong Li
Log these activities to help production support. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: smbd: Return -EINVAL when the number of iovs exceeds SMBDIRECT_MAX_SGELong Li
While it's not friendly to fail user processes that issue more iovs than we support, at least we should return the correct error code so the user process gets a chance to retry with smaller number of iovs. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: smbd: Invalidate and deregister memory registration on re-send for ↵Long Li
direct I/O On re-send, there might be a reconnect and all prevoius memory registrations need to be invalidated and deregistered. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: Don't display RDMA transport on reconnectLong Li
On reconnect, the transport data structure is NULL and its information is not available. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25CIFS: remove set but not used variables 'cinode' and 'netfid'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: fs/cifs/file.c: In function 'cifs_flock': fs/cifs/file.c:1704:8: warning: variable 'netfid' set but not used [-Wunused-but-set-variable] fs/cifs/file.c:1702:24: warning: variable 'cinode' set but not used [-Wunused-but-set-variable] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: add support for flockSteve French
The flock system call locks the whole file rather than a byte range and so is currently emulated by various other file systems by simply sending a byte range lock for the whole file. Add flock handling for cifs.ko in similar way. xfstest generic/504 passes with this as well Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-11-25cifs: remove unused variable 'sid_user'YueHaibing
fs/cifs/cifsacl.c:43:30: warning: sid_user defined but not used [-Wunused-const-variable=] It is never used, so remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-25cifs: rename a variable in SendReceive()Dan Carpenter
Smatch gets confused because we sometimes refer to "server->srv_mutex" and sometimes to "sess->server->srv_mutex". They refer to the same lock so let's just make this consistent. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-24Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull cramfs fix from Al Viro: "Regression fix, fallen through the cracks" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cramfs: fix usage on non-MTD device
2019-11-23cramfs: fix usage on non-MTD deviceMaxime Bizon
When both CONFIG_CRAMFS_MTD and CONFIG_CRAMFS_BLOCKDEV are enabled, if we fail to mount on MTD, we don't try on block device. Note: this relies upon cramfs_mtd_fill_super() leaving no side effects on fc state in case of failure; in general, failing get_tree_...() does *not* mean "fine to try again"; e.g. parsed options might've been consumed by fill_super callback and freed on failure. Fixes: 74f78fc5ef43 ("vfs: Convert cramfs to use the new mount API") Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-11-22afs: Fix large file supportMarc Dionne
By default s_maxbytes is set to MAX_NON_LFS, which limits the usable file size to 2GB, enforced by the vfs. Commit b9b1f8d5930a ("AFS: write support fixes") added support for the 64-bit fetch and store server operations, but did not change this value. As a result, attempts to write past the 2G mark result in EFBIG errors: $ dd if=/dev/zero of=foo bs=1M count=1 seek=2048 dd: error writing 'foo': File too large Set s_maxbytes to MAX_LFS_FILESIZE. Fixes: b9b1f8d5930a ("AFS: write support fixes") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-22afs: Fix possible assert with callbacks from yfs serversMarc Dionne
Servers sending callback breaks to the YFS_CM_SERVICE service may send up to YFSCBMAX (1024) fids in a single RPC. Anything over AFSCBMAX (50) will cause the assert in afs_break_callbacks to trigger. Remove the assert, as the count has already been checked against the appropriate max values in afs_deliver_cb_callback and afs_deliver_yfs_cb_callback. Fixes: 35dbfba3111a ("afs: Implement the YFS cache manager service") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-22Revert "fs: ocfs2: fix possible null-pointer dereferences in ↵Joseph Qi
ocfs2_xa_prepare_entry()" This reverts commit 56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d. Commit 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()") introduces a regression that fail to create directory with mount option user_xattr and acl. Actually the reported NULL pointer dereference case can be correctly handled by loc->xl_ops->xlo_add_entry(), so revert it. Link: http://lkml.kernel.org/r/1573624916-83825-1-git-send-email-joseph.qi@linux.alibaba.com Fixes: 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Thomas Voegtle <tv@lio96.de> Acked-by: Changwei Ge <gechangwei@live.cn> Cc: Jia-Ju Bai <baijiaju1990@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-19afs: Fix missing timeout resetDavid Howells
In afs_wait_for_call_to_complete(), rather than immediately aborting an operation if a signal occurs, the code attempts to wait for it to complete, using a schedule timeout of 2*RTT (or min 2 jiffies) and a check that we're still receiving relevant packets from the server before we consider aborting the call. We may even ping the server to check on the status of the call. However, there's a missing timeout reset in the event that we do actually get a packet to process, such that if we then get a couple of short stalls, we then time out when progress is actually being made. Fix this by resetting the timeout any time we get something to process. If it's the failure of the call then the call state will get changed and we'll exit the loop shortly thereafter. A symptom of this is data fetches and stores failing with EINTR when they really shouldn't. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-15Merge tag 'for-linus-20191115' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes that should make it into this release. This contains: - io_uring: - The timeout command assumes sequence == 0 means that we want one completion, but this kind of overloading is unfortunate as it prevents users from doing a pure time based wait. Since this operation was introduced in this cycle, let's correct it now, while we can. (me) - One-liner to fix an issue with dependent links and fixed buffer reads. The actual IO completed fine, but the link got severed since we stored the wrong expected value. (me) - Add TIMEOUT to list of opcodes that don't need a file. (Pavel) - rsxx missing workqueue destry calls. Old bug. (Chuhong) - Fix blk-iocost active list check (Jiufei) - Fix impossible-to-hit overflow merge condition, that still hit some folks very rarely (Junichi) - Fix bfq hang issue from 5.3. This didn't get marked for stable, but will go into stable post this merge (Paolo)" * tag 'for-linus-20191115' of git://git.kernel.dk/linux-block: rsxx: add missed destroy_workqueue calls in remove iocost: check active_list of all the ancestors in iocg_activate() block, bfq: deschedule empty bfq_queues not referred by any process io_uring: ensure registered buffer import returns the IO length io_uring: Fix getting file for timeout block: check bi_size overflow before merge io_uring: make timeout sequence == 0 mean no sequence
2019-11-15Merge tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two fixes for the buffered reads and O_DIRECT writes serialization patch that went into -rc1 and a fixup for a bogus warning on older gcc versions" * tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-client: rbd: silence bogus uninitialized warning in rbd_object_map_update_finish() ceph: increment/decrement dio counter on async requests ceph: take the inode lock before acquiring cap refs
2019-11-15afs: Fix race in commit bulk status fetchDavid Howells
When a lookup is done, the afs filesystem will perform a bulk status-fetch operation on the requested vnode (file) plus the next 49 other vnodes from the directory list (in AFS, directory contents are downloaded as blobs and parsed locally). When the results are received, it will speculatively populate the inode cache from the extra data. However, if the lookup races with another lookup on the same directory, but for a different file - one that's in the 49 extra fetches, then if the bulk status-fetch operation finishes first, it will try and update the inode from the other lookup. If this other inode is still in the throes of being created, however, this will cause an assertion failure in afs_apply_status(): BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags)); on or about fs/afs/inode.c:175 because it expects data to be there already that it can compare to. Fix this by skipping the update if the inode is being created as the creator will presumably set up the inode with the same information. Fixes: 39db9815da48 ("afs: Fix application of the results of a inline bulk status fetch") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-15Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc vfs fixes from Al Viro: "Assorted fixes all over the place; some of that is -stable fodder, some regressions from the last window" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable ecryptfs: fix unlink and rmdir in face of underlying fs modifications audit_get_nd(): don't unlock parent too early exportfs_decode_fh(): negative pinned may become positive without the parent locked cgroup: don't put ERR_PTR() into fc->root autofs: fix a leak in autofs_expire_indirect() aio: Fix io_pgetevents() struct __compat_aio_sigset layout fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()
2019-11-14ceph: increment/decrement dio counter on async requestsJeff Layton
Ceph can in some cases issue an async DIO request, in which case we can end up calling ceph_end_io_direct before the I/O is actually complete. That may allow buffered operations to proceed while DIO requests are still in flight. Fix this by incrementing the i_dio_count when issuing an async DIO request, and decrement it when tearing down the aio_req. Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-11-14ceph: take the inode lock before acquiring cap refsJeff Layton
Most of the time, we (or the vfs layer) takes the inode_lock and then acquires caps, but ceph_read_iter does the opposite, and that can lead to a deadlock. When there are multiple clients treading over the same data, we can end up in a situation where a reader takes caps and then tries to acquire the inode_lock. Another task holds the inode_lock and issues a request to the MDS which needs to revoke the caps, but that can't happen until the inode_lock is unwedged. Fix this by having ceph_read_iter take the inode_lock earlier, before attempting to acquire caps. Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes") Link: https://tracker.ceph.com/issues/36348 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>