summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-10-22xfs: create slab caches for frequently-used deferred itemsDarrick J. Wong
Create slab caches for the high-level structures that coordinate deferred intent items, since they're used fairly heavily. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22xfs: compact deferred intent item structuresDarrick J. Wong
Rearrange these structs to reduce the amount of unused padding bytes. This saves eight bytes for each of the three structs changed here, which means they're now all (rmap/bmap are 64 bytes, refc is 32 bytes) even powers of two. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22xfs: rename _zone variables to _cacheDarrick J. Wong
Now that we've gotten rid of the kmem_zone_t typedef, rename the variables to _cache since that's what they are. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22xfs: remove kmem_zone typedefDarrick J. Wong
Remove these typedefs by referencing kmem_cache directly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-22Merge tag 'fuse-fixes-5.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Syzbot discovered a race in case of reusing the fuse sb (introduced in this cycle). Fix it by doing the s_fs_info initialization at the proper place" * tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: clean up error exits in fuse_fill_super() fuse: always initialize sb->s_fs_info fuse: clean up fuse_mount destruction fuse: get rid of fuse_put_super() fuse: check s_root when destroying sb
2021-10-22fuse: decrement nlink on overwriting renameMiklos Szeredi
Rename didn't decrement/clear nlink on overwritten target inode. Create a common helper fuse_entry_unlinked() that handles this for unlink, rmdir and rename. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22fuse: simplify __fuse_write_file_get()Miklos Szeredi
Use list_first_entry_or_null() instead of list_empty() + list_entry(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22fuse: move fuse_invalidate_attr() into fuse_update_ctime()Miklos Szeredi
Logically it belongs there since attributes are invalidated due to the updated ctime. This is a cleanup and should not change behavior. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22fuse: delete redundant codePeng Hao
'ia->io=io' has been set in fuse_io_alloc. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22fuse: use kmap_local_page()Peng Hao
Due to the introduction of kmap_local_*, the storage of slots used for short-term mapping has changed from per-CPU to per-thread. kmap_atomic() disable preemption, while kmap_local_*() only disable migration. There is no need to disable preemption in several kamp_atomic places used in fuse. Link: https://lwn.net/Articles/836144/ Signed-off-by: Peng Hao <flyingpeng@tencent.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22fuse: annotate lock in fuse_reverse_inval_entry()Miklos Szeredi
Add missing inode lock annotatation; found by syzbot. Reported-and-tested-by: syzbot+9f747458f5990eaa8d43@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22fuse: write inode in fuse_vma_close() instead of fuse_release()Miklos Szeredi
Fuse ->release() is otherwise asynchronous for the reason that it can happen in contexts unrelated to close/munmap. Inode is already written back from fuse_flush(). Add it to fuse_vma_close() as well to make sure inode dirtying from mmaps also get written out before the file is released. Also add error handling. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22fuse: make sure reclaim doesn't write the inodeMiklos Szeredi
In writeback cache mode mtime/ctime updates are cached, and flushed to the server using the ->write_inode() callback. Closing the file will result in a dirty inode being immediately written, but in other cases the inode can remain dirty after all references are dropped. This result in the inode being written back from reclaim, which can deadlock on a regular allocation while the request is being served. The usual mechanisms (GFP_NOFS/PF_MEMALLOC*) don't work for FUSE, because serving a request involves unrelated userspace process(es). Instead do the same as for dirty pages: make sure the inode is written before the last reference is gone. - fallocate(2)/copy_file_range(2): these call file_update_time() or file_modified(), so flush the inode before returning from the call - unlink(2), link(2) and rename(2): these call fuse_update_ctime(), so flush the ctime directly from this helper Reported-by: chenguanyou <chenguanyou@xiaomi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-22block: simplify the block device syncing codeChristoph Hellwig
Get rid of the indirections and just provide a sync_bdevs helper for the generic sync code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22ntfs3: use sync_blockdev_nowaitChristoph Hellwig
Use sync_blockdev_nowait instead of opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211019062530.2174626-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22fat: use sync_blockdev_nowaitChristoph Hellwig
Use sync_blockdev_nowait instead of opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211019062530.2174626-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22btrfs: use sync_blockdevChristoph Hellwig
Use sync_blockdev instead of opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: David Sterba <dsterba@suse.com> Link: https://lore.kernel.org/r/20211019062530.2174626-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: remove __sync_blockdevChristoph Hellwig
Instead offer a new sync_blockdev_nowait helper for the !wait case. This new helper is exported as it will grow modular callers in a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22fs: remove __sync_filesystemChristoph Hellwig
There is no clear benefit in having this helper vs just open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211019062530.2174626-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commandsChristoph Hellwig
Call the ->get_unique_id method to query the SCSI identifiers. This can use the cached VPD page in the sd driver instead of sending a command on every LAYOUTGET. It will also allow to support NVMe based volumes if the draft for that ever takes off. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21NFS: Remove redundant call to __set_page_dirty_nobuffersTrond Myklebust
Remove a redundant call in nfs_updatepage(). nfs_writepage_setup() will have already called nfs_mark_request_dirty() on success. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-21io_uring: apply worker limits to previous usersPavel Begunkov
Another change to the API io-wq worker limitation API added in 5.15, apply the limit to all prior users that already registered a tctx. It may be confusing as it's now, in particular the change covers the following 2 cases: TASK1 | TASK2 _________________________________________________ ring = create() | | limit_iowq_workers() *not limited* | TASK1 | TASK2 _________________________________________________ ring = create() | | issue_requests() limit_iowq_workers() | | *not limited* A note on locking, it's safe to traverse ->tctx_list as we hold ->uring_lock, but do that after dropping sqd->lock to avoid possible problems. It's also safe to access tctx->io_wq there because tasks kill it only after removing themselves from tctx_list, see io_uring_cancel_generic() -> io_uring_clean_tctx() Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d6e09ecc3545e4dc56e43c906ee3d71b7ae21bed.1634818641.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21fuse: clean up error exits in fuse_fill_super()Miklos Szeredi
Instead of "goto err", return error directly, since there's no error cleanup to do now. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: always initialize sb->s_fs_infoMiklos Szeredi
Syzkaller reports a null pointer dereference in fuse_test_super() that is caused by sb->s_fs_info being NULL. This is due to the fact that fuse_fill_super() is initializing s_fs_info, which is too late, it's already on the fs_supers list. The initialization needs to be done in sget_fc() with the sb_lock held. Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into fuse_get_tree(). After this ->kill_sb() will always be called with non-NULL ->s_fs_info, hence fuse_mount_destroy() can drop the test for non-NULL "fm". Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com Fixes: 5d5b74aa9c76 ("fuse: allow sharing existing sb") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: clean up fuse_mount destructionMiklos Szeredi
1. call fuse_mount_destroy() for open coded variants 2. before deactivate_locked_super() don't need fuse_mount destruction since that will now be done (if ->s_fs_info is not cleared) 3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the regular pattern can be used Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: get rid of fuse_put_super()Miklos Szeredi
The ->put_super callback is called from generic_shutdown_super() in case of a fully initialized sb. This is called from kill_***_super(), which is called from ->kill_sb instances. Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the reference to the fuse_conn, while it does the same on each error case during sb setup. This patch moves the destruction from fuse_put_super() to fuse_mount_destroy(), called at the end of all ->kill_sb instances. A follup patch will clean up the error paths. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21fuse: check s_root when destroying sbMiklos Szeredi
Checking "fm" works because currently sb->s_fs_info is cleared on error paths; however, sb->s_root is what generic_shutdown_super() checks to determine whether the sb was fully initialized or not. This change will allow cleanup of sb setup error paths. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-20autofs: fix wait name hash calculation in autofs_wait()Ian Kent
There's a mistake in commit 2be7828c9fefc ("get rid of autofs_getpath()") that affects kernels from v5.13.0, basically missed because of me not fully testing the change for Al. The problem is that the hash calculation for the wait name qstr hasn't been updated to account for the change to use dentry_path_raw(). This prevents the correct matching an existing wait resulting in multiple notifications being sent to the daemon for the same mount which must not occur. The problem wasn't discovered earlier because it only occurs when multiple processes trigger a request for the same mount concurrently so it only shows up in more aggressive testing. Fixes: 2be7828c9fefc ("get rid of autofs_getpath()") Cc: stable@vger.kernel.org Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-10-20aio: Prefer struct_size over open coded arithmeticLen Baker
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the struct_size() helper to do the arithmetic instead of the argument "size + size * count" in the kzalloc() function. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-10-20writeback: prefer struct_size over open coded arithmeticLen Baker
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. In this case these are not actually dynamic sizes: all the operands involved in the calculation are constant values. However it is better to refactor them anyway, just to keep the open-coded math idiom out of code. So, use the struct_size() helper to do the arithmetic instead of the argument "size + count * size" in the kzalloc() functions. This code was detected with the help of Coccinelle and audited and fixed manually. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-10-20xfs: Use kvcalloc() instead of kvzalloc()Gustavo A. R. Silva
Use 2-factor argument multiplication form kvcalloc() instead of kvzalloc(). Link: https://github.com/KSPP/linux/issues/162 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-10-20NFS: Unexport nfs_probe_fsinfo()Anna Schumaker
All the callers are now in client.c so we can remove the EXPORT_SYMBOL_GPL() and make it static. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Call nfs_probe_server() during a fscontext-reconfigure eventAnna Schumaker
This lets us update the server's attributes when the user does a "mount -o remount" on the filesystem. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Replace calls to nfs_probe_fsinfo() with nfs_probe_server()Anna Schumaker
Clean up. There are a few places where we want to probe the server, but don't actually care about the fsinfo result. Change these to use nfs_probe_server(), which handles the fattr allocation for us. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Move nfs_probe_destination() into the generic clientAnna Schumaker
And rename it to nfs_probe_server(). I also change it to take the nfs_fh as an argument so callers can choose what filehandle to probe. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Create an nfs4_server_set_init_caps() functionAnna Schumaker
And call it before doing an FSINFO probe to reset to the baseline capabilities before probing. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Remove --> and <-- dprintk call sitesChuck Lever
dprintk call sites that display no other information than the function name can be replaced with use of the trace "function" or "function_graph" plug-ins. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20SUNRPC: Trace calls to .rpc_call_doneChuck Lever
Introduce a single tracepoint that can replace simple dprintk call sites in upper layer "rpc_call_done" callbacks. Example: kworker/u24:2-1254 [001] 771.026677: rpc_stats_latency: task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555 kworker/u24:2-1254 [001] 771.026677: rpc_task_call_done: task:00000001@00000002 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpcb_getport_done kworker/u24:2-1254 [001] 771.026678: rpcb_setport: task:00000001@00000002 status=0 port=20048 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Replace dprintk callsites in nfs_readpage(s)Chuck Lever
These new events report slightly different information for readpage and readpages/readahead. For readpage: fsx-1387 [006] 380.761896: nfs_aop_readpage: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 fsx-1387 [006] 380.761900: nfs_aop_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 ret=0 The index of a synchronous single-page read is reported. For readpages: fsx-1387 [006] 380.760847: nfs_aop_readahead: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 fsx-1387 [006] 380.760853: nfs_aop_readahead_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 ret=0 The count of pages requested is reported. nfs_readpages does not wait for the READ requests to complete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size fieldChuck Lever
For certain special cases, RPC-related tracepoints record a -1 as the task ID or the client ID. It's ugly for a trace event to display 4 billion in these cases. To help keep SUNRPC tracepoints consistent, create a macro that defines the print format specifiers for tk_pid and cl_clid. At some point in the future we might try tk_pid with a wider range of values than 0..64K so this makes it easier to make that change. RPC tracepoints now look like this: <...>-1276 [009] 149.720358: rpc_clnt_new: client=00000005 peer=[192.168.2.55]:20049 program=nfs server=klimt.ib <...>-1342 [004] 149.921234: rpc_xdr_recvfrom: task:0000001a@00000005 head=[0xff1242d9ab6dc01c,144] page=0 tail=[(nil),0] len=144 <...>-1342 [004] 149.921235: xprt_release_cong: task:0000001a@00000005 snd_task:ffffffff cong=256 cwnd=16384 <...>-1342 [004] 149.921235: xprt_put_cong: task:0000001a@00000005 snd_task:ffffffff cong=0 cwnd=16384 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20Fix user namespace leakAlexey Gladkov
Fixes: 61ca2c4afd9d ("NFS: Only reference user namespace from nfs4idmap struct instead of cred") Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Save some space in the inodeTrond Myklebust
Save some space in the nfs_inode by setting up an anonymous union with the fields that are peculiar to a specific type of filesystem object. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFSv4: Fixes for nfs4_inode_return_delegation()Trond Myklebust
We mustn't call nfs_wb_all() on anything other than a regular file. Furthermore, we can exit early when we don't hold a delegation. Reported-by: David Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Fix an Oops in pnfs_mark_request_commit()Trond Myklebust
Olga reports seeing the following Oops when doing O_DIRECT writes to a pNFS flexfiles server: Oops: 0000 [#1] SMP PTI CPU: 1 PID: 234186 Comm: kworker/u8:1 Not tainted 5.15.0-rc4+ #4 Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Workqueue: nfsiod rpc_async_release [sunrpc] RIP: 0010:nfs_mark_request_commit+0x12/0x30 [nfs] Code: ff ff be 03 00 00 00 e8 ac 34 83 eb e9 29 ff ff ff e8 22 bc d7 eb 66 90 0f 1f 44 00 00 48 85 f6 74 16 48 8b 42 10 48 8b 40 18 <48> 8b 40 18 48 85 c0 74 05 e9 70 fc 15 ec 48 89 d6 e9 68 ed ff ff RSP: 0018:ffffa82f0159fe00 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8f3393141880 RCX: 0000000000000000 RDX: ffffa82f0159fe08 RSI: ffff8f3381252500 RDI: ffff8f3393141880 RBP: ffff8f33ac317c00 R08: 0000000000000000 R09: ffff8f3487724cb0 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000001 R13: ffff8f3485bccee0 R14: ffff8f33ac317c10 R15: ffff8f33ac317cd8 FS: 0000000000000000(0000) GS:ffff8f34fbc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000122120006 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: nfs_direct_write_completion+0x13b/0x250 [nfs] rpc_free_task+0x39/0x60 [sunrpc] rpc_async_release+0x29/0x40 [sunrpc] process_one_work+0x1ce/0x370 worker_thread+0x30/0x380 ? process_one_work+0x370/0x370 kthread+0x11a/0x140 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 9c455a8c1e14 ("NFS/pNFS: Clean up pNFS commit operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20NFS: Fix up commit deadlocksTrond Myklebust
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end(). Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two important filesystem fixes, marked for stable. The blocklisted superblocks issue was particularly annoying because for unexperienced users it essentially exacted a reboot to establish a new functional mount in that scenario" * tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client: ceph: fix handling of "meta" errors ceph: skip existing superblocks that are blocklisted or shut down when mounting
2021-10-20gfs2: Eliminate ip->i_ghAndreas Gruenbacher
Now that gfs2_file_buffered_write is the only remaining user of ip->i_gh, we can move the glock holder to the stack (or rather, use the one we already have on the stack); there is no need for keeping the holder in the inode anymore. This is slightly complicated by the fact that we're using ip->i_gh for the statfs inode in gfs2_file_buffered_write as well. Writing to the statfs inode isn't very common, so allocate the statfs holder dynamically when needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20gfs2: Move the inode glock locking to gfs2_file_buffered_writeAndreas Gruenbacher
So far, for buffered writes, we were taking the inode glock in gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of not holding the inode glock while iomap_write_actor faults in user pages. It turns out that iomap_write_actor is called inside iomap_begin ... iomap_end, so the user pages were still faulted in while holding the inode glock and the locking code in iomap_begin / iomap_end was completely pointless. Move the locking into gfs2_file_buffered_write instead. We'll take care of the potential deadlocks due to faulting in user pages while holding a glock in a subsequent patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20gfs2: Introduce flag for glock holder auto-demotionBob Peterson
This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that will allow glocks to be demoted automatically on locking conflicts. When a locking request comes in that isn't compatible with the locking state of an active holder and that holder has the HIF_MAY_DEMOTE flag set, the holder will be demoted before the incoming locking request is granted. Note that this mechanism demotes active holders (with the HIF_HOLDER flag set), while before we were only demoting glocks without any active holders. This allows processes to keep hold of locks that may form a cyclic locking dependency; the core glock logic will then break those dependencies in case a conflicting locking request occurs. We'll use this to avoid giving up the inode glock proactively before faulting in pages. Processes that allow a glock holder to be taken away indicate this by calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag. Later, they call gfs2_holder_disallow_demote() to clear the flag again, and then they check if their holder is still queued: if it is, they are still holding the glock; if it isn't, they can re-acquire the glock (or abort). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20gfs2: Clean up function may_grantAndreas Gruenbacher
Pass the first current glock holder into function may_grant and deobfuscate the logic there. While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant. To make that build cleanly, de-constify the may_grant arguments. We're now using function find_first_holder in do_promote, so move the function's definition above do_promote. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>