summaryrefslogtreecommitdiff
path: root/net/sunrpc
AgeCommit message (Collapse)Author
2022-03-23SUNRPC: avoid race between mod_timer() and del_timer_sync()NeilBrown
xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync(). Both xprt_unlock_connect() and xprt_release() call ->release_xprt() which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect() which calls mod_timer(). This may result in mod_timer() being called *after* del_timer_sync(). When this happens, the timer may fire long after the xprt has been freed, and run_timer_softirq() will probably crash. The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is always called under ->transport_lock. So if we take ->transport_lock to call del_timer_sync(), we can be sure that mod_timer() will run first (if it runs at all). Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22fs: allocate inode by using alloc_inode_sb()Muchun Song
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22SUNRPC: Make the rpciod and xprtiod slab allocation modes consistentTrond Myklebust
Make sure that rpciod and xprtiod are always using the same slab allocation modes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22SUNRPC: Fix unx_lookup_cred() allocationTrond Myklebust
Default to the same mempool allocation strategy as for rpc_malloc(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22NFS: Fix memory allocation in rpc_alloc_task()Trond Myklebust
As for rpc_malloc(), we first try allocating from the slab, then fall back to a non-waiting allocation from the mempool. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22NFS: Fix memory allocation in rpc_malloc()Trond Myklebust
When in a low memory situation, we do want rpciod to kick off direct reclaim in the case where that helps, however we don't want it looping forever in mempool_alloc(). So first try allocating from the slab using GFP_KERNEL | __GFP_NORETRY, and then fall back to a GFP_NOWAIT allocation from the mempool. Ditto for rpc_alloc_task() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22SUNRPC: Improve accuracy of socket ENOBUFS determinationTrond Myklebust
The current code checks for whether or not the socket is in a writeable state after we get an EAGAIN. That is racy, since we've dropped the socket lock, so the amount of free buffer may have changed. Instead, let's check whether the socket is writeable before we try to write to it. If that was the case, we do expect the message to be at least partially sent unless we're in a low memory situation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACETrond Myklebust
The socket's SOCKWQ_ASYNC_NOSPACE can be cleared by various actors in the socket layer, so replace it with our own flag in the transport sock_state field. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22SUNRPC: Fix socket waits for write buffer spaceTrond Myklebust
The socket layer requires that we use the socket lock to protect changes to the sock->sk_write_pending field and others. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22SUNRPC: Only save the TCP source port after the connection is completeTrond Myklebust
Since the RPC client uses a non-blocking connect(), we do not expect to see it return '0' under normal circumstances. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22SUNRPC: Don't call connect() more than once on a TCP socketTrond Myklebust
Avoid socket state races due to repeated calls to ->connect() using the same socket. If connect() returns 0 due to the connection having completed, but we are in fact in a closing state, then we may leave the XPRT_CONNECTING flag set on the transport. Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de> Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13SUNRPC: change locking for xs_swap_enable/disableNeilBrown
It is not in general safe to wait for XPRT_LOCKED to clear. A wakeup is only sent when - connection completes - sock close completes so during normal operations, this can wait indefinitely. The event we need to protect against is ->inet being set to NULL, and that happens under the recv_mutex lock. So drop the handlign of XPRT_LOCKED and use recv_mutex instead. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13NFSv4: keep state manager thread active if swap is enabledNeilBrown
If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOCNeilBrown
rpc tasks can be marked as RPC_TASK_SWAPPER. This causes GFP_MEMALLOC to be used for some allocations. This is needed in some cases, but not in all where it is currently provided, and in some where it isn't provided. Currently *all* tasks associated with a rpc_client on which swap is enabled get the flag and hence some GFP_MEMALLOC support. GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it. However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does need it. xdr_alloc_bvec is called while the XPRT_LOCK is held. If this blocks, then it blocks all other queued tasks. So this allocation needs GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used for any swap writes. Similarly, if the transport is not connected, that will block all requests including swap writes, so memory allocations should get GFP_MEMALLOC if swap writes are possible. So with this patch: 1/ we ONLY set RPC_TASK_SWAPPER for swap writes. 2/ __rpc_execute() sets PF_MEMALLOC while handling any task with RPC_TASK_SWAPPER set, or when handling any task that holds the XPRT_LOCKED lock on an xprt used for swap. This removes the need for the RPC_IS_SWAPPER() test in ->buf_alloc handlers. 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking any task to a swapper xprt. __rpc_execute() will clear it. 3/ PF_MEMALLOC is set for all the connect workers. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> (for xprtrdma parts) Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDSNeilBrown
NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13SUNRPC: remove scheduling boost for "SWAPPER" tasks.NeilBrown
Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13SUNRPC/xprt: async tasks mustn't block waiting for memoryNeilBrown
When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13SUNRPC/auth: async tasks mustn't block waiting for memoryNeilBrown
When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13SUNRPC/call_alloc: async tasks mustn't block waiting for memoryNeilBrown
When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-28SUNRPC: Teach server to recognize RPC_AUTH_TLSChuck Lever
Initial support for the RPC_AUTH_TLS authentication flavor enables NFSD to eventually accept an RPC_AUTH_TLS probe from clients. This patch simply prevents NFSD from rejecting these probes completely. In the meantime, graft this support in now so that RPC_AUTH_TLS support keeps up with generic code and API changes in the RPC server. Down the road, server-side transport implementations will populate xpo_start_tls when they can support RPC-with-TLS. For example, TCP will eventually populate it, but RDMA won't. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28NFSD: Move svc_serv_ops::svo_function into struct svc_servChuck Lever
Hoist svo_function back into svc_serv and remove struct svc_serv_ops, since the struct is now devoid of fields. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28NFSD: Remove svc_serv_ops::svo_moduleChuck Lever
struct svc_serv_ops is about to be removed. Neil Brown says: > I suspect svo_module can go as well - I don't think the thread is > ever the thing that primarily keeps a module active. A random sample of kthread_create() callers shows sunrpc is the only one that manages module reference count in this way. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Remove svc_shutdown_net()Chuck Lever
Clean up: svc_shutdown_net() now does nothing but call svc_close_net(). Replace all external call sites. svc_close_net() is renamed to be the inverse of svc_xprt_create(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Rename svc_close_xprt()Chuck Lever
Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Rename svc_create_xprt()Chuck Lever
Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Remove svo_shutdown methodChuck Lever
Clean up. Neil observed that "any code that calls svc_shutdown_net() knows what the shutdown function should be, and so can call it directly." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de>
2022-02-28SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()Chuck Lever
Neil says: "These functions were separated in commit 0971374e2818 ("SUNRPC: Reduce contention in svc_xprt_enqueue()") so that the XPT_BUSY check happened before taking any spinlocks. We have since moved or removed the spinlocks so the extra test is fairly pointless." I've made this a separate patch in case the XPT_BUSY change has unexpected consequences and needs to be reverted. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Remove the .svo_enqueue_xprt methodChuck Lever
We have never been able to track down and address the underlying cause of the performance issues with workqueue-based service support. svo_enqueue_xprt is called multiple times per RPC, so it adds instruction path length, but always ends up at the same function: svc_xprt_do_enqueue(). We do not anticipate needing this flexibility for dynamic nfsd thread management support. As a micro-optimization, remove .svo_enqueue_xprt because Spectre/Meltdown makes virtual function calls more costly. This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-25SUNRPC/xprtrdma: Convert GFP_NOFS to GFP_KERNELTrond Myklebust
Assume that the upper layers have set memalloc_nofs_save/restore as appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25SUNRPC/auth_gss: Convert GFP_NOFS to GFP_KERNELTrond Myklebust
Assume that the upper layers have set memalloc_nofs_save/restore as appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25SUNRPC: Convert GFP_NOFS to GFP_KERNELTrond Myklebust
The sections which should not re-enter the filesystem are already protected with memalloc_nofs_save/restore calls, so it is better to use GFP_KERNEL in these calls to allow better performance for synchronous RPC calls. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25SUNRPC: remove redundant pointer plainhdrColin Ian King
[You don't often get email from colin.i.king@gmail.com. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Pointer plainhdr is being assigned a value that is never read, the pointer is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08SUNRPC: lock against ->sock changing during sysfs readNeilBrown
->sock can be set to NULL asynchronously unless ->recv_mutex is held. So it is important to hold that mutex. Otherwise a sysfs read can trigger an oops. Commit 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") appears to attempt to fix this problem, but it only narrows the race window. Fixes: 17f09d3f619a ("SUNRPC: Check if the xprt is connected before handling sysfs reads") Fixes: a8482488a7d6 ("SUNRPC query transport's source port") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_createDan Aloni
If there are failures then we must not leave the non-NULL pointers with the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries free them, resulting in an Oops. Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-28Merge tag 'fsnotify_for_v5.17-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Fixes for userspace breakage caused by fsnotify changes ~3 years ago and one fanotify cleanup" * tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: fix fsnotify hooks in pseudo filesystems fsnotify: invalidate dcache before IN_DELETE event fanotify: remove variable set but not used
2022-01-28SUNRPC: add netns refcount tracker to struct rpc_xprtEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-28SUNRPC: add netns refcount tracker to struct gss_authEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-28SUNRPC: add netns refcount tracker to struct svc_xprtEric Dumazet
struct svc_xprt holds a long lived reference to a netns, it is worth tracking it. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-25Merge tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Basic handling for case insensitive filesystems - Initial support for fs_locations and server trunking Bugfixes and Cleanups: - Cleanups to how the "struct cred *" is handled for the nfs_access_entry - Ensure the server has an up to date ctimes before hardlinking or renaming - Update 'blocks used' after writeback, fallocate, and clone - nfs_atomic_open() fixes - Improvements to sunrpc tracing - Various null check & indenting related cleanups - Some improvements to the sunrpc sysfs code: - Use default_groups in kobj_type - Fix some potential races and reference leaks - A few tracepoint cleanups in xprtrdma" [ This should have gone in during the merge window, but didn't. The original pull request - sent during the merge window - had gotten marked as spam and discarded due missing DKIM headers in the email from Anna. - Linus ] * tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (35 commits) SUNRPC: Don't dereference xprt->snd_task if it's a cookie xprtrdma: Remove definitions of RPCDBG_FACILITY xprtrdma: Remove final dprintk call sites from xprtrdma sunrpc: Fix potential race conditions in rpc_sysfs_xprt_state_change() net/sunrpc: fix reference count leaks in rpc_sysfs_xprt_state_change NFSv4.1 test and add 4.1 trunking transport SUNRPC allow for unspecified transport time in rpc_clnt_add_xprt NFSv4 handle port presence in fs_location server string NFSv4 expose nfs_parse_server_name function NFSv4.1 query for fs_location attr on a new file system NFSv4 store server support for fs_location attribute NFSv4 remove zero number of fs_locations entries error check NFSv4: nfs_atomic_open() can race when looking up a non-regular file NFSv4: Handle case where the lookup of a directory fails NFSv42: Fallocate and clone should also request 'blocks used' NFSv4: Allow writebacks to request 'blocks used' SUNRPC: use default_groups in kobj_type NFS: use default_groups in kobj_type NFS: Fix the verifier for case sensitive filesystem in nfs_atomic_open() NFS: Add a helper to remove case-insensitive aliases ...
2022-01-24fsnotify: fix fsnotify hooks in pseudo filesystemsAmir Goldstein
Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify will have access to a positive dentry. This allowed a race where opening the deleted file via cached dentry is now possible after receiving the IN_DELETE event. To fix the regression in pseudo filesystems, convert d_delete() calls to d_drop() (see commit 46c46f8df9aa ("devpts_pty_kill(): don't bother with d_delete()") and move the fsnotify hook after d_drop(). Add a missing fsnotify_unlink() hook in nfsdfs that was found during the audit of fsnotify hooks in pseudo filesystems. Note that the fsnotify hooks in simple_recursive_removal() follow d_invalidate(), so they require no change. Link: https://lore.kernel.org/r/20220120215305.282577-2-amir73il@gmail.com Reported-by: Ivan Delalande <colona@arista.com> Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/ Fixes: 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-22proc: remove PDE_DATA() completelyMuchun Song
Remove PDE_DATA() completely and replace it with pde_data(). [akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c] [akpm@linux-foundation.org: now fix it properly] Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-16Merge tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Bruce has announced he is leaving Red Hat at the end of the month and is stepping back from his role as NFSD co-maintainer. As a result, this includes a patch removing him from the MAINTAINERS file. There is one patch in here that Jeff Layton was carrying in the locks tree. Since he had only one for this cycle, he asked us to send it to you via the nfsd tree. There continues to be 0-day reports from Robert Morris @MIT. This time we include a fix for a crash in the COPY_NOTIFY operation. Highlights: - Bruce steps down as NFSD maintainer - Prepare for dynamic nfsd thread management - More work on supporting re-exporting NFS mounts - One fs/locks patch on behalf of Jeff Layton Notable bug fixes: - Fix zero-length NFSv3 WRITEs - Fix directory cinfo on FS's that do not support iversion - Fix WRITE verifiers for stable writes - Fix crash on COPY_NOTIFY with a special state ID" * tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (51 commits) SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point fs/locks: fix fcntl_getlk64/fcntl_setlk64 stub prototypes nfsd: fix crash on COPY_NOTIFY with special stateid MAINTAINERS: remove bfields NFSD: Move fill_pre_wcc() and fill_post_wcc() Revert "nfsd: skip some unnecessary stats in the v4 case" NFSD: Trace boot verifier resets NFSD: Rename boot verifier functions NFSD: Clean up the nfsd_net::nfssvc_boot field NFSD: Write verifier might go backwards nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id) NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id) NFSD: Clean up nfsd_vfs_write() nfsd: Replace use of rwsem with errseq_t NFSD: Fix verifier returned in stable WRITEs nfsd: Retry once in nfsd_open on an -EOPENSTALE return nfsd: Add errno mapping for EREMOTEIO nfsd: map EBADF ...
2022-01-15mm: introduce memalloc_retry_wait()NeilBrown
Various places in the kernel - largely in filesystems - respond to a memory allocation failure by looping around and re-trying. Some of these cannot conveniently use __GFP_NOFAIL, for reasons such as: - a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on - a need to check for the process being signalled between failures - the possibility that other recovery actions could be performed - the allocation is quite deep in support code, and passing down an extra flag to say if __GFP_NOFAIL is wanted would be clumsy. Many of these currently use congestion_wait() which (in almost all cases) simply waits the given timeout - congestion isn't tracked for most devices. It isn't clear what the best delay is for loops, but it is clear that the various filesystems shouldn't be responsible for choosing a timeout. This patch introduces memalloc_retry_wait() with takes on that responsibility. Code that wants to retry a memory allocation can call this function passing the GFP flags that were used. It will wait however is appropriate. For now, it only considers __GFP_NORETRY and whatever gfpflags_allow_blocking() tests. If blocking is allowed without __GFP_NORETRY, then alloc_page either made some reclaim progress, or waited for a while, before failing. So there is no need for much further waiting. memalloc_retry_wait() will wait until the current jiffie ends. If this condition is not met, then alloc_page() won't have waited much if at all. In that case memalloc_retry_wait() waits about 200ms. This is the delay that most current loops uses. linux/sched/mm.h needs to be included in some files now, but linux/backing-dev.h does not. Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14xprtrdma: Remove definitions of RPCDBG_FACILITYChuck Lever
Deprecated. dprintk is no longer used in xprtrdma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-14xprtrdma: Remove final dprintk call sites from xprtrdmaChuck Lever
Deprecated. This information is available via tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13sunrpc: Fix potential race conditions in rpc_sysfs_xprt_state_change()Anna Schumaker
We need to use test_and_set_bit() when changing xprt state flags to avoid potentially getting xps->xps_nactive out of sync. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13net/sunrpc: fix reference count leaks in rpc_sysfs_xprt_state_changeXiyu Yang
The refcount leak issues take place in an error handling path. When the 3rd argument buf doesn't match with "offline", "online" or "remove", the function simply returns -EINVAL and forgets to decrease the reference count of a rpc_xprt object and a rpc_xprt_switch object increased by rpc_sysfs_xprt_kobj_get_xprt() and rpc_sysfs_xprt_kobj_get_xprt_switch(), causing reference count leaks of both unused objects. Fix this issue by jumping to the error handling path labelled with out_put when buf matches none of "offline", "online" or "remove". Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>