summaryrefslogtreecommitdiff
path: root/fs/nfsd
AgeCommit message (Collapse)Author
2025-05-28Merge tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "The marquee feature for this release is that the limit on the maximum rsize and wsize has been raised to 4MB. The default remains at 1MB, but risk-seeking administrators now have the ability to try larger I/O sizes with NFS clients that support them. Eventually the default setting will be increased when we have confidence that this change will not have negative impact. With v6.16, NFSD now has its own debugfs file system where we can add experimental features and make them available outside of our development community without impacting production deployments. The first experimental setting added is one that makes all NFS READ operations use vfs_iter_read() instead of the NFSD splice actor. The plan is to eventually retire the splice actor, as that will enable a number of new capabilities such as the use of struct bio_vec from the top to the bottom of the NFSD stack. Jeff Layton contributed a number of observability improvements. The use of dprintk() in a number of high-traffic code paths has been replaced with static trace points. This release sees the continuation of efforts to harden the NFSv4.2 COPY operation. Soon, the restriction on async COPY operations can be lifted. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.16 development cycle" * tag 'nfsd-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (60 commits) xdrgen: Fix code generated for counted arrays SUNRPC: Bump the maximum payload size for the server NFSD: Add a "default" block size NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macro NFSD: Remove NFSD_BUFSIZE sunrpc: Remove the RPCSVC_MAXPAGES macro svcrdma: Adjust the number of entries in svc_rdma_send_ctxt::sc_pages svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages sunrpc: Adjust size of socket's receive page array dynamically SUNRPC: Remove svc_rqst :: rq_vec SUNRPC: Remove svc_fill_write_vector() NFSD: Use rqstp->rq_bvec in nfsd_iter_write() SUNRPC: Export xdr_buf_to_bvec() NFSD: De-duplicate the svc_fill_write_vector() call sites NFSD: Use rqstp->rq_bvec in nfsd_iter_read() sunrpc: Replace the rq_bvec array with dynamically-allocated memory sunrpc: Replace the rq_pages array with dynamically-allocated memory sunrpc: Remove backchannel check in svc_init_buffer() sunrpc: Add a helper to derive maxpages from sv_max_mesg svcrdma: Reduce the number of rdma_rw contexts per-QP ...
2025-05-26Merge tag 'vfs-6.16-rc1.async.dir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one*() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions
2025-05-15NFSD: Add a "default" block sizeChuck Lever
We'd like to increase the maximum r/wsize that NFSD can support, but without introducing possible regressions. So let's add a default setting of 1MB. A subsequent patch will raise the maximum value but leave the default alone. No behavior change is expected. Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macroChuck Lever
The 8192-byte maximum is a protocol-defined limit, and we already have a symbolic constant defined whose name matches the name of the limit defined in the protocol. Replace the duplicate. No change in behavior is expected. Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Remove NFSD_BUFSIZEChuck Lever
Clean up: The documenting comment for NFSD_BUFSIZE is quite stale. NFSD_BUFSIZE is used only for NFSv4 Reply these days; never for NFSv2 or v3, and never for RPC Calls. Even so, the byte count estimate does not include the size of the NFSv4 COMPOUND Reply HEADER or the RPC auth flavor. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Use rqstp->rq_bvec in nfsd_iter_write()Chuck Lever
If we can get rid of all uses of rq_vec, then it can be removed. Replace one use of rqstp::rq_vec with rqstp::rq_bvec. The feeling of layering violation grows stronger now that <linux/sunrpc/xdr.h> is included in fs/nfsd/vfs.c. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: De-duplicate the svc_fill_write_vector() call sitesChuck Lever
All three call sites do the same thing. I'm struggling with this a bit, however. struct xdr_buf is an XDR layer object and unmarshaling a WRITE payload is clearly a task intended to be done by the proc and xdr functions, not by VFS. This feels vaguely like a layering violation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Use rqstp->rq_bvec in nfsd_iter_read()Chuck Lever
If we can get rid of all uses of rq_vec, then it can be removed. Replace one use of rqstp::rq_vec with rqstp::rq_bvec. Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove legacy dprintks from GETATTR and STATFS codepathsJeff Layton
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove legacy READDIR dprintksJeff Layton
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove dprintks for v2/3 RENAME eventsJeff Layton
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove REMOVE/RMDIR dprintksJeff Layton
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove old LINK dprintksJeff Layton
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove old v2/3 SYMLINK dprintksJeff Layton
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove old v2/3 create path dprintksJeff Layton
Observability here is now covered by static tracepoints. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoint for getattr and statfs eventsJeff Layton
There isn't a common helper for getattrs, so add these into the protocol-specific helpers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoint to nfsd_readdirJeff Layton
Observe the start of NFS READDIR operations. The NFS READDIR's count argument can be interesting when tuning a client's readdir behavior. However, the count argument is not passed to nfsd_readdir(). To properly capture the count argument, this tracepoint must appear in each proc function before the nfsd_readdir() call. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoint to nfsd_renameJeff Layton
Observe the start of RENAME operations for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoints for unlink eventsJeff Layton
Observe the start of UNLINK, REMOVE, and RMDIR operations for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoint to nfsd_link()Jeff Layton
Observe the start of NFS LINK operations. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add tracepoint to nfsd_symlinkJeff Layton
Observe the start of SYMLINK operations for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add nfsd_vfs_create tracepointsJeff Layton
Observe the start of file and directory creation for all NFS versions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add a tracepoint to nfsd_lookup_dentryJeff Layton
Replace the dprintk in nfsd_lookup_dentry() with a trace point. nfsd_lookup_dentry() is called frequently enough that enabling this dprintk call site would result in log floods and performance issues. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add a tracepoint for nfsd_setattrJeff Layton
Turn Sargun's internal kprobe based implementation of this into a normal static tracepoint. Also, remove the dprintk's that got added recently with the fix for zero-length ACLs. Cc: Sargun Dillon <sargun@sargun.me> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Add a Call equivalent to the NFSD_TRACE_PROC_RES macrosChuck Lever
Introduce tracing helpers that can be used before the procedure status code is known. These macros are similar to the SVC_RQST_ENDPOINT helpers, but they can be modified to include NFS-specific fields if that is needed later. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Use sockaddr instead of a generic arrayChuck Lever
Record and emit presentation addresses using tracing helpers designed for the task. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Implement FATTR4_CLONE_BLKSIZE attributeChuck Lever
RFC 7862 states that if an NFS server implements a CLONE operation, it MUST also implement FATTR4_CLONE_BLKSIZE. NFSD implements CLONE, but does not implement FATTR4_CLONE_BLKSIZE. Note that in Section 12.2, RFC 7862 claims that FATTR4_CLONE_BLKSIZE is RECOMMENDED, not REQUIRED. Likely this is because a minor version is not permitted to add a REQUIRED attribute. Confusing. We assume this attribute reports a block size as a count of bytes, as RFC 7862 does not specify a unit. Reported-by: Roland Mainz <roland.mainz@nrubsig.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org # v6.7+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: use SHA-256 library API instead of crypto_shash APIEric Biggers
This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: Initialize ssc before laundromat_work to prevent NULL dereferenceLi Lingfeng
In nfs4_state_start_net(), laundromat_work may access nfsd_ssc through nfs4_laundromat -> nfsd4_ssc_expire_umount. If nfsd_ssc isn't initialized, this can cause NULL pointer dereference. Normally the delayed start of laundromat_work allows sufficient time for nfsd_ssc initialization to complete. However, when the kernel waits too long for userspace responses (e.g. in nfs4_state_start_net -> nfsd4_end_grace -> nfsd4_record_grace_done -> nfsd4_cld_grace_done -> cld_pipe_upcall -> __cld_pipe_upcall -> wait_for_completion path), the delayed work may start before nfsd_ssc initialization finishes. Fix this by moving nfsd_ssc initialization before starting laundromat_work. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: add commit start/done tracepoints around nfsd_commit()Jeff Layton
Very useful for gauging how long the vfs_fsync_range() takes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: nfsd4_spo_must_allow() must check this is a v4 compound requestNeilBrown
If the request being processed is not a v4 compound request, then examining the cstate can have undefined results. This patch adds a check that the rpc procedure being executed (rq_procinfo) is the NFSPROC4_COMPOUND procedure. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: fix access checking for NLM under XPRTSEC policiesOlga Kornievskaia
When an export policy with xprtsec policy is set with "tls" and/or "mtls", but an NFS client is doing a v3 xprtsec=tls mount, then NLM locking calls fail with an error because there is currently no support for NLM with TLS. Until such support is added, allow NLM calls under TLS-secured policy. Fixes: 4cc9b9f2bf4d ("nfsd: refine and rename NFSD_MAY_LOCK") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11nfsd: remove redundant WARN_ON_ONCE in nfsd4_writeGuoqing Jiang
It can be removed since svc_fill_write_vector already has the same WARN_ON_ONCE. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Add experimental setting to disable the use of splice readChuck Lever
NFSD currently has two separate code paths for handling read requests. One uses page splicing; the other is a traditional read based on an iov iterator. Because most Linux file systems support splice read, the latter does not get nearly the same test experience as splice reads. To force the use of vectored reads for testing and benchmarking, introduce the ability to disable splice reads for all NFS READ operations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Add /sys/kernel/debug/nfsdChuck Lever
Create a small sandbox under /sys/kernel/debug for experimental NFS server feature settings. There is no API/ABI compatibility guarantee for these settings. The only documentation for such settings, if any documentation exists, is in the kernel source code. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: fix race between nfsd registration and exports_procManinder Singh
As of now nfsd calls create_proc_exports_entry() at start of init_nfsd and cleanup by remove_proc_entry() at last of exit_nfsd. Which causes kernel OOPs if there is race between below 2 operations: (i) exportfs -r (ii) mount -t nfsd none /proc/fs/nfsd for 5.4 kernel ARM64: CPU 1: el1_irq+0xbc/0x180 arch_counter_get_cntvct+0x14/0x18 running_clock+0xc/0x18 preempt_count_add+0x88/0x110 prep_new_page+0xb0/0x220 get_page_from_freelist+0x2d8/0x1778 __alloc_pages_nodemask+0x15c/0xef0 __vmalloc_node_range+0x28c/0x478 __vmalloc_node_flags_caller+0x8c/0xb0 kvmalloc_node+0x88/0xe0 nfsd_init_net+0x6c/0x108 [nfsd] ops_init+0x44/0x170 register_pernet_operations+0x114/0x270 register_pernet_subsys+0x34/0x50 init_nfsd+0xa8/0x718 [nfsd] do_one_initcall+0x54/0x2e0 CPU 2 : Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 PC is at : exports_net_open+0x50/0x68 [nfsd] Call trace: exports_net_open+0x50/0x68 [nfsd] exports_proc_open+0x2c/0x38 [nfsd] proc_reg_open+0xb8/0x198 do_dentry_open+0x1c4/0x418 vfs_open+0x38/0x48 path_openat+0x28c/0xf18 do_filp_open+0x70/0xe8 do_sys_open+0x154/0x248 Sometimes it crashes at exports_net_open() and sometimes cache_seq_next_rcu(). and same is happening on latest 6.14 kernel as well: [ 0.000000] Linux version 6.14.0-rc5-next-20250304-dirty ... [ 285.455918] Unable to handle kernel paging request at virtual address 00001f4800001f48 ... [ 285.464902] pc : cache_seq_next_rcu+0x78/0xa4 ... [ 285.469695] Call trace: [ 285.470083] cache_seq_next_rcu+0x78/0xa4 (P) [ 285.470488] seq_read+0xe0/0x11c [ 285.470675] proc_reg_read+0x9c/0xf0 [ 285.470874] vfs_read+0xc4/0x2fc [ 285.471057] ksys_read+0x6c/0xf4 [ 285.471231] __arm64_sys_read+0x1c/0x28 [ 285.471428] invoke_syscall+0x44/0x100 [ 285.471633] el0_svc_common.constprop.0+0x40/0xe0 [ 285.471870] do_el0_svc_compat+0x1c/0x34 [ 285.472073] el0_svc_compat+0x2c/0x80 [ 285.472265] el0t_32_sync_handler+0x90/0x140 [ 285.472473] el0t_32_sync+0x19c/0x1a0 [ 285.472887] Code: f9400885 93407c23 937d7c27 11000421 (f86378a3) [ 285.473422] ---[ end trace 0000000000000000 ]--- It reproduced simply with below script: while [ 1 ] do /exportfs -r done & while [ 1 ] do insmod /nfsd.ko mount -t nfsd none /proc/fs/nfsd umount /proc/fs/nfsd rmmod nfsd done & So exporting interfaces to user space shall be done at last and cleanup at first place. With change there is no Kernel OOPs. Co-developed-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Shubham Rana <s9.rana@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: unregister filesystem in case genl_register_family() failsManinder Singh
With rpc_status netlink support, unregister of register_filesystem() was missed in case of genl_register_family() fails. Correcting it by making new label. Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support") Cc: stable@vger.kernel.org Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Record each NFSv4 call's session slot indexChuck Lever
Help the client resolve the race between the reply to an asynchronous COPY reply and the associated CB_OFFLOAD callback by planting the session, slot, and sequence number of the COPY in the CB_SEQUENCE contained in the CB_OFFLOAD COMPOUND. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Implement CB_SEQUENCE referring call listsChuck Lever
The slot index number of the current COMPOUND has, until now, not been needed outside of nfsd4_sequence(). But to record the tuple that represents a referring call, the slot number will be needed when processing subsequent operations in the COMPOUND. Refactor the code that allocates a new struct nfsd4_slot to ensure that the new sl_index field is always correctly initialized. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Implement CB_SEQUENCE referring call listsChuck Lever
We have yet to implement a mechanism in NFSD for resolving races between a server's reply and a related callback operation. For example, a CB_OFFLOAD callback can race with the matching COPY response. The client will not recognize the copy state ID in the CB_OFFLOAD callback until the COPY response arrives. Trond adds: > It is also needed for the same kind of race with delegation > recalls, layout recalls, CB_NOTIFY_DEVICEID and would also be > helpful (although not as strongly required) for CB_NOTIFY_LOCK. RFC 8881 Section 20.9.3 describes referring call lists this way: > The csa_referring_call_lists array is the list of COMPOUND > requests, identified by session ID, slot ID, and sequence ID. > These are requests that the client previously sent to the server. > These previous requests created state that some operation(s) in > the same CB_COMPOUND as the csa_referring_call_lists are > identifying. A session ID is included because leased state is tied > to a client ID, and a client ID can have multiple sessions. See > Section 2.10.6.3. Introduce the XDR infrastructure for populating the csa_referring_call_lists argument of CB_SEQUENCE. Subsequent patches will put the referring call list to use. Note that cb_sequence_enc_sz estimates that only zero or one rcl is included in each CB_SEQUENCE, but the new infrastructure can manage any number of referring calls. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: Shorten CB_OFFLOAD response to NFS4ERR_DELAYChuck Lever
Try not to prolong the wait for completion of a COPY or COPY_NOTIFY operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11NFSD: OFFLOAD_CANCEL should mark an async COPY as completedChuck Lever
Update the status of an async COPY operation when it has been stopped. OFFLOAD_STATUS needs to indicate that the COPY is no longer running. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-19Merge tag 'nfsd-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - v6.15 libcrc clean-up makes invalid configurations possible - Fix a potential deadlock introduced during the v6.15 merge window * tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: decrease sc_count directly if fail to queue dl_recall nfs: add missing selections of CONFIG_CRC32
2025-04-13nfsd: decrease sc_count directly if fail to queue dl_recallLi Lingfeng
A deadlock warning occurred when invoking nfs4_put_stid following a failed dl_recall queue operation: T1 T2 nfs4_laundromat nfs4_get_client_reaplist nfs4_anylock_blockers __break_lease spin_lock // ctx->flc_lock spin_lock // clp->cl_lock nfs4_lockowner_has_blockers locks_owner_has_blockers spin_lock // flctx->flc_lock nfsd_break_deleg_cb nfsd_break_one_deleg nfs4_put_stid refcount_dec_and_lock spin_lock // clp->cl_lock When a file is opened, an nfs4_delegation is allocated with sc_count initialized to 1, and the file_lease holds a reference to the delegation. The file_lease is then associated with the file through kernel_setlease. The disassociation is performed in nfsd4_delegreturn via the following call chain: nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg --> nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease The corresponding sc_count reference will be released after this disassociation. Since nfsd_break_one_deleg executes while holding the flc_lock, the disassociation process becomes blocked when attempting to acquire flc_lock in generic_delete_lease. This means: 1) sc_count in nfsd_break_one_deleg will not be decremented to 0; 2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to acquire cl_lock; 3) Consequently, no deadlock condition is created. Given that sc_count in nfsd_break_one_deleg remains non-zero, we can safely perform refcount_dec on sc_count directly. This approach effectively avoids triggering deadlock warnings. Fixes: 230ca758453c ("nfsd: put dl_stid if fail to queue dl_recall") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-13nfs: add missing selections of CONFIG_CRC32Eric Biggers
nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available only when CONFIG_CRC32 is enabled. But the only NFS kconfig option that selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and did not actually guard the use of crc32_le() even on the client. The code worked around this bug by only actually calling crc32_le() when CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases. This avoided randconfig build errors, and in real kernels the fallback code was unlikely to be reached since CONFIG_CRC32 is 'default y'. But, this really needs to just be done properly, especially now that I'm planning to update CONFIG_CRC32 to not be 'default y'. Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select CONFIG_CRC32. Then remove the fallback code that becomes unnecessary, as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG. Fixes: 1264a2f053a3 ("NFS: refactor code for calculating the crc32 hash of a filehandle") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-07nfsd: Use lookup_one() rather than lookup_one_len()NeilBrown
nfsd uses some VFS interfaces (such as vfs_mkdir) which take an explicit mnt_idmap, and it passes &nop_mnt_idmap as nfsd doesn't yet support idmapped mounts. It also uses the lookup_one_len() family of functions which implicitly use &nop_mnt_idmap. This mixture of implicit and explicit could be confusing. When we eventually update nfsd to support idmap mounts it would be best if all places which need an idmap determined from the mount point were similar and easily found. So this patch changes nfsd to use lookup_one(), lookup_one_unlocked(), and lookup_one_positive_unlocked(), passing &nop_mnt_idmap. This has the benefit of removing some uses of the lookup_one_len functions where permission checking is actually needed. Many callers don't care about permission checking and using these function only where permission checking is needed is a valuable simplification. This change requires passing the name in a qstr. Currently this is a little clumsy, but if nfsd is changed to use qstr more broadly it will result in a net improvement. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-3-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-31Merge tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Neil Brown contributed more scalability improvements to NFSD's open file cache, and Jeff Layton contributed a menagerie of repairs to NFSD's NFSv4 callback / backchannel implementation. Mike Snitzer contributed a change to NFS re-export support that disables support for file locking on a re-exported NFSv4 mount. This is because NFSv4 state recovery is currently difficult if not impossible for re-exported NFS mounts. The change aims to prevent data integrity exposures after the re-export server crashes. Work continues on the evolving NFSD netlink administrative API. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.15 development cycle" * tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits) NFSD: Add a Kconfig setting to enable delegated timestamps sysctl: Fixes nsm_local_state bounds nfsd: use a long for the count in nfsd4_state_shrinker_count() nfsd: remove obsolete comment from nfs4_alloc_stid nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault() nfsd: reorganize struct nfs4_delegation for better packing nfsd: handle errors from rpc_call_async() nfsd: move cb_need_restart flag into cb_flags nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY nfsd: prevent callback tasks running concurrently nfsd: disallow file locking and delegations for NFSv4 reexport nfsd: filecache: drop the list_lru lock during lock gc scans nfsd: filecache: don't repeatedly add/remove files on the lru list nfsd: filecache: introduce NFSD_FILE_RECENT nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc() nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync() NFSD: Re-organize nfsd_file_gc_worker() nfsd: filecache: remove race handling. fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning ...
2025-03-14NFSD: Add a Kconfig setting to enable delegated timestampsChuck Lever
After three tries, we still see test failures with delegated timestamps. Disable them by default, but leave the implementation intact so that development can continue. Cc: stable@vger.kernel.org # v6.14 Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: use a long for the count in nfsd4_state_shrinker_count()Jeff Layton
If there are no courtesy clients then the return value from the atomic_long_read() could overflow an int. Use a long to store the value instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10nfsd: remove obsolete comment from nfs4_alloc_stidJeff Layton
idr_alloc_cyclic() is what guarantees that now, not this long-gone trick. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>