summaryrefslogtreecommitdiff
path: root/net/sunrpc/clnt.c
AgeCommit message (Collapse)Author
2025-03-26SUNRPC: rpc_clnt_set_transport() must not change the autobind settingTrond Myklebust
The autobind setting was supposed to be determined in rpc_create(), since commit c2866763b402 ("SUNRPC: use sockaddr + size when creating remote transport endpoints"). Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-21NFS: Treat ENETUNREACH errors as fatal in containersTrond Myklebust
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the generic NFS client. If the flag is set, the client will receive ENETDOWN and ENETUNREACH errors from the RPC layer, and is expected to treat them as being fatal. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-22SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expiredDai Ngo
When a user TGT ticket expired, gssd returns EKEYEXPIRED to the RPC layer for the upcall to create the security context. The RPC layer then retries the upcall twice before returning the EKEYEXPIRED to the NFS layer. This results in three separate TCP connections to the NFS server being created by gssd for each RPC request. These connections are not used and left in TIME_WAIT state. Note that for RPC call that uses machine credential, gssd automatically renews the ticket. But for a regular user the ticket needs to be renewed by the user before access to the krb5 share is allowed. This patch removes the retries by RPC on EKEYEXPIRED so that these unused TCP connections are not created. Reproducer: $ kinit -l 1m $ sleep 65 $ cd /mnt/krb5share $ netstat -na |grep TIME_WAIT Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-13SUNRPC: display total RPC tasks for RPC clientDai Ngo
Display the total number of RPC tasks, including tasks waiting on workqueue and wait queues, for rpc_clnt. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-13SUNRPC: only put task on cl_tasks list after the RPC call slot is reserved.Dai Ngo
Under heavy write load, we've seen the cl_tasks list grows to millions of entries. Even though the list is extremely long, the system still runs fine until the user wants to get the information of all active RPC tasks by doing: When this happens, tasks_start acquires the cl_lock to walk the cl_tasks list, returning one entry at a time to the caller. The cl_lock is held until all tasks on this list have been processed. While the cl_lock is held, completed RPC tasks have to spin wait in rpc_task_release_client for the cl_lock. If there are millions of entries in the cl_tasks list it will take a long time before tasks_stop is called and the cl_lock is released. The spin wait tasks can use up all the available CPUs in the system, preventing other jobs to run, this causes the system to temporarily lock up. This patch fixes this problem by delaying inserting the RPC task on the cl_tasks list until the RPC call slot is reserved. This limits the length of the cl_tasks to the number of call slots available in the system. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: remove call_allocate() BUG_ONsMike Snitzer
Remove BUG_ON if p_arglen=0 to allow RPC with void arg. Remove BUG_ON if p_replen=0 to allow RPC with void return. The former was needed for the first revision of the LOCALIO protocol which had an RPC that took a void arg: /* raw RFC 9562 UUID */ typedef u8 uuid_t<UUID_SIZE>; program NFS_LOCALIO_PROGRAM { version LOCALIO_V1 { void NULL(void) = 0; uuid_t GETUUID(void) = 1; } = 1; } = 400122; The latter is needed for the final revision of the LOCALIO protocol which has a UUID_IS_LOCAL RPC which returns a void: /* raw RFC 9562 UUID */ typedef u8 uuid_t<UUID_SIZE>; program NFS_LOCALIO_PROGRAM { version LOCALIO_V1 { void NULL(void) = 0; void UUID_IS_LOCAL(uuid_t) = 1; } = 1; } = 400122; There is really no value in triggering a BUG_ON in response to either of these previously unsupported conditions. NeilBrown would like the entire 'if (proc->p_proc != 0)' branch removed (not just the one BUG_ON that must be removed for LOCALIO's immediate needs of returning void). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: clnt.c: Remove misleading commentSiddh Raman Pant
destroy_wait doesn't store all RPC clients. There was a list named "all_clients" above it, which got moved to struct sunrpc_net in 2012, but the comment was never removed. Fixes: 70abc49b4f4a ("SUNRPC: make SUNPRC clients list per network namespace context") Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23SUNRPC: Fix -Wformat-truncation warningKunwu Chan
Increase size of the servername array to avoid truncated output warning. net/sunrpc/clnt.c:582:75: error:‘%s’ directive output may be truncated writing up to 107 bytes into a region of size 48 [-Werror=format-truncation=] 582 | snprintf(servername, sizeof(servername), "%s", | ^~ net/sunrpc/clnt.c:582:33: note:‘snprintf’ output between 1 and 108 bytes into a destination of size 48 582 | snprintf(servername, sizeof(servername), "%s", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 583 | sun->sun_path); Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-07-08SUNRPC: avoid soft lockup when transmitting UDP to reachable server.NeilBrown
Prior to the commit identified below, call_transmit_status() would handle -EPERM and other errors related to an unreachable server by falling through to call_status() which added a 3-second delay and handled the failure as a timeout. Since that commit, call_transmit_status() falls through to handle_bind(). For UDP this moves straight on to handle_connect() and handle_transmit() so we immediately retransmit - and likely get the same error. This results in an indefinite loop in __rpc_execute() which triggers a soft-lockup warning. For the errors that indicate an unreachable server, call_transmit_status() should fall back to call_status() as it did before. This cannot cause the thundering herd that the previous patch was avoiding, as the call_status() will insert a delay. Fixes: ed7dc973bd91 ("SUNRPC: Prevent thundering herd when the socket is not connected") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-05-20sunrpc: fix NFSACL RPC retry on soft mountDan Aloni
It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that all mount parameters propagate to NFSACL clients. However since that change, if mount options as follows are given: soft,timeo=50,retrans=16,vers=3 The resultant NFSACL client receives: cl_softrtry: 1 cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0 These values lead to NFSACL operations not being retried under the condition of transient network outages with soft mount. Instead, getacl call fails after 60 seconds with EIO. The simple fix is to pass the existing client's `cl_timeout` as the new client timeout. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/all/20231105154857.ryakhmgaptq3hb6b@gmail.com/T/ Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()') Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-05-20SUNRPC: fix handling expired GSS contextOlga Kornievskaia
In the case where we have received a successful reply to an RPC request, but while processing the reply the client in rpc_decode_header() finds an expired context, the code ends up propagating the error to the caller instead of getting a new context and retrying the request. To give more details, in rpc_decode_header() we call rpcauth_checkverf() will call into the gss and internally will at some point call gss_validate() which has a check if the current’s context lifetime expired, and it would fail. The reason for the failure gets ‘scrubbed’ and translated to EACCES so when we get back to rpc_decode_header() we just go to “out_verifier” which for that error would get converted to “out_garbage” (ie it’s treated as garballed reply) and the next action is call_encode. Which (1) doesn’t reencode or re-send (not to mention no upcall happens because context expires as that reason just not known) and it again fails in the same decoding process. After re-trying it 3 times the error is propagated back to the caller (ie nfs4_write_done_cb() in the case a failing write). To fix this, instead we need to look to the case where the server decides that context has expired and replies with an RPC auth error. In that case, the rpc_decode_header() goes to "out_msg_denied" in that we return EKEYREJECTED which in call_decode() is sent to “call_reserve” which triggers an upcalls and a re-try of the operation. The proposed fix is in case of a failed rpc_decode_header() to check if credentials were set to be invalid and use that as a proxy for deciding that context has expired and then treat is same way as receiving an auth error. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-03-09sunrpc: add a struct rpc_stats arg to rpc_create_argsJosef Bacik
We want to be able to have our rpc stats handled in a per network namespace manner, so add an option to rpc_create_args to specify a different rpc_stats struct instead of using the one on the rpc_program. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-01-10Merge tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull nfs client updates from Anna Schumaker: "New Features: - Always ask for type with READDIR - Remove nfs_writepage() Bugfixes: - Fix a suspicious RCU usage warning - Fix a blocklayoutdriver reference leak - Fix the block driver's calculation of layoutget size - Fix handling NFS4ERR_RETURNCONFLICT - Fix _xprt_switch_find_current_entry() - Fix v4.1 backchannel request timeouts - Don't add zero-length pnfs block devices - Use the parent cred in nfs_access_login_time() Cleanups: - A few improvements when dealing with referring calls from the server - Clean up various unused variables, struct fields, and function calls - Various tracepoint improvements" * tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (21 commits) NFSv4.1: Use the nfs_client's rpc timeouts for backchannel SUNRPC: Fixup v4.1 backchannel request timeouts rpc_pipefs: Replace one label in bl_resolve_deviceid() nfs: Remove writepage NFS: drop unused nfs_direct_req bytes_left pNFS: Fix the pnfs block driver's calculation of layoutget size nfs: print fileid in lookup tracepoints nfs: rename the nfs_async_rename_done tracepoint nfs: add new tracepoint at nfs4 revalidate entry point SUNRPC: fix _xprt_switch_find_current_entry logic NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT NFSv4.1: if referring calls are complete, trust the stateid argument NFSv4: Track the number of referring calls in struct cb_process_state NFS: Use parent's objective cred in nfs_access_login_time() NFSv4: Always ask for type with READDIR pnfs/blocklayout: Don't add zero-length pnfs_block_dev blocklayoutdriver: Fix reference leak of pnfs_device_node SUNRPC: Fix a suspicious RCU usage warning SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switch SUNRPC: Remove unused function rpc_clnt_xprt_switch_put() ...
2024-01-04NFSv4.1: Use the nfs_client's rpc timeouts for backchannelBenjamin Coddington
For backchannel requests that lookup the appropriate nfs_client, use the state-management rpc_clnt's rpc_timeout parameters for the backchannel's response. When the nfs_client cannot be found, fall back to using the xprt's default timeout parameters. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switchAnna Schumaker
This function takes the necessary rcu read lock to dereference the client's rpc_xprt_switch and bump the reference count so it doesn't disappear underneath us before returning. This does mean that callers are responsible for calling xprt_switch_put() on the returned object when they are done with it. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04SUNRPC: Remove unused function rpc_clnt_xprt_switch_put()Anna Schumaker
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04SUNRPC: Clean up unused variable in rpc_xprt_probe_trunked()Anna Schumaker
We don't use the rpc_xprt_switch anywhere in this function, so let's not take an extra reference to in unnecessarily. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-11-30SUNRPC: Replace strlcpy() with strscpy()Kees Cook
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated[1]. Additionally, it returns the size of the source string, not the resulting size of the destination string. In an effort to remove strlcpy() completely[2], replace strlcpy() here with strscpy(). Explicitly handle the truncation case by returning the size of the resulting string. If "nodename" was ever longer than sizeof(clnt->cl_nodename) - 1, this change will fix a bug where clnt->cl_nodelen would end up thinking there were more characters in clnt->cl_nodename than there actually were, which might have lead to kernel memory content exposures. Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Tom Talpey <tom@talpey.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-nfs@vger.kernel.org Cc: netdev@vger.kernel.org Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1] Link: https://github.com/KSPP/linux/issues/89 [2] Co-developed-by: Azeem Shaikh <azeemshaikh38@gmail.com> Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20231114175407.work.410-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-11-01SUNRPC: Fix RPC client cleaned up the freed pipefs dentriesfelix
RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir() workqueue,which takes care about pipefs superblock locking. In some special scenarios, when kernel frees the pipefs sb of the current client and immediately alloctes a new pipefs sb, rpc_remove_pipedir function would misjudge the existence of pipefs sb which is not the one it used to hold. As a result, the rpc_remove_pipedir would clean the released freed pipefs dentries. To fix this issue, rpc_remove_pipedir should check whether the current pipefs sb is consistent with the original pipefs sb. This error can be catched by KASAN: ========================================================= [ 250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200 [ 250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503 [ 250.500549] Workqueue: events rpc_free_client_work [ 250.501001] Call Trace: [ 250.502880] kasan_report+0xb6/0xf0 [ 250.503209] ? dget_parent+0x195/0x200 [ 250.503561] dget_parent+0x195/0x200 [ 250.503897] ? __pfx_rpc_clntdir_depopulate+0x10/0x10 [ 250.504384] rpc_rmdir_depopulate+0x1b/0x90 [ 250.504781] rpc_remove_client_dir+0xf5/0x150 [ 250.505195] rpc_free_client_work+0xe4/0x230 [ 250.505598] process_one_work+0x8ee/0x13b0 ... [ 22.039056] Allocated by task 244: [ 22.039390] kasan_save_stack+0x22/0x50 [ 22.039758] kasan_set_track+0x25/0x30 [ 22.040109] __kasan_slab_alloc+0x59/0x70 [ 22.040487] kmem_cache_alloc_lru+0xf0/0x240 [ 22.040889] __d_alloc+0x31/0x8e0 [ 22.041207] d_alloc+0x44/0x1f0 [ 22.041514] __rpc_lookup_create_exclusive+0x11c/0x140 [ 22.041987] rpc_mkdir_populate.constprop.0+0x5f/0x110 [ 22.042459] rpc_create_client_dir+0x34/0x150 [ 22.042874] rpc_setup_pipedir_sb+0x102/0x1c0 [ 22.043284] rpc_client_register+0x136/0x4e0 [ 22.043689] rpc_new_client+0x911/0x1020 [ 22.044057] rpc_create_xprt+0xcb/0x370 [ 22.044417] rpc_create+0x36b/0x6c0 ... [ 22.049524] Freed by task 0: [ 22.049803] kasan_save_stack+0x22/0x50 [ 22.050165] kasan_set_track+0x25/0x30 [ 22.050520] kasan_save_free_info+0x2b/0x50 [ 22.050921] __kasan_slab_free+0x10e/0x1a0 [ 22.051306] kmem_cache_free+0xa5/0x390 [ 22.051667] rcu_core+0x62c/0x1930 [ 22.051995] __do_softirq+0x165/0x52a [ 22.052347] [ 22.052503] Last potentially related work creation: [ 22.052952] kasan_save_stack+0x22/0x50 [ 22.053313] __kasan_record_aux_stack+0x8e/0xa0 [ 22.053739] __call_rcu_common.constprop.0+0x6b/0x8b0 [ 22.054209] dentry_free+0xb2/0x140 [ 22.054540] __dentry_kill+0x3be/0x540 [ 22.054900] shrink_dentry_list+0x199/0x510 [ 22.055293] shrink_dcache_parent+0x190/0x240 [ 22.055703] do_one_tree+0x11/0x40 [ 22.056028] shrink_dcache_for_umount+0x61/0x140 [ 22.056461] generic_shutdown_super+0x70/0x590 [ 22.056879] kill_anon_super+0x3a/0x60 [ 22.057234] rpc_kill_sb+0x121/0x200 Fixes: 0157d021d23a ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines") Signed-off-by: felix <fuzhen5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-10-22SUNRPC: Don't skip timeout checks in call_connect_status()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-10-22SUNRPC: ECONNRESET might require a rebindTrond Myklebust
If connect() is returning ECONNRESET, it usually means that nothing is listening on that port. If so, a rebind might be required in order to obtain the new port on which the RPC service is listening. Fixes: fd01b2597941 ("SUNRPC: ECONNREFUSED should cause a rebind.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-09-27Revert "SUNRPC dont update timeout value on connection reset"Trond Myklebust
This reverts commit 88428cc4ae7abcc879295fbb19373dd76aad2bdd. The problem this commit is intended to fix was comprehensively fixed in commit 7de62bc09fe6 ("SUNRPC dont update timeout value on connection reset"). Since then, this commit has been preventing the correct timeout of soft mounted requests. Cc: stable@vger.kernel.org # 5.9.x: 09252177d5f9: SUNRPC: Handle major timeout in xprt_adjust_timeout() Cc: stable@vger.kernel.org # 5.9.x: 7de62bc09fe6: SUNRPC dont update timeout value on connection reset Cc: stable@vger.kernel.org # 5.9.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-27SUNRPC: Fail quickly when server does not recognize TLSChuck Lever
rpcauth_checkverf() should return a distinct error code when a server recognizes the AUTH_TLS probe but does not support TLS so that the client's header decoder can respond appropriately and quickly. No retries are necessary is in this case, since the server has already affirmatively answered "TLS is unsupported". Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13NFSv4.1: fix pnfs MDS=DS session trunkingOlga Kornievskaia
Currently, when GETDEVICEINFO returns multiple locations where each is a different IP but the server's identity is same as MDS, then nfs4_set_ds_client() finds the existing nfs_client structure which has the MDS's max_connect value (and if it's 1), then the 1st IP on the DS's list will get dropped due to MDS trunking rules. Other IPs would be added as they fall under the pnfs trunking rules. For the list of IPs the 1st goes thru calling nfs4_set_ds_client() which will eventually call nfs4_add_trunk() and call into rpc_clnt_test_and_add_xprt() which has the check for MDS trunking. The other IPs (after the 1st one), would call rpc_clnt_add_xprt() which doesn't go thru that check. nfs4_add_trunk() is called when MDS trunking is happening and it needs to enforce the usage of max_connect mount option of the 1st mount. However, this shouldn't be applied to pnfs flow. Instead, this patch proposed to treat MDS=DS as DS trunking and make sure that MDS's max_connect limit does not apply to the 1st IP returned in the GETDEVICEINFO list. It does so by marking the newly created client with a new flag NFS_CS_PNFS which then used to pass max_connect value to use into the rpc_clnt_test_and_add_xprt() instead of the existing rpc client's max_connect value set by the MDS connection. For example, mount was done without max_connect value set so MDS's rpc client has cl_max_connect=1. Upon calling into rpc_clnt_test_and_add_xprt() and using rpc client's value, the caller passes in max_connect value which is previously been set in the pnfs path (as a part of handling GETDEVICEINFO list of IPs) in nfs4_set_ds_client(). However, when NFS_CS_PNFS flag is not set and we know we are doing MDS trunking, comparing a new IP of the same server, we then set the max_connect value to the existing MDS's value and pass that into rpc_clnt_test_and_add_xprt(). Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13Revert "SUNRPC: Fail faster on bad verifier"Trond Myklebust
This reverts commit 0701214cd6e66585a999b132eb72ae0489beb724. The premise of this commit was incorrect. There are exactly 2 cases where rpcauth_checkverf() will return an error: 1) If there was an XDR decode problem (i.e. garbage data). 2) If gss_validate() had a problem verifying the RPCSEC_GSS MIC. In the second case, there are again 2 subcases: a) The GSS context expires, in which case gss_validate() will force a new context negotiation on retry by invalidating the cred. b) The sequence number check failed because an RPC call timed out, and the client retransmitted the request using a new sequence number, as required by RFC2203. In neither subcase is this a fatal error. Reported-by: Russell Cattelan <cattelan@thebarn.com> Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-09-13SUNRPC: Mark the cred for revalidation if the server rejects itTrond Myklebust
If the server rejects the credential as being stale, or bad, then we should mark it for revalidation before retransmitting. Fixes: 7f5667a5f8c4 ("SUNRPC: Clean up rpc_verify_header()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-24SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt()Trond Myklebust
If the caller specifies the connect timeouts in the arguments to rpc_clnt_add_xprt(), then we shouldn't override them. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-24SUNRPC: Allow specification of TCP client connect timeout at setupTrond Myklebust
When we create a TCP transport, the connect timeout parameters are currently fixed to be 90s. This is problematic in the pNFS flexfiles case, where we may have multiple mirrors, and we would like to fail over quickly to the next mirror if a data server is down. This patch adds the ability to specify the connection parameters at RPC client creation time. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-23SUNRPC: kmap() the xdr pages during decodeAnna Schumaker
If the pages are in HIGHMEM then we need to make sure they're mapped before trying to read data off of them, otherwise we could end up with a NULL pointer dereference. The downside to this is that we need an extra cleanup step at the end of decode to kunmap() the last page. I introduced an xdr_finish_decode() function to do this. Right now this function only calls the unmap_current_page() function, but other generic cleanup steps could be added in the future if we come across anything else. Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-06-19NFS: add sysfs shutdown knobBenjamin Coddington
Within each nfs_server sysfs tree, add an entry named "shutdown". Writing 1 to this file will set the cl_shutdown bit on the rpc_clnt structs associated with that mount. If cl_shutdown is set, the task scheduler immediately returns -EIO for new tasks. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Add RPC client support for the RPC_AUTH_TLS auth flavorChuck Lever
The new authentication flavor is used only to discover peer support for RPC-over-TLS. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Trace the rpc_create_argsChuck Lever
Pass the upper layer's rpc_create_args to the rpc_clnt_new() tracepoint so additional parts of the upper layer's request can be recorded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Plumb an API for setting transport layer securityChuck Lever
Add an initial set of policies along with fields for upper layers to pass the requested policy down to the transport layer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: support abstract unix socket addressesNeilBrown
An "abtract" address for an AF_UNIX socket start with a nul and can contain any bytes for the given length, but traditionally doesn't contain other nuls. When reported, the leading nul is replaced by '@'. sunrpc currently rejects connections to these addresses and reports them as an empty string. To provide support for future use of these addresses, allow them for outgoing connections and report them more usefully. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-04-19SUNRPC: remove the maximum number of retries in call_bind_statusDai Ngo
Currently call_bind_status places a hard limit of 3 to the number of retries on EACCES error. This limit was done to prevent NLM unlock requests from being hang forever when the server keeps returning garbage. However this change causes problem for cases when NLM service takes longer than 9 seconds to register with the port mapper after a restart. This patch removes this hard coded limit and let the RPC handles the retry based on the standard hard/soft task semantics. Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests") Reported-by: Helen Chao <helen.chao@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-15NFS: fix disabling of swapNeilBrown
When swap is activated to a file on an NFSv4 mount we arrange that the state manager thread is always present as starting a new thread requires memory allocations that might block waiting for swap. Unfortunately the code for allowing the state manager thread to exit when swap is disabled was not tested properly and does not work. This can be seen by examining /proc/fs/nfsfs/servers after disabling swap and unmounting the filesystem. The servers file will still list one entry. Also a "ps" listing will show the state manager thread is still present. There are two problems. 1/ rpc_clnt_swap_deactivate() doesn't walk up the ->cl_parent list to find the primary client on which the state manager runs. 2/ The thread is not woken up properly and it immediately goes back to sleep without checking whether it is really needed. Using nfs4_schedule_state_manager() ensures a proper wake-up. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 4dc73c679114 ("NFSv4: keep state manager thread active if swap is enabled") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-12-06SUNRPC: Fix missing release socket in rpc_sockname()Wang ShaoBo
socket dynamically created is not released when getting an unintended address family type in rpc_sockname(), direct to out_release for calling sock_release(). Fixes: 2e738fdce22f ("SUNRPC: Add API to acquire source address") Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-10-06SUNRPC: Add API to force the client to disconnectTrond Myklebust
Allow the caller to force a disconnection of the RPC client so that we can clear any pending requests that are buffered in the socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC callsTrond Myklebust
Add the helper rpc_cancel_tasks(), which uses a caller-defined selection function to define a set of in-flight RPC calls to cancel. This is mainly intended for pNFS drivers which are subject to a layout recall, and which may therefore want to cancel all pending I/O using that layout in order to redrive it after the layout recall has been satisfied. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06SUNRPC: Fix races with rpc_killall_tasks()Trond Myklebust
Ensure that we immediately call rpc_exit_task() after waking up, and that the tk_rpc_status cannot get clobbered by some other function. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-03SUNRPC: Directly use ida_alloc()/free()Bo Liu
Use ida_alloc()/ida_free() instead of ida_simple_get()/ida_simple_remove(). The latter is deprecated and more verbose. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-09-08Revert "SUNRPC: Remove unreachable error condition"Dan Aloni
This reverts commit efe57fd58e1cb77f9186152ee12a8aa4ae3348e0. The assumption that it is impossible to return an ERR pointer from rpc_run_task() no longer holds due to commit 25cf32ad5dba ("SUNRPC: Handle allocation failure in rpc_new_task()"). Fixes: 25cf32ad5dba ('SUNRPC: Handle allocation failure in rpc_new_task()') Fixes: efe57fd58e1c ('SUNRPC: Remove unreachable error condition') Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-08-19SUNRPC: RPC level errors should set task->tk_rpc_statusTrond Myklebust
Fix up a case in call_encode() where we're failing to set task->tk_rpc_status when an RPC level error occurred. Fixes: 9c5948c24869 ("SUNRPC: task should be exit if encode return EKEYEXPIRED more times") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-27SUNRPC: Don't reuse bvec on retransmission of the requestTrond Myklebust
If a request is re-encoded and then retransmitted, we need to make sure that we also re-encode the bvec, in case the page lists have changed. Fixes: ff053dbbaffe ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC create a function that probes only offline transportsOlga Kornievskaia
For only offline transports, attempt to check connectivity via a NULL call and, if that succeeds, call a provided session trunking detection function. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC restructure rpc_clnt_setup_test_and_add_xprtOlga Kornievskaia
In preparation for code re-use, pull out the part of the rpc_clnt_setup_test_and_add_xprt() portion that sends a NULL rpc and then calls a session trunking function into a helper function. Re-organize the end of the function for code re-use. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC create an rpc function that allows xprt removal from rpc_clntOlga Kornievskaia
Expose a function that allows a removal of xprt from the rpc_clnt. When called from NFS that's running a trunked transport then don't decrement the active transport counter. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC enable back offline transports in trunking discoveryOlga Kornievskaia
When we are adding a transport to a xprt_switch that's already on the list but has been marked OFFLINE, then make the state ONLINE since it's been tested now. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC create an iterator to list only OFFLINE xprtsOlga Kornievskaia
Create a new iterator helper that will go thru the all the transports in the switch and return transports that are marked OFFLINE. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC add function to offline remove trunkable transportsOlga Kornievskaia
Iterate thru available transports in the xprt_switch for all trunkable transports offline and possibly remote them as well. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>