summaryrefslogtreecommitdiff
path: root/fs/nfsd/state.h
AgeCommit message (Collapse)Author
2025-01-21nfsd: add support for delegated timestampsJeff Layton
Add support for the delegated timestamps on write delegations. This allows the server to proxy timestamps from the delegation holder to other clients that are doing GETATTRs vs. the same inode. When OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS bit is set in the OPEN call, set the dl_type to the *_ATTRS_DELEG flavor of delegation. Add timespec64 fields to nfs4_cb_fattr and decode the timestamps into those. Vet those timestamps according to the delstid spec and update the inode attrs if necessary. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegationsJeff Layton
Add some preparatory code to various functions that handle delegation types to allow them to handle the OPEN_DELEGATE_*_ATTRS_DELEG constants. Add helpers for detecting whether it's a read or write deleg, and whether the attributes are delegated. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: add shrinker to reduce number of slots allocated per sessionNeilBrown
Add a shrinker which frees unused slots and may ask the clients to use fewer slots on each session. We keep a global count of the number of freeable slots, which is the sum of one less than the current "target" slots in all sessions in all clients in all net-namespaces. This number is reported by the shrinker. When the shrinker is asked to free some, we call xxx on each session in a round-robin asking each to reduce the slot count by 1. This will reduce the "target" so the number reported by the shrinker will reduce immediately. The memory will only be freed later when the client confirmed that it is no longer needed. We use a global list of sessions and move the "head" to after the last session that we asked to reduce, so the next callback from the shrinker will move on to the next session. This pressure should be applied "evenly" across all sessions over time. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: add support for freeing unused session-DRC slotsNeilBrown
Reducing the number of slots in the session slot table requires confirmation from the client. This patch adds reduce_session_slots() which starts the process of getting confirmation, but never calls it. That will come in a later patch. Before we can free a slot we need to confirm that the client won't try to use it again. This involves returning a lower cr_maxrequests in a SEQUENCE reply and then seeing a ca_maxrequests on the same slot which is not larger than we limit we are trying to impose. So for each slot we need to remember that we have sent a reduced cr_maxrequests. To achieve this we introduce a concept of request "generations". Each time we decide to reduce cr_maxrequests we increment the generation number, and record this when we return the lower cr_maxrequests to the client. When a slot with the current generation reports a low ca_maxrequests, we commit to that level and free extra slots. We use an 16 bit generation number (64 seems wasteful) and if it cycles we iterate all slots and reset the generation number to avoid false matches. When we free a slot we store the seqid in the slot pointer so that it can be restored when we reactivate the slot. The RFC can be read as suggesting that the slot number could restart from one after a slot is retired and reactivated, but also suggests that retiring slots is not required. So when we reactive a slot we accept with the next seqid in sequence, or 1. When decoding sa_highest_slotid into maxslots we need to add 1 - this matches how it is encoded for the reply. se_dead is moved in struct nfsd4_session to remove a hole. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: use an xarray to store v4.1 session slotsNeilBrown
Using an xarray to store session slots will make it easier to change the number of active slots based on demand, and removes an unnecessary limit. To achieve good throughput with a high-latency server it can be helpful to have hundreds of concurrent writes, which means hundreds of slots. So increase the limit to 2048 (twice what the Linux client will currently use). This limit is only a sanity check, not a hard limit. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: use new wake_up_var interfaces.NeilBrown
The wake_up_var interface is fragile as barriers are sometimes needed. There are now new interfaces so that most wake-ups can use an interface that is guaranteed to have all barriers needed. This patch changes the wake up on cl_cb_inflight to use atomic_dec_and_wake_up(). It also changes the wake up on rp_locked to use store_release_wake_up(). This involves changing rp_locked from atomic_t to int. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18nfsd: allow for up to 32 callback session slotsJeff Layton
nfsd currently only uses a single slot in the callback channel, which is proving to be a bottleneck in some cases. Widen the callback channel to a max of 32 slots (subject to the client's target_maxreqs value). Change the cb_holds_slot boolean to an integer that tracks the current slot number (with -1 meaning "unassigned"). Move the callback slot tracking info into the session. Add a new u32 that acts as a bitmap to track which slots are in use, and a u32 to track the latest callback target_slotid that the client reports. To protect the new fields, add a new per-session spinlock (the se_lock). Fix nfsd41_cb_get_slot to always search for the lowest slotid (using ffs()). Finally, convert the session->se_cb_seq_nr field into an array of ints and add the necessary handling to ensure that the seqids get reset when the slot table grows after shrinking. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Add nfsd4_copy time-to-liveChuck Lever
Keep async copy state alive for a few lease cycles after the copy completes so that OFFLOAD_STATUS returns something meaningful. This means that NFSD's client shutdown processing needs to purge any of this state that happens to be waiting to die. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Add a laundromat reaper for async copy stateChuck Lever
RFC 7862 Section 4.8 states: > A copy offload stateid will be valid until either (A) the client > or server restarts or (B) the client returns the resource by > issuing an OFFLOAD_CANCEL operation or the client replies to a > CB_OFFLOAD operation. Instead of releasing async copy state when the CB_OFFLOAD callback completes, now let it live until the next laundromat run after the callback completes. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operationsChuck Lever
Currently __destroy_client() consults the nfs4_client's async_copies list to determine whether there are ongoing async COPY operations. However, NFSD now keeps copy state in that list even when the async copy has completed, to enable OFFLOAD_STATUS to find the COPY results for a while after the COPY has completed. DESTROY_CLIENTID should not be blocked if the client's async_copies list contains state for only completed copy operations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18nfsd: make nfsd4_session->se_flags a boolJeff Layton
While this holds the flags from the CREATE_SESSION request, nothing ever consults them. The only flag used is NFS4_SESSION_DEAD. Make it a simple bool instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18nfsd: remove nfsd4_session->se_bchannelJeff Layton
This field is written and is never consulted again. Remove it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-11nfsd: have nfsd4_deleg_getattr_conflict pass back write deleg pointerJeff Layton
Currently we pass back the size and whether it has been modified, but those just mirror values tracked inside the delegation. In a later patch, we'll need to get at the timestamps in the delegation too, so just pass back a reference to the write delegation, and use that to properly override values in the iattr. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-11nfsd: drop the ncf_cb_bmap fieldJeff Layton
This is always the same value, and in a later patch we're going to need to set bits in WORD2. We can simplify this code and save a little space in the delegation too. Just hardcode the bitmap in the callback encode function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-18nfsd: fix race between laundromat and free_stateidOlga Kornievskaia
There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: track the main opcode for callbacksJeff Layton
Keep track of the "main" opcode for the callback, and display it in the tracepoint. This makes it simpler to discern what's happening when there is more than one callback in flight. The one special case is the CB_NULL RPC. That's not a CB_COMPOUND opcode, so designate the value 0 for that. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-08-26fs/nfsd: fix update of inode attrs in CB_GETATTRJeff Layton
Currently, we copy the mtime and ctime to the in-core inode and then mark the inode dirty. This is fine for certain types of filesystems, but not all. Some require a real setattr to properly change these values (e.g. ceph or reexported NFS). Fix this code to call notify_change() instead, which is the proper way to effect a setattr. There is one problem though: In this case, the client is holding a write delegation and has sent us attributes to update our cache. We don't want to break the delegation for this since that would defeat the purpose. Add a new ATTR_DELEG flag that makes notify_change bypass the try_break_deleg call. Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: Move callback_wq into struct nfs4_clientChuck Lever
Commit 883820366747 ("nfsd: update workqueue creation") made the callback_wq single-threaded, presumably to protect modifications of cl_cb_client. See documenting comment for nfsd4_process_cb_update(). However, cl_cb_client is per-lease. There's no other reason that all callback operations need to be dispatched via a single thread. The single threading here means all client callbacks can be blocked by a problem with one client. Change the NFSv4 callback client so it serializes per-lease instead of serializing all NFSv4 callback operations on the server. Reported-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()NeilBrown
move_to_close_lru() waits for sc_count to become zero while holding rp_mutex. This can deadlock if another thread holds a reference and is waiting for rp_mutex. By the time we get to move_to_close_lru() the openowner is unhashed and cannot be found any more. So code waiting for the mutex can safely retry the lookup if move_to_close_lru() has started. So change rp_mutex to an atomic_t with three states: RP_UNLOCK - state is still hashed, not locked for reply RP_LOCKED - state is still hashed, is locked for reply RP_UNHASHED - state is not hashed, no code can get a lock. Use wait_var_event() to wait for either a lock, or for the owner to be unhashed. In the latter case, retry the lookup. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-23Revert "NFSD: Convert the callback workqueue to use delayed_work"Chuck Lever
This commit was a pre-requisite for commit c1ccfcf1a9bf ("NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"), which has already been reverted. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: handle GETATTR conflict with write delegationDai Ngo
If the GETATTR request on a file that has write delegation in effect and the request attributes include the change info and size attribute then the request is handled as below: Server sends CB_GETATTR to client to get the latest change info and file size. If these values are the same as the server's cached values then the GETATTR proceeds as normal. If either the change info or file size is different from the server's cached values, or the file was already marked as modified, then: . update time_modify and time_metadata into file's metadata with current time . encode GETATTR as normal except the file size is encoded with the value returned from CB_GETATTR . mark the file as modified If the CB_GETATTR fails for any reasons, the delegation is recalled and NFS4ERR_DELAY is returned for the GETATTR. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: add support for CB_GETATTR callbackDai Ngo
Includes: . CB_GETATTR proc for nfs4_cb_procedures[] . XDR encoding and decoding function for CB_GETATTR request/reply . add nfs4_cb_fattr to nfs4_delegation for sending CB_GETATTR and store file attributes from client's reply. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: clean up comments over nfs4_client definitionChen Hanxiao
nfsd fault injection has been deprecated since commit 9d60d93198c6 ("Deprecate nfsd fault injection") and removed by commit e56dc9e2949e ("nfsd: remove fault injection code") So remove the outdated parts about fault injection. Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: prepare for supporting admin-revocation of stateNeilBrown
The NFSv4 protocol allows state to be revoked by the admin and has error codes which allow this to be communicated to the client. This patch - introduces a new state-id status SC_STATUS_ADMIN_REVOKED which can be set on open, lock, or delegation state. - reports NFS4ERR_ADMIN_REVOKED when these are accessed - introduces a per-client counter of these states and returns SEQ4_STATUS_ADMIN_STATE_REVOKED when the counter is not zero. Decrements this when freeing any admin-revoked state. - introduces stub code to find all interesting states for a given superblock so they can be revoked via the 'unlock_filesystem' file in /proc/fs/nfsd/ No actual states are handled yet. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: split sc_status out of sc_typeNeilBrown
sc_type identifies the type of a state - open, lock, deleg, layout - and also the status of a state - closed or revoked. This is a bit untidy and could get worse when "admin-revoked" states are added. So clean it up. With this patch, the type is now all that is stored in sc_type. This is zero when the state is first added to ->cl_stateids (causing it to be ignored), and is then set appropriately once it is fully initialised. It is set under ->cl_lock to ensure atomicity w.r.t lookup. It is now never cleared. sc_type is still a bit-set even though at most one bit is set. This allows lookup functions to be given a bitmap of acceptable types. sc_type is now an unsigned short rather than char. There is no value in restricting to just 8 bits. All the constants now start SC_TYPE_ matching the field in which they are stored. Keeping the existing names and ensuring clear separation from non-type flags would have required something like NFS4_STID_TYPE_CLOSED which is cumbersome. The "NFS4" prefix is redundant was they only appear in NFS4 code, so remove that and change STID to SC to match the field. The status is stored in a separate unsigned short named "sc_status". It has two flags: SC_STATUS_CLOSED and SC_STATUS_REVOKED. CLOSED combines NFS4_CLOSED_STID, NFS4_CLOSED_DELEG_STID, and is used for SC_TYPE_LOCK and SC_TYPE_LAYOUT instead of setting the sc_type to zero. These flags are only ever set, never cleared. For deleg stateids they are set under the global state_lock. For open and lock stateids they are set under ->cl_lock. For layout stateids they are set under ->ls_lock nfs4_unhash_stid() has been removed, and we never set sc_type = 0. This was only used for LOCK and LAYOUT stids and they now use SC_STATUS_CLOSED. Also TRACE_DEFINE_NUM() calls for the various STID #define have been removed because these things are not enums, and so that call is incorrect. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Convert the callback workqueue to use delayed_workChuck Lever
Normally, NFSv4 callback operations are supposed to be sent to the client as soon as they are queued up. In a moment, I will introduce a recovery path where the server has to wait for the client to reconnect. We don't want a hard busy wait here -- the callback should be requeued to try again in several milliseconds. For now, convert nfsd4_callback from struct work_struct to struct delayed_work, and queue with a zero delay argument. This should avoid behavior changes for current operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-18NFSD: Revert 738401a9bd1ac34ccd5723d69640a4adbb1a4bc0Chuck Lever
There's nothing wrong with this commit, but this is dead code now that nothing triggers a CB_GETATTR callback. It can be re-introduced once the issues with handling conflicting GETATTRs are resolved. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-18NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500dChuck Lever
For some reason, the wait_on_bit() in nfsd4_deleg_getattr_conflict() is waiting forever, preventing a clean server shutdown. The requesting client might also hang waiting for a reply to the conflicting GETATTR. Invoking wait_on_bit() in an nfsd thread context is a hazard. The correct fix is to replace this wait_on_bit() call site with a mechanism that defers the conflicting GETATTR until the CB_GETATTR completes or is known to have failed. That will require some surgery and extended testing and it's late in the v6.7-rc cycle, so I'm reverting now in favor of trying again in a subsequent kernel release. This is my fault: I should have recognized the ramifications of calling wait_on_bit() in here before accepting this patch. Thanks to Dai Ngo <dai.ngo@oracle.com> for diagnosing the issue. Reported-by: Wolfgang Walter <linux-nfs@stwm.de> Closes: https://lore.kernel.org/linux-nfs/e3d43ecdad554fbdcaa7181833834f78@stwm.de/ Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16NFSD: add rpc_status netlink supportLorenzo Bianconi
Introduce rpc_status netlink support for NFSD in order to dump pending RPC requests debugging information from userspace. Closes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=366 Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16NFSD: handle GETATTR conflict with write delegationDai Ngo
If the GETATTR request on a file that has write delegation in effect and the request attributes include the change info and size attribute then the request is handled as below: Server sends CB_GETATTR to client to get the latest change info and file size. If these values are the same as the server's cached values then the GETATTR proceeds as normal. If either the change info or file size is different from the server's cached values, or the file was already marked as modified, then: . update time_modify and time_metadata into file's metadata with current time . encode GETATTR as normal except the file size is encoded with the value returned from CB_GETATTR . mark the file as modified If the CB_GETATTR fails for any reasons, the delegation is recalled and NFS4ERR_DELAY is returned for the GETATTR. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16NFSD: add support for CB_GETATTR callbackDai Ngo
Includes: . CB_GETATTR proc for nfs4_cb_procedures[] . XDR encoding and decoding function for CB_GETATTR request/reply . add nfs4_cb_fattr to nfs4_delegation for sending CB_GETATTR and store file attributes from client's reply. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29NFSD: handle GETATTR conflict with write delegationDai Ngo
If the GETATTR request on a file that has write delegation in effect and the request attributes include the change info and size attribute then the write delegation is recalled. If the delegation is returned within 30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY error is returned for the GETATTR. Add counter for write delegation recall due to conflict GETATTR. This is used to evaluate the need to implement CB_GETATTR to adoid recalling the delegation with conflit GETATTR. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUSJeff Layton
We're not doing any blocking operations for OP_OFFLOAD_STATUS, so taking and putting a reference is a waste of effort. Take the client lock, search for the copy and fetch the wr_bytes_written field and return. Also, make find_async_copy a static function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: add delegation reaper to react to low memory conditionDai Ngo
The delegation reaper is called by nfsd memory shrinker's on the 'count' callback. It scans the client list and sends the courtesy CB_RECALL_ANY to the clients that hold delegations. To avoid flooding the clients with CB_RECALL_ANY requests, the delegation reaper sends only one CB_RECALL_ANY request to each client per 5 seconds. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-12-10NFSD: add support for sending CB_RECALL_ANYDai Ngo
Add XDR encode and decode function for CB_RECALL_ANY. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28NFSD: Use rhashtable for managing nfs4_file objectsChuck Lever
fh_match() is costly, especially when filehandles are large (as is the case for NFSv4). It needs to be used sparingly when searching data structures. Unfortunately, with common workloads, I see multiple thousands of objects stored in file_hashtbl[], which has just 256 buckets, making its bucket hash chains quite lengthy. Walking long hash chains with the state_lock held blocks other activity that needs that lock. Sizable hash chains are a common occurrance once the server has handed out some delegations, for example -- IIUC, each delegated file is held open on the server by an nfs4_file object. To help mitigate the cost of searching with fh_match(), replace the nfs4_file hash table with an rhashtable, which can dynamically resize its bucket array to minimize hash chain length. The result of this modification is an improvement in the latency of NFSv4 operations, and the reduction of nfsd CPU utilization due to eliminating the cost of multiple calls to fh_match() and reducing the CPU cache misses incurred while walking long hash chains in the nfs4_file hash table. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26nfsd: make nfsd4_run_cb a bool return functionJeff Layton
queue_work can return false and not queue anything, if the work is already queued. If that happens in the case of a CB_RECALL, we'll have taken an extra reference to the stid that will never be put. Ensure we throw a warning in that case. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26NFSD: Rename the fields in copy_stateid_tChuck Lever
Code maintenance: The name of the copy_stateid_t::sc_count field collides with the sc_count field in struct nfs4_stid, making the latter difficult to grep for when auditing stateid reference counting. No behavior change expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26nfsd: remove nfsd4_prepare_cb_recall() declarationGaosheng Cui
nfsd4_prepare_cb_recall() has been removed since commit 0162ac2b978e ("nfsd: introduce nfsd4_callback_ops"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26NFSD: Increase NFSD_MAX_OPS_PER_COMPOUNDChuck Lever
When attempting an NFSv4 mount, a Solaris NFSv4 client builds a single large COMPOUND that chains a series of LOOKUPs to get to the pseudo filesystem root directory that is to be mounted. The Linux NFS server's current maximum of 16 operations per NFSv4 COMPOUND is not large enough to ensure that this works for paths that are more than a few components deep. Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not trigger any re-allocation of the operation array on the server), increasing this maximum should result in little to no impact. The ops array can get large now, so allocate it via vmalloc() to help ensure memory fragmentation won't cause an allocation failure. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29NFSD: Make nfs4_put_copy() staticChuck Lever
Clean up: All call sites are in fs/nfsd/nfs4proc.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19NFSD: add courteous server support for thread with only delegationDai Ngo
This patch provides courteous server support for delegation only. Only expired client with delegation but no conflict and no open or lock state is allowed to be in COURTESY state. Delegation conflict with COURTESY/EXPIRABLE client is resolved by setting it to EXPIRABLE, queue work for the laundromat and return delay to the caller. Conflict is resolved when the laudromat runs and expires the EXIRABLE client while the NFS client retries the OPEN request. Local thread request that gets conflict is doing the retry in _break_lease. Client in COURTESY or EXPIRABLE state is allowed to reconnect and continues to have access to its state. Access to the nfs4_client by the reconnecting thread and the laundromat is serialized via the client_lock. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08nfsd4: add refcount for nfsd4_blocked_lockVasily Averin
nbl allocated in nfsd4_lock can be released by a several ways: directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs command RELEASE_LOCKOWNER or via nfsd4_callback. This structure should be refcounted to be used and released correctly in all these cases. Refcount is initialized to 1 during allocation and is incremented when nbl is added into nbl_list/nbl_lru lists. Usually nbl is linked into both lists together, so only one refcount is used for both lists. However nfsd4_lock() should keep in mind that nbl can be present in one of lists only. This can happen if nbl was handled already by nfs4_laundromat/nfsd4_callback/etc. Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED, because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc. Refcount is not changed in find_blocked_lock() because of it reuses counter released after removing nbl from lists. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08nfsd: improve stateid access bitmask documentationJ. Bruce Fields
The use of the bitmaps is confusing. Add a cross-reference to make it easier to find the existing comment. Add an updated reference with URL to make it quicker to look up. And a bit more editorializing about the value of this. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-04-19nfsd: track filehandle aliasing in nfs4_filesJ. Bruce Fields
It's unusual but possible for multiple filehandles to point to the same file. In that case, we may end up with multiple nfs4_files referencing the same inode. For delegation purposes it will turn out to be useful to flag those cases. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-04-19nfsd: hash nfs4_files by inode numberJ. Bruce Fields
The nfs4_file structure is per-filehandle, not per-inode, because the spec requires open and other state to be per filehandle. But it will turn out to be convenient for nfs4_files associated with the same inode to be hashed to the same bucket, so let's hash on the inode instead of the filehandle. Filehandle aliasing is rare, so that shouldn't have much performance impact. (If you have a ton of exported filesystems, though, and all of them have a root with inode number 2, could that get you an overlong hash chain? Perhaps this (and the v4 open file cache) should be hashed on the inode pointer instead.) Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22nfsd: report client confirmation status in "info" fileNeilBrown
mountd can now monitor clients appearing and disappearing in /proc/fs/nfsd/clients, and will log these events, in liu of the logging of mount/unmount events for NFSv3. Currently it cannot distinguish between unconfirmed clients (which might be transient and totally uninteresting) and confirmed clients. So add a "status: " line which reports either "confirmed" or "unconfirmed", and use fsnotify to report that the info file has been modified. This requires a bit of infrastructure to keep the dentry for the "info" file. There is no need to take a counted reference as the dentry must remain around until the client is removed. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-28nfsd: simplify nfsd4_check_open_reclaimJ. Bruce Fields
The set_client() was already taken care of by process_open1(). The comments here are mostly redundant with the code. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-09-25nfsd: remove fault injection codeJ. Bruce Fields
It was an interesting idea but nobody seems to be using it, it's buggy at this point, and nfs4state.c is already complicated enough without it. The new nfsd/clients/ code provides some of the same functionality, and could probably do more if desired. This feature has been deprecated since 9d60d93198c6 ("Deprecate nfsd fault injection"). Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-05-20NFSD: Add tracepoints to the NFSD state management codeChuck Lever
Capture obvious events and replace dprintk() call sites. Introduce infrastructure so that adding more tracepoints in this code later is simplified. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>