linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-09-20	NFSD: Clean up extra whitespace in trace_nfsd_copy_done	Chuck Lever
	Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	NFSD: Record the callback stateid in copy tracepoints	Chuck Lever
	Match COPY operations up with CB_OFFLOAD operations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	NFSD: Display copy stateids with conventional print formatting	Chuck Lever
	Make it easier to grep for s2s COPY stateids in trace logs: Use the same display format in nfsd_copy_class as is used to display other stateids. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	NFSD: Limit the number of concurrent async COPY operations	Chuck Lever
	Nothing appears to limit the number of concurrent async COPY operations that clients can start. In addition, AFAICT each async COPY can copy an unlimited number of 4MB chunks, so can run for a long time. Thus IMO async COPY can become a DoS vector. Add a restriction mechanism that bounds the number of concurrent background COPY operations. Start simple and try to be fair -- this patch implements a per-namespace limit. An async COPY request that occurs while this limit is exceeded gets NFS4ERR_DELAY. The requesting client can choose to send the request again after a delay or fall back to a traditional read/write style copy. If there is need to make the mechanism more sophisticated, we can visit that in future patches. Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	NFSD: Async COPY result needs to return a write verifier	Chuck Lever
	Currently, when NFSD handles an asynchronous COPY, it returns a zero write verifier, relying on the subsequent CB_OFFLOAD callback to pass the write verifier and a stable_how4 value to the client. However, if the CB_OFFLOAD never arrives at the client (for example, if a network partition occurs just as the server sends the CB_OFFLOAD operation), the client will never receive this verifier. Thus, if the client sends a follow-up COMMIT, there is no way for the client to assess the COMMIT result. The usual recovery for a missing CB_OFFLOAD is for the client to send an OFFLOAD_STATUS operation, but that operation does not carry a write verifier in its result. Neither does it carry a stable_how4 value, so the client /must/ send a COMMIT in this case -- which will always fail because currently there's still no write verifier in the COPY result. Thus the server needs to return a normal write verifier in its COPY result even if the COPY operation is to be performed asynchronously. If the server recognizes the callback stateid in subsequent OFFLOAD_STATUS operations, then obviously it has not restarted, and the write verifier the client received in the COPY result is still valid and can be used to assess a COMMIT of the copied data, if one is needed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: avoid races with wake_up_var()	NeilBrown
	wake_up_var() needs a barrier after the important change is made in the var and before wake_up_var() is called, else it is possible that a wake up won't be sent when it should. In each case here the var is changed in an "atomic" manner, so smb_mb__after_atomic() is sufficient. In one case the important change (removing the lease) is performed after the wake_up, which is backwards. The code survives in part because the wait_var_event is given a timeout. This patch adds the required barriers and calls destroy_delegation() before waking any threads waiting for the delegation to be destroyed. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: use clear_and_wake_up_bit()	NeilBrown
	nfsd has two places that open-code clear_and_wake_up_bit(). One has the required memory barriers. The other does not. Change both to use clear_and_wake_up_bit() so we have the barriers without the noise. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	sunrpc: xprtrdma: Use ERR_CAST() to return	Yan Zhen
	Using ERR_CAST() is more reasonable and safer, When it is necessary to convert the type of an error pointer and return it. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()	Thorsten Blum
	Add the __counted_by compiler attribute to the flexible array member volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Use struct_size() instead of manually calculating the number of bytes to allocate for a pnfs_block_deviceaddr with a single volume. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: call cache_put if xdr_reserve_space returns NULL	Guoqing Jiang
	If not enough buffer space available, but idmap_lookup has triggered lookup_fn which calls cache_get and returns successfully. Then we missed to call cache_put here which pairs with cache_get. Fixes: ddd1ea563672 ("nfsd4: use xdr_reserve_space in attribute encoding") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviwed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: add more nfsd_cb tracepoints	Jeff Layton
	Add some tracepoints in the callback client RPC operations. Also add a tracepoint to nfsd4_cb_getattr_done. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: track the main opcode for callbacks	Jeff Layton
	Keep track of the "main" opcode for the callback, and display it in the tracepoint. This makes it simpler to discern what's happening when there is more than one callback in flight. The one special case is the CB_NULL RPC. That's not a CB_COMPOUND opcode, so designate the value 0 for that. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: add more info to WARN_ON_ONCE on failed callbacks	Jeff Layton
	Currently, you get the warning and stack trace, but nothing is printed about the relevant error codes. Add that in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: fix some spelling errors in comments	Li Lingfeng
	Fix spelling errors in comments of nfsd4_release_lockowner and nfs4_set_delegation. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: remove unused parameter of nfsd_file_mark_find_or_create	Li Lingfeng
	Commit 427f5f83a319 ("NFSD: Ensure nf_inode is never dereferenced") passes inode directly to nfsd_file_mark_find_or_create instead of getting it from nf, so there is no need to pass nf. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: use LIST_HEAD() to simplify code	Hongbo Li
	list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: map the EBADMSG to nfserr_io to avoid warning	Li Lingfeng
	Ext4 will throw -EBADMSG through ext4_readdir when a checksum error occurs, resulting in the following WARNING. Fix it by mapping EBADMSG to nfserr_io. nfsd_buffered_readdir iterate_dir // -EBADMSG -74 ext4_readdir // .iterate_shared ext4_dx_readdir ext4_htree_fill_tree htree_dirblock_to_tree ext4_read_dirblock __ext4_read_dirblock ext4_dirblock_csum_verify warn_no_space_for_csum __warn_no_space_for_csum return ERR_PTR(-EFSBADCRC) // -EBADMSG -74 nfserrno // WARNING [ 161.115610] ------------[ cut here ]------------ [ 161.116465] nfsd: non-standard errno: -74 [ 161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0 [ 161.118596] Modules linked in: [ 161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138 [ 161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe mu.org 04/01/2014 [ 161.123601] RIP: 0010:nfserrno+0x9d/0xd0 [ 161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6 05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33 [ 161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286 [ 161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a [ 161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827 [ 161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021 [ 161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8 [ 161.135244] FS: 0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000 [ 161.136695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0 [ 161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 161.141519] PKRU: 55555554 [ 161.142076] Call Trace: [ 161.142575] ? __warn+0x9b/0x140 [ 161.143229] ? nfserrno+0x9d/0xd0 [ 161.143872] ? report_bug+0x125/0x150 [ 161.144595] ? handle_bug+0x41/0x90 [ 161.145284] ? exc_invalid_op+0x14/0x70 [ 161.146009] ? asm_exc_invalid_op+0x12/0x20 [ 161.146816] ? nfserrno+0x9d/0xd0 [ 161.147487] nfsd_buffered_readdir+0x28b/0x2b0 [ 161.148333] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.149258] ? nfsd_buffered_filldir+0xf0/0xf0 [ 161.150093] ? wait_for_concurrent_writes+0x170/0x170 [ 161.151004] ? generic_file_llseek_size+0x48/0x160 [ 161.151895] nfsd_readdir+0x132/0x190 [ 161.152606] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.153516] ? nfsd_unlink+0x380/0x380 [ 161.154256] ? override_creds+0x45/0x60 [ 161.155006] nfsd4_encode_readdir+0x21a/0x3d0 [ 161.155850] ? nfsd4_encode_readlink+0x210/0x210 [ 161.156731] ? write_bytes_to_xdr_buf+0x97/0xe0 [ 161.157598] ? __write_bytes_to_xdr_buf+0xd0/0xd0 [ 161.158494] ? lock_downgrade+0x90/0x90 [ 161.159232] ? nfs4svc_decode_voidarg+0x10/0x10 [ 161.160092] nfsd4_encode_operation+0x15a/0x440 [ 161.160959] nfsd4_proc_compound+0x718/0xe90 [ 161.161818] nfsd_dispatch+0x18e/0x2c0 [ 161.162586] svc_process_common+0x786/0xc50 [ 161.163403] ? nfsd_svc+0x380/0x380 [ 161.164137] ? svc_printk+0x160/0x160 [ 161.164846] ? svc_xprt_do_enqueue.part.0+0x365/0x380 [ 161.165808] ? nfsd_svc+0x380/0x380 [ 161.166523] ? rcu_is_watching+0x23/0x40 [ 161.167309] svc_process+0x1a5/0x200 [ 161.168019] nfsd+0x1f5/0x380 [ 161.168663] ? nfsd_shutdown_threads+0x260/0x260 [ 161.169554] kthread+0x1c4/0x210 [ 161.170224] ? kthread_insert_work_sanity_check+0x80/0x80 [ 161.171246] ret_from_fork+0x1f/0x30 Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	NFSD: remove redundant assignment operation	Li Lingfeng
	Commit 5826e09bf3dd ("NFSD: OP_CB_RECALL_ANY should recall both read and write delegations") added a new assignment statement to add RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the old one should be removed. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	.mailmap: Add an entry for my work email address	Chuck Lever
	Collect a few very old previous employers as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	NFSD: Fix NFSv4's PUTPUBFH operation	Chuck Lever
	According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH. Replace the XDR decoder for PUTPUBFH with a "noop" since we no longer want the minorversion check, and PUTPUBFH has no arguments to decode. (Ideally nfsd4_decode_noop should really be called nfsd4_decode_void). PUTPUBFH should now behave just like PUTROOTFH. Reported-by: Cedric Blancher <cedric.blancher@gmail.com> Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1") Cc: Dan Shelton <dan.f.shelton@gmail.com> Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: Add quotes to client info 'callback address'	Mark Grimes
	The 'callback address' in client_info_show is output without quotes causing yaml parsers to fail on processing IPv6 addresses. Adding quotes to 'callback address' also matches that used by the 'address' field. Signed-off-by: Mark Grimes <mark.grimes@ixsystems.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	svcrdma: Handle device removal outside of the CM event handler	Chuck Lever
	Synchronously wait for all disconnects to complete to ensure the transports have divested all hardware resources before the underlying RDMA device can safely be removed. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: move error choice for incorrect object types to version-specific code.	NeilBrown
	If an NFS operation expects a particular sort of object (file, dir, link, etc) but gets a file handle for a different sort of object, it must return an error. The actual error varies among NFS versions in non-trivial ways. For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only, INVAL is suitable. For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK was found when not expected. This take precedence over NOTDIR. For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in preference to EINVAL when none of the specific error codes apply. When nfsd_mode_check() finds a symlink where it expected a directory it needs to return an error code that can be converted to NOTDIR for v2 or v3 but will be SYMLINK for v4. It must be different from the error code returns when it finds a symlink but expects a regular file - that must be converted to EINVAL or SYMLINK. So we introduce an internal error code nfserr_symlink_not_dir which each version converts as appropriate. nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable error if the object is the wrong time. For v4.0 we use nfserr_symlink for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type. We handle this difference in-place in nfsd_check_obj_isreg() as there is nothing to be gained by delaying the choice to nfsd4_map_status(). As a result of these changes, nfsd_mode_check() doesn't need an rqstp arg any more. Note that NFSv4 operations are actually performed in the xdr code(!!!) so to the only place that we can map the status code successfully is in nfsd4_encode_operation(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: be more systematic about selecting error codes for internal use.	NeilBrown
	Rather than using ad hoc values for internal errors (30000, 11000, ...) use 'enum' to sequentially allocate numbers starting from the first known available number - now visible as NFS4ERR_FIRST_FREE. The goal is values that are distinct from all be32 error codes. To get those we must first select integers that are not already used, then convert them with cpu_to_be32(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: Move error code mapping to per-version proc code.	NeilBrown
	There is code scattered around nfsd which chooses an error status based on the particular version of nfs being used. It is cleaner to have the version specific choices in version specific code. With this patch common code returns the most specific error code possible and the version specific code maps that if necessary. Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()" function which is called to map the resp->status before each non-trivial nfsd_proc_* or nfsd3_proc_* function returns. NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and are left for a later patch. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: move V4ROOT version check to nfsd_set_fh_dentry()	NeilBrown
	This further centralizes version number checks. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: further centralize protocol version checks.	NeilBrown
	With this patch the only places that test ->rq_vers against a specific version are nfsd_v4client() and nfsd_set_fh_dentry(). The latter sets some flags in the svc_fh, which now includes: fh_64bit_cookies fh_use_wgather Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease()	NeilBrown
	nfsd_breaker_owns_lease() currently open-codes the same test that nfsd_v4client() performs. With this patch we use nfsd_v4client() instead. Also as i_am_nfsd() is only used in combination with kthread_data(), replace it with nfsd_current_rqst() which combines the two and returns a valid svc_rqst, or NULL. The test for NULL is moved into nfsd_v4client() for code clarity. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: Pass 'cred' instead of 'rqstp' to some functions.	NeilBrown
	nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags() only ever need the cred out of rqstp, so pass it explicitly instead of the whole rqstp. This makes the interfaces cleaner. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: Don't pass all of rqst into rqst_exp_find()	NeilBrown
	Rather than passing the whole rqst, pass the pieces that are actually needed. This makes the inputs to rqst_exp_find() more obvious. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: don't assume copy notify when preprocessing the stateid	Sagi Grimberg
	Move the stateid handling to nfsd4_copy_notify. If nfs4_preprocess_stateid_op did not produce an output stateid, error out. Copy notify specifically does not permit the use of special stateids, so enforce that outside generic stateid pre-processing. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	sunrpc: allow svc threads to fail initialisation cleanly	NeilBrown
	If an svc thread needs to perform some initialisation that might fail, it has no good way to handle the failure. Before the thread can exit it must call svc_exit_thread(), but that requires the service mutex to be held. The thread cannot simply take the mutex as that could deadlock if there is a concurrent attempt to shut down all threads (which is unlikely, but not impossible). nfsd currently call svc_exit_thread() unprotected in the unlikely event that unshare_fs_struct() fails. We can clean this up by introducing svc_thread_init_status() by which an svc thread can report whether initialisation has succeeded. If it has, it continues normally into the action loop. If it has not, svc_thread_init_status() immediately aborts the thread. svc_start_kthread() waits for either of these to happen, and calls svc_exit_thread() (under the mutex) if the thread aborted. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	sunrpc: merge svc_rqst_alloc() into svc_prepare_thread()	NeilBrown
	The only caller of svc_rqst_alloc() is svc_prepare_thread(). So merge the one into the other and simplify. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	sunrpc: don't take ->sv_lock when updating ->sv_nrthreads.	NeilBrown
	As documented in svc_xprt.c, sv_nrthreads is protected by the service mutex, and it does not need ->sv_lock. (->sv_lock is needed only for sv_permsocks, sv_tempsocks, and sv_tmpcnt). So remove the unnecessary locking. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	sunrpc: change sp_nrthreads from atomic_t to unsigned int.	NeilBrown
	sp_nrthreads is only ever accessed under the service mutex nlmsvc_mutex nfs_callback_mutex nfsd_mutex so these is no need for it to be an atomic_t. The fact that all code using it is single-threaded means that we can simplify svc_pool_victim and remove the temporary elevation of sp_nrthreads. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	sunrpc: document locking rules for svc_exit_thread()	NeilBrown
	The locking required for svc_exit_thread() is not obvious, so document it in a kdoc comment. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	nfsd: don't allocate the versions array.	NeilBrown
	Instead of using kmalloc to allocate an array for storing active version info, just declare an array to the max size - it is only 5 or so. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20	udmabuf: reuse folio array when pin folios	Huan Yang
	When invoke memfd_pin_folios, we need offer an array to save each folio which we pinned. The current way is dynamic alloc an array(use kvmalloc), get folios, save into udmabuf and then free. Depend on the size, kvmalloc can do something different: Below PAGE_SIZE, slab allocator will be used, which have good alloc performance, due to it cached page. PAGE_SIZE - PCP Order, PCP(per-cpu-pageset) also given buddy page a cache in each CPU, so different CPU no need to hold some lock(zone or some) to get the locally page. If PCP cached page, the access also fast. PAGE_SIZE - BUDDY_MAX, try to get page from buddy, due to kvmalloc adjusted the gfp flags, if zone freelist can't alloc page(fast path), we will not enter slowpath to reclaim memory. Due to need hold lock and check, may slow, but still fast than vmalloc. Anything wrong will fallback into vmalloc to alloc memory, it obtains contiguous virtual addresses by loop alloc order 0 page(PAGE_SIZE), and then map it into vmalloc area. If necessary, page alloc may enter slowpath to reclaim memory. Hence, if fallback into vmalloc, it's slow. When create, we need to iter each udmabuf item, then pin it's range folios, if each item's range folio's count is large, we may fallback each into vmalloc. This patch find the largest range folio in items, then alloc this size's folio array. When pin range folios, reuse this array. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-8-link@vivo.com
2024-09-20	udmabuf: remove udmabuf_folio	Huan Yang
	Currently, udmabuf handles folio by create an unpin list to record each folio obtained from the list and unpinning them when released. To maintain this, many struct have been established. However, maintain this requires a significant amount of memory and iter the list is a substantial overhead, which is not friendly to the CPU cache. When create, we arranged the folio array in the order of pin and set the offset according to pgcnt. So, if record each pinned folio when create, then can easy unpin it. Compare to use list to record it, an array also can do this. Hence, this patch setup a pinned_folios array(size is the pgcnt) to instead of udmabuf_folio struct, it record each folio which pinned when invoke memfd_pin_folios, then unpin folio by iter pinned_folios. Note that, since a folio may be pinned multiple times, each folio can be added to pinned_folios multiple times, depend on how many times the folio has been pinned when create. Compare to udmabuf_folio(24 byte size), a folio pointer is 8 byte, if no large folio - each folio is PAGE_SIZE - and need to unpin when release. So need to record each folio, by this patch, each folio can save 16 byte. But if large folio used, depend on the large folio's number, the pinned_folios array may take more memory, but it still can makes unpin access more cache-friendly. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-7-link@vivo.com
2024-09-20	udmabuf: introduce udmabuf init and deinit helper	Huan Yang
	After udmabuf is allocated, its resources need to be initialized, including various array structures. The current array structure has already been greatly expanded. Also, before udmabuf needs to be kfree, the occupied resources need to be released. This part is repetitive and maybe overlooked. This patch give a helper function when init and deinit, by this, reduce duplicate code. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-6-link@vivo.com
2024-09-20	udmabuf: udmabuf_create pin folio codestyle cleanup	Huan Yang
	This patch aim to simplify the memfd folio pin during the udmabuf create. No functional changes. This patch create a udmabuf_pin_folios function, in this, do the memfd pin folio and then record each pinned folio, offset. This patch simplify the pinned folio record, iter by each pinned folio, and then record each offset in it. Compare to iter by pgcnt, more readable. Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-5-link@vivo.com
2024-09-20	udmabuf: fix vmap_udmabuf error page set	Huan Yang
	Currently vmap_udmabuf set page's array by each folio. But, ubuf->folios is only contain's the folio's head page. That mean we repeatedly mapped the folio head page to the vmalloc area. Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist, so, we can't use page array to map, instead, use pfn array. By this, we removed page usage in udmabuf totally. Fixes: 5e72b2b41a21 ("udmabuf: convert udmabuf driver to use folios") Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-4-link@vivo.com
2024-09-20	udmabuf: change folios array from kmalloc to kvmalloc	Huan Yang
	When PAGE_SIZE 4096, MAX_PAGE_ORDER 10, 64bit machine, page_alloc only support 4MB. If above this, trigger this warn and return NULL. udmabuf can change size limit, if change it to 3072(3GB), and then alloc 3GB udmabuf, will fail create. [ 4080.876581] ------------[ cut here ]------------ [ 4080.876843] WARNING: CPU: 3 PID: 2015 at mm/page_alloc.c:4556 __alloc_pages+0x2c8/0x350 [ 4080.878839] RIP: 0010:__alloc_pages+0x2c8/0x350 [ 4080.879470] Call Trace: [ 4080.879473] <TASK> [ 4080.879473] ? __alloc_pages+0x2c8/0x350 [ 4080.879475] ? __warn.cold+0x8e/0xe8 [ 4080.880647] ? __alloc_pages+0x2c8/0x350 [ 4080.880909] ? report_bug+0xff/0x140 [ 4080.881175] ? handle_bug+0x3c/0x80 [ 4080.881556] ? exc_invalid_op+0x17/0x70 [ 4080.881559] ? asm_exc_invalid_op+0x1a/0x20 [ 4080.882077] ? udmabuf_create+0x131/0x400 Because MAX_PAGE_ORDER, kmalloc can max alloc 4096 * (1 << 10), 4MB memory, each array entry is pointer(8byte), so can save 524288 pages(2GB). Further more, costly order(order 3) may not be guaranteed that it can be applied for, due to fragmentation. This patch change udmabuf array use kvmalloc_array, this can fallback alloc into vmalloc, which can guarantee allocation for any size and does not affect the performance of kmalloc allocations. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-3-link@vivo.com
2024-09-20	udmabuf: pre-fault when first page fault	Huan Yang
	The current udmabuf mmap only fills the physical memory to the corresponding virtual address when the user actually accesses the virtual address. However, the current udmabuf has already obtained and pinned the folio upon completion of the creation.This means that the physical memory has already been acquired, rather than being accessed dynamically. As a result, the page fault has lost its purpose as a demanding page. Due to the fact that page fault requires trapping into kernel mode and filling in when accessing the corresponding virtual address in mmap, when creating a large size udmabuf, this represents a considerable overhead. This patch fill the pfn into page table, and then pre-fault each pfn into vma, when first access. Notice, if anything wrong , we do not return an error during this pre-fault step. However, an error will be returned if the failure occurs when the addr is truly accessed Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-2-link@vivo.com
2024-09-20	MAINTAINERS: udmabuf: Add myself as co-maintainer for udmabuf driver	Vivek Kasireddy
	I would like to help maintain the udmabuf driver, in light of the recent changes that converted the driver to use folios instead of pages. Furthermore, I also contribute to Qemu's virtio-gpu module (and UI modules), that are primary users of udmabuf driver. Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240822045806.3563883-1-vivek.kasireddy@intel.com
2024-09-20	drm/xe/pciid: Add new PCI id for ARL	Dnyaneshwar Bhadane
	Add new PCI id for ARL platform. v2: Fix typo in PCI id (SaiTeja) Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912115906.2730577-1-dnyaneshwar.bhadane@intel.com
2024-09-20	sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3	Jan Stancek
	ENGINE API has been deprecated since OpenSSL version 3.0 [1]. Distros have started dropping support from headers and in future it will likely disappear also from library. It has been superseded by the PROVIDER API, so use it instead for OPENSSL MAJOR >= 3. [1] https://github.com/openssl/openssl/blob/master/README-ENGINES.md [jarkko: fixed up alignment issues reported by checkpatch.pl --strict] Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: R Nageswara Sastry <rnsastry@linux.ibm.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20	sign-file,extract-cert: avoid using deprecated ERR_get_error_line()	Jan Stancek
	ERR_get_error_line() is deprecated since OpenSSL 3.0. Use ERR_peek_error_line() instead, and combine display_openssl_errors() and drain_openssl_errors() to a single function where parameter decides if it should consume errors silently. Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: R Nageswara Sastry <rnsastry@linux.ibm.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20	sign-file,extract-cert: move common SSL helper functions to a header	Jan Stancek
	Couple error handling helpers are repeated in both tools, so move them to a common header. Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: R Nageswara Sastry <rnsastry@linux.ibm.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20	KEYS: prevent NULL pointer dereference in find_asymmetric_key()	Roman Smirnov
	In find_asymmetric_key(), if all NULLs are passed in the id_{0,1,2} arguments, the kernel will first emit WARN but then have an oops because id_2 gets dereferenced anyway. Add the missing id_2 check and move WARN_ON() to the final else branch to avoid duplicate NULL checks. Found by Linux Verification Center (linuxtesting.org) with Svace static analysis tool. Cc: stable@vger.kernel.org # v5.17+ Fixes: 7d30198ee24f ("keys: X.509 public key issuer lookup without AKID") Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>