linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-01-07	svcrdma: Remove queue-shortening warnings	Chuck Lever
	These won't have much diagnostic value for site administrators. Since they can't be disabled, they become noise. What's more, the subsequent rdma_create_qp() call adjusts the Send Queue size (possibly downward) without warning, making the size reported by these pr_warns inaccurate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Remove pointer addresses shown in dprintk()	Chuck Lever
	There are a couple of dprintk() call sites in svc_rdma_accept() that show pointer addresses. These days, displayed pointer addresses are hashed and thus have little or no diagnostic value, especially for site administrators. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Optimize svc_rdma_cc_init()	Chuck Lever
	The atomic_inc_return() in svc_rdma_send_cid_init() is expensive. Some svc_rdma_chunk_ctxt's now reside in long-lived container structures. They don't need a fresh completion ID for every I/O operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: De-duplicate completion ID initialization helpers	Chuck Lever
	Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Move the svc_rdma_cc_init() call	Chuck Lever
	Now that the chunk_ctxt for Reads is no longer dynamically allocated it can be initialized once for the life of the object that contains it (struct svc_rdma_recv_ctxt). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Remove struct svc_rdma_read_info	Chuck Lever
	The remaining fields of struct svc_rdma_read_info are no longer referenced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update the synopsis of svc_rdma_read_special()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_special() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update the synopsis of svc_rdma_read_call_chunk()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_call_chunk() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update synopsis of svc_rdma_copy_inline_range()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_copy_inline_range() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update the synopsis of svc_rdma_read_data_item()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_data_item() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update synopsis of svc_rdma_read_chunk_range()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update synopsis of svc_rdma_build_read_chunk()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update synopsis of svc_rdma_build_read_segment()	Chuck Lever
	Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_segment() can use the recv_ctxt to derive that information rather than the other way around. This removes one usage of the ri_readctxt field, enabling its removal in a subsequent patch. At the same time, the use of ri_rqst can similarly be replaced with a passed-in function parameter. Start with build_read_segment() because it is a common utility function at the bottom of the Read chunk path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxt	Chuck Lever
	Further clean up: move the starting byte offset field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt	Chuck Lever
	Further clean up: move the page index field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Start moving fields out of struct svc_rdma_read_info	Chuck Lever
	Since the request's svc_rdma_recv_ctxt will stay around for the duration of the RDMA Read operation, the contents of struct svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt rather than being allocated separately. This will eventually save a call to kmalloc() in a hot path. Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.h	Chuck Lever
	Prepare for nestling these into the send and recv ctxts so they no longer have to be allocated dynamically. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma field	Chuck Lever
	In every instance, the pointer address in that field is now available by other means. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Pass a pointer to the transport to svc_rdma_cc_release()	Chuck Lever
	Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt()	Chuck Lever
	Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Explicitly pass the transport into Read chunk I/O paths	Chuck Lever
	Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Explicitly pass the transport into Write chunk I/O paths	Chuck Lever
	Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Acquire the svcxprt_rdma pointer from the CQ context	Chuck Lever
	Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Reduce size of struct svc_rdma_rw_ctxt	Chuck Lever
	SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first SGL array more than 4200 bytes in length, pushing the memory allocation well into order 1. Even so, the RDMA rw core doesn't seem to use more than max_send_sge entries in that array (typically 32 or less), so that is all wasted space. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Update some svcrdma DMA-related tracepoints	Chuck Lever
	A send/recv_ctxt already records transport-related information in the cq.id, thus there is no need to record the IP addresses of the transport endpoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: DMA error tracepoints should report completion IDs	Chuck Lever
	Update the DMA error flow tracepoints to report the completion ID of the failing context. This ties the wait/failure to a particular operation or request, which is more useful than knowing only the failing transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: SQ error tracepoints should report completion IDs	Chuck Lever
	Update the Send Queue's error flow tracepoints to report the completion ID of the waiting or failing context. This ties the wait/failure to a particular operation or request, which is a little more useful than knowing only the transport that is about to close. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	rpcrdma: Introduce a simple cid tracepoint class	Chuck Lever
	De-duplicate some code, making it easier to add new tracepoints that report only a completion ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Add lockdep class keys for transport locks	Chuck Lever
	Two svcrdma-related transport locks can become quite contended. Collate their use and make them easy to find in /proc/lock_stat for better observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Clean up locking	Chuck Lever
	There's no need to protect llist_entry() with a spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Add an async version of svc_rdma_write_info_free()	Chuck Lever
	DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing write_info structs to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Write completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Add an async version of svc_rdma_send_ctxt_put()	Chuck Lever
	DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing send_ctxts to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Send completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Add a utility workqueue to svcrdma	Chuck Lever
	To handle work in the background, set up an UNBOUND workqueue for svcrdma. Subsequent patches will make use of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Pre-allocate svc_rdma_recv_ctxt objects	Chuck Lever
	The original reason for allocating svc_rdma_recv_ctxt objects during Receive completion was to ensure the objects were allocated on the NUMA node closest to the underlying IB device. Since commit c5d68d25bd6b ("svcrdma: Clean up allocation of svc_rdma_recv_ctxt"), however, the device's favored node is explicitly passed to the memory allocator. To enable switching Receive completion to soft IRQ context, move memory allocation out of completion handling, since it can be costly, and it can sleep. A limited number of objects is now allocated at "accept" time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	svcrdma: Eliminate allocation of recv_ctxt objects in backchannel	Chuck Lever
	The svc_rdma_recv_ctxt free list uses a lockless list to avoid the need for a spin lock in the fast path. llist_del_first(), which is used by svc_rdma_recv_ctxt_get(), requires serialization, however, when there are multiple list producers that are unserialized. I mistakenly thought there was only one caller of svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit serialization would not be necessary. But there is another caller: svc_rdma_bc_sendto(), and these two are not serialized against each other. I haven't seen ill effects that I could directly ascribe to a lack of serialization. It's just an observation based on code audit. When DMA-mapping before sending a Reply, the passed-in struct svc_rdma_recv_ctxt is used only for its write and reply PCLs. These are currently always empty in the backchannel case. So, instead of passing a full svc_rdma_recv_ctxt object to svc_rdma_map_reply_msg(), let's pass in just the Write and Reply PCLs. This change makes it unnecessary for the backchannel to acquire a dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need for svc_rdma_recv_ctxt free list serialization is now completely avoided. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	NFSv4, NFSD: move enum nfs_cb_opnum4 to include/linux/nfs4.h	ChenXiaoSong
	Callback operations enum is defined in client and server, move it to common header file. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	nfsd: remove unnecessary NULL check	Dan Carpenter
	We check "state" for NULL on the previous line so it can't be NULL here. No need to check again. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202312031425.LffZTarR-lkp@intel.com/ Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	SUNRPC: Remove RQ_SPLICE_OK	Chuck Lever
	This flag is no longer used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	NFSD: Modify NFSv4 to use nfsd_read_splice_ok()	Chuck Lever
	Avoid the use of an atomic bitop, and prepare for adding a run-time switch for using splice reads. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	NFSD: Replace RQ_SPLICE_OK in nfsd_read()	Chuck Lever
	RQ_SPLICE_OK is a bit of a layering violation. Also, a subsequent patch is going to provide a mechanism for always disabling splice reads. Splicing is an issue only for NFS READs, so refactor nfsd_read() to check the auth type directly instead of relying on an rq_flag setting. The new helper will be added into the NFSv4 read path in a subsequent patch. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavor	Chuck Lever
	NFSD will use this new API to determine whether nfsd_splice_read is safe to use. This avoids the need to add a dependency to NFSD for CONFIG_SUNRPC_GSS. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	NFSD: Document lack of f_pos_lock in nfsd_readdir()	Chuck Lever
	Al Viro notes that normal system calls hold f_pos_lock when calling ->iterate_shared and ->llseek; however nfsd_readdir() does not take that mutex when calling these methods. It should be safe however because the struct file acquired by nfsd_readdir() is not visible to other threads. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	NFSD: Remove nfsd_drc_gc() tracepoint	Chuck Lever
	This trace point was for debugging the DRC's garbage collection. In the field it's just noise. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	NFSD: Make the file_delayed_close workqueue UNBOUND	Chuck Lever
	workqueue: nfsd_file_delayed_close [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND There's no harm in closing a cached file descriptor on another core. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	NFSD: use read_seqbegin() rather than read_seqbegin_or_lock()	Oleg Nesterov
	The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier() is wrong. "seq" is always even and thus "or_lock" has no effect, this code can never take ->writeverf_lock for writing. I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes and nfsd_reset_write_verifier() is supposed to be very rare operation so we do not need the adaptive locking in this case. Yet the code looks wrong and sub-optimal, it can use read_seqbegin() without changing the behaviour. [ cel: Note also that it eliminates this Sparse warning: fs/nfsd/nfssvc.c:360:6: warning: context imbalance in 'nfsd_copy_write_verifier' - different lock contexts for basic block ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	nfsd: new Kconfig option for legacy client tracking	Jeff Layton
	We've had a number of attempts at different NFSv4 client tracking methods over the years, but now nfsdcld has emerged as the clear winner since the others (recoverydir and the usermodehelper upcall) are problematic. As a case in point, the recoverydir backend uses MD5 hashes to encode long form clientid strings, which means that nfsd repeatedly gets dinged on FIPS audits, since MD5 isn't considered secure. Its use of MD5 is not cryptographically significant, so there is no danger there, but allowing us to compile that out allows us to sidestep the issue entirely. As a prelude to eventually removing support for these client tracking methods, add a new Kconfig option that enables them. Mark it deprecated and make it default to N. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07	ipvlan: Remove usage of the deprecated ida_simple_xx() API	Christophe JAILLET
	ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Note that the upper bound of ida_alloc_range() is inclusive while the one of ida_simple_get() was exclusive. So calls have been updated accordingly. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07	ipvlan: Fix a typo in a comment	Christophe JAILLET
	s/diffentiate/differentiate/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-07	parisc/power: Fix power soft-off button emulation on qemu	Helge Deller
	Make sure to start the kthread to check the power button on qemu as well if the power button address was provided. This fixes the qemu built-in system_powerdown runtime command. Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v6.0+