summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-20NFSD: Limit the number of concurrent async COPY operationsChuck Lever
Nothing appears to limit the number of concurrent async COPY operations that clients can start. In addition, AFAICT each async COPY can copy an unlimited number of 4MB chunks, so can run for a long time. Thus IMO async COPY can become a DoS vector. Add a restriction mechanism that bounds the number of concurrent background COPY operations. Start simple and try to be fair -- this patch implements a per-namespace limit. An async COPY request that occurs while this limit is exceeded gets NFS4ERR_DELAY. The requesting client can choose to send the request again after a delay or fall back to a traditional read/write style copy. If there is need to make the mechanism more sophisticated, we can visit that in future patches. Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Async COPY result needs to return a write verifierChuck Lever
Currently, when NFSD handles an asynchronous COPY, it returns a zero write verifier, relying on the subsequent CB_OFFLOAD callback to pass the write verifier and a stable_how4 value to the client. However, if the CB_OFFLOAD never arrives at the client (for example, if a network partition occurs just as the server sends the CB_OFFLOAD operation), the client will never receive this verifier. Thus, if the client sends a follow-up COMMIT, there is no way for the client to assess the COMMIT result. The usual recovery for a missing CB_OFFLOAD is for the client to send an OFFLOAD_STATUS operation, but that operation does not carry a write verifier in its result. Neither does it carry a stable_how4 value, so the client /must/ send a COMMIT in this case -- which will always fail because currently there's still no write verifier in the COPY result. Thus the server needs to return a normal write verifier in its COPY result even if the COPY operation is to be performed asynchronously. If the server recognizes the callback stateid in subsequent OFFLOAD_STATUS operations, then obviously it has not restarted, and the write verifier the client received in the COPY result is still valid and can be used to assess a COMMIT of the copied data, if one is needed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: avoid races with wake_up_var()NeilBrown
wake_up_var() needs a barrier after the important change is made in the var and before wake_up_var() is called, else it is possible that a wake up won't be sent when it should. In each case here the var is changed in an "atomic" manner, so smb_mb__after_atomic() is sufficient. In one case the important change (removing the lease) is performed *after* the wake_up, which is backwards. The code survives in part because the wait_var_event is given a timeout. This patch adds the required barriers and calls destroy_delegation() *before* waking any threads waiting for the delegation to be destroyed. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: use clear_and_wake_up_bit()NeilBrown
nfsd has two places that open-code clear_and_wake_up_bit(). One has the required memory barriers. The other does not. Change both to use clear_and_wake_up_bit() so we have the barriers without the noise. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: xprtrdma: Use ERR_CAST() to returnYan Zhen
Using ERR_CAST() is more reasonable and safer, When it is necessary to convert the type of an error pointer and return it. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Use struct_size() instead of manually calculating the number of bytes to allocate for a pnfs_block_deviceaddr with a single volume. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: call cache_put if xdr_reserve_space returns NULLGuoqing Jiang
If not enough buffer space available, but idmap_lookup has triggered lookup_fn which calls cache_get and returns successfully. Then we missed to call cache_put here which pairs with cache_get. Fixes: ddd1ea563672 ("nfsd4: use xdr_reserve_space in attribute encoding") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviwed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: add more nfsd_cb tracepointsJeff Layton
Add some tracepoints in the callback client RPC operations. Also add a tracepoint to nfsd4_cb_getattr_done. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: track the main opcode for callbacksJeff Layton
Keep track of the "main" opcode for the callback, and display it in the tracepoint. This makes it simpler to discern what's happening when there is more than one callback in flight. The one special case is the CB_NULL RPC. That's not a CB_COMPOUND opcode, so designate the value 0 for that. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: add more info to WARN_ON_ONCE on failed callbacksJeff Layton
Currently, you get the warning and stack trace, but nothing is printed about the relevant error codes. Add that in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: fix some spelling errors in commentsLi Lingfeng
Fix spelling errors in comments of nfsd4_release_lockowner and nfs4_set_delegation. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: remove unused parameter of nfsd_file_mark_find_or_createLi Lingfeng
Commit 427f5f83a319 ("NFSD: Ensure nf_inode is never dereferenced") passes inode directly to nfsd_file_mark_find_or_create instead of getting it from nf, so there is no need to pass nf. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: use LIST_HEAD() to simplify codeHongbo Li
list_head can be initialized automatically with LIST_HEAD() instead of calling INIT_LIST_HEAD(). Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: map the EBADMSG to nfserr_io to avoid warningLi Lingfeng
Ext4 will throw -EBADMSG through ext4_readdir when a checksum error occurs, resulting in the following WARNING. Fix it by mapping EBADMSG to nfserr_io. nfsd_buffered_readdir iterate_dir // -EBADMSG -74 ext4_readdir // .iterate_shared ext4_dx_readdir ext4_htree_fill_tree htree_dirblock_to_tree ext4_read_dirblock __ext4_read_dirblock ext4_dirblock_csum_verify warn_no_space_for_csum __warn_no_space_for_csum return ERR_PTR(-EFSBADCRC) // -EBADMSG -74 nfserrno // WARNING [ 161.115610] ------------[ cut here ]------------ [ 161.116465] nfsd: non-standard errno: -74 [ 161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0 [ 161.118596] Modules linked in: [ 161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138 [ 161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe mu.org 04/01/2014 [ 161.123601] RIP: 0010:nfserrno+0x9d/0xd0 [ 161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6 05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33 [ 161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286 [ 161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a [ 161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827 [ 161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021 [ 161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8 [ 161.135244] FS: 0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000 [ 161.136695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0 [ 161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 161.141519] PKRU: 55555554 [ 161.142076] Call Trace: [ 161.142575] ? __warn+0x9b/0x140 [ 161.143229] ? nfserrno+0x9d/0xd0 [ 161.143872] ? report_bug+0x125/0x150 [ 161.144595] ? handle_bug+0x41/0x90 [ 161.145284] ? exc_invalid_op+0x14/0x70 [ 161.146009] ? asm_exc_invalid_op+0x12/0x20 [ 161.146816] ? nfserrno+0x9d/0xd0 [ 161.147487] nfsd_buffered_readdir+0x28b/0x2b0 [ 161.148333] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.149258] ? nfsd_buffered_filldir+0xf0/0xf0 [ 161.150093] ? wait_for_concurrent_writes+0x170/0x170 [ 161.151004] ? generic_file_llseek_size+0x48/0x160 [ 161.151895] nfsd_readdir+0x132/0x190 [ 161.152606] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.153516] ? nfsd_unlink+0x380/0x380 [ 161.154256] ? override_creds+0x45/0x60 [ 161.155006] nfsd4_encode_readdir+0x21a/0x3d0 [ 161.155850] ? nfsd4_encode_readlink+0x210/0x210 [ 161.156731] ? write_bytes_to_xdr_buf+0x97/0xe0 [ 161.157598] ? __write_bytes_to_xdr_buf+0xd0/0xd0 [ 161.158494] ? lock_downgrade+0x90/0x90 [ 161.159232] ? nfs4svc_decode_voidarg+0x10/0x10 [ 161.160092] nfsd4_encode_operation+0x15a/0x440 [ 161.160959] nfsd4_proc_compound+0x718/0xe90 [ 161.161818] nfsd_dispatch+0x18e/0x2c0 [ 161.162586] svc_process_common+0x786/0xc50 [ 161.163403] ? nfsd_svc+0x380/0x380 [ 161.164137] ? svc_printk+0x160/0x160 [ 161.164846] ? svc_xprt_do_enqueue.part.0+0x365/0x380 [ 161.165808] ? nfsd_svc+0x380/0x380 [ 161.166523] ? rcu_is_watching+0x23/0x40 [ 161.167309] svc_process+0x1a5/0x200 [ 161.168019] nfsd+0x1f5/0x380 [ 161.168663] ? nfsd_shutdown_threads+0x260/0x260 [ 161.169554] kthread+0x1c4/0x210 [ 161.170224] ? kthread_insert_work_sanity_check+0x80/0x80 [ 161.171246] ret_from_fork+0x1f/0x30 Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: remove redundant assignment operationLi Lingfeng
Commit 5826e09bf3dd ("NFSD: OP_CB_RECALL_ANY should recall both read and write delegations") added a new assignment statement to add RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the old one should be removed. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20.mailmap: Add an entry for my work email addressChuck Lever
Collect a few very old previous employers as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20NFSD: Fix NFSv4's PUTPUBFH operationChuck Lever
According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH. Replace the XDR decoder for PUTPUBFH with a "noop" since we no longer want the minorversion check, and PUTPUBFH has no arguments to decode. (Ideally nfsd4_decode_noop should really be called nfsd4_decode_void). PUTPUBFH should now behave just like PUTROOTFH. Reported-by: Cedric Blancher <cedric.blancher@gmail.com> Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1") Cc: Dan Shelton <dan.f.shelton@gmail.com> Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Add quotes to client info 'callback address'Mark Grimes
The 'callback address' in client_info_show is output without quotes causing yaml parsers to fail on processing IPv6 addresses. Adding quotes to 'callback address' also matches that used by the 'address' field. Signed-off-by: Mark Grimes <mark.grimes@ixsystems.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20svcrdma: Handle device removal outside of the CM event handlerChuck Lever
Synchronously wait for all disconnects to complete to ensure the transports have divested all hardware resources before the underlying RDMA device can safely be removed. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: move error choice for incorrect object types to version-specific code.NeilBrown
If an NFS operation expects a particular sort of object (file, dir, link, etc) but gets a file handle for a different sort of object, it must return an error. The actual error varies among NFS versions in non-trivial ways. For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only, INVAL is suitable. For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK was found when not expected. This take precedence over NOTDIR. For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in preference to EINVAL when none of the specific error codes apply. When nfsd_mode_check() finds a symlink where it expected a directory it needs to return an error code that can be converted to NOTDIR for v2 or v3 but will be SYMLINK for v4. It must be different from the error code returns when it finds a symlink but expects a regular file - that must be converted to EINVAL or SYMLINK. So we introduce an internal error code nfserr_symlink_not_dir which each version converts as appropriate. nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable error if the object is the wrong time. For v4.0 we use nfserr_symlink for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type. We handle this difference in-place in nfsd_check_obj_isreg() as there is nothing to be gained by delaying the choice to nfsd4_map_status(). As a result of these changes, nfsd_mode_check() doesn't need an rqstp arg any more. Note that NFSv4 operations are actually performed in the xdr code(!!!) so to the only place that we can map the status code successfully is in nfsd4_encode_operation(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: be more systematic about selecting error codes for internal use.NeilBrown
Rather than using ad hoc values for internal errors (30000, 11000, ...) use 'enum' to sequentially allocate numbers starting from the first known available number - now visible as NFS4ERR_FIRST_FREE. The goal is values that are distinct from all be32 error codes. To get those we must first select integers that are not already used, then convert them with cpu_to_be32(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Move error code mapping to per-version proc code.NeilBrown
There is code scattered around nfsd which chooses an error status based on the particular version of nfs being used. It is cleaner to have the version specific choices in version specific code. With this patch common code returns the most specific error code possible and the version specific code maps that if necessary. Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()" function which is called to map the resp->status before each non-trivial nfsd_proc_* or nfsd3_proc_* function returns. NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and are left for a later patch. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: move V4ROOT version check to nfsd_set_fh_dentry()NeilBrown
This further centralizes version number checks. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: further centralize protocol version checks.NeilBrown
With this patch the only places that test ->rq_vers against a specific version are nfsd_v4client() and nfsd_set_fh_dentry(). The latter sets some flags in the svc_fh, which now includes: fh_64bit_cookies fh_use_wgather Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease()NeilBrown
nfsd_breaker_owns_lease() currently open-codes the same test that nfsd_v4client() performs. With this patch we use nfsd_v4client() instead. Also as i_am_nfsd() is only used in combination with kthread_data(), replace it with nfsd_current_rqst() which combines the two and returns a valid svc_rqst, or NULL. The test for NULL is moved into nfsd_v4client() for code clarity. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Pass 'cred' instead of 'rqstp' to some functions.NeilBrown
nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags() only ever need the cred out of rqstp, so pass it explicitly instead of the whole rqstp. This makes the interfaces cleaner. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Don't pass all of rqst into rqst_exp_find()NeilBrown
Rather than passing the whole rqst, pass the pieces that are actually needed. This makes the inputs to rqst_exp_find() more obvious. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: don't assume copy notify when preprocessing the stateidSagi Grimberg
Move the stateid handling to nfsd4_copy_notify. If nfs4_preprocess_stateid_op did not produce an output stateid, error out. Copy notify specifically does not permit the use of special stateids, so enforce that outside generic stateid pre-processing. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: allow svc threads to fail initialisation cleanlyNeilBrown
If an svc thread needs to perform some initialisation that might fail, it has no good way to handle the failure. Before the thread can exit it must call svc_exit_thread(), but that requires the service mutex to be held. The thread cannot simply take the mutex as that could deadlock if there is a concurrent attempt to shut down all threads (which is unlikely, but not impossible). nfsd currently call svc_exit_thread() unprotected in the unlikely event that unshare_fs_struct() fails. We can clean this up by introducing svc_thread_init_status() by which an svc thread can report whether initialisation has succeeded. If it has, it continues normally into the action loop. If it has not, svc_thread_init_status() immediately aborts the thread. svc_start_kthread() waits for either of these to happen, and calls svc_exit_thread() (under the mutex) if the thread aborted. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: merge svc_rqst_alloc() into svc_prepare_thread()NeilBrown
The only caller of svc_rqst_alloc() is svc_prepare_thread(). So merge the one into the other and simplify. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: don't take ->sv_lock when updating ->sv_nrthreads.NeilBrown
As documented in svc_xprt.c, sv_nrthreads is protected by the service mutex, and it does not need ->sv_lock. (->sv_lock is needed only for sv_permsocks, sv_tempsocks, and sv_tmpcnt). So remove the unnecessary locking. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: change sp_nrthreads from atomic_t to unsigned int.NeilBrown
sp_nrthreads is only ever accessed under the service mutex nlmsvc_mutex nfs_callback_mutex nfsd_mutex so these is no need for it to be an atomic_t. The fact that all code using it is single-threaded means that we can simplify svc_pool_victim and remove the temporary elevation of sp_nrthreads. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: document locking rules for svc_exit_thread()NeilBrown
The locking required for svc_exit_thread() is not obvious, so document it in a kdoc comment. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: don't allocate the versions array.NeilBrown
Instead of using kmalloc to allocate an array for storing active version info, just declare an array to the max size - it is only 5 or so. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20ntb: Force physically contiguous allocation of rx ring buffersDave Jiang
Physical addresses under IOVA on x86 platform are mapped contiguously as a side effect before the patch that removed CONFIG_DMA_REMAP. The NTB rx buffer ring is a single chunk DMA buffer that is allocated against the NTB PCI device. If the receive side is using a DMA device, then the buffers are remapped against the DMA device before being submitted via the dmaengine API. This scheme becomes a problem when the physical memory is discontiguous. When dma_map_page() is called on the kernel virtual address from the dma_alloc_coherent() call, the new IOVA mapping no longer points to all the physical memory allocated due to being discontiguous. Change dma_alloc_coherent() to dma_alloc_attrs() in order to force DMA_ATTR_FORCE_CONTIGUOUS attribute. This is the best fix for the circumstance. A potential future solution may be having the DMA mapping API providing a way to alias an existing IOVA mapping to a new device perhaps. This fix is not to fix the patch pointed to by the fixes tag, but to fix the issue arised in the ntb_transport driver on x86 platforms after the said patch is applied. Reported-by: Jerry Dai <jerry.dai@intel.com> Fixes: f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP") Tested-by: Jerry Dai <jerry.dai@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: ntb_hw_switchtec: Fix use after free vulnerability in ↵Kaixin Wang
switchtec_ntb_remove due to race condition In the switchtec_ntb_add function, it can call switchtec_ntb_init_sndev function, then &sndev->check_link_status_work is bound with check_link_status_work. switchtec_ntb_link_notification may be called to start the work. If we remove the module which will call switchtec_ntb_remove to make cleanup, it will free sndev through kfree(sndev), while the work mentioned above will be used. The sequence of operations that may lead to a UAF bug is as follows: CPU0 CPU1 | check_link_status_work switchtec_ntb_remove | kfree(sndev); | | if (sndev->link_force_down) | // use sndev Fix it by ensuring that the work is canceled before proceeding with the cleanup in switchtec_ntb_remove. Signed-off-by: Kaixin Wang <kxwang23@m.fudan.edu.cn> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: idt: Fix the cacography in ntb_hw_idt.czhang jiao
The word 'swtich' is wrong, so fix it. Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Acked-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20NTB: epf: don't misuse kernel-doc markerRandy Dunlap
Use "/*" instead of "/**" for common C comments to prevent warnings from scripts/kernel-doc. ntb_hw_epf.c:15: warning: expecting prototype for Host side endpoint driver to implement Non(). Prototype was for NTB_EPF_COMMAND() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Cc: ntb@lists.linux.dev Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20NTB: ntb_transport: fix all kernel-doc warningsRandy Dunlap
Fix all kernel-doc warnings in ntb_transport.c. The function parameters for ntb_transport_create_queue() changed, so update them in the kernel-doc comments. Add a Returns: comment for ntb_transport_register_client_dev(). ntb_transport.c:382: warning: No description found for return value of 'ntb_transport_register_client_dev' ntb_transport.c:1984: warning: Excess function parameter 'rx_handler' description in 'ntb_transport_create_queue' ntb_transport.c:1984: warning: Excess function parameter 'tx_handler' description in 'ntb_transport_create_queue' ntb_transport.c:1984: warning: Excess function parameter 'event_handler' description in 'ntb_transport_create_queue' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Cc: ntb@lists.linux.dev Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: Constify struct bus_typeChristophe JAILLET
'struct bus_type' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increase overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 69682 4593 152 74427 122bb drivers/ntb/ntb_transport.o 5847 448 32 6327 18b7 drivers/ntb/core.o After: ===== text data bss dec hex filename 69858 4433 152 74443 122cb drivers/ntb/ntb_transport.o 6007 288 32 6327 18b7 drivers/ntb/core.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb_perf: Fix printk formatMax Hawking
The correct printk format is %pa or %pap, but not %pa[p]. Fixes: 99a06056124d ("NTB: ntb_perf: Fix address err in perf_copy_chunk") Signed-off-by: Max Hawking <maxahawking@sonnenkinder.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir()Jinjie Ruan
The debugfs_create_dir() function returns error pointers. It never returns NULL. So use IS_ERR() to check it. Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20Merge tag 'sched-rt-2024-09-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RT enablement from Thomas Gleixner: "Enable PREEMPT_RT on supported architectures: After twenty years of development we finally reached the point to enable PREEMPT_RT support in the mainline kernel. All prerequisites are merged, so enable it on the supported architectures ARM64, RISCV and X86(32/64-bit)" * tag 'sched-rt-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: riscv: Allow to enable PREEMPT_RT. arm64: Allow to enable PREEMPT_RT. x86: Allow to enable PREEMPT_RT.
2024-09-19Merge branch 'pci/tools'Bjorn Helgaas
- Remove .*.cmd files with make clean (zhang jiao) - Remove the unused BILLION macro (zhang jiao) * pci/tools: tools: PCI: Remove unused BILLION macro tools: PCI: Remove .*.cmd files with make clean
2024-09-19Merge branch 'pci/misc'Bjorn Helgaas
- Check pcie_find_root_port() return in x86 fixups to avoid NULL pointer dereferences (Samasth Norway Ananda) - Make pci_bus_type constant (Kunwu Chan) - Remove unused declarations of __pci_pme_wakeup() and pci_vpd_release() (Yue Haibing) - Remove any leftover .*.cmd files with make clean (zhang jiao) * pci/misc: PCI: Fix typos PCI/VPD: Remove pci_vpd_release() unused declarations PCI/PM: Remove __pci_pme_wakeup() unused declarations PCI: Make pci_bus_type constant x86/PCI: Check pcie_find_root_port() return for NULL
2024-09-19Merge branch 'pci/quirks'Bjorn Helgaas
- Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS but does provide ACS-like features (Subramanian Ananthanarayanan) - Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson) * pci/quirks: PCI: Mark Creative Labs EMU20k2 INTx masking as broken PCI: Add ACS quirk for Qualcomm SA8775P
2024-09-19Merge branch 'pci/controller/xilinx'Bjorn Helgaas
- Fix off-by-one error in INTx IRQ handler that caused INTx interrupts to be lost or delivered as the wrong interrupt (Sean Anderson) - Rate-limit misc interrupt messages (Sean Anderson) - Turn off the clock on probe failure and device removal (Sean Anderson) - Add DT binding and driver support for enabling/disabling PHYs (Sean Anderson) - Add PCIe phy bindings for the ZCU102 (Sean Anderson) - Add support for Xilinx QDMA Soft IP PCIe Root Port Bridge to DT binding and xilinx-dma-pl driver (Thippeswamy Havalige) * pci/controller/xilinx: PCI: xilinx-xdma: Add Xilinx QDMA Root Port driver dt-bindings: PCI: xilinx-xdma: Add schemas for Xilinx QDMA PCIe Root Port Bridge arm64: zynqmp: Add PCIe phys property for ZCU102 PCI: xilinx-nwl: Add PHY support dt-bindings: pci: xilinx-nwl: Add phys property PCI: xilinx-nwl: Clean up clock on probe failure/removal PCI: xilinx-nwl: Rate-limit misc interrupt messages PCI: xilinx-nwl: Fix register misspelling PCI: xilinx-nwl: Fix off-by-one in INTx IRQ handler
2024-09-19Merge branch 'pci/controller/vmd'Bjorn Helgaas
- Fix whitespace indentation issues (Riyan Dhiman) * pci/controller/vmd: PCI: vmd: Fix indentation issue in vmd_shutdown()
2024-09-19Merge branch 'pci/controller/rcar-gen4'Bjorn Helgaas
- Make the read-only const array 'check_addr' static (Colin Ian King) - Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding (Yoshihiro Shimoda) * pci/controller/rcar-gen4: dt-bindings: PCI: rcar-gen4-pci-ep: Add R-Car V4M compatible dt-bindings: PCI: rcar-gen4-pci-host: Add R-Car V4M compatible PCI: rcar-gen4: Make read-only const array check_addr static
2024-09-19Merge branch 'pci/controller/qcom'Bjorn Helgaas
- Drop endpoint redundant masking of global IRQ events (Manivannan Sadhasivam) - Clarify unknown global IRQ message and only log it once to avoid a flood (Manivannan Sadhasivam) - Add Manivannan Sadhasivam as maintainer of qcom endpoint driver (Manivannan Sadhasivam) - Add 'linux,pci-domain' property to endpoint DT binding (Manivannan Sadhasivam) - Assign PCI domain number for endpoint controllers (Manivannan Sadhasivam) - Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for endpoint controller (Manivannan Sadhasivam) - Add global SPI interrupt for PCIe link events to DT binding (Manivannan Sadhasivam) - Add global RC interrupt handler to handle 'Link up' events and automatically enumerate hot-added devices (Manivannan Sadhasivam) - Avoid mirroring of DBI and iATU register space so it doesn't overlap BAR MMIO space (Prudhvi Yarlagadda) - Enable controller resources like PHY only after PERST# is deasserted to partially avoid the problem that the endpoint SoC crashes when accessing things when Refclk is absent (Manivannan Sadhasivam) - Rename dw_pcie.link_gen to max_link_speed to avoid ambiguity (Manivannan Sadhasivam) - Cache maximum link speed value in dw_pcie.max_link_speed for use by vendor drivers (Manivannan Sadhasivam) - Add 16.0 GT/s equalization and RX lane margining settings (Shashank Babu Chinta Venkata) - Pass domain number to pci_bus_release_domain_nr() explicitly to avoid a NULL pointer dereference (Manivannan Sadhasivam) * pci/controller/qcom: PCI: Pass domain number to pci_bus_release_domain_nr() explicitly PCI: qcom: Add RX lane margining settings for 16.0 GT/s PCI: qcom: Add equalization settings for 16.0 GT/s PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed' PCI: qcom-ep: Enable controller resources like PHY only after refclk is available PCI: qcom: Disable mirroring of DBI and iATU register space in BAR region PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt dt-bindings: PCI: qcom,pcie-sm8450: Add 'global' interrupt PCI: qcom-ep: Modify 'global_irq' and 'perst_irq' IRQ device names PCI: endpoint: Assign PCI domain number for endpoint controllers dt-bindings: PCI: pci-ep: Document 'linux,pci-domain' property dt-bindings: PCI: pci-ep: Update Maintainers PCI: qcom-ep: Reword the error message for receiving unknown global IRQ event PCI: qcom-ep: Drop the redundant masking of global IRQ events