summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-20svcrdma: Handle device removal outside of the CM event handlerChuck Lever
Synchronously wait for all disconnects to complete to ensure the transports have divested all hardware resources before the underlying RDMA device can safely be removed. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: move error choice for incorrect object types to version-specific code.NeilBrown
If an NFS operation expects a particular sort of object (file, dir, link, etc) but gets a file handle for a different sort of object, it must return an error. The actual error varies among NFS versions in non-trivial ways. For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only, INVAL is suitable. For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK was found when not expected. This take precedence over NOTDIR. For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in preference to EINVAL when none of the specific error codes apply. When nfsd_mode_check() finds a symlink where it expected a directory it needs to return an error code that can be converted to NOTDIR for v2 or v3 but will be SYMLINK for v4. It must be different from the error code returns when it finds a symlink but expects a regular file - that must be converted to EINVAL or SYMLINK. So we introduce an internal error code nfserr_symlink_not_dir which each version converts as appropriate. nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable error if the object is the wrong time. For v4.0 we use nfserr_symlink for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type. We handle this difference in-place in nfsd_check_obj_isreg() as there is nothing to be gained by delaying the choice to nfsd4_map_status(). As a result of these changes, nfsd_mode_check() doesn't need an rqstp arg any more. Note that NFSv4 operations are actually performed in the xdr code(!!!) so to the only place that we can map the status code successfully is in nfsd4_encode_operation(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: be more systematic about selecting error codes for internal use.NeilBrown
Rather than using ad hoc values for internal errors (30000, 11000, ...) use 'enum' to sequentially allocate numbers starting from the first known available number - now visible as NFS4ERR_FIRST_FREE. The goal is values that are distinct from all be32 error codes. To get those we must first select integers that are not already used, then convert them with cpu_to_be32(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Move error code mapping to per-version proc code.NeilBrown
There is code scattered around nfsd which chooses an error status based on the particular version of nfs being used. It is cleaner to have the version specific choices in version specific code. With this patch common code returns the most specific error code possible and the version specific code maps that if necessary. Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()" function which is called to map the resp->status before each non-trivial nfsd_proc_* or nfsd3_proc_* function returns. NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and are left for a later patch. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: move V4ROOT version check to nfsd_set_fh_dentry()NeilBrown
This further centralizes version number checks. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: further centralize protocol version checks.NeilBrown
With this patch the only places that test ->rq_vers against a specific version are nfsd_v4client() and nfsd_set_fh_dentry(). The latter sets some flags in the svc_fh, which now includes: fh_64bit_cookies fh_use_wgather Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease()NeilBrown
nfsd_breaker_owns_lease() currently open-codes the same test that nfsd_v4client() performs. With this patch we use nfsd_v4client() instead. Also as i_am_nfsd() is only used in combination with kthread_data(), replace it with nfsd_current_rqst() which combines the two and returns a valid svc_rqst, or NULL. The test for NULL is moved into nfsd_v4client() for code clarity. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Pass 'cred' instead of 'rqstp' to some functions.NeilBrown
nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags() only ever need the cred out of rqstp, so pass it explicitly instead of the whole rqstp. This makes the interfaces cleaner. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: Don't pass all of rqst into rqst_exp_find()NeilBrown
Rather than passing the whole rqst, pass the pieces that are actually needed. This makes the inputs to rqst_exp_find() more obvious. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: don't assume copy notify when preprocessing the stateidSagi Grimberg
Move the stateid handling to nfsd4_copy_notify. If nfs4_preprocess_stateid_op did not produce an output stateid, error out. Copy notify specifically does not permit the use of special stateids, so enforce that outside generic stateid pre-processing. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: allow svc threads to fail initialisation cleanlyNeilBrown
If an svc thread needs to perform some initialisation that might fail, it has no good way to handle the failure. Before the thread can exit it must call svc_exit_thread(), but that requires the service mutex to be held. The thread cannot simply take the mutex as that could deadlock if there is a concurrent attempt to shut down all threads (which is unlikely, but not impossible). nfsd currently call svc_exit_thread() unprotected in the unlikely event that unshare_fs_struct() fails. We can clean this up by introducing svc_thread_init_status() by which an svc thread can report whether initialisation has succeeded. If it has, it continues normally into the action loop. If it has not, svc_thread_init_status() immediately aborts the thread. svc_start_kthread() waits for either of these to happen, and calls svc_exit_thread() (under the mutex) if the thread aborted. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: merge svc_rqst_alloc() into svc_prepare_thread()NeilBrown
The only caller of svc_rqst_alloc() is svc_prepare_thread(). So merge the one into the other and simplify. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: don't take ->sv_lock when updating ->sv_nrthreads.NeilBrown
As documented in svc_xprt.c, sv_nrthreads is protected by the service mutex, and it does not need ->sv_lock. (->sv_lock is needed only for sv_permsocks, sv_tempsocks, and sv_tmpcnt). So remove the unnecessary locking. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: change sp_nrthreads from atomic_t to unsigned int.NeilBrown
sp_nrthreads is only ever accessed under the service mutex nlmsvc_mutex nfs_callback_mutex nfsd_mutex so these is no need for it to be an atomic_t. The fact that all code using it is single-threaded means that we can simplify svc_pool_victim and remove the temporary elevation of sp_nrthreads. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20sunrpc: document locking rules for svc_exit_thread()NeilBrown
The locking required for svc_exit_thread() is not obvious, so document it in a kdoc comment. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20nfsd: don't allocate the versions array.NeilBrown
Instead of using kmalloc to allocate an array for storing active version info, just declare an array to the max size - it is only 5 or so. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-20udmabuf: reuse folio array when pin foliosHuan Yang
When invoke memfd_pin_folios, we need offer an array to save each folio which we pinned. The current way is dynamic alloc an array(use kvmalloc), get folios, save into udmabuf and then free. Depend on the size, kvmalloc can do something different: Below PAGE_SIZE, slab allocator will be used, which have good alloc performance, due to it cached page. PAGE_SIZE - PCP Order, PCP(per-cpu-pageset) also given buddy page a cache in each CPU, so different CPU no need to hold some lock(zone or some) to get the locally page. If PCP cached page, the access also fast. PAGE_SIZE - BUDDY_MAX, try to get page from buddy, due to kvmalloc adjusted the gfp flags, if zone freelist can't alloc page(fast path), we will not enter slowpath to reclaim memory. Due to need hold lock and check, may slow, but still fast than vmalloc. Anything wrong will fallback into vmalloc to alloc memory, it obtains contiguous virtual addresses by loop alloc order 0 page(PAGE_SIZE), and then map it into vmalloc area. If necessary, page alloc may enter slowpath to reclaim memory. Hence, if fallback into vmalloc, it's slow. When create, we need to iter each udmabuf item, then pin it's range folios, if each item's range folio's count is large, we may fallback each into vmalloc. This patch find the largest range folio in items, then alloc this size's folio array. When pin range folios, reuse this array. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-8-link@vivo.com
2024-09-20udmabuf: remove udmabuf_folioHuan Yang
Currently, udmabuf handles folio by create an unpin list to record each folio obtained from the list and unpinning them when released. To maintain this, many struct have been established. However, maintain this requires a significant amount of memory and iter the list is a substantial overhead, which is not friendly to the CPU cache. When create, we arranged the folio array in the order of pin and set the offset according to pgcnt. So, if record each pinned folio when create, then can easy unpin it. Compare to use list to record it, an array also can do this. Hence, this patch setup a pinned_folios array(size is the pgcnt) to instead of udmabuf_folio struct, it record each folio which pinned when invoke memfd_pin_folios, then unpin folio by iter pinned_folios. Note that, since a folio may be pinned multiple times, each folio can be added to pinned_folios multiple times, depend on how many times the folio has been pinned when create. Compare to udmabuf_folio(24 byte size), a folio pointer is 8 byte, if no large folio - each folio is PAGE_SIZE - and need to unpin when release. So need to record each folio, by this patch, each folio can save 16 byte. But if large folio used, depend on the large folio's number, the pinned_folios array may take more memory, but it still can makes unpin access more cache-friendly. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-7-link@vivo.com
2024-09-20udmabuf: introduce udmabuf init and deinit helperHuan Yang
After udmabuf is allocated, its resources need to be initialized, including various array structures. The current array structure has already been greatly expanded. Also, before udmabuf needs to be kfree, the occupied resources need to be released. This part is repetitive and maybe overlooked. This patch give a helper function when init and deinit, by this, reduce duplicate code. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-6-link@vivo.com
2024-09-20udmabuf: udmabuf_create pin folio codestyle cleanupHuan Yang
This patch aim to simplify the memfd folio pin during the udmabuf create. No functional changes. This patch create a udmabuf_pin_folios function, in this, do the memfd pin folio and then record each pinned folio, offset. This patch simplify the pinned folio record, iter by each pinned folio, and then record each offset in it. Compare to iter by pgcnt, more readable. Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-5-link@vivo.com
2024-09-20udmabuf: fix vmap_udmabuf error page setHuan Yang
Currently vmap_udmabuf set page's array by each folio. But, ubuf->folios is only contain's the folio's head page. That mean we repeatedly mapped the folio head page to the vmalloc area. Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist, so, we can't use page array to map, instead, use pfn array. By this, we removed page usage in udmabuf totally. Fixes: 5e72b2b41a21 ("udmabuf: convert udmabuf driver to use folios") Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-4-link@vivo.com
2024-09-20udmabuf: change folios array from kmalloc to kvmallocHuan Yang
When PAGE_SIZE 4096, MAX_PAGE_ORDER 10, 64bit machine, page_alloc only support 4MB. If above this, trigger this warn and return NULL. udmabuf can change size limit, if change it to 3072(3GB), and then alloc 3GB udmabuf, will fail create. [ 4080.876581] ------------[ cut here ]------------ [ 4080.876843] WARNING: CPU: 3 PID: 2015 at mm/page_alloc.c:4556 __alloc_pages+0x2c8/0x350 [ 4080.878839] RIP: 0010:__alloc_pages+0x2c8/0x350 [ 4080.879470] Call Trace: [ 4080.879473] <TASK> [ 4080.879473] ? __alloc_pages+0x2c8/0x350 [ 4080.879475] ? __warn.cold+0x8e/0xe8 [ 4080.880647] ? __alloc_pages+0x2c8/0x350 [ 4080.880909] ? report_bug+0xff/0x140 [ 4080.881175] ? handle_bug+0x3c/0x80 [ 4080.881556] ? exc_invalid_op+0x17/0x70 [ 4080.881559] ? asm_exc_invalid_op+0x1a/0x20 [ 4080.882077] ? udmabuf_create+0x131/0x400 Because MAX_PAGE_ORDER, kmalloc can max alloc 4096 * (1 << 10), 4MB memory, each array entry is pointer(8byte), so can save 524288 pages(2GB). Further more, costly order(order 3) may not be guaranteed that it can be applied for, due to fragmentation. This patch change udmabuf array use kvmalloc_array, this can fallback alloc into vmalloc, which can guarantee allocation for any size and does not affect the performance of kmalloc allocations. Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-3-link@vivo.com
2024-09-20udmabuf: pre-fault when first page faultHuan Yang
The current udmabuf mmap only fills the physical memory to the corresponding virtual address when the user actually accesses the virtual address. However, the current udmabuf has already obtained and pinned the folio upon completion of the creation.This means that the physical memory has already been acquired, rather than being accessed dynamically. As a result, the page fault has lost its purpose as a demanding page. Due to the fact that page fault requires trapping into kernel mode and filling in when accessing the corresponding virtual address in mmap, when creating a large size udmabuf, this represents a considerable overhead. This patch fill the pfn into page table, and then pre-fault each pfn into vma, when first access. Notice, if anything wrong , we do not return an error during this pre-fault step. However, an error will be returned if the failure occurs when the addr is truly accessed Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Huan Yang <link@vivo.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918025238.2957823-2-link@vivo.com
2024-09-20MAINTAINERS: udmabuf: Add myself as co-maintainer for udmabuf driverVivek Kasireddy
I would like to help maintain the udmabuf driver, in light of the recent changes that converted the driver to use folios instead of pages. Furthermore, I also contribute to Qemu's virtio-gpu module (and UI modules), that are primary users of udmabuf driver. Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240822045806.3563883-1-vivek.kasireddy@intel.com
2024-09-20sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3Jan Stancek
ENGINE API has been deprecated since OpenSSL version 3.0 [1]. Distros have started dropping support from headers and in future it will likely disappear also from library. It has been superseded by the PROVIDER API, so use it instead for OPENSSL MAJOR >= 3. [1] https://github.com/openssl/openssl/blob/master/README-ENGINES.md [jarkko: fixed up alignment issues reported by checkpatch.pl --strict] Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: R Nageswara Sastry <rnsastry@linux.ibm.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20sign-file,extract-cert: avoid using deprecated ERR_get_error_line()Jan Stancek
ERR_get_error_line() is deprecated since OpenSSL 3.0. Use ERR_peek_error_line() instead, and combine display_openssl_errors() and drain_openssl_errors() to a single function where parameter decides if it should consume errors silently. Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: R Nageswara Sastry <rnsastry@linux.ibm.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20sign-file,extract-cert: move common SSL helper functions to a headerJan Stancek
Couple error handling helpers are repeated in both tools, so move them to a common header. Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: R Nageswara Sastry <rnsastry@linux.ibm.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20KEYS: prevent NULL pointer dereference in find_asymmetric_key()Roman Smirnov
In find_asymmetric_key(), if all NULLs are passed in the id_{0,1,2} arguments, the kernel will first emit WARN but then have an oops because id_2 gets dereferenced anyway. Add the missing id_2 check and move WARN_ON() to the final else branch to avoid duplicate NULL checks. Found by Linux Verification Center (linuxtesting.org) with Svace static analysis tool. Cc: stable@vger.kernel.org # v5.17+ Fixes: 7d30198ee24f ("keys: X.509 public key issuer lookup without AKID") Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20KEYS: Remove unused declarationsYue Haibing
These declarations are never implemented, remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-09-20ntb: Force physically contiguous allocation of rx ring buffersDave Jiang
Physical addresses under IOVA on x86 platform are mapped contiguously as a side effect before the patch that removed CONFIG_DMA_REMAP. The NTB rx buffer ring is a single chunk DMA buffer that is allocated against the NTB PCI device. If the receive side is using a DMA device, then the buffers are remapped against the DMA device before being submitted via the dmaengine API. This scheme becomes a problem when the physical memory is discontiguous. When dma_map_page() is called on the kernel virtual address from the dma_alloc_coherent() call, the new IOVA mapping no longer points to all the physical memory allocated due to being discontiguous. Change dma_alloc_coherent() to dma_alloc_attrs() in order to force DMA_ATTR_FORCE_CONTIGUOUS attribute. This is the best fix for the circumstance. A potential future solution may be having the DMA mapping API providing a way to alias an existing IOVA mapping to a new device perhaps. This fix is not to fix the patch pointed to by the fixes tag, but to fix the issue arised in the ntb_transport driver on x86 platforms after the said patch is applied. Reported-by: Jerry Dai <jerry.dai@intel.com> Fixes: f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP") Tested-by: Jerry Dai <jerry.dai@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: ntb_hw_switchtec: Fix use after free vulnerability in ↵Kaixin Wang
switchtec_ntb_remove due to race condition In the switchtec_ntb_add function, it can call switchtec_ntb_init_sndev function, then &sndev->check_link_status_work is bound with check_link_status_work. switchtec_ntb_link_notification may be called to start the work. If we remove the module which will call switchtec_ntb_remove to make cleanup, it will free sndev through kfree(sndev), while the work mentioned above will be used. The sequence of operations that may lead to a UAF bug is as follows: CPU0 CPU1 | check_link_status_work switchtec_ntb_remove | kfree(sndev); | | if (sndev->link_force_down) | // use sndev Fix it by ensuring that the work is canceled before proceeding with the cleanup in switchtec_ntb_remove. Signed-off-by: Kaixin Wang <kxwang23@m.fudan.edu.cn> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: idt: Fix the cacography in ntb_hw_idt.czhang jiao
The word 'swtich' is wrong, so fix it. Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Acked-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20NTB: epf: don't misuse kernel-doc markerRandy Dunlap
Use "/*" instead of "/**" for common C comments to prevent warnings from scripts/kernel-doc. ntb_hw_epf.c:15: warning: expecting prototype for Host side endpoint driver to implement Non(). Prototype was for NTB_EPF_COMMAND() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Cc: ntb@lists.linux.dev Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20NTB: ntb_transport: fix all kernel-doc warningsRandy Dunlap
Fix all kernel-doc warnings in ntb_transport.c. The function parameters for ntb_transport_create_queue() changed, so update them in the kernel-doc comments. Add a Returns: comment for ntb_transport_register_client_dev(). ntb_transport.c:382: warning: No description found for return value of 'ntb_transport_register_client_dev' ntb_transport.c:1984: warning: Excess function parameter 'rx_handler' description in 'ntb_transport_create_queue' ntb_transport.c:1984: warning: Excess function parameter 'tx_handler' description in 'ntb_transport_create_queue' ntb_transport.c:1984: warning: Excess function parameter 'event_handler' description in 'ntb_transport_create_queue' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Cc: ntb@lists.linux.dev Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: Constify struct bus_typeChristophe JAILLET
'struct bus_type' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increase overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 69682 4593 152 74427 122bb drivers/ntb/ntb_transport.o 5847 448 32 6327 18b7 drivers/ntb/core.o After: ===== text data bss dec hex filename 69858 4433 152 74443 122cb drivers/ntb/ntb_transport.o 6007 288 32 6327 18b7 drivers/ntb/core.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb_perf: Fix printk formatMax Hawking
The correct printk format is %pa or %pap, but not %pa[p]. Fixes: 99a06056124d ("NTB: ntb_perf: Fix address err in perf_copy_chunk") Signed-off-by: Max Hawking <maxahawking@sonnenkinder.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir()Jinjie Ruan
The debugfs_create_dir() function returns error pointers. It never returns NULL. So use IS_ERR() to check it. Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2024-09-20crash: Fix riscv64 crash memory reserve dead loopJinjie Ruan
On RISCV64 Qemu machine with 512MB memory, cmdline "crashkernel=500M,high" will cause system stall as below: Zone ranges: DMA32 [mem 0x0000000080000000-0x000000009fffffff] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000080000000-0x000000008005ffff] node 0: [mem 0x0000000080060000-0x000000009fffffff] Initmem setup node 0 [mem 0x0000000080000000-0x000000009fffffff] (stall here) commit 5d99cadf1568 ("crash: fix x86_32 crash memory reserve dead loop bug") fix this on 32-bit architecture. However, the problem is not completely solved. If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on 64-bit architecture, for example, when system memory is equal to CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also occur: -> reserve_crashkernel_generic() and high is true -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX). As Catalin suggested, do not remove the ",high" reservation fallback to ",low" logic which will change arm64's kdump behavior, but fix it by skipping the above situation similar to commit d2f32f23190b ("crash: fix x86_32 crash memory reserve dead loop"). After this patch, it print: cannot allocate crashkernel (size:0x1f400000) Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20240812062017.2674441-1-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20perf/riscv-sbi: Add platform specific firmware event handlingMayuresh Chitale
The SBI v2.0 specification pointed to by the link below reserves the event code 0xffff for platform specific firmware events. Update the driver to be able to parse and program such events. The platform specific firmware events must now be specified in the perf command as below: perf stat -e rCxxx ... where bits[63:62] = 0x3 of the event config indicate a platform specific firmware event and xxx indicate the actual event code which is passed as the event data. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/v2.0/riscv-sbi.pdf Link: https://lore.kernel.org/r/20240812051109.6496-1-mchitale@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20Merge patch series "tools: Add barrier implementations for riscv"Palmer Dabbelt
Charlie Jenkins <charlie@rivosinc.com> says: Add support for riscv specific barrier implementations to the tools tree, so that fence instructions can be emitted for synchronization. * b4-shazam-merge: tools: Optimize ring buffer for riscv tools: Add riscv barrier implementation Link: https://lore.kernel.org/r/20240806-optimize_ring_buffer_read_riscv-v2-0-ca7e193ae198@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20MAINTAINERS: update email for Joel GranadosJoel Granados
Change my contact email in MAINTAINERS and .mailmap to my kernel.org. This in order to avoid cumbersome corporate email policies. Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-09-20spi: atmel-quadspi: Avoid overwriting delay register settingsAlexander Dahl
Previously the MR and SCR registers were just set with the supposedly required values, from cached register values (cached reg content initialized to zero). All parts fixed here did not consider the current register (cache) content, which would make future support of cs_setup, cs_hold, and cs_inactive impossible. Setting SCBR in atmel_qspi_setup() erases a possible DLYBS setting from atmel_qspi_set_cs_timing(). The DLYBS setting is applied by ORing over the current setting, without resetting the bits first. All writes to MR did not consider possible settings of DLYCS and DLYBCT. Signed-off-by: Alexander Dahl <ada@thorsis.com> Fixes: f732646d0ccd ("spi: atmel-quadspi: Add support for configuring CS timing") Link: https://patch.msgid.link/20240918082744.379610-2-ada@thorsis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-20powerpc/pseries/eeh: move pseries_eeh_err_inject() outside CONFIG_DEBUG_FS blockNarayana Murty N
Makes pseries_eeh_err_inject() available even when debugfs is disabled (CONFIG_DEBUG_FS=n). It moves eeh_debugfs_break_device() and eeh_pe_inject_mmio_error() out of the CONFIG_DEBUG_FS block and renames it as eeh_break_device(). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409170509.VWC6jadC-lkp@intel.com/ Fixes: b0e2b828dfca ("powerpc/pseries/eeh: Fix pseries_eeh_err_inject") Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240917132445.3868016-1-nnmlinux@linux.ibm.com
2024-09-20powerpc/vdso32: Fix use of crtsavres for PPC64Christophe Leroy
crtsavres.S content is encloded by a #ifndef CONFIG_PPC64 To be used on VDSO32 on PPC64 it's content must available on PPC64 as well. Replace #ifndef CONFIG_PPC64 by #ifndef __powerpc64__ as __powerpc64__ is not set when building VDSO32 on PPC64. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Closes: https://lore.kernel.org/linuxppc-dev/047b7503-af0c-4bb0-b12a-2f6b1e461752@csgroup.eu/T/ Fixes: b163596a5b6f ("powerpc/vdso32: Add crtsavres") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/aded2b257018fe654db759fdfa4ab1a0b5426b1b.1726772140.git.christophe.leroy@csgroup.eu
2024-09-20io_uring: check if we need to reschedule during overflow flushJens Axboe
In terms of normal application usage, this list will always be empty. And if an application does overflow a bit, it'll have a few entries. However, nothing obviously prevents syzbot from running a test case that generates a ton of overflow entries, and then flushing them can take quite a while. Check for needing to reschedule while flushing, and drop our locks and do so if necessary. There's no state to maintain here as overflows always prune from head-of-list, hence it's fine to drop and reacquire the locks at the end of the loop. Link: https://lore.kernel.org/io-uring/66ed061d.050a0220.29194.0053.GAE@google.com/ Reported-by: syzbot+5fca234bd7eb378ff78e@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-20tools: Optimize ring buffer for riscvCharlie Jenkins
Now that the riscv tools tree supports optimized barriers, use them in the ring buffer. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240806-optimize_ring_buffer_read_riscv-v2-2-ca7e193ae198@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20tools: Add riscv barrier implementationCharlie Jenkins
Many of the other architectures use their custom barrier implementations. Use the barrier code from the kernel sources to optimize barriers in tools. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240806-optimize_ring_buffer_read_riscv-v2-1-ca7e193ae198@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_tPalmer Dabbelt
I recently ended up with a warning on some compilers along the lines of CC kernel/resource.o In file included from include/linux/ioport.h:16, from kernel/resource.c:15: kernel/resource.c: In function 'gfr_start': include/linux/minmax.h:49:37: error: conversion from 'long long unsigned int' to 'resource_size_t' {aka 'unsigned int'} changes value from '17179869183' to '4294967295' [-Werror=overflow] 49 | ({ type ux = (x); type uy = (y); __cmp(op, ux, uy); }) | ^ include/linux/minmax.h:52:9: note: in expansion of macro '__cmp_once_unique' 52 | __cmp_once_unique(op, type, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_)) | ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:161:27: note: in expansion of macro '__cmp_once' 161 | #define min_t(type, x, y) __cmp_once(min, type, x, y) | ^~~~~~~~~~ kernel/resource.c:1829:23: note: in expansion of macro 'min_t' 1829 | end = min_t(resource_size_t, base->end, | ^~~~~ kernel/resource.c: In function 'gfr_continue': include/linux/minmax.h:49:37: error: conversion from 'long long unsigned int' to 'resource_size_t' {aka 'unsigned int'} changes value from '17179869183' to '4294967295' [-Werror=overflow] 49 | ({ type ux = (x); type uy = (y); __cmp(op, ux, uy); }) | ^ include/linux/minmax.h:52:9: note: in expansion of macro '__cmp_once_unique' 52 | __cmp_once_unique(op, type, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_)) | ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:161:27: note: in expansion of macro '__cmp_once' 161 | #define min_t(type, x, y) __cmp_once(min, type, x, y) | ^~~~~~~~~~ kernel/resource.c:1847:24: note: in expansion of macro 'min_t' 1847 | addr <= min_t(resource_size_t, base->end, | ^~~~~ cc1: all warnings being treated as errors which looks like a real problem: our phys_addr_t is only 32 bits now, so having 34-bit masks is just going to result in overflows. Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240731162159.9235-2-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODEHaibo Xu
Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> (arm64 platform) Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240729035958.1957185-1-haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20Merge branch 'next' into for-linusDmitry Torokhov
Prepare input updates for 6.12 merge window.