summaryrefslogtreecommitdiff
path: root/drivers/vdpa
AgeCommit message (Collapse)Author
2024-09-21Merge tag 'mm-stable-2024-09-20-02-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ...
2024-09-09vduse: avoid using __GFP_NOFAILJason Wang
Patch series "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn", v4. __GFP_NOFAIL carries the semantics of never failing, so its callers do not check the return value: %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller cannot handle allocation failures. The allocation could block indefinitely but will never return with failure. Testing for failure is pointless. However, __GFP_NOFAIL can sometimes fail if it exceeds size limits or is used with GFP_ATOMIC/GFP_NOWAIT in a non-sleepable context. This patchset handles illegal using __GFP_NOFAIL together with GFP_ATOMIC lacking __GFP_DIRECT_RECLAIM(without this, we can't do anything to reclaim memory to satisfy the nofail requirement) and improve related document and warnings. The proper size limits for __GFP_NOFAIL will be handled separately after more discussions. This patch (of 3): mm doesn't support non-blockable __GFP_NOFAIL allocation. Because persisting in providing __GFP_NOFAIL services for non-block users who cannot perform direct memory reclaim may only result in an endless busy loop. Therefore, in such cases, the current mm-core may directly return a NULL pointer: static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { ... if (gfp_mask & __GFP_NOFAIL) { /* * All existing users of the __GFP_NOFAIL are blockable, so warn * of any new users that actually require GFP_NOWAIT */ if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) goto fail; ... } ... fail: warn_alloc(gfp_mask, ac->nodemask, "page allocation failure: order:%u", order); got_pg: return page; } Unfortuantely, vpda does that nofail allocation under non-sleepable lock. A possible way to fix that is to move the pages allocation out of the lock into the caller, but having to allocate a huge number of pages and auxiliary page array seems to be problematic as well per Tetsuon: " You should implement proper error handling instead of using __GFP_NOFAIL if count can become large." So I chose another way, which does not release kernel bounce pages when user tries to register userspace bounce pages. Then we can avoid allocating in paths where failure is not expected.(e.g in the release). We pay this for more memory usage as we don't release kernel bounce pages but further optimizations could be done on top. [v-songbaohua@oppo.com: Refine the changelog] Link: https://lkml.kernel.org/r/20240830202823.21478-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240830202823.21478-2-21cnbao@gmail.com Fixes: 6c77ed22880d ("vduse: Support using userspace pages as bounce buffer") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Tested-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hailong.Liu <hailong.liu@oppo.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Eugenio Pérez" <eperezma@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-04dma-mapping: clearly mark DMA ops as an architecture featureChristoph Hellwig
DMA ops are a helper for architectures and not for drivers to override the DMA implementation. Unfortunately driver authors keep ignoring this. Make the fact more clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers overriding their dma_ops depend on that. These drivers should probably be marked broken, but we can give them a bit of a grace period for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6 Acked-by: Robin Murphy <robin.murphy@arm.com>
2024-09-04vdpa_sim: don't select DMA_OPSChristoph Hellwig
vdpa_sim has been fixed to not override the dma_map_ops in commit 6c3d329e6486 ("vdpa_sim: get rid of DMA ops"), so don't select the symbol and don't depend on HAS_DMA. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-29Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "The biggest thing here is the adminq change - but it looks like the only way to avoid headq blocking causing indefinite stalls. This fixes three issues: - Prevent admin commands on one VF blocking another. This prevents a bad VF from blocking a good one, as well as fixing a scalability issue with large # of VFs - Correctly return error on command failure on octeon. We used to treat failed commands as a success. - Fix modpost warning when building virtio_dma_buf. Harmless, but the fix is trivial" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci_modern: remove admin queue serialization lock virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result virtio_pci_modern: pass cmd as an identification token virtio_pci_modern: create admin queue of queried size virtio: create admin queues alongside other virtqueues virtio_pci: pass vq info as an argument to vp_setup_vq() virtio: push out code to vp_avq_index() virtio_pci_modern: treat vp_dev->admin_vq.info.vq pointer as static virtio_pci: introduce vector allocation fallback for slow path virtqueues virtio_pci: pass vector policy enum to vp_find_one_vq_msix() virtio_pci: pass vector policy enum to vp_find_vqs_msix() virtio_pci: simplify vp_request_msix_vectors() call a bit virtio_pci: push out single vq find code to vp_find_one_vq_msix() vdpa/octeon_ep: Fix error code in octep_process_mbox() virtio: add missing MODULE_DESCRIPTION() macro
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-17vdpa/octeon_ep: Fix error code in octep_process_mbox()Dan Carpenter
Return -EINVAL for invalid signatures. Don't return success. Fixes: 8b6c724cdab8 ("virtio: vdpa: vDPA driver for Marvell OCTEON DPU devices") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Message-Id: <623e885b-1a05-479e-ab97-01bcf10bf5b8@stanley.mountain> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()Dragos Tatulea
VQ indices in the range [cur_num_qps, max_vqs) represent queues that have not yet been activated. .set_vq_ready should not activate these VQs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-24-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Don't reset VQs more than necessaryDragos Tatulea
The vdpa device can be reset many times in sequence without any significant state changes in between. Previously this was not a problem: VQs were torn down only on first reset. But after VQ pre-creation was introduced, each reset will delete and re-create the hardware VQs and their associated resources. To solve this problem, avoid resetting hardware VQs if the VQs are still in a blank state. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-23-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Re-create HW VQs under certain conditionsDragos Tatulea
There are a few conditions under which the hardware VQs need a full teardown and setup: - VQ size changed to something else than default value. Hardware VQ size modification is not supported. - User turns off certain device features: mergeable buffers, checksum virtio 1.0 compliance. In these cases, the TIR and RQT need to be re-created. Add a needs_teardown configuration variable and set it when detecting the above scenarios. On next DRIVER_OK, the resources will be torn down first. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-22-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add timeDragos Tatulea
Currently, hardware VQs are created right when the vdpa device gets into DRIVER_OK state. That is easier because most of the VQ state is known by then. This patch switches to creating all VQs and their associated resources at device creation time. The motivation is to reduce the vdpa device live migration downtime by moving the expensive operation of creating all the hardware VQs and their associated resources out of downtime on the destination VM. The VQs are now created in a blank state. The VQ configuration will happen later, on DRIVER_OK. Then the configuration will be applied when the VQs are moved to the Ready state. When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is needed: now that the VQ is already created a resume_vq() will be triggered too early when no mr has been configured yet. Skip calling resume_vq() in this case, let it be handled during DRIVER_OK. For virtio-vdpa, the device configuration is done earlier during .vdpa_dev_add() by vdpa_register_device(). Avoid calling setup_vq_resources() a second time in that case. On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ resource creation + resume time was ~370ms. Now it's down to 60 ms (only VQ config and resume). The measurements were done on a ConnectX6DX based vDPA device. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-21-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Use suspend/resume during VQP changeDragos Tatulea
Resume a VQ if it is already created when the number of VQ pairs increases. This is done in preparation for VQ pre-creation which is coming in a later patch. It is necessary because calling setup_vq() on an already created VQ will return early and will not enable the queue. For symmetry, suspend a VQ instead of tearing it down when the number of VQ pairs decreases. But only if the resume operation is supported. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-20-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Forward error in suspend/resume deviceDragos Tatulea
Start using the suspend/resume_vq() error return codes previously added. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-19-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()Dragos Tatulea
There are a few more places modifying the VQ to Ready directly. Let's consolidate them into resume_vq(). The redundant warnings for resume_vq() errors can also be dropped. There is one special case that needs to be handled for virtio-vdpa: the initialized flag must be set to true earlier in setup_vq() so that resume_vq() doesn't return early. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-18-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Add error code for suspend/resume VQDragos Tatulea
Instead of blindly calling suspend/resume_vqs(), make then return error codes. To keep compatibility, keep suspending or resuming VQs on error and return the last error code. The assumption here is that the error code would be the same. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-17-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()Dragos Tatulea
Until now resume_vq() was used only for the suspend/resume scenario. This change also allows calling resume_vq() to bring it from Init to Ready state (VQ initialization). Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-16-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09vdpa/mlx5: Allow creation of blank VQsDragos Tatulea
Based on the filled flag, create VQs that are filled or blank. Blank VQs will be filled in later through VQ modify. Downstream patches will make use of this to pre-create blank VQs at vdpa device creation. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-15-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09vdpa/mlx5: Set mkey modified flags on all VQsDragos Tatulea
Otherwise, when virtqueues are moved from INIT to READY the latest mkey will not be set appropriately. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-14-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Start off rqt_size with max VQPsDragos Tatulea
Currently rqt_size is initialized during device flag configuration. That's because it is the earliest moment when device knows if MQ (multi queue) is on or off. Shift this configuration earlier to device creation time. This implies that non-MQ devices will have a larger RQT size. But the configuration will still be correct. This is done in preparation for the pre-creation of hardware virtqueues at device add time. When that change will be added, RQT will be created at device creation time so it needs to be initialized to its max size. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-13-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Set an initial size on the VQDragos Tatulea
The virtqueue size is a pre-requisite for setting up any virtqueue resources. For the upcoming optimization of creating virtqueues at device add, the virtqueue size has to be configured. The queue size check in setup_vq() will always be false. So remove it. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-12-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Add support for modifying the VQ features fieldDragos Tatulea
This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-11-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Add support for modifying the virtio_version VQ fieldDragos Tatulea
This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-10-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Rename init_mvqsDragos Tatulea
Function is used to set default values, so name it accordingly. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-9-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09vdpa/mlx5: Clear and reinitialize software VQ data on resetDragos Tatulea
The hardware VQ configuration is mirrored by data in struct mlx5_vdpa_virtqueue . Instead of clearing just a few fields at reset, fully clear the struct and initialize with the appropriate default values. As clear_vqs_ready() is used only during reset, get rid of it. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-8-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Initialize and reset device with one queue pairDragos Tatulea
The virtio spec says that a vdpa device should start off with one queue pair. The driver is already compliant. This patch moves the initialization to device add and reset times. This is done in preparation for the pre-creation of hardware virtqueues at device add time. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-7-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-07-09vdpa/mlx5: Remove duplicate suspend codeDragos Tatulea
Use the dedicated suspend_vqs() function instead. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-6-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Iterate over active VQs during suspend/resumeDragos Tatulea
No need to iterate over max number of VQs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-5-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Drop redundant check in teardown_virtqueues()Dragos Tatulea
The check is done inside teardown_vq(). Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-4-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Drop redundant codeDragos Tatulea
Originally, the second loop initialized the CVQ. But (acde3929492b ("vdpa/mlx5: Use consistent RQT size") initialized all the queues in the first loop, so the second iteration in init_mvqs() is never called because the first one will iterate up to max_vqs. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-3-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Make setup/teardown_vq_resources() symmetricalDragos Tatulea
... by changing the setup_vq_resources() parameter type. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-2-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vdpa/mlx5: Clarify meaning thorough function renameDragos Tatulea
setup_driver()/teardown_driver() are a bit vague. These functions are used for virtqueue resources. Same for alloc_resources()/teardown_resources(): they represent fixed resources that are meant to exist during the device lifetime. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20240626-stage-vdpa-vq-precreate-v2-1-560c491078df@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09vDPA: add missing MODULE_DESCRIPTION() macrosJeff Johnson
With ARCH=x86, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/vdpa.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/ifcvf/ifcvf.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Message-Id: <20240611-md-drivers-vdpa-v1-1-efaf2de15152@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-09virtio: vdpa: vDPA driver for Marvell OCTEON DPU devicesSrujana Challa
This commit introduces a new vDPA driver specifically designed for managing the virtio control plane over the vDPA bus for OCTEON DPU devices. The driver consists of two layers: 1. Octep HW Layer (Octeon Endpoint): Responsible for handling hardware operations and configurations related to the DPU device. 2. Octep Main Layer: Compliant with the vDPA bus framework, this layer implements device operations for the vDPA bus. It handles device probing, bus attachment, vring operations, and other relevant tasks. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Shijith Thotton <sthotton@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240614144659.1776067-1-schalla@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-03driver core: have match() callback in struct bus_type take a const *Greg Kroah-Hartman
In the match() callback, the struct device_driver * should not be changed, so change the function callback to be a const *. This is one step of many towards making the driver core safe to have struct device_driver in read-only memory. Because the match() callback is in all busses, all busses are modified to handle this properly. This does entail switching some container_of() calls to container_of_const() to properly handle the constant *. For some busses, like PCI and USB and HV, the const * is cast away in the match callback as those busses do want to modify those structures at this point in time (they have a local lock in the driver structure.) That will have to be changed in the future if they wish to have their struct device * in read-only-memory. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alex Elder <elder@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-22Merge tag 'stable/vduse-virtio-net' into vhostMichael S. Tsirkin
This adds support for virtio-net to vduse. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vp_vdpa: don't allocate unused msix vectorsYuxue Liu
When there is a ctlq and it doesn't require interrupt callbacks,the original method of calculating vectors wastes hardware msi or msix resources as well as system IRQ resources. When conducting performance testing using testpmd in the guest os, it was found that the performance was lower compared to directly using vfio-pci to passthrough the device In scenarios where the virtio device in the guest os does not utilize interrupts, the vdpa driver still configures the hardware's msix vector. Therefore, the hardware still sends interrupts to the host os. Because of this unnecessary action by the hardware, hardware performance decreases, and it also affects the performance of the host os. Before modification:(interrupt mode) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 0 0 0 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-2 35: 0 0 0 0 PCI-MSI 32771-edge vp-vdpa[0000:00:02.0]-config After modification:(interrupt mode) 32: 0 0 1 7 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 36 0 3 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-config Before modification:(virtio pmd mode for guest os) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 0 0 0 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-2 35: 0 0 0 0 PCI-MSI 32771-edge vp-vdpa[0000:00:02.0]-config After modification:(virtio pmd mode for guest os) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-config To verify the use of the virtio PMD mode in the guest operating system, the following patch needs to be applied to QEMU: https://lore.kernel.org/all/20240408073311.2049-1-yuxue.liu@jaguarmicro.com Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Message-Id: <20240410033020.1310-1-yuxue.liu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vdpa: Convert sprintf/snprintf to sysfs_emitLi Zhijian
Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). sprintf() will be converted as weel if they have. Generally, this patch is generated by make coccicheck M=<path/to/file> MODE=patch \ COCCI=scripts/coccinelle/api/device_attr_show.cocci No functional change intended CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com> CC: virtualization@lists.linux.dev Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Message-Id: <20240314095853.1326111-1-lizhijian@fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vp_vdpa: Fix return value check vp_vdpa_request_irqYuxue Liu
In the vp_vdpa_set_status function, when setting the device status to VIRTIO_CONFIG_S_DRIVER_OK, the vp_vdpa_request_irq function may fail. In such cases, the device status should not be set to DRIVER_OK. Add exception printing to remind the user. Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com> Message-Id: <20240325105448.235-1-gavin.liu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-04-22vDPA: code clean for vhost_vdpa uapiZhu Lingshan
This commit cleans up the uapi for vhost_vdpa by better naming some of the enums which report blk information to user space, and they are not in any official releases yet. Fixes: 1ac61ddfee93 ("vDPA: report virtio-blk flush info to user space") Fixes: ae1374b7f72c ("vDPA: report virtio-block read-only info to user space") Fixes: 330b8aea6924 ("vDPA: report virtio-block max segment size to user space") Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240415111047.1047774-1-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vduse: enable Virtio-net device typeMaxime Coquelin
This patch adds Virtio-net device type to the supported devices types. Initialization fails if the device does not support VIRTIO_F_VERSION_1 feature, in order to guarantee the configuration space is read-only. It also fails with -EPERM if the CAP_NET_ADMIN is missing. Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240109111025.1320976-4-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2024-03-19vduse: Temporarily fail if control queue feature requestedMaxime Coquelin
Virtio-net driver control queue implementation is not safe when used with VDUSE. If the VDUSE application does not reply to control queue messages, it currently ends up hanging the kernel thread sending this command. Some work is on-going to make the control queue implementation robust with VDUSE. Until it is completed, let's fail features check if control-queue feature is requested. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240109111025.1320976-3-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com>
2024-03-19vduse: validate block features only with block devicesMaxime Coquelin
This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240109111025.1320976-2-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-blk flush info to user spaceZhu Lingshan
This commit reports whether a virtio-blk device support cache flush command to user space Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block read-only info to user spaceZhu Lingshan
This commit report read-only information of virtio-blk devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block write zeroes configuration to user spaceZhu Lingshan
This commits reports write zeroes configuration of virtio-block devices to user space, includes: 1)maximum write zeroes sectors size 2)maximum write zeroes segment number Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block discarding configuration to user spaceZhu Lingshan
This commit reports virtio-blk discarding configuration to user space,includes: 1) the maximum discard sectors 2) maximum number of discard segments for the block driver to use 3) the alignment for splitting a discarding request Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block topology info to user spaceZhu Lingshan
This commit allows vDPA reporting topology information of virtio-blk devices to user space, includes: 1) the number of logical blocks per physical block 2) offset of first aligned logical block 3) suggested minimum I/O size in blocks 4) optimal (suggested maximum) I/O size in blocks Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block MQ info to user spaceZhu Lingshan
This commits allows vDPA reporting virtio-block multi-queue configuration to user sapce. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block max segments in a request to user spaceZhu Lingshan
This commit allows vDPA reporting the maximum number of segments in a request of virtio-block devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block block-size to user spaceZhu Lingshan
This commit allows reporting the block size of a virtio-block device to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>