summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2023-04-21vhost_vdpa: fix unmap process in no-batch modeCindy Lu
While using the vdpa device with vIOMMU enabled in the guest VM, when the vdpa device bind to vfio-pci and run testpmd then system will fail to unmap. The test process is Load guest VM --> attach to virtio driver--> bind to vfio-pci driver So the mapping process is 1)batched mode map to normal MR 2)batched mode unmapped the normal MR 3)unmapped all the memory 4)mapped to iommu MR This error happened in step 3). The iotlb was freed in step 2) and the function vhost_vdpa_process_iotlb_msg will return fail Which causes failure. To fix this, we will not remove the AS while the iotlb->nmaps is 0. This will free in the vhost_vdpa_clean Cc: stable@vger.kernel.org Fixes: aaca8373c4b1 ("vhost-vdpa: support ASID based IOTLB API") Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20230420151734.860168-1-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim_blk: support shared backendStefano Garzarella
The vdpa_sim_blk simulator uses a ramdisk as the backend. To test live migration, we need two devices that share the backend to have the data synchronized with each other. Add a new module parameter to make the buffer shared between all devices. The shared_buffer_mutex is used just to ensure that each operation is atomic, but it is up to the user to use the devices knowing that the underlying ramdisk is shared. For example, when we do a migration, the VMM (e.g., QEMU) will guarantee to write to the destination device, only after completing operations with the source device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230407133658.66339-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: move buffer allocation in the devicesStefano Garzarella
Currently, the vdpa_sim core does not use the buffer, but only allocates it. The buffer is used by devices differently, and some future devices may not use it. So let's move all its management inside the devices. Add a new `free` device callback called to clean up the resources allocated by the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230407133658.66339-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/snet: use likely/unlikely macros in hot functionsAlvaro Karsz
- kick callback: most likely that the VQ is ready. - interrupt handlers: most likely that the callback is not NULL. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230409120242.3460074-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/snet: implement kick_vq_with_data callbackAlvaro Karsz
Implement the kick_vq_with_data vDPA callback. On kick, we pass the next available data to the DPU by writing it in the kick offset. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230417083853.375076-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature supportAlvaro Karsz
Add VIRTIO_F_NOTIFICATION_DATA support for vDPA transport. If this feature is negotiated, the driver passes extra data when kicking a virtqueue. A device that offers this feature needs to implement the kick_vq_with_data callback. kick_vq_with_data receives the vDPA device and data. data includes: 16 bits vqn and 16 bits next available index for split virtqueues. 16 bits vqs, 15 least significant bits of next available index and 1 bit next_wrap for packed virtqueues. This patch follows a patch [1] by Viktor Prutyanov which adds support for the MMIO, channel I/O and modern PCI transports. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230413081855.36643-3-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21virtio: add VIRTIO_F_NOTIFICATION_DATA feature supportViktor Prutyanov
According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature indicates that the driver passes extra data along with the queue notifications. In a split queue case, the extra data is 16-bit available index. In a packed queue case, the extra data is 1-bit wrap counter and 15-bit available index. Add support for this feature for MMIO, channel I/O and modern PCI transports. Signed-off-by: Viktor Prutyanov <viktor@daynix.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230413081855.36643-2-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/snet: support the suspend vDPA callbackAlvaro Karsz
When suspend is called, the driver sends a suspend command to the DPU through the control mechanism. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230413073337.31367-3-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vdpa/snet: support getting and setting VQ stateAlvaro Karsz
This patch adds the get_vq_state and set_vq_state vDPA callbacks. In order to get the VQ state, the state needs to be read from the DPU. In order to allow that, the old messaging mechanism is replaced with a new, flexible control mechanism. This mechanism allows to read data from the DPU. The mechanism can be used if the negotiated config version is 2 or higher. If the new mechanism is used when the config version is 1, it will call snet_send_ctrl_msg_old, which is config 1 compatible. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230413073337.31367-2-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21virtio_ring: don't update event idx on get_bufAlbert Huang
In virtio_net, if we disable napi_tx, when we trigger a tx interrupt, the vq->event_triggered will be set to true. It is then never reset until we explicitly call virtqueue_enable_cb_delayed or virtqueue_enable_cb_prepare. If we disable the napi_tx, virtqueue_enable_cb* will only be called when the tx ring is getting relatively empty. Since event_triggered is true, VRING_AVAIL_F_NO_INTERRUPT or VRING_PACKED_EVENT_FLAG_DISABLE will not be set. As a result we update vring_used_event(&vq->split.vring) or vq->packed.vring.driver->off_wrap every time we call virtqueue_get_buf_ctx. This causes more interrupts. To summarize: 1) event_triggered was set to true in vring_interrupt() 2) after this nothing will happen in virtqueue_disable_cb() so VRING_AVAIL_F_NO_INTERRUPT is not set in avail_flags_shadow 3) virtqueue_get_buf_ctx_split() will still think the cb is enabled and then it will publish a new event index To fix: update VRING_AVAIL_F_NO_INTERRUPT or VRING_PACKED_EVENT_FLAG_DISABLE in the vq when we call virtqueue_disable_cb even when event_triggered is true. Tested with iperf: iperf3 tcp stream: vm1 -----------------> vm2 vm2 just receives tcp data stream from vm1, and sends acks to vm1, there are many tx interrupts in vm2. with the patch applied there are just a few tx interrupts. v2->v3: -update the interrupt disable flag even with the event_triggered is set, -instead of checking whether event_triggered is set in -virtqueue_get_buf_ctx_{packed/split}, will cause the drivers which have -not called virtqueue_{enable/disable}_cb to miss notifications. v3->v4: -remove change for -"if (vq->packed.event_flags_shadow != VRING_PACKED_EVENT_FLAG_DISABLE)" -in virtqueue_disable_cb_packed Fixes: 8d622d21d248 ("virtio: fix up virtio_disable_cb") Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230329102300.61000-1-huangjie.albert@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: add support for user VAStefano Garzarella
The new "use_va" module parameter (default: true) is used in vdpa_alloc_device() to inform the vDPA framework that the device supports VA. vringh is initialized to use VA only when "use_va" is true and the user's mm has been bound. So, only when the bus supports user VA (e.g. vhost-vdpa). vdpasim_mm_work_fn work is used to serialize the binding to a new address space when the .bind_mm callback is invoked, and unbinding when the .unbind_mm callback is invoked. Call mmget_not_zero()/kthread_use_mm() inside the worker function to pin the address space only as long as needed, following the documentation of mmget() in include/linux/sched/mm.h: * Never use this function to pin this address space for an * unbounded/indefinite amount of time. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131734.45943-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: replace the spinlock with a mutex to protect the stateStefano Garzarella
The spinlock we use to protect the state of the simulator is sometimes held for a long time (for example, when devices handle requests). This also prevents us from calling functions that might sleep (such as kthread_flush_work() in the next patch), and thus having to release and retake the lock. For these reasons, let's replace the spinlock with a mutex that gives us more flexibility. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131730.45920-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: use kthread workerStefano Garzarella
Let's use our own kthread to run device jobs. This allows us more flexibility, especially we can attach the kthread to the user address space when vDPA uses user's VA. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131725.45908-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa_sim: make devices agnostic for work managementStefano Garzarella
Let's move work management inside the vdpa_sim core. This way we can easily change how we manage the works, without having to change the devices each time. Acked-by: Eugenio Pérez Martin <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131721.45886-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vringh: support VA with iotlbStefano Garzarella
vDPA supports the possibility to use user VA in the iotlb messages. So, let's add support for user VA in vringh to use it in the vDPA simulators. Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131716.45855-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vringh: define the stride used for translationStefano Garzarella
Define a macro to be reused in the different parts of the code. Useful for the next patches where we add more arrays to manage also translations with user VA. Suggested-by: Eugenio Perez Martin <eperezma@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-5-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vringh: replace kmap_atomic() with kmap_local_page()Stefano Garzarella
kmap_atomic() is deprecated in favor of kmap_local_page() since commit f3ba3c710ac5 ("mm/highmem: Provide kmap_local*"). With kmap_local_page() the mappings are per thread, CPU local, can take page-faults, and can be called from any context (including interrupts). Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. kmap_atomic() is implemented like a kmap_local_page() which also disables page-faults and preemption (the latter only for !PREEMPT_RT kernels, otherwise it only disables migration). The code within the mappings/un-mappings in getu16_iotlb() and putu16_iotlb() don't depend on the above-mentioned side effects of kmap_atomic(), so that mere replacements of the old API with the new one is all that is required (i.e., there is no need to explicitly add calls to pagefault_disable() and/or preempt_disable()). This commit reuses a "boiler plate" commit message from Fabio, who has already did this change in several places. Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-4-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-vdpa: use bind_mm/unbind_mm device callbacksStefano Garzarella
When the user call VHOST_SET_OWNER ioctl and the vDPA device has `use_va` set to true, let's call the bind_mm callback. In this way we can bind the device to the user address space and directly use the user VA. The unbind_mm callback is called during the release after stopping the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vringh: fix typos in the vringh_init_* documentationStefano Garzarella
Replace `userpace` with `userspace`. Cc: Simon Horman <simon.horman@corigine.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230331080208.17002-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support specifying bounce buffer size via sysfsXie Yongji
As discussed in [1], this adds sysfs interface to support specifying bounce buffer size in virtio-vdpa case. It would be a performance tuning parameter for high throughput workloads. [1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/ Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-12-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Delay iova domain creationXie Yongji
Delay creating iova domain until the vduse device is registered to vdpa bus. This is a preparation for adding sysfs interface to support specifying bounce buffer size for the iova domain. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-11-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Signal vq trigger eventfd directly if possibleXie Yongji
Now the vdpa callback will associate an trigger eventfd in some cases. For performance reasons, VDUSE can signal it directly during irq injection. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-10-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa: Add eventfd for the vdpa callbackXie Yongji
Add eventfd for the vdpa callback so that user can signal it directly instead of triggering the callback. It will be used for vhost-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-9-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Add sysfs interface for irq callback affinityXie Yongji
Add sysfs interface for each vduse virtqueue to get/set the affinity for irq callback. This might be useful for performance tuning when the irq callback affinity mask contains more than one CPU. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-8-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support get_vq_affinity callbackXie Yongji
This implements get_vq_affinity callback so that the virtio-blk driver can build the blk-mq queues based on the irq callback affinity. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-7-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support set_vq_affinity callbackXie Yongji
Since virtio-vdpa bus driver already support interrupt affinity spreading mechanism, let's implement the set_vq_affinity callback to bring it to vduse device. After we get the virtqueue's affinity, we can spread IRQs between CPUs in the affinity mask, in a round-robin manner, to run the irq callback. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Refactor allocation for vduse virtqueuesXie Yongji
Allocate memory for vduse virtqueues one by one instead of doing one allocation for all of them. This is a preparation for adding sysfs interface for virtqueues. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio-vdpa: Support interrupt affinity spreading mechanismXie Yongji
To support interrupt affinity spreading mechanism, this makes use of group_cpus_evenly() to create an irq callback affinity mask for each virtqueue of vdpa device. Then we will unify set_vq_affinity callback to pass the affinity to the vdpa device driver. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-4-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vdpa: Add set/get_vq_affinity callbacks in vdpa_config_opsXie Yongji
This introduces set/get_vq_affinity callbacks in vdpa_config_ops to support virtqueue affinity management for vdpa device drivers. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-3-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/mlx5: Extend driver support for new featuresEli Cohen
Extend the possible list for features that can be supported by firmware. Note that different versions of firmware may or may not support these features. The driver is made aware of them by querying the firmware. While doing this, improve the code so we use enum names instead of hard coded numerical values. The new features supported by the driver are the following: VIRTIO_NET_F_MRG_RXBUF VIRTIO_NET_F_HOST_ECN VIRTIO_NET_F_GUEST_ECN VIRTIO_NET_F_GUEST_TSO6 VIRTIO_NET_F_GUEST_TSO4 Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
2023-04-21vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by defaultEli Cohen
Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF. Current firmware versions show degradation in packet rate when using MRG_RXBUF. Users who favor memory saving over packet rate could enable this feature but we want to keep it off by default. One can still enable it when creating the vdpa device using vdpa tool by providing features that include it. For example: $ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2023-04-21virtio_ring: Allow non power of 2 sizes for packed virtqueueFeng Liu
According to the Virtio Specification, the Queue Size parameter of a virtqueue corresponds to the maximum number of descriptors in that queue, and it does not have to be a power of 2 for packed virtqueues. However, the virtio_pci_modern driver enforced a power of 2 check for virtqueue sizes, which is unnecessary and restrictive for packed virtuqueue. Split virtqueue still needs to check the virtqueue size is power_of_2 which has been done in vring_alloc_queue_split of the virtio_ring layer. To validate this change, we tested various virtqueue sizes for packed rings, including 128, 256, 512, 100, 200, 500, and 1000, with CONFIG_PAGE_POISONING enabled, and all tests passed successfully. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230315185458.11638-2-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Reduce vhost_scsi_mutex useMike Christie
We on longer need to hold the vhost_scsi_mutex the entire time we set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related to the tpg list, the port link/unlink functions use the tv_tpg_mutex while accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer queue events after the virtqueue's backend has been cleared and flushed, and we don't drop our refcount to the tpg until after we have stopped cmds and wait for outstanding cmds to complete. This moves the vhost_scsi_mutex use to it's documented use of being used to access the tpg list. We then don't need to hold it while a flush is being performed causing other device's vhost_scsi_set_endpoint and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a flakey device. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Drop vhost_scsi_mutex use in port calloutsMike Christie
We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared tpg->vhost_scsi and it can't be freed while they are using. However, we currently set the tpg->vhost_scsi pointer while holding tv_tpg_mutex. So, we can just hold that while calling vhost_scsi_hotplug/hotunplug. We then don't need to hold the vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing a flush which could cause the LUN map/unmap to have to wait on another device's flush. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Check for a cleared backend before queueing an eventMike Christie
We currenly hold the vhost_scsi_mutex while clearing the endpoint and while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed from uder us, and to make sure anything queued is handled by the full call in vhost_scsi_clear_endpoint. This patch removes the need for the vhost_scsi_mutex for the latter case. In the next patches, we won't hold the vhost_scsi_mutex while flushing so this patch adds a check for the clearing of the virtqueue from vhost_scsi_clear_endpoint. We then know that once vhost_scsi_clear_endpoint has cleared the backend that no new events will be queued, and the flush after the vhost_vq_set_backend(vq, NULL) call will see everything that's been queued to that point. So the flush will then handle all events without the need for the vhost_scsi_mutex. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Drop device mutex use in vhost_scsi_do_plugMike Christie
We don't need the device mutex in vhost_scsi_do_plug because: 1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not change on us and the vhost_scsi can't be freed from under us if it was set. 2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint has completed that vhost_scsi_do_plug can't send new events and any queued ones have completed. So this patch drops the device mutex use in vhost_scsi_do_plug. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Delay releasing our refcount on the tpgMike Christie
We currently hold the vhost_scsi_mutex the entire time we are running vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents userspace from being able to free the se_tpg from under us after we have called target_undepend_item. However, it forces management operations for for other devices to have to wait on a flakey device's: vhost_scsi_clear_endpoint -> vhost_scsi_flush() call which can which can take a long time. This moves the target_undepend_item call and the tpg unsetup code to after we have stopped new IO from starting up and after we have waited on running IO. We can then release our refcount on the tpg and session knowing our device is no longer accessing them. We can then drop the vhost_scsi_mutex use during thee flush call in later patches in this set, when we have removed other reasons for holding it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio_ring: Use const to annotate read-only pointer paramsFeng Liu
Add const to make the read-only pointer parameters clear, similar to many existing functions. To implement this change, the commit also introduces the use of `container_of_const` to implement `to_vvq`, which ensures the const-ness of read-only parameters and avoids accidental modification of their members. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Message-Id: <20230310053428.3376-4-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio_ring: Avoid using inline for small functionsFeng Liu
According to kernel coding style [1], defining inline functions is not necessary and beneficial for simple functions. Hence clean up the code by removing the inline keyword. It is verified with GCC 12.2.0, the generated code with/without inline is same. Additionally tested with pktgen and iperf, and verified the result, the pps test results are the same in the cases of with/without inline. Iperf and pps of pktgen for virtio-net didn't change before and after the change. [1] https://www.kernel.org/doc/html/v6.2-rc3/process/coding-style.html#the-inline-disease Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20230310053428.3376-3-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost: use struct_size and size_add to compute flex array sizesJacob Keller
The vhost_get_avail_size and vhost_get_used_size functions compute the size of structures with flexible array members with an additional 2 bytes if the VIRTIO_RING_F_EVENT_IDX feature flag is set. Convert these functions to use struct_size() and size_add() instead of coding the calculation by hand. This ensures that the calculations will saturate at SIZE_MAX rather than overflowing. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Message-Id: <20230227214127.3678392-1-jacob.e.keller@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/mlx5: Avoid losing link state updatesEli Cohen
Current code ignores link state updates if VIRTIO_NET_F_STATUS was not negotiated. However, link state updates could be received before feature negotiation was completed , therefore causing link state events to be lost, possibly leaving the link state down. Modify the code so link state notifier is registered after DRIVER_OK was negotiated and carry the registration only if VIRTIO_NET_F_STATUS was negotiated. Unregister the notifier when the device is reset. Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230417110343.138319-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-15Merge tag 'ubifs-for-linus-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI fixes from Richard Weinberger: - Fix failure to attach when vid_hdr offset equals the (sub)page size - Fix for a deadlock in UBI's worker thread * tag 'ubifs-for-linus-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size ubi: Fix deadlock caused by recursively holding work_sem
2023-04-15Merge tag 'i2c-for-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Just two driver fixes" * tag 'i2c-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: ocores: generate stop condition after timeout in polling mode i2c: mchp-pci1xxxx: Update Timing registers
2023-04-15Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "One small fix to SCSI Enclosure Services to fix a regression caused by another recent fix" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ses: Handle enclosure with just a primary component gracefully
2023-04-15Merge tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fix from Jens Axboe: "A single NVMe quirk entry addition" * tag 'block-6.3-2023-04-14' of git://git.kernel.dk/linux: nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
2023-04-14Merge tag 'acpi-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These add two ACPI-related quirks: - Add a quirk to force StorageD3Enable on AMD Picasso systems (Mario Limonciello) - Add an ACPI IRQ override quirk for ASUS ExpertBook B1502CBA (Paul Menzel)" * tag 'acpi-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CBA ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable
2023-04-14Merge tag 'pm-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Make the amd-pstate cpufreq driver take all of the possible combinations of the 'old' and 'new' status values correctly while changing the operation mode via sysfs (Wyes Karny)" * tag 'pm-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: amd-pstate: Fix amd_pstate mode switch
2023-04-14Merge tag 'thermal-6.3-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Modify the Intel thermal throttling code to avoid updating unsupported status clearing mask bits which causes the kernel to complain about unchecked MSR access (Srinivas Pandruvada)" * tag 'thermal-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits
2023-04-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "We had a fairly slow cycle on the rc side this time, here are the accumulated fixes, mostly in drivers: - irdma should not generate extra completions during flushing - Fix several memory leaks - Do not get confused in irdma's iwarp mode if IPv6 is present - Correct a link speed calculation in mlx5 - Increase the EQ/WQ limits on erdma as they are too small for big applications - Use the right math for erdma's inline mtt feature - Make erdma probing more robust to boot time ordering differences - Fix a KMSAN crash in CMA due to uninitialized qkey" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Fix GID entry ref leak when create_ah fails RDMA/cma: Allow UD qp_type to join multicast only RDMA/erdma: Defer probing if netdevice can not be found RDMA/erdma: Inline mtt entries into WQE if supported RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192 RDMA/erdma: Fix some typos IB/mlx5: Add support for 400G_8X lane speed RDMA/irdma: Add ipv4 check to irdma_find_listener() RDMA/irdma: Increase iWARP CM default rexmit count RDMA/irdma: Fix memory leak of PBLE objects RDMA/irdma: Do not generate SW completions for NOPs
2023-04-14Merge branch 'acpi-x86'Rafael J. Wysocki
Merge a quirk to force StorageD3Enable on AMD Picasso systems (Mario Limonciello). * acpi-x86: ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable