Age | Commit message (Collapse) | Author |
|
As discussed in [1], this adds sysfs interface to support
specifying bounce buffer size in virtio-vdpa case. It would
be a performance tuning parameter for high throughput workloads.
[1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-12-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Delay creating iova domain until the vduse device is
registered to vdpa bus.
This is a preparation for adding sysfs interface to
support specifying bounce buffer size for the iova
domain.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-11-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Now the vdpa callback will associate an trigger
eventfd in some cases. For performance reasons,
VDUSE can signal it directly during irq injection.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-10-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Add eventfd for the vdpa callback so that user
can signal it directly instead of triggering the
callback. It will be used for vhost-vdpa case.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-9-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
Add sysfs interface for each vduse virtqueue to
get/set the affinity for irq callback. This might
be useful for performance tuning when the irq callback
affinity mask contains more than one CPU.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-8-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
This implements get_vq_affinity callback so that
the virtio-blk driver can build the blk-mq queues
based on the irq callback affinity.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-7-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
Since virtio-vdpa bus driver already support interrupt
affinity spreading mechanism, let's implement the
set_vq_affinity callback to bring it to vduse device.
After we get the virtqueue's affinity, we can spread
IRQs between CPUs in the affinity mask, in a round-robin
manner, to run the irq callback.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-6-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
Allocate memory for vduse virtqueues one by one instead of
doing one allocation for all of them.
This is a preparation for adding sysfs interface for virtqueues.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-5-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
To support interrupt affinity spreading mechanism,
this makes use of group_cpus_evenly() to create
an irq callback affinity mask for each virtqueue
of vdpa device. Then we will unify set_vq_affinity
callback to pass the affinity to the vdpa device driver.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-4-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
This introduces set/get_vq_affinity callbacks in
vdpa_config_ops to support virtqueue affinity
management for vdpa device drivers.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-3-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Export group_cpus_evenly() so that some modules
can make use of it to group CPUs evenly according
to NUMA and CPU locality.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-2-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Extend the possible list for features that can be supported by firmware.
Note that different versions of firmware may or may not support these
features. The driver is made aware of them by querying the firmware.
While doing this, improve the code so we use enum names instead of hard
coded numerical values.
The new features supported by the driver are the following:
VIRTIO_NET_F_MRG_RXBUF
VIRTIO_NET_F_HOST_ECN
VIRTIO_NET_F_GUEST_ECN
VIRTIO_NET_F_GUEST_TSO6
VIRTIO_NET_F_GUEST_TSO4
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
|
|
Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.
Current firmware versions show degradation in packet rate when using
MRG_RXBUF. Users who favor memory saving over packet rate could enable
this feature but we want to keep it off by default.
One can still enable it when creating the vdpa device using vdpa tool by
providing features that include it.
For example:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
|
|
According to the Virtio Specification, the Queue Size parameter of a
virtqueue corresponds to the maximum number of descriptors in that
queue, and it does not have to be a power of 2 for packed virtqueues.
However, the virtio_pci_modern driver enforced a power of 2 check for
virtqueue sizes, which is unnecessary and restrictive for packed
virtuqueue.
Split virtqueue still needs to check the virtqueue size is power_of_2
which has been done in vring_alloc_queue_split of the virtio_ring layer.
To validate this change, we tested various virtqueue sizes for packed
rings, including 128, 256, 512, 100, 200, 500, and 1000, with
CONFIG_PAGE_POISONING enabled, and all tests passed successfully.
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20230315185458.11638-2-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We on longer need to hold the vhost_scsi_mutex the entire time we
set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related
to the tpg list, the port link/unlink functions use the tv_tpg_mutex while
accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer
queue events after the virtqueue's backend has been cleared and flushed,
and we don't drop our refcount to the tpg until after we have stopped
cmds and wait for outstanding cmds to complete.
This moves the vhost_scsi_mutex use to it's documented use of being used
to access the tpg list. We then don't need to hold it while a flush is
being performed causing other device's vhost_scsi_set_endpoint
and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a
flakey device.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-8-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and
vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared
tpg->vhost_scsi and it can't be freed while they are using.
However, we currently set the tpg->vhost_scsi pointer while holding
tv_tpg_mutex. So, we can just hold that while calling
vhost_scsi_hotplug/hotunplug. We then don't need to hold the
vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing
a flush which could cause the LUN map/unmap to have to wait on another
device's flush.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-7-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We currenly hold the vhost_scsi_mutex while clearing the endpoint and
while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed
from uder us, and to make sure anything queued is handled by the
full call in vhost_scsi_clear_endpoint.
This patch removes the need for the vhost_scsi_mutex for the latter
case. In the next patches, we won't hold the vhost_scsi_mutex while
flushing so this patch adds a check for the clearing of the virtqueue
from vhost_scsi_clear_endpoint. We then know that once
vhost_scsi_clear_endpoint has cleared the backend that no new events
will be queued, and the flush after the vhost_vq_set_backend(vq, NULL)
call will see everything that's been queued to that point. So the flush
will then handle all events without the need for the vhost_scsi_mutex.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-6-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We don't need the device mutex in vhost_scsi_do_plug because:
1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not
change on us and the vhost_scsi can't be freed from under us if it was
set.
2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while
holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint
has completed that vhost_scsi_do_plug can't send new events and any
queued ones have completed.
So this patch drops the device mutex use in vhost_scsi_do_plug.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-5-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We currently hold the vhost_scsi_mutex the entire time we are running
vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents
userspace from being able to free the se_tpg from under us after we have
called target_undepend_item. However, it forces management operations for
for other devices to have to wait on a flakey device's:
vhost_scsi_clear_endpoint -> vhost_scsi_flush()
call which can which can take a long time.
This moves the target_undepend_item call and the tpg unsetup code to after
we have stopped new IO from starting up and after we have waited on
running IO. We can then release our refcount on the tpg and session
knowing our device is no longer accessing them. We can then drop the
vhost_scsi_mutex use during thee flush call in later patches in this set,
when we have removed other reasons for holding it.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-4-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Add const to make the read-only pointer parameters clear, similar to
many existing functions.
To implement this change, the commit also introduces the use of
`container_of_const` to implement `to_vvq`, which ensures the const-ness
of read-only parameters and avoids accidental modification of their
members.
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Message-Id: <20230310053428.3376-4-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
According to kernel coding style [1], defining inline functions is not
necessary and beneficial for simple functions. Hence clean up the code
by removing the inline keyword.
It is verified with GCC 12.2.0, the generated code with/without inline
is same. Additionally tested with pktgen and iperf, and verified the
result, the pps test results are the same in the cases of with/without
inline.
Iperf and pps of pktgen for virtio-net didn't change before and after
the change.
[1]
https://www.kernel.org/doc/html/v6.2-rc3/process/coding-style.html#the-inline-disease
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20230310053428.3376-3-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
When we get help information, we should return directly, and we should not
execute test cases. Move the exit() directly into the help() function and
remove it from case '?'.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_822CEBEB925205EA1573541CD1C2604F4805@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Replace eight spaces with Tab.
Signed-off-by: Rong Tao <rtoax@foxmail.com>
Message-Id: <tencent_89579C514BC4020324A1A4ACA44B5B95BB07@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Group some variables based on their sizes to reduce hole and avoid padding.
On x86_64, this shrinks the size of 'struct virtqueue'
from 72 to 68 bytes.
It saves a few bytes of memory.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <8f3d2e49270a2158717e15008e7ed7228196ba02.1676707807.git.christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Lafreniere <peter@n8pjl.ca>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
|
|
The vhost_get_avail_size and vhost_get_used_size functions compute the size
of structures with flexible array members with an additional 2 bytes if the
VIRTIO_RING_F_EVENT_IDX feature flag is set. Convert these functions to use
struct_size() and size_add() instead of coding the calculation by hand.
This ensures that the calculations will saturate at SIZE_MAX rather than
overflowing.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Message-Id: <20230227214127.3678392-1-jacob.e.keller@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Current code ignores link state updates if VIRTIO_NET_F_STATUS was not
negotiated. However, link state updates could be received before feature
negotiation was completed , therefore causing link state events to be
lost, possibly leaving the link state down.
Modify the code so link state notifier is registered after DRIVER_OK was
negotiated and carry the registration only if
VIRTIO_NET_F_STATUS was negotiated. Unregister the notifier when the
device is reset.
Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230417110343.138319-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Some powernv machines use IGB for networking, so build the driver in to
enable net booting such machines.
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230420052149.1328094-1-mpe@ellerman.id.au
|
|
Unlikely that anyone is still regularly using JFS, drop it from the
defconfig.
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230420051609.1324201-2-mpe@ellerman.id.au
|
|
The ext4 code will mount ext2 filesystems, no need to build in both.
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230420051609.1324201-1-mpe@ellerman.id.au
|
|
Rather than trying to keep multiple configs up to date, make
pseries_defconfig an alias for ppc64le_guest_defconfig.
NOTE, pseries_defconfig was a big endian config, but this commit
switches it to little endian.
Almost all distros are ppc64le these days, so little endian is much more
likely to be what a user wants when they build for "pseries".
For an actual big endian guest, use ppc64_guest_defconfig.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-32-mpe@ellerman.id.au
|
|
Rather than trying to keep multiple configs up to date, make
pseries_le_defconfig an alias for ppc64le_guest_defconfig.
ppc64le_guest_defconfig should work in all cases that
pseries_le_defconfig currently does, but if not we can update it.
Move pseries_le_defconfig down in the Makefile, so it appears after
ppc64le_guest_defconfig in the help output.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-31-mpe@ellerman.id.au
|
|
Incorporate the generic kvm_guest.config into the powerpc guest configs,
ppc64[le]_guest_defconfig.
This brings in some useful options, in particular 9P support, and also
means future additions to the generic file will be automatically picked
up by the powerpc configs.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-30-mpe@ellerman.id.au
|
|
These drivers are sometimes required to have functional networking in a
guest, so build them in when building ppc64[le]_guest_defconfig.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-29-mpe@ellerman.id.au
|
|
Add device mapper options for test coverage and in case folks are
booting systems that require them.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-28-mpe@ellerman.id.au
|
|
Like pseries & powernv_defconfig, enable PSTORE.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-27-mpe@ellerman.id.au
|
|
Most other configs, and distros enable it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-26-mpe@ellerman.id.au
|
|
Copy powernv_defconfig and enable BLK_DEV_NVME.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-25-mpe@ellerman.id.au
|
|
No reason to use this anymore.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-24-mpe@ellerman.id.au
|
|
Modern distros use SHA512 for module signing.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-23-mpe@ellerman.id.au
|
|
Distros enable it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-22-mpe@ellerman.id.au
|
|
Distros enable it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-21-mpe@ellerman.id.au
|
|
Fedora enables DEBUG_VM, which has led to occasions where a VM_BUG_ON()
is not caught by upstream testing, but rather is first found in Fedora,
which is not how it's meant to be.
PAGE_OWNER & PAGE_POISONING both need to be enabled on the kernel
command line, so should not add much overhead in normal operation.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-20-mpe@ellerman.id.au
|
|
This is enabled in some of the other powerpc configs, and can be useful
for debugging, so enable it in ppc64[le]_defconfig.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-19-mpe@ellerman.id.au
|
|
All built as modules, so the tests only happen when the modules are
loaded, not affecting normal boot time.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-18-mpe@ellerman.id.au
|
|
Fedora, CentOS, RHEL & SUSE all enable it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-17-mpe@ellerman.id.au
|
|
Multiple distros enable these.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-16-mpe@ellerman.id.au
|
|
Fedora & CentOS enable these.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-15-mpe@ellerman.id.au
|
|
Most distros enable these. In particular Fedore uses zram in the default
install.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-14-mpe@ellerman.id.au
|
|
Most distros enable this.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-13-mpe@ellerman.id.au
|
|
Distros enable these options.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230414132415.821564-12-mpe@ellerman.id.au
|