summaryrefslogtreecommitdiff
path: root/drivers/vhost
AgeCommit message (Collapse)Author
2021-04-07iommu: remove DOMAIN_ATTR_GEOMETRYChristoph Hellwig
The geometry information can be trivially queried from the iommu_domain struture. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Li Yang <leoyang.li@nxp.com> Link: https://lore.kernel.org/r/20210401155256.298656-16-hch@lst.de Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-03-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Some fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: set v->config_ctx to NULL if eventfd_ctx_fdget() fails vhost-vdpa: fix use-after-free of v->config_ctx vhost: Fix vhost_vq_reset() vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation vdpa_sim: Skip typecasting from void* virtio: remove export for virtio_config_{enable, disable} virtio-mmio: Use to_virtio_mmio_device() to simply code vdpa: set the virtqueue num during register
2021-03-14vhost-vdpa: set v->config_ctx to NULL if eventfd_ctx_fdget() failsStefano Garzarella
In vhost_vdpa_set_config_call() if eventfd_ctx_fdget() fails the 'v->config_ctx' contains an error instead of a valid pointer. Since we consider 'v->config_ctx' valid if it is not NULL, we should set it to NULL in this case to avoid to use an invalid pointer in other functions such as vhost_vdpa_config_put(). Fixes: 776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa") Cc: lingshan.zhu@intel.com Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210311135257.109460-3-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-03-14vhost-vdpa: fix use-after-free of v->config_ctxStefano Garzarella
When the 'v->config_ctx' eventfd_ctx reference is released we didn't set it to NULL. So if the same character device (e.g. /dev/vhost-vdpa-0) is re-opened, the 'v->config_ctx' is invalid and calling again vhost_vdpa_config_put() causes use-after-free issues like the following refcount_t underflow: refcount_t: underflow; use-after-free. WARNING: CPU: 2 PID: 872 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0 RIP: 0010:refcount_warn_saturate+0xae/0xf0 Call Trace: eventfd_ctx_put+0x5b/0x70 vhost_vdpa_release+0xcd/0x150 [vhost_vdpa] __fput+0x8e/0x240 ____fput+0xe/0x10 task_work_run+0x66/0xa0 exit_to_user_mode_prepare+0x118/0x120 syscall_exit_to_user_mode+0x21/0x50 ? __x64_sys_close+0x12/0x40 do_syscall_64+0x45/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa") Cc: lingshan.zhu@intel.com Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210311135257.109460-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-03-14vhost: Fix vhost_vq_reset()Laurent Vivier
vhost_reset_is_le() is vhost_init_is_le(), and in the case of cross-endian legacy, vhost_init_is_le() depends on vq->user_be. vq->user_be is set by vhost_disable_cross_endian(). But in vhost_vq_reset(), we have: vhost_reset_is_le(vq); vhost_disable_cross_endian(vq); And so user_be is used before being set. To fix that, reverse the lines order as there is no other dependency between them. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Link: https://lore.kernel.org/r/20210312140913.788592-1-lvivier@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-14vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocationGautam Dawar
When qemu with vhost-vdpa netdevice is run for the first time, it works well. But after the VM is powered off, the next qemu run causes kernel panic due to a NULL pointer dereference in irq_bypass_register_producer(). When the VM is powered off, vhost_vdpa_clean_irq() misses on calling irq_bypass_unregister_producer() for irq 0 because of the existing check. This leaves stale producer nodes, which are reset in vhost_vring_call_reset() when vhost_dev_init() is invoked during the second qemu run. As the node member of struct irq_bypass_producer is also initialized to zero, traversal on the producers list causes crash due to NULL pointer dereference. Fixes: 2cf1ba9a4d15c ("vhost_vdpa: implement IRQ offloading in vhost_vdpa") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211711 Signed-off-by: Gautam Dawar <gdawar.xilinx@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210224114845.104173-1-gdawar.xilinx@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-04scsi: target: vhost-scsi: Use LIO wq cmd submission helperMike Christie
Convert vhost-scsi to use the LIO wq cmd submission helper. Link: https://lore.kernel.org/r/20210227170006.5077-18-michael.christie@oracle.com Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: target: core: Add gfp_t arg to target_cmd_init_cdb()Mike Christie
tcm_loop could be used like a normal block device, so we can't use GFP_KERNEL and should use GFP_NOIO. This adds a gfp_t arg to target_cmd_init_cdb() and converts the users. For every driver but loop GFP_KERNEL is kept. This will also be useful in subsequent patches where loop needs to do target_submit_prep() from interrupt context to get a ref to the se_device, and so it will need to use GFP_ATOMIC. Link: https://lore.kernel.org/r/20210227170006.5077-16-michael.christie@oracle.com Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: target: vhost-scsi: Convert to new submission APIMike Christie
target_submit_cmd_map_sgls() is being removed, so convert vhost-scsi to the new submission API. This has it use target_init_cmd(), target_submit_prep(), target_submit() because we need to have LIO core map sgls which is now done in target_submit_prep(), and in the next patches we will do the target_submit step from the LIO workqueue. Note: vhost-scsi never calls target_stop_session() so target_submit_cmd_map_sgls() never failed (in the new API target_init_cmd() handles target_stop_session() being called when cmds are being submitted). If it were to have used target_stop_session() and got an error, we would have hit a refcount bug like xen and usb, because it does: if (rc < 0) { transport_send_check_condition_and_sense(se_cmd, TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, 0); transport_generic_free_cmd(se_cmd, 0); } transport_send_check_condition_and_sense() calls queue_status which does transport_generic_free_cmd(), and then we do an extra transport_generic_free_cmd() call above which would have dropped the refcount to -1 and the refcount code would spit out errors. Link: https://lore.kernel.org/r/20210227170006.5077-12-michael.christie@oracle.com Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: sbitmap: Move allocation hint into sbitmapMing Lei
Allocation hint should have belonged to sbitmap. Also, when sbitmap's depth is high and there is no need to use mulitple wakeup queues, user can benefit from percpu allocation hint too. Move allocation hint into sbitmap, then SCSI device queue can benefit from allocation hint when converting to plain sbitmap. Convert vhost/scsi.c to use sbitmap allocation with percpu alloc hint. This is more efficient than the previous approach. Link: https://lore.kernel.org/r/20210122023317.687987-5-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: virtualization@lists.linux-foundation.org Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04scsi: sbitmap: Maintain allocation round_robin in sbitmapMing Lei
Currently the allocation round_robin info is maintained by sbitmap_queue. However, bit allocation really belongs to sbitmap. Move it there. Link: https://lore.kernel.org/r/20210122023317.687987-3-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: virtualization@lists.linux-foundation.org Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - new vdpa features to allow creation and deletion of new devices - virtio-blk support per-device queue depth - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (31 commits) virtio-input: add multi-touch support virtio_mmio: fix one typo vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() virtio_net: Fix fall-through warnings for Clang virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT. virtio-blk: support per-device queue depth virtio_vdpa: don't warn when fail to disable vq virtio-pci: introduce modern device module virito-pci-modern: rename map_capability() to vp_modern_map_capability() virtio-pci-modern: introduce helper to get notification offset virtio-pci-modern: introduce helper for getting queue nums virtio-pci-modern: introduce helper for setting/geting queue size virtio-pci-modern: introduce helper to set/get queue_enable virtio-pci-modern: introduce vp_modern_queue_address() virtio-pci-modern: introduce vp_modern_set_queue_vector() virtio-pci-modern: introduce vp_modern_generation() virtio-pci-modern: introduce helpers for setting and getting features virtio-pci-modern: introduce helpers for setting and getting status virtio-pci-modern: introduce helper to set config vector virtio-pci-modern: introduce vp_modern_remove() ...
2021-02-23vhost scsi: alloc vhost_scsi with kvzalloc() to avoid delayDongli Zhang
The size of 'struct vhost_scsi' is order-10 (~2.3MB). It may take long time delay by kzalloc() to compact memory pages by retrying multiple times when there is a lack of high-order pages. As a result, there is latency to create a VM (with vhost-scsi) or to hotadd vhost-scsi-based storage. The prior commit 595cb754983d ("vhost/scsi: use vmalloc for order-10 allocation") prefers to fallback only when really needed, while this patch allocates with kvzalloc() with __GFP_NORETRY implicitly set to avoid retrying memory pages compact for multiple times. The __GFP_NORETRY is implicitly set if the size to allocate is more than PAGE_SZIE and when __GFP_RETRY_MAYFAIL is not explicitly set. Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Link: https://lore.kernel.org/r/20210123080853.4214-1-dongli.zhang@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-01-19vhost_net: avoid tx queue stuck when sendmsg failsYunjian Wang
Currently the driver doesn't drop a packet which can't be sent by tun (e.g bad packet). In this case, the driver will always process the same packet lead to the tx queue stuck. To fix this issue: 1. in the case of persistent failure (e.g bad packet), the driver can skip this descriptor by ignoring the error. 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), the driver schedules the worker to try again. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/1610685980-38608-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflict in CAN on file rename. Conflicts: drivers/net/can/m_can/tcan4x5x-core.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07tap/tun: add skb_zcopy_init() helper for initialization.Jonathan Lemon
Replace direct assignments with skb_zcopy_init() for zerocopy cases where a new skb is initialized, without changing the reference counts. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07skbuff: Add skb parameter to the ubuf zerocopy callbackJonathan Lemon
Add an optional skb parameter to the zerocopy callback parameter, which is passed down from skb_zcopy_clear(). This gives access to the original skb, which is needed for upcoming RX zero-copy error handling. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull vhost bugfix from Michael Tsirkin: "This fixes configs with vhost vsock behind a viommu" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/vsock: add IOTLB API support
2021-01-05Merge tag 'net-5.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, wireless and bpf trees. Current release - regressions: - mt76: fix NULL pointer dereference in mt76u_status_worker and mt76s_process_tx_queue - net: ipa: fix interconnect enable bug Current release - always broken: - netfilter: fixes possible oops in mtype_resize in ipset - ath11k: fix number of coding issues found by static analysis tools and spurious error messages Previous releases - regressions: - e1000e: re-enable s0ix power saving flows for systems with the Intel i219-LM Ethernet controllers to fix power use regression - virtio_net: fix recursive call to cpus_read_lock() to avoid a deadlock - ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst() - sysfs: take the rtnl lock around XPS configuration - xsk: fix memory leak for failed bind and rollback reservation at NETDEV_TX_BUSY - r8169: work around power-saving bug on some chip versions Previous releases - always broken: - dcb: validate netlink message in DCB handler - tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS to prevent unnecessary retries - vhost_net: fix ubuf refcount when sendmsg fails - bpf: save correct stopping point in file seq iteration - ncsi: use real net-device for response handler - neighbor: fix div by zero caused by a data race (TOCTOU) - bareudp: fix use of incorrect min_headroom size and a false positive lockdep splat from the TX lock - mvpp2: - clear force link UP during port init procedure in case bootloader had set it - add TCAM entry to drop flow control pause frames - fix PPPoE with ipv6 packet parsing - fix GoP Networking Complex Control config of port 3 - fix pkt coalescing IRQ-threshold configuration - xsk: fix race in SKB mode transmit with shared cq - ionic: account for vlan tag len in rx buffer len - stmmac: ignore the second clock input, current clock framework does not handle exclusive clock use well, other drivers may reconfigure the second clock Misc: - ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow existing scheme" * tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event r8169: work around power-saving bug on some chip versions net: usb: qmi_wwan: add Quectel EM160R-GL selftests: mlxsw: Set headroom size of correct port net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag ibmvnic: fix: NULL pointer dereference. docs: networking: packet_mmap: fix old config reference docs: networking: packet_mmap: fix formatting for C macros vhost_net: fix ubuf refcount incorrectly when sendmsg fails bareudp: Fix use of incorrect min_headroom size bareudp: set NETIF_F_LLTX flag net: hdlc_ppp: Fix issues when mod_timer is called while timer is running atlantic: remove architecture depends erspan: fix version 1 check in gre_parse_header() net: hns: fix return value check in __lb_other_process() net: sched: prevent invalid Scell_log shift count net: neighbor: fix a crash caused by mod zero ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst() ...
2021-01-04vhost_net: fix ubuf refcount incorrectly when sendmsg failsYunjian Wang
Currently the vhost_zerocopy_callback() maybe be called to decrease the refcount when sendmsg fails in tun. The error handling in vhost handle_tx_zerocopy() will try to decrease the same refcount again. This is wrong. To fix this issue, we only call vhost_net_ubuf_put() when vq->heads[nvq->desc].len == VHOST_DMA_IN_PROGRESS. Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1609207308-20544-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-27vhost/vsock: add IOTLB API supportStefano Garzarella
This patch enables the IOTLB API support for vhost-vsock devices, allowing the userspace to emulate an IOMMU for the guest. These changes were made following vhost-net, in details this patch: - exposes VIRTIO_F_ACCESS_PLATFORM feature and inits the iotlb device if the feature is acked - implements VHOST_GET_BACKEND_FEATURES and VHOST_SET_BACKEND_FEATURES ioctls - calls vq_meta_prefetch() before vq processing to prefetch vq metadata address in IOTLB - provides .read_iter, .write_iter, and .poll callbacks for the chardev; they are used by the userspace to exchange IOTLB messages This patch was tested specifying "intel_iommu=strict" in the guest kernel command line. I used QEMU with a patch applied [1] to fix a simple issue (that patch was merged in QEMU v5.2.0): $ qemu -M q35,accel=kvm,kernel-irqchip=split \ -drive file=fedora.qcow2,format=qcow2,if=virtio \ -device intel-iommu,intremap=on,device-iotlb=on \ -device vhost-vsock-pci,guest-cid=3,iommu_platform=on,ats=on [1] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg09077.html Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201223143638.123417-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-12-24Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - vdpa sim refactoring - virtio mem: Big Block Mode support - misc cleanus, fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (61 commits) vdpa: Use simpler version of ida allocation vdpa: Add missing comment for virtqueue count uapi: virtio_ids: add missing device type IDs from OASIS spec uapi: virtio_ids.h: consistent indentions vhost scsi: fix error return code in vhost_scsi_set_endpoint() virtio_ring: Fix two use after free bugs virtio_net: Fix error code in probe() virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed() tools/virtio: add barrier for aarch64 tools/virtio: add krealloc_array tools/virtio: include asm/bug.h vdpa/mlx5: Use write memory barrier after updating CQ index vdpa: split vdpasim to core and net modules vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov vdpa_sim: make vdpasim->buffer size configurable vdpa_sim: use kvmalloc to allocate vdpasim->buffer vdpa_sim: set vringh notify callback vdpa_sim: add set_config callback in vdpasim_dev_attr vdpa_sim: add get_config callback in vdpasim_dev_attr vdpa_sim: make 'config' generic and usable for any device type ...
2020-12-18vhost scsi: fix error return code in vhost_scsi_set_endpoint()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 25b98b64e284 ("vhost scsi: alloc cmds per vq instead of session") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1607071411-33484-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vhost_vdpa: switch to vmemdup_user()Tian Tao
Replace opencoded alloc and copy with vmemdup_user() Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/1605057288-60400-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-12-15vhost: vringh: use krealloc_array()Bartosz Golaszewski
Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/20201109110654.12547-5-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christian Knig <christian.koenig@amd.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Rientjes <rientjes@google.com> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: James Morse <james.morse@arm.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-02vhost_vdpa: return -EFAULT if copy_to_user() failsDan Carpenter
The copy_to_user() function returns the number of bytes remaining to be copied but this should return -EFAULT to the user. Fixes: 1b48dc03e575 ("vhost: vdpa: report iova range") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8c32z5EtDsMyyIL@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-11-25vhost-vdpa: fix page pinning leakage in error path (rework)Si-Wei Liu
Pinned pages are not properly accounted particularly when mapping error occurs on IOTLB update. Clean up dangling pinned pages for the error path. The memory usage for bookkeeping pinned pages is reverted to what it was before: only one single free page is needed. This helps reduce the host memory demand for VM with a large amount of memory, or in the situation where host is running short of free memory. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1604618793-4681-1-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-11-25vringh: fix vringh_iov_push_*() documentationStefano Garzarella
vringh_iov_push_*() functions don't have 'dst' parameter, but have the 'src' parameter. Replace 'dst' description with 'src' description. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201116161653.102904-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-25vhost scsi: fix lun reset completion handlingMike Christie
vhost scsi owns the scsi se_cmd but lio frees the se_cmd->se_tmr before calling release_cmd, so while with normal cmd completion we can access the se_cmd from the vhost work, we can't do the same with se_cmd->se_tmr. This has us copy the tmf response in vhost_scsi_queue_tm_rsp to our internal vhost-scsi tmf struct for when it gets sent to the guest from our worker thread. Fixes: efd838fec17b ("vhost scsi: Add support for LUN resets.") Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/1605887459-3864-1-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-15vhost scsi: Add support for LUN resets.Mike Christie
In newer versions of virtio-scsi we just reset the timer when an a command times out, so TMFs are never sent for the cmd time out case. However, in older kernels and for the TMF inject cases, we can still get resets and we end up just failing immediately so the guest might see the device get offlined and IO errors. For the older kernel cases, we want the same end result as the modern virtio-scsi driver where we let the lower levels fire their error handling and handle the problem. And at the upper levels we want to wait. This patch ties the LUN reset handling into the LIO TMF code which will just wait for outstanding commands to complete like we are doing in the modern virtio-scsi case. Note: I did not handle the ABORT case to keep this simple. For ABORTs LIO just waits on the cmd like how it does for the RESET case. If an ABORT fails, the guest OS ends up escalating to LUN RESET, so in the end we get the same behavior where we wait on the outstanding cmds. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1604986403-4931-6-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-11-15vhost scsi: add lun parser helperMike Christie
Move code to parse lun from req's lun_buf to helper, so tmf code can use it in the next patch. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-11-15vhost scsi: fix cmd completion raceMike Christie
We might not do the final se_cmd put from vhost_scsi_complete_cmd_work. When the last put happens a little later then we could race where vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends more IO, and vhost_scsi_handle_vq runs but does not find any free cmds. This patch has us delay completing the cmd until the last lio core ref is dropped. We then know that once we signal to the guest that the cmd is completed that if it queues a new command it will find a free cmd. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Maurizio Lombardi <mlombard@redhat.com> Link: https://lore.kernel.org/r/1604986403-4931-4-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-11-15vhost scsi: alloc cmds per vq instead of sessionMike Christie
We currently are limited to 256 cmds per session. This leads to problems where if the user has increased virtqueue_size to more than 2 or cmd_per_lun to more than 256 vhost_scsi_get_tag can fail and the guest will get IO errors. This patch moves the cmd allocation to per vq so we can easily match whatever the user has specified for num_queues and virtqueue_size/cmd_per_lun. It also makes it easier to control how much memory we preallocate. For cases, where perf is not as important and we can use the current defaults (1 vq and 128 cmds per vq) memory use from preallocate cmds is cut in half. For cases, where we are willing to use more memory for higher perf, cmd mem use will now increase as the num queues and queue depth increases. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1604986403-4931-3-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Maurizio Lombardi <mlombard@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-11-15vhost: add helper to check if a vq has been setupMike Christie
This adds a helper check if a vq has been setup. The next patches will use this when we move the vhost scsi cmd preallocation from per session to per vq. In the per vq case, we only want to allocate cmds for vqs that have actually been setup and not for all the possible vqs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1604986403-4931-2-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-30vdpa: handle irq bypass register failure caseZhu Lingshan
LKP considered variable 'ret' in vhost_vdpa_setup_vq_irq() as a unused variable, so suggest we remove it. Actually it stores return value of irq_bypass_register_producer(), but we did not check it, we should handle the failure case. This commit will print a message if irq bypass register producer fail, in this case, vqs still remain functional. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20201023104046.404794-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-10-30Revert "vhost-vdpa: fix page pinning leakage in error path"Michael S. Tsirkin
This reverts commit 7ed9e3d97c32d969caded2dfb6e67c1a2cc5a0b1. The patch creates a DoS risk since it can result in a high order memory allocation. Fixes: 7ed9e3d97c32d ("vhost-vdpa: fix page pinning leakage in error path") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-30vhost_vdpa: Return -EFAULT if copy_from_user() failsDan Carpenter
The copy_to/from_user() functions return the number of bytes which we weren't able to copy but the ioctl should return -EFAULT if they fail. Fixes: a127c5bbb6a8 ("vhost-vdpa: fix backend feature ioctls") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20201023120853.GI282278@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Acked-by: Jason Wang <jasowang@redhat.com>
2020-10-23vhost: vdpa: report iova rangeJason Wang
This patch introduces a new ioctl for vhost-vdpa device that can report the iova range by the device. For device that implements get_iova_range() method, we fetch it from the vDPA device. If device doesn't implement get_iova_range() but depends on platform IOMMU, we will query via DOMAIN_ATTR_GEOMETRY, otherwise [0, ULLONG_MAX] is assumed. For safety, this patch also rules out the map request which is not in the valid range. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20201023090043.14430-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-21vhost_vdpa: remove unnecessary spin_lock in vhost_vring_callZhu Lingshan
This commit removed unnecessary spin_locks in vhost_vring_call and related operations. Because we manipulate irq offloading contents in vhost_vdpa ioctl code path which is already protected by dev mutex and vq mutex. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/20200909065234.3313-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-10-21vringh: fix __vringh_iov() when riov and wiov are differentStefano Garzarella
If riov and wiov are both defined and they point to different objects, only riov is initialized. If the wiov is not initialized by the caller, the function fails returning -EINVAL and printing "Readable desc 0x... after writable" error message. This issue happens when descriptors have both readable and writable buffers (eg. virtio-blk devices has virtio_blk_outhdr in the readable buffer and status as last byte of writable buffer) and we call __vringh_iov() to get both type of buffers in two different iovecs. Let's replace the 'else if' clause with 'if' to initialize both riov and wiov if they are not NULL. As checkpatch pointed out, we also avoid crashing the kernel when riov and wiov are both NULL, replacing BUG() with WARN_ON() and returning -EINVAL. Fixes: f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201008204256.162292-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-21vhost_vdpa: Fix duplicate included kernel.hTian Tao
linux/kernel.h is included more than once, Remove the one that isn't necessary. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/1600131102-24672-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-10-21vhost: reduce stack usage in log_usedLi Wang
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/vhost/vhost.c: In function log_used: drivers/vhost/vhost.c:1906:1: warning: the frame size of 1040 bytes is larger than 1024 bytes Signed-off-by: Li Wang <li.wang@windriver.com> Link: https://lore.kernel.org/r/1600106889-25013-1-git-send-email-li.wang@windriver.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-10-04vhost-vdpa: fix page pinning leakage in error pathSi-Wei Liu
Pinned pages are not properly accounted particularly when mapping error occurs on IOTLB update. Clean up dangling pinned pages for the error path. As the inflight pinned pages, specifically for memory region that strides across multiple chunks, would need more than one free page for book keeping and accounting. For simplicity, pin pages for all memory in the IOVA range in one go rather than have multiple pin_user_pages calls to make up the entire region. This way it's easier to track and account the pages already mapped, particularly for clean-up in the error path. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1601701330-16837-3-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-04vhost-vdpa: fix vhost_vdpa_map() on error conditionSi-Wei Liu
vhost_vdpa_map() should remove the iotlb entry just added if the corresponding mapping fails to set up properly. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1601701330-16837-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-04vhost: Don't call log_access_ok() when using IOTLBGreg Kurz
When the IOTLB device is enabled, the log_guest_addr that is passed by userspace to the VHOST_SET_VRING_ADDR ioctl, and which is then written to vq->log_addr, is a GIOVA. All writes to this address are translated by log_user() to writes to an HVA, and then ultimately logged through the corresponding GPAs in log_write_hva(). No logging will ever occur with vq->log_addr in this case. It is thus wrong to pass vq->log_addr and log_guest_addr to log_access_vq() which assumes they are actual GPAs. Introduce a new vq_log_used_access_ok() helper that only checks accesses to the log for the used structure when there isn't an IOTLB device around. Signed-off-by: Greg Kurz <groug@kaod.org> Link: https://lore.kernel.org/r/160171933385.284610.10189082586063280867.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-04vhost: Use vhost_get_used_size() in vhost_vring_set_addr()Greg Kurz
The open-coded computation of the used size doesn't take the event into account when the VIRTIO_RING_F_EVENT_IDX feature is present. Fix that by using vhost_get_used_size(). Fixes: 8ea8cf89e19a ("vhost: support event index") Cc: stable@vger.kernel.org Signed-off-by: Greg Kurz <groug@kaod.org> Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-04vhost: Don't call access_ok() when using IOTLBGreg Kurz
When the IOTLB device is enabled, the vring addresses we get from userspace are GIOVAs. It is thus wrong to pass them down to access_ok() which only takes HVAs. Access validation is done at prefetch time with IOTLB. Teach vq_access_ok() about that by moving the (vq->iotlb) check from vhost_vq_access_ok() to vq_access_ok(). This prevents vhost_vring_set_addr() to fail when verifying the accesses. No behavior change for vhost_vq_access_ok(). BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084 Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Cc: jasowang@redhat.com CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-09-30vhost vdpa: fix vhost_vdpa_open error handlingMike Christie
We must free the vqs array in the open failure path, because vhost_vdpa_release will not be called. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1600712588-9514-2-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-09-24vhost-vdpa: fix backend feature ioctlsJason Wang
Commit 653055b9acd4 ("vhost-vdpa: support get/set backend features") introduces two malfunction backend features ioctls: 1) the ioctls was blindly added to vring ioctl instead of vdpa device ioctl 2) vhost_set_backend_features() was called when dev mutex has already been held which will lead a deadlock This patch fixes the above issues. Cc: Eli Cohen <elic@nvidia.com> Reported-by: Zhu Lingshan <lingshan.zhu@intel.com> Fixes: 653055b9acd4 ("vhost-vdpa: support get/set backend features") Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200907104343.31141-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-09-24vhost: Fix documentationEli Cohen
Fix documentation to match actual function prototypes "end" used instead of "last". Fix that. Signed-off-by: Eli Cohen <eli@mellanox.com> Link: https://lore.kernel.org/r/20200630052925.GA157062@mtl-vdi-166.wap.labs.mlnx Signed-off-by: Michael S. Tsirkin <mst@redhat.com>