summaryrefslogtreecommitdiff
path: root/include/uapi/linux/vfio.h
AgeCommit message (Collapse)Author
2025-04-01Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Two significant new items: - Allow reporting IOMMU HW events to userspace when the events are clearly linked to a device. This is linked to the VIOMMU object and is intended to be used by a VMM to forward HW events to the virtual machine as part of emulating a vIOMMU. ARM SMMUv3 is the first driver to use this mechanism. Like the existing fault events the data is delivered through a simple FD returning event records on read(). - PASID support in VFIO. The "Process Address Space ID" is a PCI feature that allows the device to tag all PCI DMA operations with an ID. The IOMMU will then use the ID to select a unique translation for those DMAs. This is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to manage each PASID entry. The support is generic so any VFIO user could attach any translation to a PASID, and the support should work on ARM SMMUv3 as well. AMD requires additional driver work. Some minor updates, along with fixes: - Prevent using nested parents with fault's, no driver support today - Put a single "cookie_type" value in the iommu_domain to indicate what owns the various opaque owner fields" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits) iommufd: Test attach before detaching pasid iommufd: Fix iommu_vevent_header tables markup iommu: Convert unreachable() to BUG() iommufd: Balance veventq->num_events inc/dec iommufd: Initialize the flags of vevent in iommufd_viommu_report_event() iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices ida: Add ida_find_first_range() iommufd/selftest: Add coverage for iommufd pasid attach/detach iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add a helper to get test device iommufd/selftest: Add set_dev_pasid in mock iommu iommufd: Allow allocating PASID-compatible domain iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support iommufd: Enforce PASID-compatible domain for RID iommufd: Support pasid attach/replace iommufd: Enforce PASID-compatible domain in PASID path iommufd/device: Add pasid_attach array to track per-PASID attach ...
2025-03-25vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasidYi Liu
This extends the VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT ioctls to attach/detach a given pasid of a vfio device to/from an IOAS/HWPT. Link: https://patch.msgid.link/r/20250321180143.8468-4-yi.l.liu@intel.com Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-18s390/vfio-ap: Signal eventfd when guest AP configuration is changedRorie Reyes
In this patch, an eventfd object is created by the vfio_ap device driver and used to notify userspace when a guests's AP configuration is dynamically changed. Such changes may occur whenever: * An adapter, domain or control domain is assigned to or unassigned from a mediated device that is attached to the guest. * A queue assigned to the mediated device that is attached to a guest is bound to or unbound from the vfio_ap device driver. This can occur either by manually binding/unbinding the queue via the vfio_ap driver's sysfs bind/unbind attribute interfaces, or because an adapter, domain or control domain assigned to the mediated device is added to or removed from the host's AP configuration via an SE/HMC The purpose of this patch is to provide immediate notification of changes made to a guest's AP configuration by the vfio_ap driver. This will enable the guest to take immediate action rather than relying on polling or some other inefficient mechanism to detect changes to its AP configuration. Note that there are corresponding QEMU patches that will be shipped along with this patch (see vfio-ap: Report vfio-ap configuration changes) that will pick up the eventfd signal. Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20250107183645.90082-1-rreyes@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-11-05vfio: Remove VFIO_TYPE1_NESTING_IOMMUJason Gunthorpe
This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no significant visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known. The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 and manage invalidation was never completed, or at least never upstreamed, rendering this part useless dead code. Upstream has now settled on iommufd as the uAPI for controlling nested translation. Choosing the stage 2 implementation should be done by through the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation. Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op. Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-04vfio/migration: Add debugfs to live migration driverLongfang Liu
There are multiple devices, software and operational steps involved in the process of live migration. An error occurred on any node may cause the live migration operation to fail. This complex process makes it very difficult to locate and analyze the cause when the function fails. In order to quickly locate the cause of the problem when the live migration fails, I added a set of debugfs to the vfio live migration driver. +-------------------------------------------+ | | | | | QEMU | | | | | +---+----------------------------+----------+ | ^ | ^ | | | | | | | | v | v | +---------+--+ +---------+--+ |src vfio_dev| |dst vfio_dev| +--+---------+ +--+---------+ | ^ | ^ | | | | v | | | +-----------+----+ +-----------+----+ |src dev debugfs | |dst dev debugfs | +----------------+ +----------------+ The entire debugfs directory will be based on the definition of the CONFIG_DEBUG_FS macro. If this macro is not enabled, the interfaces in vfio.h will be empty definitions, and the creation and initialization of the debugfs directory will not be executed. vfio | +---<dev_name1> | +---migration | +--state | +---<dev_name2> +---migration +--state debugfs will create a public root directory "vfio" file. then create a dev_name() file for each live migration device. First, create a unified state acquisition file of "migration" in this device directory. Then, create a public live migration state lookup file "state". Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20231106072225.28577-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-09-28vfio: use __aligned_u64 in struct vfio_device_ioeventfdStefan Hajnoczi
The memory layout of struct vfio_device_ioeventfd is architecture-dependent due to a u64 field and a struct size that is not a multiple of 8 bytes: - On x86_64 the struct size is padded to a multiple of 8 bytes. - On x32 the struct size is only a multiple of 4 bytes, not 8. - Other architectures may vary. Use __aligned_u64 to make memory layout consistent. This reduces the chance that 32-bit userspace on a 64-bit kernel breakage. This patch increases the struct size on x32 but this is safe because of the struct's argsz field. The kernel may grow the struct as long as it still supports smaller argsz values from userspace (e.g. applications compiled against older kernel headers). The code that uses struct vfio_device_ioeventfd already works correctly when the struct size grows, so only the struct definition needs to be changed. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20230918205617.1478722-4-stefanha@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-09-28vfio: use __aligned_u64 in struct vfio_device_gfx_plane_infoStefan Hajnoczi
The memory layout of struct vfio_device_gfx_plane_info is architecture-dependent due to a u64 field and a struct size that is not a multiple of 8 bytes: - On x86_64 the struct size is padded to a multiple of 8 bytes. - On x32 the struct size is only a multiple of 4 bytes, not 8. - Other architectures may vary. Use __aligned_u64 to make memory layout consistent. This reduces the chance of 32-bit userspace on a 64-bit kernel breakage. This patch increases the struct size on x32 but this is safe because of the struct's argsz field. The kernel may grow the struct as long as it still supports smaller argsz values from userspace (e.g. applications compiled against older kernel headers). Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20230918205617.1478722-3-stefanha@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-09-28vfio: trivially use __aligned_u64 for ioctl structsStefan Hajnoczi
u64 alignment behaves differently depending on the architecture and so <uapi/linux/types.h> offers __aligned_u64 to achieve consistent behavior in kernel<->userspace ABIs. There are structs in <uapi/linux/vfio.h> that can trivially be updated to __aligned_u64 because the struct sizes are multiples of 8 bytes. There is no change in memory layout on any CPU architecture and therefore this change is safe. The commits that follow this one handle the trickier cases where explanation about ABI breakage is necessary. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20230918205617.1478722-2-stefanha@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-09-28vfio: add bus master feature to device feature ioctlNipun Gupta
add bus mastering control to VFIO_DEVICE_FEATURE IOCTL. The VFIO user can use this feature to enable or disable the Bus Mastering of a device bound to VFIO. Co-developed-by: Shubham Rohila <shubham.rohila@amd.com> Signed-off-by: Shubham Rohila <shubham.rohila@amd.com> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Link: https://lore.kernel.org/r/20230915045423.31630-2-nipun.gupta@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-30Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "On top of the vfio updates is built some new iommufd functionality: - IOMMU_HWPT_ALLOC allows userspace to directly create the low level IO Page table objects and affiliate them with IOAS objects that hold the translation mapping. This is the basic functionality for the normal IOMMU_DOMAIN_PAGING domains. - VFIO_DEVICE_ATTACH_IOMMUFD_PT can be used to replace the current translation. This is wired up to through all the layers down to the driver so the driver has the ability to implement a hitless replacement. This is necessary to fully support guest behaviors when emulating HW (eg guest atomic change of translation) - IOMMU_GET_HW_INFO returns information about the IOMMU driver HW that owns a VFIO device. This includes support for the Intel iommu, and patches have been posted for all the other server IOMMU. Along the way are a number of internal items: - New iommufd kernel APIs: iommufd_ctx_has_group(), iommufd_device_to_ictx(), iommufd_device_to_id(), iommufd_access_detach(), iommufd_ctx_from_fd(), iommufd_device_replace() - iommufd now internally tracks iommu_groups as it needs some per-group data - Reorganize how the internal hwpt allocation flows to have more robust locking - Improve the access interfaces to support detach and replace of an IOAS from an access - New selftests and a rework of how the selftests creates a mock iommu driver to be more like a real iommu driver" Link: https://lore.kernel.org/lkml/ZO%2FTe6LU1ENf58ZW@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (34 commits) iommufd/selftest: Don't leak the platform device memory when unloading the module iommu/vt-d: Implement hw_info for iommu capability query iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl iommufd: Add IOMMU_GET_HW_INFO iommu: Add new iommu op to get iommu hardware information iommu: Move dev_iommu_ops() to private header iommufd: Remove iommufd_ref_to_users() iommufd/selftest: Make the mock iommu driver into a real driver vfio: Support IO page table replacement iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage iommufd: Add iommufd_access_replace() API iommufd: Use iommufd_access_change_ioas in iommufd_access_destroy_object iommufd: Add iommufd_access_change_ioas(_id) helpers iommufd: Allow passing in iopt_access_list_id to iopt_remove_access() vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC iommufd/selftest: Return the real idev id from selftest mock_domain iommufd: Add IOMMU_HWPT_ALLOC iommufd/selftest: Test iommufd_device_replace() iommufd: Make destroy_rwsem use a lock class per object type ...
2023-08-17vfio: align capability structuresStefan Hajnoczi
The VFIO_DEVICE_GET_INFO, VFIO_DEVICE_GET_REGION_INFO, and VFIO_IOMMU_GET_INFO ioctls fill in an info struct followed by capability structs: +------+---------+---------+-----+ | info | caps[0] | caps[1] | ... | +------+---------+---------+-----+ Both the info and capability struct sizes are not always multiples of sizeof(u64), leaving u64 fields in later capability structs misaligned. Userspace applications currently need to handle misalignment manually in order to support CPU architectures and programming languages with strict alignment requirements. Make life easier for userspace by ensuring alignment in the kernel. This is done by padding info struct definitions and by copying out zeroes after capability structs that are not aligned. The new layout is as follows: +------+---------+---+---------+-----+ | info | caps[0] | 0 | caps[1] | ... | +------+---------+---+---------+-----+ In this example caps[0] has a size that is not multiples of sizeof(u64), so zero padding is added to align the subsequent structure. Adding zero padding between structs does not break the uapi. The memory layout is specified by the info.cap_offset and caps[i].next fields filled in by the kernel. Applications use these field values to locate structs and are therefore unaffected by the addition of zero padding. Note that code that copies out info structs with padding is updated to always zero the struct and copy out as many bytes as userspace requested. This makes the code shorter and avoids potential information leaks by ensuring padding is initialized. Originally-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230809203144.2880050-1-stefanha@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-28vfio: Support IO page table replacementNicolin Chen
Now both the physical path and the emulated path can support an IO page table replacement. Call iommufd_device_replace/iommufd_access_replace(), when vdev->iommufd_attached is true. Also update the VFIO_DEVICE_ATTACH_IOMMUFD_PT kdoc in the uAPI header. Link: https://lore.kernel.org/r/b5f01956ff161f76aa52c95b0fa1ad6eaca95c4a.1690523699.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-25vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PTYi Liu
This adds ioctl for userspace to attach device cdev fd to and detach from IOAS/hw_pagetable managed by iommufd. VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS or hw_pagetable managed by iommufd. Attach can be undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close. VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached IOAS or hw_pagetable managed by iommufd. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-24-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-25vfio: Add VFIO_DEVICE_BIND_IOMMUFDYi Liu
This adds ioctl for userspace to bind device cdev fd to iommufd. VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA control provided by the iommufd. open_device op is called after bind_iommufd op. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230718135551.6592-23-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-25vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESETYi Liu
This is the way user to invoke hot-reset for the devices opened by cdev interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing hot-reset for cdev devices. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718105542.4138-11-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-25vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdevYi Liu
This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx of the cdev device to check the ownership of the other affected devices. When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate the values returned are IOMMUFD devids rather than group IDs as used when accessing vfio devices through the conventional vfio group interface. Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported in this mode if all of the devices affected by the hot-reset are owned by either virtue of being directly bound to the same iommufd context as the calling device, or implicitly owned via a shared IOMMU group. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718105542.4138-9-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-06-30Merge tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Adjust log levels for common messages (Oleksandr Natalenko, Alex Williamson) - Support for dynamic MSI-X allocation (Reinette Chatre) - Enable and report PCIe AtomicOp Completer capabilities (Alex Williamson) - Cleanup Kconfigs for vfio bus drivers (Alex Williamson) - Add support for CDX bus based devices (Nipun Gupta) - Fix race with concurrent mdev initialization (Eric Farman) * tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio: vfio/mdev: Move the compat_class initialization to module init vfio/cdx: add support for CDX bus vfio/fsl: Create Kconfig sub-menu vfio/platform: Cleanup Kconfig vfio/pci: Cleanup Kconfig vfio/pci-core: Add capability for AtomicOp completer support vfio/pci: Also demote hiding standard cap messages vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X vfio/pci: Support dynamic MSI-X vfio/pci: Probe and store ability to support dynamic MSI-X vfio/pci: Use bitfield for struct vfio_pci_core_device flags vfio/pci: Update stale comment vfio/pci: Remove interrupt context counter vfio/pci: Use xarray for interrupt context storage vfio/pci: Move to single error path vfio/pci: Prepare for dynamic interrupt context storage vfio/pci: Remove negative check on unsigned vector vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable vfio/pci: demote hiding ecap messages to debug level
2023-06-16vfio/cdx: add support for CDX busNipun Gupta
vfio-cdx driver enables IOCTLs for user space to query MMIO regions for CDX devices and mmap them. This change also adds support for reset of CDX devices. With VFIO enabled on CDX devices, user-space applications can also exercise DMA securely via IOMMU on these devices. This change adds the VFIO CDX driver and enables the following ioctls for CDX devices: - VFIO_DEVICE_GET_INFO: - VFIO_DEVICE_GET_REGION_INFO - VFIO_DEVICE_RESET Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Tested-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Link: https://lore.kernel.org/r/20230531124557.11009-1-nipun.gupta@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-06-16vfio/pci-core: Add capability for AtomicOp completer supportAlex Williamson
Test and enable PCIe AtomicOp completer support of various widths and report via device-info capability to userspace. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Robin Voetter <robin@streamhpc.com> Tested-by: Robin Voetter <robin@streamhpc.com> Link: https://lore.kernel.org/r/20230519214748.402003-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-06-06s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctlTony Krowiak
Realize the VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve the information for the VFIO device request IRQ. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230530223538.279198-2-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-23vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-XReinette Chatre
Dynamic MSI-X is supported. Clear VFIO_IRQ_INFO_NORESIZE to provide guidance to user space. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/fd1ef2bf6ae972da8e2805bc95d5155af5a8fb0a.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-02-09vfio/type1: exclude mdevs from VFIO_UPDATE_VADDRSteve Sistare
Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. Their kernel threads could be blocked indefinitely by a misbehaving userland while trying to pin/unpin pages while vaddrs are being updated. Do not allow groups to be added to the container while vaddr's are invalid, so we never need to block user threads from pinning, and can delete the vaddr-waiting code in a subsequent patch. Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") Cc: stable@vger.kernel.org Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-12-06vfio: Extend the device migration protocol with PRE_COPYJason Gunthorpe
The optional PRE_COPY states open the saving data transfer FD before reaching STOP_COPY and allows the device to dirty track internal state changes with the general idea to reduce the volume of data transferred in the STOP_COPY stage. While in PRE_COPY the device remains RUNNING, but the saving FD is open. Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P, which halts P2P transfers while continuing the saving FD. PRE_COPY, with P2P support, requires the driver to implement 7 new arcs and exists as an optional FSM branch between RUNNING and STOP_COPY: RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to query the progress of the precopy operation in the driver with the idea it will judge to move to STOP_COPY at least once the initial data set is transferred, and possibly after the dirty size has shrunk appropriately. This ioctl is valid only in PRE_COPY states and kernel driver should return -EINVAL from any other migration state. Compared to the v1 clarification, STOP_COPY -> PRE_COPY is blocked and to be defined in future. We also split the pending_bytes report into the initial and sustaining values, e.g.: initial_bytes and dirty_bytes. initial_bytes: Amount of initial precopy data. dirty_bytes: Device state changes relative to data previously retrieved. These fields are not required to have any bearing to STOP_COPY phase. It is recommended to leave PRE_COPY for STOP_COPY only after the initial_bytes field reaches zero. Leaving PRE_COPY earlier might make things slower. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20221206083438.37807-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-14vfio: Add an option to get migration data sizeYishai Hadas
Add an option to get migration data size by introducing a new migration feature named VFIO_DEVICE_FEATURE_MIG_DATA_SIZE. Upon VFIO_DEVICE_FEATURE_GET the estimated data length that will be required to complete STOP_COPY is returned. This option may better enable user space to consider before moving to STOP_COPY whether it can meet the downtime SLA based on the returned data. The patch also includes the implementation for mlx5 and hisi for this new option to make it feature complete for the existing drivers in this area. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20221106174630.25909-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio: Introduce DMA logging uAPIsYishai Hadas
DMA logging allows a device to internally record what DMAs the device is initiating and report them back to userspace. It is part of the VFIO migration infrastructure that allows implementing dirty page tracking during the pre copy phase of live migration. Only DMA WRITEs are logged, and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. This patch introduces the DMA logging involved uAPIs. It uses the FEATURE ioctl with its GET/SET/PROBE options as of below. It exposes a PROBE option to detect if the device supports DMA logging. It exposes a SET option to start device DMA logging in given IOVAs ranges. It exposes a SET option to stop device DMA logging that was previously started. It exposes a GET option to read back and clear the device DMA log. Extra details exist as part of vfio.h per a specific option. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio: Add the device features for the low power entry and exitAbhishek Sahu
This patch adds the following new device features for the low power entry and exit in the header file. The implementation for the same will be added in the subsequent patches. - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT For vfio-pci based devices, with the standard PCI PM registers, all power states cannot be achieved. The platform-based power management needs to be involved to go into the lowest power state. For doing low power entry and exit with platform-based power management, these device features can be used. The entry device feature has two variants. These two variants are mainly to support the different behaviour for the low power entry. If there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Some devices (for example NVIDIA VGA or 3D controller) require the user's guest driver involvement for each low-power entry. In the first variant, the host can return the device to low power automatically. The device will continue to attempt to reach low power until the low power exit feature is called. In the second variant, if the device exits low power due to an access, the host kernel will signal the user via the provided eventfd and will not return the device to low power without a subsequent call to one of the low power entry features. A call to the low power exit feature is optional if the user provided eventfd is signaled. These device features only support VFIO_DEVICE_FEATURE_SET and VFIO_DEVICE_FEATURE_PROBE operations. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-16include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR insteadThomas Huth
There is no macro called _IORW, so use _IOWR in the comment instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220516101202.88373-1-thuth@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-03-03vfio: Remove migration protocol v1 documentationJason Gunthorpe
v1 was never implemented and is replaced by v2. The old uAPI documentation is removed from the header file. The old uAPI definitions are still kept in the header file to ease transition for userspace copying these headers. They will be fully removed down the road. Link: https://lore.kernel.org/all/20220224142024.147653-12-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-03vfio: Extend the device migration protocol with RUNNING_P2PJason Gunthorpe
The RUNNING_P2P state is designed to support multiple devices in the same VM that are doing P2P transactions between themselves. When in RUNNING_P2P the device must be able to accept incoming P2P transactions but should not generate outgoing P2P transactions. As an optional extension to the mandatory states it is defined as in between STOP and RUNNING: STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP For drivers that are unable to support RUNNING_P2P the core code silently merges RUNNING_P2P and RUNNING together. Unless driver support is present, the new state cannot be used in SET_STATE. Drivers that support this will be required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM arcs become combination transitions. Compared to the v1 clarification, NDMA is redefined into FSM states and is described in terms of the desired P2P quiescent behavior, noting that halting all DMA is an acceptable implementation. Link: https://lore.kernel.org/all/20220224142024.147653-11-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-03vfio: Define device migration protocol v2Jason Gunthorpe
Replace the existing region based migration protocol with an ioctl based protocol. The two protocols have the same general semantic behaviors, but the way the data is transported is changed. This is the STOP_COPY portion of the new protocol, it defines the 5 states for basic stop and copy migration and the protocol to move the migration data in/out of the kernel. Compared to the clarification of the v1 protocol Alex proposed: https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen This has a few deliberate functional differences: - ERROR arcs allow the device function to remain unchanged. - The protocol is not required to return to the original state on transition failure. Instead userspace can execute an unwind back to the original state, reset, or do something else without needing kernel support. This simplifies the kernel design and should userspace choose a policy like always reset, avoids doing useless work in the kernel on error handling paths. - PRE_COPY is made optional, userspace must discover it before using it. This reflects the fact that the majority of drivers we are aware of right now will not implement PRE_COPY. - segmentation is not part of the data stream protocol, the receiver does not have to reproduce the framing boundaries. The hybrid FSM for the device_state is described as a Mealy machine by documenting each of the arcs the driver is required to implement. Defining the remaining set of old/new device_state transitions as 'combination transitions' which are naturally defined as taking multiple FSM arcs along the shortest path within the FSM's digraph allows a complete matrix of transitions. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is defined to replace writing to the device_state field in the region. This allows returning a brand new FD whenever the requested transition opens a data transfer session. The VFIO core code implements the new feature and provides a helper function to the driver. Using the helper the driver only has to implement 6 of the FSM arcs and the other combination transitions are elaborated consistently from those arcs. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to report the capability for migration and indicate which set of states and arcs are supported by the device. The FSM provides a lot of flexibility to make backwards compatible extensions but the VFIO_DEVICE_FEATURE also allows for future breaking extensions for scenarios that cannot support even the basic STOP_COPY requirements. The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state of the VFIO device. Data transfer sessions are now carried over a file descriptor, instead of the region. The FD functions for the lifetime of the data transfer session. read() and write() transfer the data with normal Linux stream FD semantics. This design allows future expansion to support poll(), io_uring, and other performance optimizations. The complicated mmap mode for data transfer is discarded as current qemu doesn't take meaningful advantage of it, and the new qemu implementation avoids substantially all the performance penalty of using a read() on the region. Link: https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-05-05vfio/pci: Revert nvlink removal uAPI breakageAlex Williamson
Revert the uAPI changes from the below commit with notice that these regions and capabilities are no longer provided. Fixes: b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <162014341432.3807030.11054087109120670135.stgit@omen>
2021-04-06vfio/pci: remove vfio_pci_nvlink2Christoph Hellwig
This driver never had any open userspace (which for VFIO would include VM kernel drivers) that use it, and thus should never have been added by our normal userspace ABI rules. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <20210326061311.1497642-2-hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01vfio: interfaces to update vaddrSteve Sistare
Define interfaces that allow the underlying memory object of an iova range to be mapped to a new host virtual address in the host process: - VFIO_DMA_UNMAP_FLAG_VADDR for VFIO_IOMMU_UNMAP_DMA - VFIO_DMA_MAP_FLAG_VADDR flag for VFIO_IOMMU_MAP_DMA - VFIO_UPDATE_VADDR extension for VFIO_CHECK_EXTENSION Unmap vaddr invalidates the host virtual address in an iova range, and blocks vfio translation of host virtual addresses. DMA to already-mapped pages continues. Map vaddr updates the base VA and resumes translation. See comments in uapi/linux/vfio.h for more details. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01vfio: option to unmap allSteve Sistare
For the UNMAP_DMA ioctl, delete all mappings if VFIO_DMA_UNMAP_FLAG_ALL is set. Define the corresponding VFIO_UNMAP_ALL extension. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-03vfio-ccw: Wire in the request callbackEric Farman
The device is being unplugged, so pass the request to userspace to ask for a graceful cleanup. This should free up the thread that would otherwise loop waiting for the device to be fully released. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into ↵Alex Williamson
v5.10/vfio/next
2020-10-07vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFOMatthew Rosato
Allow the VFIO_DEVICE_GET_INFO ioctl to include a capability chain. Add a flag indicating capability chain support, and introduce the definitions for the first set of capabilities which are specified to s390 zPCI devices. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-07vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devicesBharat Bhushan
DPAA2 (Data Path Acceleration Architecture) consists in mechanisms for processing Ethernet packets, queue management, accelerators, etc. The Management Complex (mc) is a hardware entity that manages the DPAA2 hardware resources. It provides an object-based abstraction for software drivers to use the DPAA2 hardware. The MC mediates operations such as create, discover, destroy of DPAA2 objects. The MC provides memory-mapped I/O command interfaces (MC portals) which DPAA2 software drivers use to operate on DPAA2 objects. A DPRC is a container object that holds other types of DPAA2 objects. Each object in the DPRC is a Linux device and bound to a driver. The MC-bus driver is a platform driver (different from PCI or platform bus). The DPRC driver does runtime management of a bus instance. It performs the initial scan of the DPRC and handles changes in the DPRC configuration (adding/removing objects). All objects inside a container share the same hardware isolation context, meaning that only an entire DPRC can be assigned to a virtual machine. When a container is assigned to a virtual machine, all the objects within that container are assigned to that virtual machine. The DPRC container assigned to the virtual machine is not allowed to change contents (add/remove objects) by the guest. The restriction is set by the host and enforced by the mc hardware. The DPAA2 objects can be directly assigned to the guest. However the MC portals (the memory mapped command interface to the MC) need to be emulated because there are commands that configure the interrupts and the isolation IDs which are virtual in the guest. Example: echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind The dprc.2 is bound to the VFIO driver and all the objects within dprc.2 are going to be bound to the VFIO driver. This patch adds the infrastructure for VFIO support for fsl-mc devices. Subsequent patches will add support for binding and secure assigning these devices using VFIO. More details about the DPAA2 objects can be found here: Documentation/networking/device_drivers/freescale/dpaa2/overview.rst Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', ↵Alex Williamson
'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next
2020-09-21vfio iommu: Add dma available capabilityMatthew Rosato
Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container") added the ability to limit the number of memory backed DMA mappings. However on s390x, when lazy mapping is in use, we use a very large number of concurrent mappings. Let's provide the current allowable number of DMA mappings to userspace via the IOMMU info chain so that userspace can take appropriate mitigation. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21vfio: Fix typo of the device_stateZenghui Yu
A typo fix ("_RUNNNG" => "_RUNNING") in comment block of the uapi header. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-18vfio/type1: Fix migration info capability IDAlex Williamson
ID 1 is already used by the IOVA range capability, use ID 2. Reported-by: Liu Yi L <yi.l.liu@intel.com> Fixes: ad721705d09c ("vfio iommu: Add migration capability to report supported features") Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-08Merge tag 's390-5.8-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for multi-function devices in pci code. - Enable PF-VF linking for architectures using the pdev->no_vf_scan flag (currently just s390). - Add reipl from NVMe support. - Get rid of critical section cleanup in entry.S. - Refactor PNSO CHSC (perform network subchannel operation) in cio and qeth. - QDIO interrupts and error handling fixes and improvements, more refactoring changes. - Align ioremap() with generic code. - Accept requests without the prefetch bit set in vfio-ccw. - Enable path handling via two new regions in vfio-ccw. - Other small fixes and improvements all over the code. * tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits) vfio-ccw: make vfio_ccw_regops variables declarations static vfio-ccw: Add trace for CRW event vfio-ccw: Wire up the CRW irq and CRW region vfio-ccw: Introduce a new CRW region vfio-ccw: Refactor IRQ handlers vfio-ccw: Introduce a new schib region vfio-ccw: Refactor the unregister of the async regions vfio-ccw: Register a chp_event callback for vfio-ccw vfio-ccw: Introduce new helper functions to free/destroy regions vfio-ccw: document possible errors vfio-ccw: Enable transparent CCW IPL from DASD s390/pci: Log new handle in clp_disable_fh() s390/cio, s390/qeth: cleanup PNSO CHSC s390/qdio: remove q->first_to_kick s390/qdio: fix up qdio_start_irq() kerneldoc s390: remove critical section cleanup from entry.S s390: add machine check SIGP s390/pci: ioremap() align with generic code s390/ap: introduce new ap function ap_get_qdev() Documentation/s390: Update / remove developerWorks web links ...
2020-06-03vfio-ccw: Introduce a new CRW regionFarhan Ali
This region provides a mechanism to pass a Channel Report Word that affect vfio-ccw devices, and needs to be passed to the guest for its awareness and/or processing. The base driver (see crw_collect_info()) provides space for two CRWs, as a subchannel event may have two CRWs chained together (one for the ssid, one for the subchannel). As vfio-ccw will deal with everything at the subchannel level, provide space for a single CRW to be transferred in one shot. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20200505122745.53208-7-farman@linux.ibm.com> [CH: added padding to ccw_crw_region] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-02vfio-ccw: Introduce a new schib regionFarhan Ali
The schib region can be used by userspace to get the subchannel- information block (SCHIB) for the passthrough subchannel. This can be useful to get information such as channel path information via the SCHIB.PMCW fields. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20200505122745.53208-5-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-05-28vfio iommu: Add migration capability to report supported featuresKirti Wankhede
Added migration capability in IOMMU info chain. User application should check IOMMU info chain for migration capability to use dirty page tracking feature provided by kernel module. User application must check page sizes supported and maximum dirty bitmap size returned by this capability structure for ioctls used to get dirty bitmap. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmapKirti Wankhede
DMA mapped pages, including those pinned by mdev vendor drivers, might get unpinned and unmapped while migration is active and device is still running. For example, in pre-copy phase while guest driver could access those pages, host device or vendor driver can dirty these mapped pages. Such pages should be marked dirty so as to maintain memory consistency for a user making use of dirty page tracking. To get bitmap during unmap, user should allocate memory for bitmap, set it all zeros, set size of allocated memory, set page size to be considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28vfio iommu: Add ioctl definition for dirty pages trackingKirti Wankhede
IOMMU container maintains a list of all pages pinned by vfio_pin_pages API. All pages pinned by vendor driver through this API should be considered as dirty during migration. When container consists of IOMMU capable device and all pages are pinned and mapped, then all pages are marked dirty. Added support to start/stop dirtied pages tracking and to get bitmap of all dirtied pages for requested IO virtual address range. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-28vfio: UAPI for migration interface for device stateKirti Wankhede
- Defined MIGRATION region type and sub-type. - Defined vfio_device_migration_info structure which will be placed at the 0th offset of migration region to get/set VFIO device related information. Defined members of structure and usage on read/write access. - Defined device states and state transition details. - Defined sequence to be followed while saving and resuming VFIO device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first userAlex Williamson
The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device agnostic ioctl for setting, retrieving, and probing device features. This implementation provides a 16-bit field for specifying a feature index, where the data porition of the ioctl is determined by the semantics for the given feature. Additional flag bits indicate the direction and nature of the operation; SET indicates user data is provided into the device feature, GET indicates the device feature is written out into user data. The PROBE flag augments determining whether the given feature is supported, and if provided, whether the given operation on the feature is supported. The first user of this ioctl is for setting the vfio-pci VF token, where the user provides a shared secret key (UUID) on a SR-IOV PF device, which users must provide when opening associated VF devices. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>