diff options
Diffstat (limited to 'Documentation/driver-api/vfio.rst')
-rw-r--r-- | Documentation/driver-api/vfio.rst | 245 |
1 files changed, 216 insertions, 29 deletions
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst index f1a4d3c3ba0b..633d11c7fa71 100644 --- a/Documentation/driver-api/vfio.rst +++ b/Documentation/driver-api/vfio.rst @@ -2,7 +2,7 @@ VFIO - "Virtual Function I/O" [1]_ ================================== -Many modern system now provide DMA and interrupt remapping facilities +Many modern systems now provide DMA and interrupt remapping facilities to help ensure I/O devices behave within the boundaries they've been allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC @@ -239,51 +239,230 @@ group and can access them as follows:: /* Gratuitous device reset and go... */ ioctl(device, VFIO_DEVICE_RESET); +IOMMUFD and vfio_iommu_type1 +---------------------------- + +IOMMUFD is the new user API to manage I/O page tables from userspace. +It intends to be the portal of delivering advanced userspace DMA +features (nested translation [5]_, PASID [6]_, etc.) while also providing +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use +cases. Eventually the vfio_iommu_type1 driver, as well as the legacy +vfio container and group model is intended to be deprecated. + +The IOMMUFD backwards compatibility interface can be enabled two ways. +In the first method, the kernel can be configured with +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem +transparently provides the entire infrastructure for the VFIO +container and IOMMU backend interfaces. The compatibility mode can +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is +simply symlink'd to /dev/iommu. Note that at the time of writing, the +compatibility mode is not entirely feature complete relative to +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface. Therefore +it is not generally advisable at this time to switch from native VFIO +implementations to the IOMMUFD compatibility interfaces. + +Long term, VFIO users should migrate to device access through the cdev +interface described below, and native access through the IOMMUFD +provided interfaces. + +VFIO Device cdev +---------------- + +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD +in a VFIO group. + +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd +by directly opening a character device /dev/vfio/devices/vfioX where +"X" is the number allocated uniquely by VFIO for registered devices. +cdev interface does not support noiommu devices, so user should use +the legacy group interface if noiommu is wanted. + +The cdev only works with IOMMUFD. Both VFIO drivers and applications +must adapt to the new cdev security model which requires using +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to +actually use the device. Once BIND succeeds then a VFIO device can +be fully accessed by the user. + +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers. +Hence those modules can be fully compiled out in an environment +where no legacy VFIO application exists. + +So far SPAPR does not support IOMMUFD yet. So it cannot support device +cdev either. + +vfio device cdev access is still bound by IOMMU group semantics, ie. there +can be only one DMA owner for the group. Devices belonging to the same +group can not be bound to multiple iommufd_ctx or shared between native +kernel and vfio bus driver or other driver supporting the driver_managed_dma +flag. A violation of this ownership requirement will fail at the +VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access. + +Device cdev Example +------------------- + +Assume user wants to access PCI device 0000:6a:01.0:: + + $ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/ + vfio0 + +This device is therefore represented as vfio0. The user can verify +its existence:: + + $ ls -l /dev/vfio/devices/vfio0 + crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0 + $ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev + 511:0 + $ ls -l /dev/char/511\:0 + lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0 + +Then provide the user with access to the device if unprivileged +operation is desired:: + + $ chown user:user /dev/vfio/devices/vfio0 + +Finally the user could get cdev fd by:: + + cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR); + +An opened cdev_fd doesn't give the user any permission of accessing +the device except binding the cdev_fd to an iommufd. After that point +then the device is fully accessible including attaching it to an +IOMMUFD IOAS/HWPT to enable userspace DMA:: + + struct vfio_device_bind_iommufd bind = { + .argsz = sizeof(bind), + .flags = 0, + }; + struct iommu_ioas_alloc alloc_data = { + .size = sizeof(alloc_data), + .flags = 0, + }; + struct vfio_device_attach_iommufd_pt attach_data = { + .argsz = sizeof(attach_data), + .flags = 0, + }; + struct iommu_ioas_map map = { + .size = sizeof(map), + .flags = IOMMU_IOAS_MAP_READABLE | + IOMMU_IOAS_MAP_WRITEABLE | + IOMMU_IOAS_MAP_FIXED_IOVA, + .__reserved = 0, + }; + + iommufd = open("/dev/iommu", O_RDWR); + + bind.iommufd = iommufd; + ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind); + + ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data); + attach_data.pt_id = alloc_data.out_ioas_id; + ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data); + + /* Allocate some space and setup a DMA mapping */ + map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + map.iova = 0; /* 1MB starting at 0x0 from device view */ + map.length = 1024 * 1024; + map.ioas_id = alloc_data.out_ioas_id;; + + ioctl(iommufd, IOMMU_IOAS_MAP, &map); + + /* Other device operations as stated in "VFIO Usage Example" */ + VFIO User API ------------------------------------------------------------------------------- -Please see include/linux/vfio.h for complete API documentation. +Please see include/uapi/linux/vfio.h for complete API documentation. VFIO bus driver API ------------------------------------------------------------------------------- VFIO bus drivers, such as vfio-pci make use of only a few interfaces into VFIO core. When devices are bound and unbound to the driver, -the driver should call vfio_add_group_dev() and vfio_del_group_dev() -respectively:: - - extern int vfio_add_group_dev(struct device *dev, - const struct vfio_device_ops *ops, - void *device_data); - - extern void *vfio_del_group_dev(struct device *dev); - -vfio_add_group_dev() indicates to the core to begin tracking the -iommu_group of the specified dev and register the dev as owned by -a VFIO bus driver. The driver provides an ops structure for callbacks +Following interfaces are called when devices are bound to and +unbound from the driver:: + + int vfio_register_group_dev(struct vfio_device *device); + int vfio_register_emulated_iommu_dev(struct vfio_device *device); + void vfio_unregister_group_dev(struct vfio_device *device); + +The driver should embed the vfio_device in its own structure and use +vfio_alloc_device() to allocate the structure, and can register +@init/@release callbacks to manage any private state wrapping the +vfio_device:: + + vfio_alloc_device(dev_struct, member, dev, ops); + void vfio_put_device(struct vfio_device *device); + +vfio_register_group_dev() indicates to the core to begin tracking the +iommu_group of the specified dev and register the dev as owned by a VFIO bus +driver. Once vfio_register_group_dev() returns it is possible for userspace to +start accessing the driver, thus the driver should ensure it is completely +ready before calling it. The driver provides an ops structure for callbacks similar to a file operations structure:: struct vfio_device_ops { - int (*open)(void *device_data); - void (*release)(void *device_data); - ssize_t (*read)(void *device_data, char __user *buf, + char *name; + int (*init)(struct vfio_device *vdev); + void (*release)(struct vfio_device *vdev); + int (*bind_iommufd)(struct vfio_device *vdev, + struct iommufd_ctx *ictx, u32 *out_device_id); + void (*unbind_iommufd)(struct vfio_device *vdev); + int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id); + void (*detach_ioas)(struct vfio_device *vdev); + int (*open_device)(struct vfio_device *vdev); + void (*close_device)(struct vfio_device *vdev); + ssize_t (*read)(struct vfio_device *vdev, char __user *buf, size_t count, loff_t *ppos); - ssize_t (*write)(void *device_data, const char __user *buf, - size_t size, loff_t *ppos); - long (*ioctl)(void *device_data, unsigned int cmd, + ssize_t (*write)(struct vfio_device *vdev, const char __user *buf, + size_t count, loff_t *size); + long (*ioctl)(struct vfio_device *vdev, unsigned int cmd, unsigned long arg); - int (*mmap)(void *device_data, struct vm_area_struct *vma); + int (*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma); + void (*request)(struct vfio_device *vdev, unsigned int count); + int (*match)(struct vfio_device *vdev, char *buf); + void (*dma_unmap)(struct vfio_device *vdev, u64 iova, u64 length); + int (*device_feature)(struct vfio_device *device, u32 flags, + void __user *arg, size_t argsz); }; -Each function is passed the device_data that was originally registered -in the vfio_add_group_dev() call above. This allows the bus driver -an easy place to store its opaque, private data. The open/release -callbacks are issued when a new file descriptor is created for a -device (via VFIO_GROUP_GET_DEVICE_FD). The ioctl interface provides -a direct pass through for VFIO_DEVICE_* ioctls. The read/write/mmap -interfaces implement the device region access defined by the device's -own VFIO_DEVICE_GET_REGION_INFO ioctl. +Each function is passed the vdev that was originally registered +in the vfio_register_group_dev() or vfio_register_emulated_iommu_dev() +call above. This allows the bus driver to obtain its private data using +container_of(). + +:: + + - The init/release callbacks are issued when vfio_device is initialized + and released. + + - The open/close device callbacks are issued when the first + instance of a file descriptor for the device is created (eg. + via VFIO_GROUP_GET_DEVICE_FD) for a user session. + + - The ioctl callback provides a direct pass through for some VFIO_DEVICE_* + ioctls. + - The [un]bind_iommufd callbacks are issued when the device is bound to + and unbound from iommufd. + + - The [de]attach_ioas callback is issued when the device is attached to + and detached from an IOAS managed by the bound iommufd. However, the + attached IOAS can also be automatically detached when the device is + unbound from iommufd. + + - The read/write/mmap callbacks implement the device region access defined + by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl. + + - The request callback is issued when device is going to be unregistered, + such as when trying to unbind the device from the vfio bus driver. + + - The dma_unmap callback is issued when a range of iovas are unmapped + in the container or IOAS attached by the device. Drivers which make + use of the vfio page pinning interface must implement this callback in + order to unpin pages within the dma_unmap range. Drivers must tolerate + this callback even before calls to open_device(). PPC64 sPAPR implementation note ------------------------------- @@ -518,3 +697,11 @@ This implementation has some specifics: \-0d.1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) + +.. [5] Nested translation is an IOMMU feature which supports two stage + address translations. This improves the address translation efficiency + in IOMMU virtualization. + +.. [6] PASID stands for Process Address Space ID, introduced by PCI + Express. It is a prerequisite for Shared Virtual Addressing (SVA) + and Scalable I/O Virtualization (Scalable IOV). |