summaryrefslogtreecommitdiff
path: root/drivers/vfio
AgeCommit message (Collapse)Author
2022-08-01vfio/pci: fix the wrong wordBo Liu
This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220801013918.2520-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-07-25vfio: Replace phys_pfn with pages for vfio_pin_pages()Nicolin Chen
Most of the callers of vfio_pin_pages() want "struct page *" and the low-level mm code to pin pages returns a list of "struct page *" too. So there's no gain in converting "struct page *" to PFN in between. Replace the output parameter "phys_pfn" list with a "pages" list, to simplify callers. This also allows us to replace the vfio_iommu_type1 implementation with a more efficient one. And drop the pfn_valid check in the gvt code, as there is no need to do such a check at a page-backed struct page pointer. For now, also update vfio_iommu_type1 to fit this new parameter too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-11-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-25vfio: Rename user_iova of vfio_dma_rw()Nicolin Chen
Following the updated vfio_pin/unpin_pages(), use the simpler "iova". Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-9-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-25vfio: Pass in starting IOVA to vfio_pin/unpin_pages APINicolin Chen
The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA. Among all three callers, there was only one caller possibly passing in a non-contiguous PFN list, which is now ensured to have contiguous PFN inputs too. Pass in the starting address with "iova" alone to simplify things, so callers no longer need to maintain a PFN list or to pin/unpin one page at a time. This also allows VFIO to use more efficient implementations of pin/unpin_pages. For now, also update vfio_iommu_type1 to fit this new parameter too, while keeping its input intact (being user_iova) since we don't want to spend too much effort swapping its parameters and local variables at that level. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-6-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-23vfio: Make vfio_unpin_pages() return voidNicolin Chen
There's only one caller that checks its return value with a WARN_ON_ONCE, while all other callers don't check the return value at all. Above that, an undo function should not fail. So, simplify the API to return void by embedding similar WARN_ONs. Also for users to pinpoint which condition fails, separate WARN_ON lines, yet remove the "driver->ops->unpin_pages" check, since it's unreasonable for callers to unpin on something totally random that wasn't even pinned. And remove NULL pointer checks for they would trigger oops vs. warnings. Note that npage is already validated in the vfio core, thus drop the same check in the type1 code. Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-2-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-22vfio/spapr_tce: Fix the commentAlexey Kardashevskiy
Grepping for "iommu_ops" finds this spot and gives wrong impression that iommu_ops is used in here, fix the comment. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Link: https://lore.kernel.org/r/20220714080912.3713509-1-aik@ozlabs.ru [aw: convert to multi-line comment] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-20vfio: Replace the iommu notifier with a device listJason Gunthorpe
Instead of bouncing the function call to the driver op through a blocking notifier just have the iommu layer call it directly. Register each device that is being attached to the iommu with the lower driver which then threads them on a linked list and calls the appropriate driver op at the right time. Currently the only use is if dma_unmap() is defined. Also, fully lock all the debugging tests on the pinning path that a dma_unmap is registered. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-20vfio: Replace the DMA unmapping notifier with a callbackJason Gunthorpe
Instead of having drivers register the notifier with explicit code just have them provide a dma_unmap callback op in their driver ops and rely on the core code to wire it up. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-16vfio: do not set FMODE_LSEEK flagJason A. Donenfeld
This file does not support llseek, so don't set the flag advertising it. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-07-11vfio-pci/zdev: different maxstbl for interpreted devicesMatthew Rosato
When doing load/store interpretation, the maximum store block length is determined by the underlying firmware, not the host kernel API. Reflect that in the associated Query PCI Function Group clp capability and let userspace decide which is appropriate to present to the guest. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-20-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio-pci/zdev: add function handle to clp base capabilityMatthew Rosato
The function handle is a system-wide unique identifier for a zPCI device. With zPCI instruction interpretation, the host will no longer be executing the zPCI instructions on behalf of the guest. As a result, the guest needs to use the real function handle in order for firmware to associate the instruction with the proper PCI function. Let's provide that handle to the guest. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-19-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio-pci/zdev: add open/close device hooksMatthew Rosato
During vfio-pci open_device, pass the KVM associated with the vfio group (if one exists). This is needed in order to pass a special indicator (GISA) to firmware to allow zPCI interpretation facilities to be used for only the specific KVM associated with the vfio-pci device. During vfio-pci close_device, unregister the notifier. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-18-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVMMatthew Rosato
The current contents of vfio-pci-zdev are today only useful in a KVM environment; let's tie everything currently under vfio-pci-zdev to this Kconfig statement and require KVM in this case, reducing complexity (e.g. symbol lookups). Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-11-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-07Merge branches 'v5.20/vfio/spapr_tce-unused-arg-v1', ↵Alex Williamson
'v5.20/vfio/comment-typo-v1' and 'v5.20/vfio/vfio-ccw-rework-v4' into v5.20/vfio/next
2022-07-07vfio/spapr_tce: Remove the unused parameters containerDeming Wang
The parameter of container has been unused for tce_iommu_unuse_page. So, we should delete it. Signed-off-by: Deming Wang <wangdeming@inspur.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Link: https://lore.kernel.org/r/20220702064613.5293-1-wangdeming@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-06vfio/pci: fix the wrong wordBo Liu
This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220704023649.3913-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-05vfio: Move IOMMU_CAP_CACHE_COHERENCY test to after we know we have a groupJason Gunthorpe
The test isn't going to work if a group doesn't exist. Normally this isn't a problem since VFIO isn't going to create a device if there is no group, but the special CONFIG_VFIO_NOIOMMU behavior allows bypassing this prevention. The new cap test effectively forces a group and breaks this config option. Move the cap test to vfio_group_find_or_alloc() which is the earliest time we know we have a group available and thus are not running in noiommu mode. Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-e8934b490f36+f4-vfio_cap_fix_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30Merge branches 'v5.20/vfio/migration-enhancements-v3', ↵Alex Williamson
'v5.20/vfio/simplify-bus_type-determination-v3', 'v5.20/vfio/check-vfio_register_iommu_driver-return-v2', 'v5.20/vfio/check-iommu_group_set_name_return-v1', 'v5.20/vfio/clear-caps-buf-v3', 'v5.20/vfio/remove-useless-judgement-v1' and 'v5.20/vfio/move-device_open-count-v2' into v5.20/vfio/next
2022-06-30vfio: Move "device->open_count--" out of group_rwsem in vfio_device_open()Yi Liu
We do not protect the vfio_device::open_count with group_rwsem elsewhere (see vfio_device_fops_release as a comparison, where we already drop group_rwsem before open_count--). So move the group_rwsem unlock prior to open_count--. This change now also drops group_rswem before setting device->kvm = NULL, but that's also OK (again, just like vfio_device_fops_release). The setting of device->kvm before open_device is technically done while holding the group_rwsem, this is done to protect the group kvm value we are copying from, and we should not be relying on that to protect the contents of device->kvm; instead we assume this value will not change until after the device is closed and while under the dev_set->lock. Cc: Matthew Rosato <mjrosato@linux.ibm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220627074119.523274-1-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30vfio: remove useless judgementLi Zhe
In function vfio_dma_do_unmap(), we currently prevent process to unmap vfio dma region whose mm_struct is different from the vfio_dma->task. In our virtual machine scenario which is using kvm and qemu, this judgement stops us from liveupgrading our qemu, which uses fork() && exec() to load the new binary but the new process cannot do the VFIO_IOMMU_UNMAP_DMA action during vm exit because of this judgement. This judgement is added in commit 8f0d5bb95f76 ("vfio iommu type1: Add task structure to vfio_dma") for the security reason. But it seems that no other task who has no family relationship with old and new process can get the same vfio_dma struct here for the reason of resource isolation. So this patch delete it. Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca> Link: https://lore.kernel.org/r/20220627035109.73745-1-lizhe.67@bytedance.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30vfio: Clear the caps->buf to NULL after freeSchspa Shi
On buffer resize failure, vfio_info_cap_add() will free the buffer, report zero for the size, and return -ENOMEM. As additional hardening, also clear the buffer pointer to prevent any chance of a double free. Signed-off-by: Schspa Shi <schspa@gmail.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220629022948.55608-1-schspa@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30vfio: check iommu_group_set_name() return valueLiam Ni
As iommu_group_set_name() can fail, we should check the return value. Signed-off-by: Liam Ni <zhiguangni01@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220625114239.9301-1-zhiguangni01@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30vfio: Split migration ops from main device opsYishai Hadas
vfio core checks whether the driver sets some migration op (e.g. set_state/get_state) and accordingly calls its op. However, currently mlx5 driver sets the above ops without regards to its migration caps. This might lead to unexpected usage/Oops if user space may call to the above ops even if the driver doesn't support migration. As for example, the migration state_mutex is not initialized in that case. The cleanest way to manage that seems to split the migration ops from the main device ops, this will let the driver setting them separately from the main ops when it's applicable. As part of that, validate ops construction on registration and include a check for VFIO_MIGRATION_STOP_COPY since the uAPI claims it must be set in migration_flags. HISI driver was changed as well to match this scheme. This scheme may enable down the road to come with some extra group of ops (e.g. DMA log) that can be set without regards to the other options based on driver caps. Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220628155910.171454-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30vfio/mlx5: Protect mlx5vf_disable_fds() upon close deviceYishai Hadas
Protect mlx5vf_disable_fds() upon close device to be called under the state mutex as done in all other places. This will prevent a race with any other flow which calls mlx5vf_disable_fds() as of health/recovery upon MLX5_PF_NOTIFY_DISABLE_VF event. Encapsulate this functionality in a separate function named mlx5vf_cmd_close_migratable() to consider migration caps and for further usage upon close device. Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220628155910.171454-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-27vfio: check vfio_register_iommu_driver() return valueBo Liu
As vfio_register_iommu_driver() can fail, we should check the return value. Signed-off-by: Bo Liu <liubo03@inspur.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220622045651.5416-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-27vfio: Use device_iommu_capable()Robin Murphy
Use the new interface to check the capabilities for our device specifically. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4ea5eb64246f1ee188d1a61c3e93b37756932eb7.1656092606.git.robin.murphy@arm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-27vfio/type1: Simplify bus_type determinationRobin Murphy
Since IOMMU groups are mandatory for drivers to support, it stands to reason that any device which has been successfully added to a group must be on a bus supported by that IOMMU driver, and therefore a domain viable for any device in the group must be viable for all devices in the group. This already has to be the case for the IOMMU API's internal default domain, for instance. Thus even if the group contains devices on different buses, that can only mean that the IOMMU driver actually supports such an odd topology, and so without loss of generality we can expect the bus type of any device in a group to be suitable for IOMMU API calls. Furthermore, scrutiny reveals a lack of protection for the bus being removed while vfio_iommu_type1_attach_group() is using it; the reference that VFIO holds on the iommu_group ensures that data remains valid, but does not prevent the group's membership changing underfoot. We can address both concerns by recycling vfio_bus_type() into some superficially similar logic to indirect the IOMMU API calls themselves. Each call is thus protected from races by the IOMMU group's own locking, and we no longer need to hold group-derived pointers beyond that scope. It also gives us an easy path for the IOMMU API's migration of bus-based interfaces to device-based, of which we can already take the first step with device_iommu_capable(). As with domains, any capability must in practice be consistent for devices in a given group - and after all it's still the same capability which was expected to be consistent across an entire bus! - so there's no need for any complicated validation. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/194a12d3434d7b38f84fa96503c7664451c8c395.1656092606.git.robin.murphy@arm.com [aw: add comment to vfio_iommu_device_capable()] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-27vfio: de-extern-ify function prototypesAlex Williamson
The use of 'extern' in function prototypes has been disrecommended in the kernel coding style for several years now, remove them from all vfio related files so contributors no longer need to decide between style and consistency. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/165471414407.203056.474032786990662279.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-27vfio/pci: Remove console driversAlex Williamson
Console drivers can create conflicts with PCI resources resulting in userspace getting mmap failures to memory BARs. This is especially evident when trying to re-use the system primary console for userspace drivers. Use the aperture helpers to remove these conflicts. v3: * call aperture_remove_conflicting_pci_devices() Reported-by: Laszlo Ersek <lersek@redhat.com> Suggested-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220622140134.12763-4-tzimmermann@suse.de
2022-06-01Merge tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull vfio updates from Alex Williamson: - Improvements to mlx5 vfio-pci variant driver, including support for parallel migration per PF (Yishai Hadas) - Remove redundant iommu_present() check (Robin Murphy) - Ongoing refactoring to consolidate the VFIO driver facing API to use vfio_device (Jason Gunthorpe) - Use drvdata to store vfio_device among all vfio-pci and variant drivers (Jason Gunthorpe) - Remove redundant code now that IOMMU core manages group DMA ownership (Jason Gunthorpe) - Remove vfio_group from external API handling struct file ownership (Jason Gunthorpe) - Correct typo in uapi comments (Thomas Huth) - Fix coccicheck detected deadlock (Wan Jiabing) - Use rwsem to remove races and simplify code around container and kvm association to groups (Jason Gunthorpe) - Harden access to devices in low power states and use runtime PM to enable d3cold support for unused devices (Abhishek Sahu) - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe) - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe) - Pass KVM pointer directly rather than via notifier (Matthew Rosato) * tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio: (38 commits) vfio: remove VFIO_GROUP_NOTIFY_SET_KVM vfio/pci: Add driver_managed_dma to the new vfio_pci drivers vfio: Do not manipulate iommu dma_owner for fake iommu groups vfio/pci: Move the unused device into low power state with runtime PM vfio/pci: Virtualize PME related registers bits and initialize to zero vfio/pci: Change the PF power state to D0 before enabling VFs vfio/pci: Invalidate mmaps and block the access in D3hot power state vfio: Change struct vfio_group::container_users to a non-atomic int vfio: Simplify the life cycle of the group FD vfio: Fully lock struct vfio_group::container vfio: Split up vfio_group_get_device_fd() vfio: Change struct vfio_group::opened from an atomic to bool vfio: Add missing locking for struct vfio_group::kvm kvm/vfio: Fix potential deadlock problem in vfio include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR instead vfio/pci: Use the struct file as the handle not the vfio_group kvm/vfio: Remove vfio_group from kvm vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm() vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent() vfio: Remove vfio_external_group_match_file() ...
2022-05-31Merge tag 'iommu-updates-v5.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Intel VT-d driver updates: - Domain force snooping improvement. - Cleanups, no intentional functional changes. - ARM SMMU driver updates: - Add new Qualcomm device-tree compatible strings - Add new Nvidia device-tree compatible string for Tegra234 - Fix UAF in SMMUv3 shared virtual addressing code - Force identity-mapped domains for users of ye olde SMMU legacy binding - Minor cleanups - Fix a BUG_ON in the vfio_iommu_group_notifier: - Groundwork for upcoming iommufd framework - Introduction of DMA ownership so that an entire IOMMU group is either controlled by the kernel or by user-space - MT8195 and MT8186 support in the Mediatek IOMMU driver - Make forcing of cache-coherent DMA more coherent between IOMMU drivers - Fixes for thunderbolt device DMA protection - Various smaller fixes and cleanups * tag 'iommu-updates-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits) iommu/amd: Increase timeout waiting for GA log enablement iommu/s390: Tolerate repeat attach_dev calls iommu/vt-d: Remove hard coding PGSNP bit in PASID entries iommu/vt-d: Remove domain_update_iommu_snooping() iommu/vt-d: Check domain force_snooping against attached devices iommu/vt-d: Block force-snoop domain attaching if no SC support iommu/vt-d: Size Page Request Queue to avoid overflow condition iommu/vt-d: Fold dmar_insert_one_dev_info() into its caller iommu/vt-d: Change return type of dmar_insert_one_dev_info() iommu/vt-d: Remove unneeded validity check on dev iommu/dma: Explicitly sort PCI DMA windows iommu/dma: Fix iova map result check bug iommu/mediatek: Fix NULL pointer dereference when printing dev_name iommu: iommu_group_claim_dma_owner() must always assign a domain iommu/arm-smmu: Force identity domains for legacy binding iommu/arm-smmu: Support Tegra234 SMMU dt-bindings: arm-smmu: Add compatible for Tegra234 SOC dt-bindings: arm-smmu: Document nvidia,memory-controller property iommu/arm-smmu-qcom: Add SC8280XP support dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP ...
2022-05-24vfio: remove VFIO_GROUP_NOTIFY_SET_KVMMatthew Rosato
Rather than relying on a notifier for associating the KVM with the group, let's assume that the association has already been made prior to device_open. The first time a device is opened associate the group KVM with the device. This fixes a user-triggerable oops in GVT. Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Zhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/20220519183311.582380-2-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-23vfio/pci: Add driver_managed_dma to the new vfio_pci driversJason Gunthorpe
When the iommu series adding driver_managed_dma was rebased it missed that new VFIO drivers were added and did not update them too. Without this vfio will claim the groups are not viable. Add driver_managed_dma to mlx5 and hisi. Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices") Reported-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-f9dfa642fab0+2b3-vfio_managed_dma_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-23vfio: Do not manipulate iommu dma_owner for fake iommu groupsJason Gunthorpe
Since asserting dma ownership now causes the group to have its DMA blocked the iommu layer requires a working iommu. This means the dma_owner APIs cannot be used on the fake groups that VFIO creates. Test for this and avoid calling them. Otherwise asserting dma ownership will fail for VFIO mdev devices as a BLOCKING iommu_domain cannot be allocated due to the NULL iommu ops. Fixes: 0286300e6045 ("iommu: iommu_group_claim_dma_owner() must always assign a domain") Reported-by: Eric Farman <farman@linux.ibm.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/0-v1-9cfc47edbcd4+13546-vfio_dma_owner_fix_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-20Merge branches 'apple/dart', 'arm/mediatek', 'arm/msm', 'arm/smmu', ↵Joerg Roedel
'ppc/pamu', 'x86/vt-d', 'x86/amd' and 'vfio-notifier-fix' into next
2022-05-18vfio/pci: Move the unused device into low power state with runtime PMAbhishek Sahu
Currently, there is very limited power management support available in the upstream vfio_pci_core based drivers. If there are no users of the device, then the PCI device will be moved into D3hot state by writing directly into PCI PM registers. This D3hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio_pci_core based drivers with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to provide 'struct dev_pm_ops' at least. The runtime suspend/resume callbacks are optional and needed only if we need to do any extra handling. Now there are multiple vfio_pci_core based drivers. Instead of assigning the 'struct dev_pm_ops' in individual parent driver, the vfio_pci_core itself assigns the 'struct dev_pm_ops'. There are other drivers where the 'struct dev_pm_ops' is being assigned inside core layer (For example, wlcore_probe() and some sound based driver, etc.). 2. This patch provides the stub implementation of 'struct dev_pm_ops'. The subsequent patch will provide the runtime suspend/resume callbacks. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks (pci_pm_runtime_suspend() and pci_pm_runtime_resume()). 3. Inside pci_reset_bus(), all the devices in dev_set needs to be runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take care of the runtime resume and its error handling. 4. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 5. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 6. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 7. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. 8. Use the runtime PM API's in vfio_pci_core_sriov_configure(). The device can be in low power state either with runtime power management (when there is no user) or PCI_PM_CTRL register write by the user. In both the cases, the PF should be moved to D0 state. For preventing any runtime usage mismatch, pci_num_vf() has been called explicitly during disable. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Virtualize PME related registers bits and initialize to zeroAbhishek Sahu
If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Change the PF power state to D0 before enabling VFsAbhishek Sahu
According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related to runtime PM will handle this case since runtime PM maintains its own usage count. Also, vfio_pci_core_sriov_configure() can be called at any time (with and without vfio pci device user), so the power state change and SR-IOV enablement need to be protected with the required locks. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Invalidate mmaps and block the access in D3hot power stateAbhishek Sahu
According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Change struct vfio_group::container_users to a non-atomic intJason Gunthorpe
Now that everything is fully locked there is no need for container_users to remain as an atomic, change it to an unsigned int. Use 'if (group->container)' as the test to determine if the container is present or not instead of using container_users. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/6-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Simplify the life cycle of the group FDJason Gunthorpe
Once userspace opens a group FD it is prevented from opening another instance of that same group FD until all the prior group FDs and users of the container are done. The first is done trivially by checking the group->opened during group FD open. However, things get a little weird if userspace creates a device FD and then closes the group FD. The group FD still cannot be re-opened, but this time it is because the group->container is still set and container_users is elevated by the device FD. Due to this mismatched lifecycle we have the vfio_group_try_dissolve_container() which tries to auto-free a container after the group FD is closed but the device FD remains open. Instead have the device FD hold onto a reference to the single group FD. This directly prevents vfio_group_fops_release() from being called when any device FD exists and makes the lifecycle model more understandable. vfio_group_try_dissolve_container() is removed as the only place a container is auto-deleted is during vfio_group_fops_release(). At this point the container_users is either 1 or 0 since all device FDs must be closed. Change group->opened to group->opened_file which points to the single struct file * that is open for the group. If the group->open_file is NULL then group->container == NULL. If all device FDs have closed then the group's notifier list must be empty. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/5-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Fully lock struct vfio_group::containerJason Gunthorpe
This is necessary to avoid various user triggerable races, for instance racing SET_CONTAINER/UNSET_CONTAINER: ioctl(VFIO_GROUP_SET_CONTAINER) ioctl(VFIO_GROUP_UNSET_CONTAINER) vfio_group_unset_container int users = atomic_cmpxchg(&group->container_users, 1, 0); // users == 1 container_users == 0 __vfio_group_unset_container(group); container = group->container; vfio_group_set_container() if (!atomic_read(&group->container_users)) down_write(&container->group_lock); group->container = container; up_write(&container->group_lock); down_write(&container->group_lock); group->container = NULL; up_write(&container->group_lock); vfio_container_put(container); /* woops we lost/leaked the new container */ This can then go on to NULL pointer deref since container == 0 and container_users == 1. Wrap all touches of container, except those on a performance path with a known open device, with the group_rwsem. The only user of vfio_group_add_container_user() holds the user count for a simple operation, change it to just hold the group_lock over the operation and delete vfio_group_add_container_user(). Containers now only gain a user when a device FD is opened. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/4-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Split up vfio_group_get_device_fd()Jason Gunthorpe
The split follows the pairing with the destroy functions: - vfio_group_get_device_fd() destroyed by close() - vfio_device_open() destroyed by vfio_device_fops_release() - vfio_device_assign_container() destroyed by vfio_group_try_dissolve_container() The next patch will put a lock around vfio_device_assign_container(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/3-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Change struct vfio_group::opened from an atomic to boolJason Gunthorpe
This is not a performance path, just use the group_rwsem to protect the value. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/2-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Add missing locking for struct vfio_group::kvmJason Gunthorpe
Without locking userspace can trigger a UAF by racing KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD: CPU1 CPU2 ioctl(KVM_DEV_VFIO_GROUP_DEL) ioctl(VFIO_GROUP_GET_DEVICE_FD) vfio_group_get_device_fd open_device() intel_vgpu_open_device() vfio_register_notifier() vfio_register_group_notifier() blocking_notifier_call_chain(&group->notifier, VFIO_GROUP_NOTIFY_SET_KVM, group->kvm); set_kvm() group->kvm = NULL close() kfree(kvm) intel_vgpu_group_notifier() vdev->kvm = data [..] kvm_get_kvm(vgpu->kvm); // UAF! Add a simple rwsem in the group to protect the kvm while the notifier is using it. Note this doesn't fix the race internal to i915 where userspace can trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of vgpu->kvm and trigger this same UAF, it just makes the notifier self-consistent. Fixes: ccd46dbae77d ("vfio: support notifier chain in vfio_group") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/1-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio/pci: Use the struct file as the handle not the vfio_groupJason Gunthorpe
VFIO PCI does a security check as part of hot reset to prove that the user has permission to manipulate all the devices that will be impacted by the reset. Use a new API vfio_file_has_dev() to perform this security check against the struct file directly and remove the vfio_group from VFIO PCI. Since VFIO PCI was the last user of vfio_group_get_external_user() and vfio_group_put_external_user() remove it as well. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm()Jason Gunthorpe
Just change the argument from struct vfio_group to struct file *. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent()Jason Gunthorpe
Instead of a general extension check change the function into a limited test if the iommu_domain has enforced coherency, which is the only thing kvm needs to query. Make the new op self contained by properly refcounting the container before touching it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Remove vfio_external_group_match_file()Jason Gunthorpe
vfio_group_fops_open() ensures there is only ever one struct file open for any struct vfio_group at any time: /* Do we need multiple instances of the group open? Seems not. */ opened = atomic_cmpxchg(&group->opened, 0, 1); if (opened) { vfio_group_put(group); return -EBUSY; Therefor the struct file * can be used directly to search the list of VFIO groups that KVM keeps instead of using the vfio_external_group_match_file() callback to try to figure out if the passed in FD matches the list or not. Delete vfio_external_group_match_file(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>