summaryrefslogtreecommitdiff
path: root/drivers/vfio
AgeCommit message (Collapse)Author
2025-01-10drivers: core: remove device_link argument from ↵Heiner Kallweit
class_compat_[create|remove]_link After 7e722083fcc3 ("i2c: Remove I2C_COMPAT config symbol and related code") there's no caller left passing a non-null device_link argument. So remove this argument to simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/db49131d-fd79-4f23-93f2-0ab541a345fa@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-06vfio/pci: Expose setup ROM at ROM bar when neededYunxiang Li
If ROM bar is missing for any reason, we can fallback to using pdev->rom to expose the ROM content to the guest. This fixes some passthrough use cases where the upstream bridge does not have enough address window. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Link: https://lore.kernel.org/r/20250102185013.15082-3-Yunxiang.Li@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-01-06vfio/pci: Remove shadow ROM specific code pathsYunxiang Li
After commit 0c0e0736acad ("PCI: Set ROM shadow location in arch code, not in PCI core"), the shadow ROM works the same as regular ROM BARs so these code paths are no longer needed. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Link: https://lore.kernel.org/r/20250102185013.15082-2-Yunxiang.Li@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-01-06vfio/pci: Remove #ifdef iowrite64 and #ifdef ioread64Ramesh Thomas
Remove the #ifdef iowrite64 and #ifdef ioread64 checks around calls to 64 bit IO access. Since default implementations have been enabled, the checks are not required. Signed-off-by: Ramesh Thomas <ramesh.thomas@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241210131938.303500-3-ramesh.thomas@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-01-06vfio/pci: Enable iowrite64 and ioread64 for vfio pciRamesh Thomas
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP. They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to generic implementation in lib/iomap.c. The generic implementation does 64 bit rw if readq/writeq is defined for the architecture, otherwise it would do 32 bit back to back rw. Note that there are two versions of the generic implementation that differs in the order the 32 bit words are written if 64 bit support is not present. This is not the little/big endian ordering, which is handled separately. This patch uses the lo followed by hi word ordering which is consistent with current back to back implementation in the vfio/pci code. Signed-off-by: Ramesh Thomas <ramesh.thomas@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-01-03vfio/pci: Fallback huge faults for unaligned pfnAlex Williamson
The PFN must also be aligned to the fault order to insert a huge pfnmap. Test the alignment and fallback when unaligned. Fixes: f9e54c3a2f5b ("vfio/pci: implement huge_fault support") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219619 Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com> Reported-by: Precific <precification@posteo.de> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Precific <precification@posteo.de> Link: https://lore.kernel.org/r/20250102183416.1841878-1-alex.williamson@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-12-11Merge tag 'vfio-v6.13-rc3' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull vfio fix from Alex Williamson: - Fix migration dirty page tracking support in the mlx5-vfio-pci variant driver in configurations where the system page size exceeds the device maximum message size, and anticipate device updates where the opposite may also be required (Yishai Hadas) * tag 'vfio-v6.13-rc3' of https://github.com/awilliam/linux-vfio: vfio/mlx5: Align the page tracking max message size with the device capability
2024-12-05vfio/mlx5: Align the page tracking max message size with the device capabilityYishai Hadas
Align the page tracking maximum message size with the device's capability instead of relying on PAGE_SIZE. This adjustment resolves a mismatch on systems where PAGE_SIZE is 64K, but the firmware only supports a maximum message size of 4K. Now that we rely on the device's capability for max_message_size, we must account for potential future increases in its value. Key considerations include: - Supporting message sizes that exceed a single system page (e.g., an 8K message on a 4K system). - Ensuring the RQ size is adjusted to accommodate at least 4 WQEs/messages, in line with the device specification. The above has been addressed as part of the patch. Fixes: 79c3cf279926 ("vfio/mlx5: Init QP based resources for dirty tracking") Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Yingshun Cui <yicui@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241205122654.235619-1-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01Get rid of 'remove_new' relic from platform driver structLinus Torvalds
The continual trickle of small conversion patches is grating on me, and is really not helping. Just get rid of the 'remove_new' member function, which is just an alias for the plain 'remove', and had a comment to that effect: /* * .remove_new() is a relic from a prototype conversion of .remove(). * New drivers are supposed to implement .remove(). Once all drivers are * converted to not use .remove_new any more, it will be dropped. */ This was just a tree-wide 'sed' script that replaced '.remove_new' with '.remove', with some care taken to turn a subsequent tab into two tabs to make things line up. I did do some minimal manual whitespace adjustment for places that used spaces to line things up. Then I just removed the old (sic) .remove_new member function, and this is the end result. No more unnecessary conversion noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-27Merge tag 'vfio-v6.13-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Constify an unmodified structure used in linking vfio and kvm (Christophe JAILLET) - Add ID for an additional hardware SKU supported by the nvgrace-gpu vfio-pci variant driver (Ankit Agrawal) - Fix incorrect signed cast in QAT vfio-pci variant driver, negating test in check_add_overflow(), though still caught by later tests (Giovanni Cabiddu) - Additional debugfs attributes exposed in hisi_acc vfio-pci variant driver for migration debugging (Longfang Liu) - Migration support is added to the virtio vfio-pci variant driver, becoming the primary feature of the driver while retaining emulation of virtio legacy support as a secondary option (Yishai Hadas) - Fixes to a few unwind flows in the mlx5 vfio-pci driver discovered through reviews of the virtio variant driver (Yishai Hadas) - Fix an unlikely issue where a PCI device exposed to userspace with an unknown capability at the base of the extended capability chain can overflow an array index (Avihai Horon) * tag 'vfio-v6.13-rc1' of https://github.com/awilliam/linux-vfio: vfio/pci: Properly hide first-in-list PCIe extended capability vfio/mlx5: Fix unwind flows in mlx5vf_pci_save/resume_device_data() vfio/mlx5: Fix an unwind issue in mlx5vf_add_migration_pages() vfio/virtio: Enable live migration once VIRTIO_PCI was configured vfio/virtio: Add PRE_COPY support for live migration vfio/virtio: Add support for the basic live migration functionality virtio-pci: Introduce APIs to execute device parts admin commands virtio: Manage device and driver capabilities via the admin commands virtio: Extend the admin command to include the result size virtio_pci: Introduce device parts access commands Documentation: add debugfs description for hisi migration hisi_acc_vfio_pci: register debugfs for hisilicon migration driver hisi_acc_vfio_pci: create subfunction for data reading hisi_acc_vfio_pci: extract public functions for container_of vfio/qat: fix overflow check in qat_vf_resume_write() vfio/nvgrace-gpu: Add a new GH200 SKU to the devid table kvm/vfio: Constify struct kvm_device_ops
2024-11-25vfio/pci: Properly hide first-in-list PCIe extended capabilityAvihai Horon
There are cases where a PCIe extended capability should be hidden from the user. For example, an unknown capability (i.e., capability with ID greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally chosen to be hidden from the user. Hiding a capability is done by virtualizing and modifying the 'Next Capability Offset' field of the previous capability so it points to the capability after the one that should be hidden. The special case where the first capability in the list should be hidden is handled differently because there is no previous capability that can be modified. In this case, the capability ID and version are zeroed while leaving the next pointer intact. This hides the capability and leaves an anchor for the rest of the capability list. However, today, hiding the first capability in the list is not done properly if the capability is unknown, as struct vfio_pci_core_device->pci_config_map is set to the capability ID during initialization but the capability ID is not properly checked later when used in vfio_config_do_rw(). This leads to the following warning [1] and to an out-of-bounds access to ecap_perms array. Fix it by checking cap_id in vfio_config_do_rw(), and if it is greater than PCI_EXT_CAP_ID_MAX, use an alternative struct perm_bits for direct read only access instead of the ecap_perms array. Note that this is safe since the above is the only case where cap_id can exceed PCI_EXT_CAP_ID_MAX (except for the special capabilities, which are already checked before). [1] WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1 (snip) Call Trace: <TASK> ? show_regs+0x69/0x80 ? __warn+0x8d/0x140 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] ? report_bug+0x18f/0x1a0 ? handle_bug+0x63/0xa0 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core] vfio_pci_rw+0x101/0x1b0 [vfio_pci_core] vfio_pci_core_read+0x1d/0x30 [vfio_pci_core] vfio_device_fops_read+0x27/0x40 [vfio] vfs_read+0xbd/0x340 ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio] ? __rseq_handle_notify_resume+0xa4/0x4b0 __x64_sys_pread64+0x96/0xc0 x64_sys_call+0x1c3d/0x20d0 do_syscall_64+0x4d/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241124142739.21698-1-avihaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-21Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Several new features and uAPI for iommufd: - IOMMU_IOAS_MAP_FILE allows passing in a file descriptor as the backing memory for an iommu mapping. To date VFIO/iommufd have used VMA's and pin_user_pages(), this now allows using memfds and memfd_pin_folios(). Notably this creates a pure folio path from the memfd to the iommu page table where memory is never broken down to PAGE_SIZE. - IOMMU_IOAS_CHANGE_PROCESS moves the pinned page accounting between two processes. Combined with the above this allows iommufd to support a VMM re-start using exec() where something like qemu would exec() a new version of itself and fd pass the memfds/iommufd/etc to the new process. The memfd allows DMA access to the memory to continue while the new process is getting setup, and the CHANGE_PROCESS updates all the accounting. - Support for fault reporting to userspace on non-PRI HW, such as ARM stall-mode embedded devices. - IOMMU_VIOMMU_ALLOC introduces the concept of a HW/driver backed virtual iommu. This will be used by VMMs to access hardware features that are contained with in a VM. The first use is to inform the kernel of the virtual SID to physical SID mapping when issuing SID based invalidation on ARM. Further uses will tie HW features that are directly accessed by the VM, such as invalidation queue assignment and others. - IOMMU_VDEVICE_ALLOC informs the kernel about the mapping of virtual device to physical device within a VIOMMU. Minimially this is used to translate VM issued cache invalidation commands from virtual to physical device IDs. - Enhancements to IOMMU_HWPT_INVALIDATE and IOMMU_HWPT_ALLOC to work with the VIOMMU - ARM SMMuv3 support for nested translation. Using the VIOMMU and VDEVICE the driver can model this HW's behavior for nested translation. This includes a shared branch from Will" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (51 commits) iommu/arm-smmu-v3: Import IOMMUFD module namespace iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest iommufd: Add IOMMU_IOAS_CHANGE_PROCESS iommufd: Lock all IOAS objects iommufd: Export do_update_pinned iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Use S2FWB for NESTED domains iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC Documentation: userspace-api: iommufd: Update vDEVICE iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/viommu: Add iommufd_viommu_find_dev helper iommu: Add iommu_copy_struct_from_full_user_array helper iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommu/viommu: Add cache_invalidate to iommufd_viommu_ops iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl ...
2024-11-14vfio/mlx5: Fix unwind flows in mlx5vf_pci_save/resume_device_data()Yishai Hadas
Fix unwind flows in mlx5vf_pci_save_device_data() and mlx5vf_pci_resume_device_data() to avoid freeing the migf pointer at the 'end' label, as this will be handled by fput(migf->filp) through mlx5vf_release_file(). To ensure mlx5vf_release_file() functions correctly, move the initialization of migf fields (such as migf->lock) to occur before any potential unwind flow, as these fields may be accessed within mlx5vf_release_file(). Fixes: 9945a67ea4b3 ("vfio/mlx5: Refactor PD usage") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241114095318.16556-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-14vfio/mlx5: Fix an unwind issue in mlx5vf_add_migration_pages()Yishai Hadas
Fix an unwind issue in mlx5vf_add_migration_pages(). If a set of pages is allocated but fails to be added to the SG table, they need to be freed to prevent a memory leak. Any pages successfully added to the SG table will be freed as part of mlx5vf_free_data_buffer(). Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241114095318.16556-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-13vfio/virtio: Enable live migration once VIRTIO_PCI was configuredYishai Hadas
Now that the driver supports live migration, only the legacy IO functionality depends on config VIRTIO_PCI_ADMIN_LEGACY. As part of that we introduce a bool configuration option as a sub menu under the driver's main live migration feature named VIRTIO_VFIO_PCI_ADMIN_LEGACY, to control the legacy IO functionality. This will let users configuring the kernel, know which features from the description might be available in the resulting driver. As of that, move the legacy IO into a separate file to be compiled only once CONFIG_VIRTIO_VFIO_PCI_ADMIN_LEGACY was configured and let the live migration depends only on VIRTIO_PCI. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241113115200.209269-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-13vfio/virtio: Add PRE_COPY support for live migrationYishai Hadas
Add PRE_COPY support for live migration. This functionality may reduce the downtime upon STOP_COPY as of letting the target machine to get some 'initial data' from the source once the machine is still in its RUNNING state and let it prepares itself pre-ahead to get the final STOP_COPY data. As the Virtio specification does not support reading partial or incremental device contexts. This means that during the PRE_COPY state, the vfio-virtio driver reads the full device state. As the device state can be changed and the benefit is highest when the pre copy data closely matches the final data we read it in a rate limiter mode. This means we avoid reading new data from the device for a specified time interval after the last read. With PRE_COPY enabled, we observed a downtime reduction of approximately 70-75% in various scenarios compared to when PRE_COPY was disabled, while keeping the total migration time nearly the same. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241113115200.209269-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-13vfio/virtio: Add support for the basic live migration functionalityYishai Hadas
Add support for basic live migration functionality in VFIO over virtio-net devices, aligned with the virtio device specification 1.4. This includes the following VFIO features: VFIO_MIGRATION_STOP_COPY, VFIO_MIGRATION_P2P. The implementation registers with the VFIO subsystem using vfio_pci_core and then incorporates the virtio-specific logic for the migration process. The migration follows the definitions in uapi/vfio.h and leverages the virtio VF-to-PF admin queue command channel for execution device parts related commands. Additional Notes: ----------------- The kernel protocol between the source and target devices contains a header with metadata, including record size, tag, and flags. The record size allows the target to recognize and read a complete image from the source before passing the device part data. This adheres to the virtio device specification, which mandates that partial device parts cannot be supplied. The tag and flags serve as placeholders for future extensions of the kernel protocol between the source and target, ensuring backward and forward compatibility. Both the source and target comply with the virtio device specification by using a device part object with a unique ID as part of the migration process. Since this resource is limited to a maximum of 255, its lifecycle is confined to periods with an active live migration flow. According to the virtio specification, a device has only two modes: RUNNING and STOPPED. As a result, certain VFIO transitions (i.e., RUNNING_P2P->STOP, STOP->RUNNING_P2P) are treated as no-ops. When transitioning to RUNNING_P2P, the device state is set to STOP, and it will remain STOPPED until the transition out of RUNNING_P2P->RUNNING, at which point it returns to RUNNING. During transition to STOP, the virtio device only stops initiating outgoing requests(e.g. DMA, MSIx, etc.) but still must accept incoming operations. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241113115200.209269-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-13hisi_acc_vfio_pci: register debugfs for hisilicon migration driverLongfang Liu
On the debugfs framework of VFIO, if the CONFIG_VFIO_DEBUGFS macro is enabled, the debug function is registered for the live migration driver of the HiSilicon accelerator device. After registering the HiSilicon accelerator device on the debugfs framework of live migration of vfio, a directory file "hisi_acc" of debugfs is created, and then three debug function files are created in this directory: vfio | +---<dev_name1> | +---migration | +--state | +--hisi_acc | +--dev_data | +--migf_data | +--cmd_state | +---<dev_name2> +---migration +--state +--hisi_acc +--dev_data +--migf_data +--cmd_state dev_data file: read device data that needs to be migrated from the current device in real time migf_data file: read the migration data of the last live migration from the current driver. cmd_state: used to get the cmd channel state for the device. +----------------+ +--------------+ +---------------+ | migration dev | | src dev | | dst dev | +-------+--------+ +------+-------+ +-------+-------+ | | | | +------v-------+ +-------v-------+ | | saving_migf | | resuming_migf | read | | file | | file | | +------+-------+ +-------+-------+ | | copy | | +------------+----------+ | | +-------v--------+ +-------v--------+ | data buffer | | debug_migf | +-------+--------+ +-------+--------+ | | cat | cat | +-------v--------+ +-------v--------+ | dev_data | | migf_data | +----------------+ +----------------+ When accessing debugfs, user can obtain the most recent status data of the device through the "dev_data" file. It can read recent complete status data of the device. If the current device is being migrated, it will wait for it to complete. The data for the last completed migration function will be stored in debug_migf. Users can read it via "migf_data". Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20241112073322.54550-4-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-12hisi_acc_vfio_pci: create subfunction for data readingLongfang Liu
This patch generates the code for the operation of reading data from the device into a sub-function. Then, it can be called during the device status data saving phase of the live migration process and the device status data reading function in debugfs. Thereby reducing the redundant code of the driver. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20241112073322.54550-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-12hisi_acc_vfio_pci: extract public functions for container_ofLongfang Liu
In the current driver, vdev is obtained from struct hisi_acc_vf_core_device through the container_of function. This method is used in many places in the driver. In order to reduce this repetitive operation, It was extracted into a public function. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20241112073322.54550-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-05vfio: Remove VFIO_TYPE1_NESTING_IOMMUJason Gunthorpe
This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no significant visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known. The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 and manage invalidation was never completed, or at least never upstreamed, rendering this part useless dead code. Upstream has now settled on iommufd as the uAPI for controlling nested translation. Choosing the stage 2 implementation should be done by through the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation. Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op. Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-03assorted variants of irqfd setup: convert to CLASS(fd)Al Viro
in all of those failure exits prior to fdget() are plain returns and the only thing done after fdput() is (on failure exits) a kfree(), which can be done before fdput() just fine. NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for eventfd_ctx_fileget() failure (we only want fdput() there) and once we stop doing that, it doesn't need to check if eventfd is NULL or ERR_PTR(...) there. NOTE: in privcmd we move fdget() up before the allocation - more to the point, before the copy_from_user() attempt. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03fdget(), more trivial conversionsAl Viro
all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-30vfio/qat: fix overflow check in qat_vf_resume_write()Giovanni Cabiddu
The unsigned variable `size_t len` is cast to the signed type `loff_t` when passed to the function check_add_overflow(). This function considers the type of the destination, which is of type loff_t (signed), potentially leading to an overflow. This issue is similar to the one described in the link below. Remove the cast. Note that even if check_add_overflow() is bypassed, by setting `len` to a value that is greater than LONG_MAX (which is considered as a negative value after the cast), the function copy_from_user(), invoked a few lines later, will not perform any copy and return `len` as (len > INT_MAX) causing qat_vf_resume_write() to fail with -EFAULT. Fixes: bb208810b1ab ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices") CC: stable@vger.kernel.org # 6.10+ Link: https://lore.kernel.org/all/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain Reported-by: Zijie Zhao <zzjas98@gmail.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Xin Zeng <xin.zeng@intel.com> Link: https://lore.kernel.org/r/20241021123843.42979-1-giovanni.cabiddu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-10-30vfio/nvgrace-gpu: Add a new GH200 SKU to the devid tableAnkit Agrawal
NVIDIA is planning to productize a new Grace Hopper superchip SKU with device ID 0x2348. Add the SKU devid to nvgrace_gpu_vfio_pci_table. Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20241013075216.19229-1-ankita@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-27[tree-wide] finally take no_llseek outAl Viro
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-24Merge tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: "Just a few cleanups this cycle: - Remove several unused structure and function declarations, and unused variables (Dr. David Alan Gilbert, Yue Haibing, Zhang Zekun) - Constify unmodified structure in mdev (Hongbo Li) - Convert to unsigned type to catch overflow with less fanfare than passing a negative value to kcalloc() (Dan Carpenter)" * tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfio: vfio/pci: clean up a type in vfio_pci_ioctl_pci_hot_reset_groups() vfio/mdev: Constify struct kobj_type vfio: mdev: Remove unused function declarations vfio/fsl-mc: Remove unused variable 'hwirq' vfio/pci: Remove unused struct 'vfio_pci_mmap_vma'
2024-09-23Merge tag 'pull-stable-struct_fd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.
2024-09-17vfio/pci: implement huge_fault supportAlex Williamson
With the addition of pfnmap support in vmf_insert_pfn_{pmd,pud}() we can take advantage of PMD and PUD faults to PCI BAR mmaps and create more efficient mappings. PCI BARs are always a power of two and will typically get at least PMD alignment without userspace even trying. Userspace alignment for PUD mappings is also not too difficult. Consolidate faults through a single handler with a new wrapper for standard single page faults. The pre-faulting behavior of commit d71a989cf5d9 ("vfio/pci: Insert full vma on mmap'd MMIO fault") is removed in this refactoring since huge_fault will cover the bulk of the faults and results in more efficient page table usage. We also want to avoid that pre-faulted single page mappings preempt huge page mappings. Link: https://lkml.kernel.org/r/20240826204353.2228736-20-peterx@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17vfio: use the new follow_pfnmap APIPeter Xu
Use the new API that can understand huge pfn mappings. Link: https://lkml.kernel.org/r/20240826204353.2228736-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-12vfio/pci: clean up a type in vfio_pci_ioctl_pci_hot_reset_groups()Dan Carpenter
The "array_count" value comes from the copy_from_user() in vfio_pci_ioctl_pci_hot_reset(). If the user passes a value larger than INT_MAX then we'll pass a negative value to kcalloc() which triggers an allocation failure and a stack trace. It's better to make the type unsigned so that if (array_count > count) returns -EINVAL instead. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/262ada03-d848-4369-9c37-81edeeed2da2@stanley.mountain Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-06vfio/mdev: Constify struct kobj_typeHongbo Li
This 'struct kobj_type' is not modified. It is only used in kobject_init_and_add() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure and moving it to a read-only section, and this can increase over all security. ``` [Before] text data bss dec hex filename 2372 600 0 2972 b9c drivers/vfio/mdev/mdev_sysfs.o [After] text data bss dec hex filename 2436 568 0 3004 bbc drivers/vfio/mdev/mdev_sysfs.o ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240904011837.2010444-1-lihongbo22@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-03vfio: mdev: Remove unused function declarationsZhang Zekun
The definition of mdev_bus_register() and mdev_bus_unregister() have been removed since commit 6c7f98b334a3 ("vfio/mdev: Remove vfio_mdev.c"). So, let's remove the unused declarations. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240812120823.10968-1-zhangzekun11@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-03vfio/fsl-mc: Remove unused variable 'hwirq'Yue Haibing
Commit 7447d911af69 ("vfio/fsl-mc: Block calling interrupt handler without trigger") left this variable unused, so remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240730141133.525771-1-yuehaibing@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-03vfio/pci: Remove unused struct 'vfio_pci_mmap_vma'Dr. David Alan Gilbert
'vfio_pci_mmap_vma' has been unused since commit aac6db75a9fc ("vfio/pci: Use unmap_mapping_range()") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240727160307.1000476-1-linux@treblig.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-08-12introduce fd_file(), convert all accessors to it.Al Viro
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-19Merge tag 'powerpc-6.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Remove support for 40x CPUs & platforms - Add support to the 64-bit BPF JIT for cpu v4 instructions - Fix PCI hotplug driver crash on powernv - Fix doorbell emulation for KVM on PAPR guests (nestedv2) - Fix KVM nested guest handling of some less used SPRs - Online NUMA nodes with no CPU/memory if they have a PCI device attached - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and Vaibhav Jain. * tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits) Documentation/powerpc: Mention 40x is removed powerpc: Remove 40x leftovers macintosh/therm_windtunnel: fix module unload. powerpc: Check only single values are passed to CPU/MMU feature checks powerpc/xmon: Fix disassembly CPU feature checks powerpc: Drop clang workaround for builtin constant checks powerpc64/bpf: jit support for signed division and modulo powerpc64/bpf: jit support for sign extended mov powerpc64/bpf: jit support for sign extended load powerpc64/bpf: jit support for unconditional byte swap powerpc64/bpf: jit support for 32bit offset jmp instruction powerpc/pci: Hotplug driver bridge support pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC powerpc: add missing MODULE_DESCRIPTION() macros macintosh/mac_hid: add MODULE_DESCRIPTION() KVM: PPC: add missing MODULE_DESCRIPTION() macros powerpc/kexec: Use of_property_read_reg() powerpc/64s/radix/kfence: map __kfence_pool at page granularity powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API ...
2024-07-19Merge tag 'vfio-v6.11-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Add support for 8-byte accesses when using read/write through the device regions. This fills a gap for userspace drivers that might not be able to use access through mmap to perform native register width accesses (Gerd Bayer) - Add missing MODULE_DESCRIPTION to vfio-mdev sample drivers and replace a non-standard MODULE_INFO usage (Jeff Johnson) * tag 'vfio-v6.11-rc1' of https://github.com/awilliam/linux-vfio: vfio-mdev: add missing MODULE_DESCRIPTION() macros vfio/pci: Fix typo in macro to declare accessors vfio/pci: Support 8-byte PCI loads and stores vfio/pci: Extract duplicated code into macro
2024-07-19Merge tag 'iommu-updates-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Will Deacon: "Core: - Support for the "ats-supported" device-tree property - Removal of the 'ops' field from 'struct iommu_fwspec' - Introduction of iommu_paging_domain_alloc() and partial conversion of existing users - Introduce 'struct iommu_attach_handle' and provide corresponding IOMMU interfaces which will be used by the IOMMUFD subsystem - Remove stale documentation - Add missing MODULE_DESCRIPTION() macro - Misc cleanups Allwinner Sun50i: - Ensure bypass mode is disabled on H616 SoCs - Ensure page-tables are allocated below 4GiB for the 32-bit page-table walker - Add new device-tree compatible strings AMD Vi: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte Arm SMMUv2: - Print much more useful information on context faults - Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n - Add new Qualcomm device-tree bindings Arm SMMUv3: - Support for hardware update of access/dirty bits and reporting via IOMMUFD - More driver rework from Jason, this time updating the PASID/SVA support to prepare for full IOMMUFD support - Add missing MODULE_DESCRIPTION() macro - Minor fixes and cleanups NVIDIA Tegra: - Fix for benign fwspec initialisation issue exposed by rework on the core branch Intel VT-d: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte - Use READ_ONCE() to read volatile descriptor status - Remove support for handling Execute-Requested requests - Avoid calling iommu_domain_alloc() - Minor fixes and refactoring Qualcomm MSM: - Updates to the device-tree bindings" * tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (72 commits) iommu/tegra-smmu: Pass correct fwnode to iommu_fwspec_init() iommu/vt-d: Fix identity map bounds in si_domain_init() iommu: Move IOMMU_DIRTY_NO_CLEAR define dt-bindings: iommu: Convert msm,iommu-v0 to yaml iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address() iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTH docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst arm64: dts: fvp: Enable PCIe ATS for Base RevC FVP iommu/of: Support ats-supported device-tree property dt-bindings: PCI: generic: Add ats-supported property iommu: Remove iommu_fwspec ops OF: Simplify of_iommu_configure() ACPI: Retire acpi_iommu_fwspec_ops() iommu: Resolve fwspec ops automatically iommu/mediatek-v1: Clean up redundant fwspec checks RDMA/usnic: Use iommu_paging_domain_alloc() wifi: ath11k: Use iommu_paging_domain_alloc() wifi: ath10k: Use iommu_paging_domain_alloc() drm/msm: Use iommu_paging_domain_alloc() vhost-vdpa: Use iommu_paging_domain_alloc() ...
2024-07-10vfio/pci: Init the count variable in collecting hot-reset devicesYi Liu
The count variable is used without initialization, it results in mistakes in the device counting and crashes the userspace if the get hot reset info path is triggered. Fixes: f6944d4a0b87 ("vfio/pci: Collect hot-reset devices to local buffer") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219010 Reported-by: Žilvinas Žaltiena <zaltys@natrix.lt> Cc: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240710004150.319105-1-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-07-04vfio/type1: Use iommu_paging_domain_alloc()Lu Baolu
Replace iommu_domain_alloc() with iommu_paging_domain_alloc(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240610085555.88197-4-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03driver core: have match() callback in struct bus_type take a const *Greg Kroah-Hartman
In the match() callback, the struct device_driver * should not be changed, so change the function callback to be a const *. This is one step of many towards making the driver core safe to have struct device_driver in read-only memory. Because the match() callback is in all busses, all busses are modified to handle this properly. This does entail switching some container_of() calls to container_of_const() to properly handle the constant *. For some busses, like PCI and USB and HV, the const * is cast away in the match callback as those busses do want to modify those structures at this point in time (they have a local lock in the driver structure.) That will have to be changed in the future if they wish to have their struct device * in read-only-memory. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alex Elder <elder@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-28vfio/spapr: Always clear TCEs before unsetting the windowShivaprasad G Bhat
The PAPR expects the TCE table to have no entries at the time of unset window(i.e. remove-pe). The TCE clear right now is done before freeing the iommu table. On pSeries, the unset window makes those entries inaccessible to the OS and the H_PUT/GET calls fail on them with H_CONSTRAINED. On PowerNV, this has no side effect as the TCE clear can be done before the DMA window removal as well. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/171923273535.1397.1236742071894414895.stgit@linux.ibm.com
2024-06-21vfio/pci: Support 8-byte PCI loads and storesBen Segal
Many PCI adapters can benefit or even require full 64bit read and write access to their registers. In order to enable work on user-space drivers for these devices add two new variations vfio_pci_core_io{read|write}64 of the existing access methods when the architecture supports 64-bit ioreads and iowrites. Signed-off-by: Ben Segal <bpsegal@us.ibm.com> Co-developed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Link: https://lore.kernel.org/r/20240619115847.1344875-3-gbayer@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-06-21vfio/pci: Extract duplicated code into macroGerd Bayer
vfio_pci_core_do_io_rw() repeats the same code for multiple access widths. Factor this out into a macro Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Link: https://lore.kernel.org/r/20240619115847.1344875-2-gbayer@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-06-12vfio/pci: Insert full vma on mmap'd MMIO faultAlex Williamson
In order to improve performance of typical scenarios we can try to insert the entire vma on fault. This accelerates typical cases, such as when the MMIO region is DMA mapped by QEMU. The vfio_iommu_type1 driver will fault in the entire DMA mapped range through fixup_user_fault(). In synthetic testing, this improves the time required to walk a PCI BAR mapping from userspace by roughly 1/3rd. This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain support for pfnmaps. Suggested-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.com/ Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20240607035213.2054226-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-05-31vfio/pci: Use unmap_mapping_range()Alex Williamson
With the vfio device fd tied to the address space of the pseudo fs inode, we can use the mm to track all vmas that might be mmap'ing device BARs, which removes our vma_list and all the complicated lock ordering necessary to manually zap each related vma. Note that we can no longer store the pfn in vm_pgoff if we want to use unmap_mapping_range() to zap a selective portion of the device fd corresponding to BAR mappings. This also converts our mmap fault handler to use vmf_insert_pfn() because we no longer have a vma_list to avoid the concurrency problem with io_remap_pfn_range(). The goal is to eventually use the vm_ops huge_fault handler to avoid the additional faulting overhead, but vmf_insert_pfn_{pmd,pud}() need to learn about pfnmaps first. Also, Jason notes that a race exists between unmap_mapping_range() and the fops mmap callback if we were to call io_remap_pfn_range() to populate the vma on mmap. Specifically, mmap_region() does call_mmap() before it does vma_link_file() which gives a window where the vma is populated but invisible to unmap_mapping_range(). Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240530045236.1005864-3-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-05-31vfio: Create vfio_fs_type with inode per deviceAlex Williamson
By linking all the device fds we provide to userspace to an address space through a new pseudo fs, we can use tools like unmap_mapping_range() to zap all vmas associated with a device. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240530045236.1005864-2-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>