summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2022-12-09iommufd: Improve a few unclear bits of codeJason Gunthorpe
Correct a few items noticed late in review: - We should assert that the math in batch_clear_carry() doesn't underflow - user->locked should be -1 not 0 sicne we just did mmput - npages should not have been recalculated, it already has that value No functional change. Fixes: 8d160cd4d506 ("iommufd: Algorithms for PFN storage") Link: https://lore.kernel.org/r/2-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09iommufd: Fix comment typosJason Gunthorpe
Repair some typos in comments that were noticed late in the review cycle. Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages") Link: https://lore.kernel.org/r/1-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Move vfio group specific code into group.cYi Liu
This prepares for compiling out vfio group after vfio device cdev is added. No vfio_group decode code should be in vfio_main.c, and neither device->group reference should be in vfio_main.c. No functional change is intended. Link: https://lore.kernel.org/r/20221201145535.589687-11-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Yu He <yu.he@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Refactor dma APIs for emulated devicesYi Liu
To use group helpers instead of opening group related code in the API. This prepares moving group specific code out of vfio_main.c. Link: https://lore.kernel.org/r/20221201145535.589687-10-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Wrap vfio group module init/clean code into helpersYi Liu
This wraps the init/clean code of vfio group global variable to be helpers, and prepares for further moving vfio group specific code into separate file. As container is used by group, so vfio_container_init/cleanup() is moved into vfio_group_init/cleanup(). Link: https://lore.kernel.org/r/20221201145535.589687-9-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Refactor vfio_device open and closeYi Liu
This refactor makes the vfio_device_open() to accept device, iommufd_ctx pointer and kvm pointer. These parameters are generic items in today's group path and future device cdev path. Caller of vfio_device_open() should take care the necessary protections. e.g. the current group path need to hold the group_lock to ensure the iommufd_ctx and kvm pointer are valid. This refactor also wraps the group spefcific codes in the device open and close paths to be paired helpers like: - vfio_device_group_open/close(): call vfio_device_open/close() - vfio_device_group_use/unuse_iommu(): this pair is container specific. iommufd vs. container is selected in vfio_device_first_open(). Such helpers are supposed to be moved to group.c. While iommufd related codes will be kept in the generic helpers since future device cdev path also need to handle iommufd. Link: https://lore.kernel.org/r/20221201145535.589687-8-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Make vfio_device_open() truly device specificYi Liu
Then move group related logic into vfio_device_open_file(). Accordingly introduce a vfio_device_close() to pair up. Link: https://lore.kernel.org/r/20221201145535.589687-7-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Swap order of vfio_device_container_register() and open_device()Yi Liu
This makes the DMA unmap callback registration to container be consistent across the vfio iommufd compat mode and the legacy container mode. In the vfio iommufd compat mode, this registration is done in the vfio_iommufd_bind() when creating access which has an unmap callback. This is prior to calling the open_device() op. The existing mdev drivers have been converted to be OK with this order. So it is ok to swap the order of vfio_device_container_register() and open_device() for legacy mode. This also prepares for further moving group specific code into separate source file. Link: https://lore.kernel.org/r/20221201145535.589687-6-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Set device->group in helper functionYi Liu
This avoids referencing device->group in __vfio_register_dev(). Link: https://lore.kernel.org/r/20221201145535.589687-5-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Create wrappers for group register/unregisterYi Liu
This avoids decoding group fields in the common functions used by vfio_device registration, and prepares for further moving the vfio group specific code into separate file. Link: https://lore.kernel.org/r/20221201145535.589687-4-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05vfio: Move the sanity check of the group to vfio_create_group()Jason Gunthorpe
This avoids opening group specific code in __vfio_register_dev() for the sanity check if an (existing) group is not corrupted by having two copies of the same struct device in it. It also simplifies the error unwind for this sanity check since the failure can be detected in the group allocation. This also prepares for moving the group specific code into separate group.c. Grabbed from: https://lore.kernel.org/kvm/20220922152338.2a2238fe.alex.williamson@redhat.com/ Link: https://lore.kernel.org/r/20221201145535.589687-3-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
2022-12-05vfio: Simplify vfio_create_group()Jason Gunthorpe
The vfio.group_lock is now only used to serialize vfio_group creation and destruction, we don't need a micro-optimization of searching, unlocking, then allocating and searching again. Just hold the lock the whole time. Grabbed from: https://lore.kernel.org/kvm/20220922152338.2a2238fe.alex.williamson@redhat.com/ Link: https://lore.kernel.org/r/20221201145535.589687-2-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
2022-12-02Merge tag 'v6.1-rc7' into iommufd.git for-nextJason Gunthorpe
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version. The rc fix was done a different way when iommufd patches reworked this code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02iommufd: Allow iommufd to supply /dev/vfio/vfioJason Gunthorpe
If the VFIO container is compiled out, give a kconfig option for iommufd to provide the miscdev node with the same name and permissions as vfio uses. The compatibility node supports the same ioctls as VFIO and automatically enables the VFIO compatible pinned page accounting mode. Link: https://lore.kernel.org/r/10-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio: Make vfio_container optionally compiledJason Gunthorpe
Add a kconfig CONFIG_VFIO_CONTAINER that controls compiling the container code. If 'n' then only iommufd will provide the container service. All the support for vfio iommu drivers, including type1, will not be built. This allows a compilation check that no inappropriate dependencies between the device/group and container have been created. Link: https://lore.kernel.org/r/9-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio: Move container related MODULE_ALIAS statements into container.cJason Gunthorpe
The miscdev is in container.c, so should these related MODULE_ALIAS statements. This is necessary for the next patch to be able to fully disable /dev/vfio/vfio. Fixes: cdc71fe4ecbf ("vfio: Move container code into drivers/vfio/container.c") Link: https://lore.kernel.org/r/8-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Reported-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio-iommufd: Support iommufd for emulated VFIO devicesJason Gunthorpe
Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and consist of all the mdev drivers. Like the physical drivers, support for iommufd is provided by the driver supplying the correct standard ops. Provide ops from the core that duplicate what vfio_register_emulated_iommu_dev() does. Emulated drivers are where it is more likely to see variation in the iommfd support ops. For instance IDXD will probably need to setup both a iommfd_device context linked to a PASID and an iommufd_access context to support all their mdev operations. Link: https://lore.kernel.org/r/7-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio-iommufd: Support iommufd for physical VFIO devicesJason Gunthorpe
This creates the iommufd_device for the physical VFIO drivers. These are all the drivers that are calling vfio_register_group_dev() and expect the type1 code to setup a real iommu_domain against their parent struct device. The design gives the driver a choice in how it gets connected to iommufd by providing bind_iommufd/unbind_iommufd/attach_ioas callbacks to implement as required. The core code provides three default callbacks for physical mode using a real iommu_domain. This is suitable for drivers using vfio_register_group_dev() Link: https://lore.kernel.org/r/6-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio-iommufd: Allow iommufd to be used in place of a container fdJason Gunthorpe
This makes VFIO_GROUP_SET_CONTAINER accept both a vfio container FD and an iommufd. In iommufd mode an IOAS will exist after the SET_CONTAINER, but it will not be attached to any groups. For VFIO this means that the VFIO_GROUP_GET_STATUS and VFIO_GROUP_FLAGS_VIABLE works subtly differently. With the container FD the iommu_group_claim_dma_owner() is done during SET_CONTAINER but for IOMMUFD this is done during VFIO_GROUP_GET_DEVICE_FD. Meaning that VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due to viability. As GET_DEVICE_FD can fail for many reasons already this is not expected to be a meaningful difference. Reorganize the tests for if the group has an assigned container or iommu into a vfio_group_has_iommu() function and consolidate all the duplicated WARN_ON's etc related to this. Call container functions only if a container is actually present on the group. Link: https://lore.kernel.org/r/5-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()Jason Gunthorpe
iommufd doesn't establish the iommu_domains until after the device FD is opened, even if the container has been set. This design is part of moving away from the group centric iommu APIs. This is fine, except that the normal sequence of establishing the kvm wbinvd won't work: group = open("/dev/vfio/XX") ioctl(group, VFIO_GROUP_SET_CONTAINER) ioctl(kvm, KVM_DEV_VFIO_GROUP_ADD) ioctl(group, VFIO_GROUP_GET_DEVICE_FD) As the domains don't start existing until GET_DEVICE_FD. Further, GET_DEVICE_FD requires that KVM_DEV_VFIO_GROUP_ADD already be done as that is what sets the group->kvm and thus device->kvm for the driver to use during open. Now that we have device centric cap ops and the new IOMMU_CAP_ENFORCE_CACHE_COHERENCY we know what the iommu_domain will be capable of without having to create it. Use this to compute vfio_file_enforced_coherent() and resolve the ordering problems. VFIO always tries to upgrade domains to enforce cache coherency, it never attaches a device that supports enforce cache coherency to a less capable domain, so the cap test is a sufficient proxy for the ultimate outcome. iommufd also ensures that devices that set the cap will be connected to enforcing domains. Link: https://lore.kernel.org/r/4-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio: Rename vfio_device_assign/unassign_container()Jason Gunthorpe
These functions don't really assign anything anymore, they just increment some refcounts and do a sanity check. Call them vfio_group_[un]use_container() Link: https://lore.kernel.org/r/3-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio: Move vfio_device_assign_container() into vfio_device_first_open()Jason Gunthorpe
The only thing this function does is assert the group has an assigned container and incrs refcounts. The overall model we have is that once a container_users refcount is incremented it cannot be de-assigned from the group - vfio_group_ioctl_unset_container() will fail and the group FD cannot be closed. Thus we do not need to check this on every device FD open, just the first. Reorganize the code so that only the first open and last close manages the container. Link: https://lore.kernel.org/r/2-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio: Move vfio_device driver open/close code to a functionJason Gunthorpe
This error unwind is getting complicated. Move all the code into two pair'd function. The functions should be called when the open_count == 1 after incrementing/before decrementing. Link: https://lore.kernel.org/r/1-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio/ap: Validate iova during dma_unmap and trigger irq disableMatthew Rosato
Currently, each mapped iova is stashed in its associated vfio_ap_queue; when we get an unmap request, validate that it matches with one or more of these stashed values before attempting unpins. Each stashed iova represents IRQ that was enabled for a queue. Therefore, if a match is found, trigger IRQ disable for this queue to ensure that underlying firmware will no longer try to use the associated pfn after the page is unpinned. IRQ disable will also handle the associated unpin. Link: https://lore.kernel.org/r/20221202135402.756470-3-yi.l.liu@intel.com Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02i915/gvt: Move gvt mapping cache initialization to intel_vgpu_init_dev()Yi Liu
vfio container registers .dma_unmap() callback after the device is opened. So it's fine for mdev drivers to initialize internal mapping cache in .open_device(). See vfio_device_container_register(). Now with iommufd an access ops with an unmap callback is registered when the device is bound to iommufd which is before .open_device() is called. This implies gvt's .dma_unmap() could be called before its internal mapping cache is initialized. The fix is moving gvt mapping cache initialization to vGPU init. While at it also move ptable initialization together. Link: https://lore.kernel.org/r/20221202135402.756470-2-yi.l.liu@intel.com Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Add additional invariant assertionsJason Gunthorpe
These are on performance paths so we protect them using the CONFIG_IOMMUFD_TEST to not take a hit during normal operation. These are useful when running the test suite and syzkaller to find data structure inconsistencies early. Link: https://lore.kernel.org/r/18-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Add some fault injection pointsJason Gunthorpe
This increases the coverage the fail_nth test gets, as well as via syzkaller. Link: https://lore.kernel.org/r/17-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Add kernel support for testing iommufdJason Gunthorpe
Provide a mock kernel module for the iommu_domain that allows it to run without any HW and the mocking provides a way to directly validate that the PFNs loaded into the iommu_domain are correct. This exposes the access kAPI toward userspace to allow userspace to explore the functionality of pages.c and io_pagetable.c The mock also simulates the rare case of PAGE_SIZE > iommu page size as the mock will operate at a 2K iommu page size. This allows exercising all of the calculations to support this mismatch. This is also intended to support syzkaller exploring the same space. However, it is an unusually invasive config option to enable all of this. The config option should not be enabled in a production kernel. Link: https://lore.kernel.org/r/16-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390 Tested-by: Eric Auger <eric.auger@redhat.com> # aarch64 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: vfio container FD ioctl compatibilityJason Gunthorpe
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by mapping them into io_pagetable operations. A userspace application can test against iommufd and confirm compatibility then simply make a small change to open /dev/iommu instead of /dev/vfio/vfio. For testing purposes /dev/vfio/vfio can be symlinked to /dev/iommu and then all applications will use the compatibility path with no code changes. A later series allows /dev/vfio/vfio to be directly provided by iommufd, which allows the rlimit mode to work the same as well. This series just provides the iommufd side of compatibility. Actually linking this to VFIO_SET_CONTAINER is a followup series, with a link in the cover letter. Internally the compatibility API uses a normal IOAS object that, like vfio, is automatically allocated when the first device is attached. Userspace can also query or set this IOAS object directly using the IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only features while still using the VFIO style map/unmap ioctls. While this is enough to operate qemu, it has a few differences: - Resource limits rely on memory cgroups to bound what userspace can do instead of the module parameter dma_entry_limit. - VFIO P2P is not implemented. The DMABUF patches for vfio are a start at a solution where iommufd would import a special DMABUF. This is to avoid further propogating the follow_pfn() security problem. - A full audit for pedantic compatibility details (eg errnos, etc) has not yet been done - powerpc SPAPR is left out, as it is not connected to the iommu_domain framework. It seems interest in SPAPR is minimal as it is currently non-working in v6.1-rc1. They will have to convert to the iommu subsystem framework to enjoy iommfd. The following are not going to be implemented and we expect to remove them from VFIO type1: - SW access 'dirty tracking'. As discussed in the cover letter this will be done in VFIO. - VFIO_TYPE1_NESTING_IOMMU https://lore.kernel.org/all/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/ - VFIO_DMA_MAP_FLAG_VADDR https://lore.kernel.org/all/Yz777bJZjTyLrHEQ@nvidia.com/ Link: https://lore.kernel.org/r/15-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Add kAPI toward external drivers for kernel accessJason Gunthorpe
Kernel access is the mode that VFIO "mdevs" use. In this case there is no struct device and no IOMMU connection. iommufd acts as a record keeper for accesses and returns the actual struct pages back to the caller to use however they need. eg with kmap or the DMA API. Each caller must create a struct iommufd_access with iommufd_access_create(), similar to how iommufd_device_bind() works. Using this struct the caller can access blocks of IOVA using iommufd_access_pin_pages() or iommufd_access_rw(). Callers must provide a callback that immediately unpins any IOVA being used within a range. This happens if userspace unmaps the IOVA under the pin. The implementation forwards the access requests directly to the iopt infrastructure that manages the iopt_pages_access. Link: https://lore.kernel.org/r/14-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Add kAPI toward external drivers for physical devicesJason Gunthorpe
Add the four functions external drivers need to connect physical DMA to the IOMMUFD: iommufd_device_bind() / iommufd_device_unbind() Register the device with iommufd and establish security isolation. iommufd_device_attach() / iommufd_device_detach() Connect a bound device to a page table Binding a device creates a device object ID in the uAPI, however the generic API does not yet provide any IOCTLs to manipulate them. Link: https://lore.kernel.org/r/13-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Add a HW pagetable objectJason Gunthorpe
The hw_pagetable object exposes the internal struct iommu_domain's to userspace. An iommu_domain is required when any DMA device attaches to an IOAS to control the io page table through the iommu driver. For compatibility with VFIO the hw_pagetable is automatically created when a DMA device is attached to the IOAS. If a compatible iommu_domain already exists then the hw_pagetable associated with it is used for the attachment. In the initial series there is no iommufd uAPI for the hw_pagetable object. The next patch provides driver facing APIs for IO page table attachment that allows drivers to accept either an IOAS or a hw_pagetable ID and for the driver to return the hw_pagetable ID that was auto-selected from an IOAS. The expectation is the driver will provide uAPI through its own FD for attaching its device to iommufd. This allows userspace to learn the mapping of devices to iommu_domains and to override the automatic attachment. The future HW specific interface will allow userspace to create hw_pagetable objects using iommu_domains with IOMMU driver specific parameters. This infrastructure will allow linking those domains to IOAS's and devices. Link: https://lore.kernel.org/r/12-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: IOCTLs for the io_pagetableJason Gunthorpe
Connect the IOAS to its IOCTL interface. This exposes most of the functionality in the io_pagetable to userspace. This is intended to be the core of the generic interface that IOMMUFD will provide. Every IOMMU driver should be able to implement an iommu_domain that is compatible with this generic mechanism. It is also designed to be easy to use for simple non virtual machine monitor users, like DPDK: - Universal simple support for all IOMMUs (no PPC special path) - An IOVA allocator that considers the aperture and the allowed/reserved ranges - io_pagetable allows any number of iommu_domains to be connected to the IOAS - Automatic allocation and re-use of iommu_domains Along with room in the design to add non-generic features to cater to specific HW functionality. Link: https://lore.kernel.org/r/11-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Data structure to provide IOVA to PFN mappingJason Gunthorpe
This is the remainder of the IOAS data structure. Provide an object called an io_pagetable that is composed of iopt_areas pointing at iopt_pages, along with a list of iommu_domains that mirror the IOVA to PFN map. At the top this is a simple interval tree of iopt_areas indicating the map of IOVA to iopt_pages. An xarray keeps track of a list of domains. Based on the attached domains there is a minimum alignment for areas (which may be smaller than PAGE_SIZE), an interval tree of reserved IOVA that can't be mapped and an IOVA of allowed IOVA that can always be mappable. The concept of an 'access' refers to something like a VFIO mdev that is accessing the IOVA and using a 'struct page *' for CPU based access. Externally an API is provided that matches the requirements of the IOCTL interface for map/unmap and domain attachment. The API provides a 'copy' primitive to establish a new IOVA map in a different IOAS from an existing mapping by re-using the iopt_pages. This is the basic mechanism to provide single pinning. This is designed to support a pre-registration flow where userspace would setup an dummy IOAS with no domains, map in memory and then establish an access to pin all PFNs into the xarray. Copy can then be used to create new IOVA mappings in a different IOAS, with iommu_domains attached. Upon copy the PFNs will be read out of the xarray and mapped into the iommu_domains, avoiding any pin_user_pages() overheads. Link: https://lore.kernel.org/r/10-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: Algorithms for PFN storageJason Gunthorpe
The iopt_pages which represents a logical linear list of full PFNs held in different storage tiers. Each area points to a slice of exactly one iopt_pages, and each iopt_pages can have multiple areas and accesses. The three storage tiers are managed to meet these objectives: - If no iommu_domain or in-kerenel access exists then minimal memory should be consumed by iomufd - If a page has been pinned then an iopt_pages will not pin it again - If an in-kernel access exists then the xarray must provide the backing storage to avoid allocations on domain removals - Otherwise any iommu_domain will be used for storage In a common configuration with only an iommu_domain the iopt_pages does not allocate significant memory itself. The external interface for pages has several logical operations: iopt_area_fill_domain() will load the PFNs from storage into a single domain. This is used when attaching a new domain to an existing IOAS. iopt_area_fill_domains() will load the PFNs from storage into multiple domains. This is used when creating a new IOVA map in an existing IOAS iopt_pages_add_access() creates an iopt_pages_access that tracks an in-kernel access of PFNs. This is some external driver that might be accessing the IOVA using the CPU, or programming PFNs with the DMA API. ie a VFIO mdev. iopt_pages_rw_access() directly perform a memcpy on the PFNs, without the overhead of iopt_pages_add_access() iopt_pages_fill_xarray() will load PFNs into the xarray and return a 'struct page *' array. It is used by iopt_pages_access's to extract PFNs for in-kernel use. iopt_pages_fill_from_xarray() is a fast path when it is known the xarray is already filled. As an iopt_pages can be referred to in slices by many areas and accesses it uses interval trees to keep track of which storage tiers currently hold the PFNs. On a page-by-page basis any request for a PFN will be satisfied from one of the storage tiers and the PFN copied to target domain/array. Unfill actions are similar, on a page by page basis domains are unmapped, xarray entries freed or struct pages fully put back. Significant complexity is required to fully optimize all of these data motions. The implementation calculates the largest consecutive range of same-storage indexes and operates in blocks. The accumulation of PFNs always generates the largest contiguous PFN range possible to optimize and this gathering can cross storage tier boundaries. For cases like 'fill domains' care is taken to avoid duplicated work and PFNs are read once and pushed into all domains. The map/unmap interaction with the iommu_domain always works in contiguous PFN blocks. The implementation does not require or benefit from any split/merge optimization in the iommu_domain driver. This design suggests several possible improvements in the IOMMU API that would greatly help performance, particularly a way for the driver to map and read the pfns lists instead of working with one driver call per page to read, and one driver call per contiguous range to store. Link: https://lore.kernel.org/r/9-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: PFN handling for iopt_pagesJason Gunthorpe
The top of the data structure provides an IO Address Space (IOAS) that is similar to a VFIO container. The IOAS allows map/unmap of memory into ranges of IOVA called iopt_areas. Multiple IOMMU domains (IO page tables) and in-kernel accesses (like VFIO mdevs) can be attached to the IOAS to access the PFNs that those IOVA areas cover. The IO Address Space (IOAS) datastructure is composed of: - struct io_pagetable holding the IOVA map - struct iopt_areas representing populated portions of IOVA - struct iopt_pages representing the storage of PFNs - struct iommu_domain representing each IO page table in the system IOMMU - struct iopt_pages_access representing in-kernel accesses of PFNs (ie VFIO mdevs) - struct xarray pinned_pfns holding a list of pages pinned by in-kernel accesses This patch introduces the lowest part of the datastructure - the movement of PFNs in a tiered storage scheme: 1) iopt_pages::pinned_pfns xarray 2) Multiple iommu_domains 3) The origin of the PFNs, i.e. the userspace pointer PFN have to be copied between all combinations of tiers, depending on the configuration. The interface is an iterator called a 'pfn_reader' which determines which tier each PFN is stored and loads it into a list of PFNs held in a struct pfn_batch. Each step of the iterator will fill up the pfn_batch, then the caller can use the pfn_batch to send the PFNs to the required destination. Repeating this loop will read all the PFNs in an IOVA range. The pfn_reader and pfn_batch also keep track of the pinned page accounting. While PFNs are always stored and accessed as full PAGE_SIZE units the iommu_domain tier can store with a sub-page offset/length to support IOMMUs with a smaller IOPTE size than PAGE_SIZE. Link: https://lore.kernel.org/r/8-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30iommufd: File descriptor, context, kconfig and makefilesJason Gunthorpe
This is the basic infrastructure of a new miscdevice to hold the iommufd IOCTL API. It provides: - A miscdevice to create file descriptors to run the IOCTL interface over - A table based ioctl dispatch and centralized extendable pre-validation step - An xarray mapping userspace ID's to kernel objects. The design has multiple inter-related objects held within in a single IOMMUFD fd - A simple usage count to build a graph of object relations and protect against hostile userspace racing ioctls The only IOCTL provided in this patch is the generic 'destroy any object by handle' operation. Link: https://lore.kernel.org/r/6-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-29iommu: Add device-centric DMA ownership interfacesLu Baolu
These complement the group interfaces used by VFIO and are for use by iommufd. The main difference is that multiple devices in the same group can all share the ownership by passing the same ownership pointer. Move the common code into shared functions. Link: https://lore.kernel.org/r/2-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-29iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCYJason Gunthorpe
This queries if a domain linked to a device should expect to support enforce_cache_coherency() so iommufd can negotiate the rules for when a domain should be shared or not. For iommufd a device that declares IOMMU_CAP_ENFORCE_CACHE_COHERENCY will not be attached to a domain that does not support it. Link: https://lore.kernel.org/r/1-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-27Merge tag 'usb-6.1-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for 6.1-rc7 that resolve some reported problems: - cdnsp driver fixes for reported problems - dwc3 fixes for some small reported problems - uvc gadget driver fix for reported regression All of these have been in linux-next with no reported problems" * tag 'usb-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1 usb: dwc3: gadget: Clear ep descriptor last usb: dwc3: exynos: Fix remove() function usb: cdnsp: Fix issue with Clear Feature Halt Endpoint usb: dwc3: gadget: Disable GUSB2PHYCFG.SUSPHY for End Transfer usb: gadget: uvc: also use try_format in set_format
2022-11-27Merge tag 'char-misc-6.1-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 6.1-rc7, they include: - build warning fix for the vdso when using new versions of grep - iio driver fixes for reported issues - small nvmem driver fixes - fpga Kconfig fix - interconnect dt binding fix All of these have been in linux-next with no reported issues" * tag 'char-misc-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: lib/vdso: use "grep -E" instead of "egrep" nvmem: lan9662-otp: Change return type of lan9662_otp_wait_flag_clear() nvmem: rmem: Fix return value check in rmem_read() fpga: m10bmc-sec: Fix kconfig dependencies dt-bindings: iio: adc: Remove the property "aspeed,trim-data-valid" iio: adc: aspeed: Remove the trim valid dts property. iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails iio: accel: bma400: Fix memory leak in bma400_get_steps_reg() iio: light: rpr0521: add missing Kconfig dependencies iio: health: afe4404: Fix oob read in afe4404_[read|write]_raw iio: health: afe4403: Fix oob read in afe4403_read_raw iio: light: apds9960: fix wrong register for gesture gain dt-bindings: interconnect: qcom,msm8998-bwmon: Correct SC7280 CPU compatible
2022-11-27Merge tag 'timers_urgent_for_v6.1_rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Return the proper timer register width (31 bits) for a 32-bit signed register in order to avoid a timer interrupt storm on ARM XGene-1 hardware running in NO_HZ mode * tag 'timers_urgent_for_v6.1_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error
2022-11-27Merge tag 'x86_urgent_for_v6.1_rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - ioremap: mask out the bits which are not part of the physical address *after* the size computation is done to prevent any hypothetical ioremap failures - Change the MSR save/restore functionality during suspend to rely on flags denoting that the related MSRs are actually supported vs reading them and assuming they are (an Atom one allows reading but not writing, thus breaking this scheme at resume time) - prevent IV reuse in the AES-GCM communication scheme between SNP guests and the AMD secure processor * tag 'x86_urgent_for_v6.1_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioremap: Fix page aligned size calculation in __ioremap_caller() x86/pm: Add enumeration check before spec MSRs save/restore setup x86/tsx: Add a feature bit for TSX control MSR support virt/sev-guest: Prevent IV reuse in the SNP guest driver
2022-11-25Merge tag 'for-v6.1-rc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: - rk817: Two error handling fixes - ip5xxx: fix inter overflow in current calculation - ab8500: fix thermal zone probing * tag 'for-v6.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: supply: ab8500: Defer thermal zone probe power: supply: ip5xxx: Fix integer overflow in current_now calculation power: supply: rk817: Change rk817_chg_cur_to_reg to int power: supply: rk817: check correct variable
2022-11-25Merge tag 'block-6.1-2022-11-25' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - A few fixes for s390 sads (Stefan, Colin) - Ensure that ublk doesn't reorder requests, as that can be problematic on devices that need specific ordering (Ming) - Fix a queue reference leak in disk allocation handling (Christoph) * tag 'block-6.1-2022-11-25' of git://git.kernel.dk/linux: ublk_drv: don't forward io commands in reserve order s390/dasd: fix possible buffer overflow in copy_pair_show s390/dasd: fix no record found for raw_track_access s390/dasd: increase printing of debug data payload s390/dasd: Fix spelling mistake "Ivalid" -> "Invalid" blk-mq: fix queue reference leak on blk_mq_alloc_disk_for_queue failure
2022-11-25Merge tag 'regulator-fix-v6.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "This is more changes than I'd like this late although the diffstat is still fairly small, I kept on holding off as new fixes came in to give things time to soak in -next but should probably have tagged and sent an additional pull request earlier. There's some relatively large fixes to the twl6030 driver to fix issues with the TWL6032 variant which resulted from some work on the core TWL6030 driver, a couple of fixes for error handling paths (mostly in the core), and a nice stability fix for the sgl51000 driver that's been pulled out of a BSP" * tag 'regulator-fix-v6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: twl6030: fix get status of twl6032 regulators regulator: twl6030: re-add TWL6032_SUBCLASS regulator: slg51000: Wait after asserting CS pin regulator: core: fix UAF in destroy_regulator() regulator: rt5759: fix OOB in validate_desc() regulator: core: fix kobject release warning and memory leak in regulator_register()
2022-11-25Merge tag 'pm-6.1-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These revert a recent change in the schedutil cpufreq governor that had not been expected to make any functional difference, but turned out to introduce a performance regression, fix an initialization issue in the amd-pstate driver and make it actually replace the venerable ACPI cpufreq driver on the supported systems by default. Specifics: - Revert a recent schedutil cpufreq governor change that introduced a performace regression on Pixel 6 (Sam Wu) - Fix amd-pstate driver initialization after running the kernel via kexec (Wyes Karny) - Turn amd-pstate into a built-in driver which allows it to take precedence over acpi-cpufreq by default on supported systems and amend it with a mechanism to disable this behavior (Perry Yuan) - Update amd-pstate documentation in accordance with the other changes made to it (Perry Yuan)" * tag 'pm-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Documentation: add amd-pstate kernel command line options Documentation: amd-pstate: add driver working mode introduction cpufreq: amd-pstate: add amd-pstate driver parameter for mode selection cpufreq: amd-pstate: change amd-pstate driver to be built-in type cpufreq: amd-pstate: cpufreq: amd-pstate: reset MSR_AMD_PERF_CTL register at init Revert "cpufreq: schedutil: Move max CPU capacity to sugov_policy"
2022-11-25Merge tag 's390-6.1-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Fix size of incorrectly increased from four to eight bytes TOD field of crash dump save area. As result in case of kdump NT_S390_TODPREG ELF notes section contains correct value and "detected read beyond size of field" compiler warning goes away. - Fix memory leak in cryptographic Adjunct Processors (AP) module on initialization failure path. - Add Gerald Schaefer <gerald.schaefer@linux.ibm.com> and Alexander Gordeev <agordeev@linux.ibm.com> as S390 memory management maintainers. Also rename the S390 section to S390 ARCHITECTURE to be a bit more precise. * tag 's390-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: MAINTAINERS: add S390 MM section s390/crashdump: fix TOD programmable field size s390/ap: fix memory leak in ap_init_qci_info()
2022-11-25Merge tag 'hyperv-fixes-signed-20221125' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix IRTE allocation in Hyper-V PCI controller (Dexuan Cui) - Fix handling of SCSI srb_status and capacity change events (Michael Kelley) - Restore VP assist page after CPU offlining and onlining (Vitaly Kuznetsov) - Fix some memory leak issues in VMBus (Yang Yingliang) * tag 'hyperv-fixes-signed-20221125' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: fix possible memory leak in vmbus_device_register() Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work() PCI: hv: Only reuse existing IRTE allocation for Multi-MSI scsi: storvsc: Fix handling of srb_status and capacity change events x86/hyperv: Restore VP assist page after cpu offlining/onlining
2022-11-25Merge tag 'drm-fixes-2022-11-25' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly fixes, amdgpu has not quite settled down. Most of the changes are small, and the non-amdgpu ones are all fine. There are a bunch of DP MST DSC fixes that fix some issues introduced in a previous larger MST rework. The biggest one is mainly propagating some error values properly instead of bool returns, and I think it just looks large but doesn't really change anything too much, except propagating errors that are required to avoid deadlocks. I've gone over it and a few others and they've had some decent testing over the last few weeks. Summary: amdgpu: - amdgpu gang submit fix - DCN 3.1.4 fixes - DP MST DSC deadlock fixes - HMM userptr fixes - Fix Aldebaran CU occupancy reporting - GFX11 fixes - PSP suspend/resume fix - DCE12 KASAN fix - DCN 3.2.x fixes - Rotated cursor fix - SMU 13.x fix - DELL platform suspend/resume fixes - VCN4 SR-IOV fix - Display regression fix for polled connectors i915: - Fix GVT KVM reference count handling - Never purge busy TTM objects - Fix warn in intel_display_power_*_domain() functions dma-buf: - Use dma_fence_unwrap_for_each when importing sync files - Fix race in dma_heap_add() fbcon: - Fix use of uninitialized memory in logo" * tag 'drm-fixes-2022-11-25' of git://anongit.freedesktop.org/drm/drm: (30 commits) drm/amdgpu/vcn: re-use original vcn0 doorbell value drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read" drm/amd/display: No display after resume from WB/CB drm/amdgpu: fix use-after-free during gpu recovery drm/amd/pm: update driver if header for smu_13_0_7 drm/amd/display: Fix rotated cursor offset calculation drm/amd/display: Use new num clk levels struct for max mclk index drm/amd/display: Avoid setting pixel rate divider to N/A drm/amd/display: Use viewport height for subvp mall allocation size drm/amd/display: Update soc bounding box for dcn32/dcn321 drm/amd/dc/dce120: Fix audio register mapping, stop triggering KASAN drm/amdgpu/psp: don't free PSP buffers on suspend fbcon: Use kzalloc() in fbcon_prepare_logo() dma-buf: fix racing conflict of dma_heap_add() drm/amd/amdgpu: reserve vm invalidation engine for firmware drm/amdgpu: Enable Aldebaran devices to report CU Occupancy drm/amdgpu: fix userptr HMM range handling v2 drm/amdgpu: always register an MMU notifier for userptr drm/amdgpu/dm/mst: Fix uninitialized var in pre_compute_mst_dsc_configs_for_state() drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing DSC state ...