From 791c2b17fb4023f21c3cbf5f268af01d9b8cb7cc Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Thu, 13 Apr 2023 14:40:25 +0100 Subject: iommu: Optimise PCI SAC address trick Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for PCI SAC address trick") and its subsequent revert, this mechanism no longer serves its original purpose, but now only works around broken hardware/drivers in a way that is unfortunately too impactful to remove. This does not, however, prevent us from solving the performance impact which that workaround has on large-scale systems that don't need it. Once the 32-bit IOVA space fills up and a workload starts allocating and freeing on both sides of the boundary, the opportunistic SAC allocation can then end up spending significant time hunting down scattered fragments of free 32-bit space, or just reestablishing max32_alloc_size. This can easily be exacerbated by a change in allocation pattern, such as by changing the network MTU, which can increase pressure on the 32-bit space by leaving a large quantity of cached IOVAs which are now the wrong size to be recycled, but also won't be freed since the non-opportunistic allocations can still be satisfied from the whole 64-bit space without triggering the reclaim path. However, in the context of a workaround where smaller DMA addresses aren't simply a preference but a necessity, if we get to that point at all then in fact it's already the endgame. The nature of the allocator is currently such that the first IOVA we give to a device after the 32-bit space runs out will be the highest possible address for that device, ever. If that works, then great, we know we can optimise for speed by always allocating from the full range. And if it doesn't, then the worst has already happened and any brokenness is now showing, so there's little point in continuing to try to hide it. To that end, implement a flag to refine the SAC business into a per-device policy that can automatically get itself out of the way if and when it stops being useful. CC: Linus Torvalds CC: Jakub Kicinski Reviewed-by: John Garry Signed-off-by: Robin Murphy Tested-by: Vasant Hegde Tested-by: Jakub Kicinski Link: https://lore.kernel.org/r/b8502b115b915d2a3fabde367e099e39106686c8.1681392791.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux/iommu.h') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index d31642596675..b1dcb1b9b170 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -409,6 +409,7 @@ struct iommu_fault_param { * @priv: IOMMU Driver private data * @max_pasids: number of PASIDs this device can consume * @attach_deferred: the dma domain attachment is deferred + * @pci_32bit_workaround: Limit DMA allocations to 32-bit IOVAs * * TODO: migrate other per device data pointers under iommu_dev_data, e.g. * struct iommu_group *iommu_group; @@ -422,6 +423,7 @@ struct dev_iommu { void *priv; u32 max_pasids; u32 attach_deferred:1; + u32 pci_32bit_workaround:1; }; int iommu_device_register(struct iommu_device *iommu, -- cgit From 4298780126c298f20ae4bc8676591eaf8c48767e Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Wed, 9 Aug 2023 20:47:54 +0800 Subject: iommu: Generalize PASID 0 for normal DMA w/o PASID PCIe Process address space ID (PASID) is used to tag DMA traffic, it provides finer grained isolation than requester ID (RID). For each device/RID, 0 is a special PASID for the normal DMA (no PASID). This is universal across all architectures that supports PASID, therefore warranted to be reserved globally and declared in the common header. Consequently, we can avoid the conflict between different PASID use cases in the generic code. e.g. SVA and DMA API with PASIDs. This paved away for device drivers to choose global PASID policy while continue doing normal DMA. Noting that VT-d could support none-zero RID/NO_PASID, but currently not used. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Reviewed-by: Jean-Philippe Brucker Reviewed-by: Jason Gunthorpe Signed-off-by: Jacob Pan Link: https://lore.kernel.org/r/20230802212427.1497170-2-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/iommu.h') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index d31642596675..2870bc29d456 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -196,6 +196,7 @@ enum iommu_dev_features { IOMMU_DEV_FEAT_IOPF, }; +#define IOMMU_NO_PASID (0U) /* Reserved for DMA w/o PASID */ #define IOMMU_PASID_INVALID (-1U) typedef unsigned int ioasid_t; -- cgit From 2dcebc7ddce7ffd4015824227c7623558b89d721 Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Wed, 9 Aug 2023 20:47:55 +0800 Subject: iommu: Move global PASID allocation from SVA to core Intel ENQCMD requires a single PASID to be shared between multiple devices, as the PASID is stored in a single MSR register per-process and userspace can use only that one PASID. This means that the PASID allocation for any ENQCMD using device driver must always come from a shared global pool, regardless of what kind of domain the PASID will be used with. Split the code for the global PASID allocator into iommu_alloc/free_global_pasid() so that drivers can attach non-SVA domains to PASIDs as well. This patch moves global PASID allocation APIs from SVA to IOMMU APIs. Reserved PASIDs, currently only RID_PASID, are excluded from the global PASID allocation. It is expected that device drivers will use the allocated PASIDs to attach to appropriate IOMMU domains for use. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Signed-off-by: Jacob Pan Link: https://lore.kernel.org/r/20230802212427.1497170-3-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux/iommu.h') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2870bc29d456..6b821cb715d5 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -197,6 +197,7 @@ enum iommu_dev_features { }; #define IOMMU_NO_PASID (0U) /* Reserved for DMA w/o PASID */ +#define IOMMU_FIRST_GLOBAL_PASID (1U) /*starting range for allocation */ #define IOMMU_PASID_INVALID (-1U) typedef unsigned int ioasid_t; @@ -728,6 +729,8 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct iommu_domain * iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid, unsigned int type); +ioasid_t iommu_alloc_global_pasid(struct device *dev); +void iommu_free_global_pasid(ioasid_t pasid); #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; @@ -1089,6 +1092,13 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid, { return NULL; } + +static inline ioasid_t iommu_alloc_global_pasid(struct device *dev) +{ + return IOMMU_PASID_INVALID; +} + +static inline void iommu_free_global_pasid(ioasid_t pasid) {} #endif /* CONFIG_IOMMU_API */ /** -- cgit From a48ce36e2786f8bb33e9f42ea2d0c23ea4af3e0d Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 9 Aug 2023 20:48:02 +0800 Subject: iommu: Prevent RESV_DIRECT devices from blocking domains The IOMMU_RESV_DIRECT flag indicates that a memory region must be mapped 1:1 at all times. This means that the region must always be accessible to the device, even if the device is attached to a blocking domain. This is equal to saying that IOMMU_RESV_DIRECT flag prevents devices from being attached to blocking domains. This also implies that devices that implement RESV_DIRECT regions will be prevented from being assigned to user space since taking the DMA ownership immediately switches to a blocking domain. The rule of preventing devices with the IOMMU_RESV_DIRECT regions from being assigned to user space has existed in the Intel IOMMU driver for a long time. Now, this rule is being lifted up to a general core rule, as other architectures like AMD and ARM also have RMRR-like reserved regions. This has been discussed in the community mailing list and refer to below link for more details. Other places using unmanaged domains for kernel DMA must follow the iommu_get_resv_regions() and setup IOMMU_RESV_DIRECT - we do not restrict them in the core code. Cc: Robin Murphy Cc: Alex Williamson Cc: Kevin Tian Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/linux-iommu/BN9PR11MB5276E84229B5BD952D78E9598C639@BN9PR11MB5276.namprd11.prod.outlook.com Signed-off-by: Lu Baolu Reviewed-by: Jason Gunthorpe Acked-by: Joerg Roedel Link: https://lore.kernel.org/r/20230724060352.113458-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux/iommu.h') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6b821cb715d5..54bae452975f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -411,6 +411,7 @@ struct iommu_fault_param { * @priv: IOMMU Driver private data * @max_pasids: number of PASIDs this device can consume * @attach_deferred: the dma domain attachment is deferred + * @require_direct: device requires IOMMU_RESV_DIRECT regions * * TODO: migrate other per device data pointers under iommu_dev_data, e.g. * struct iommu_group *iommu_group; @@ -424,6 +425,7 @@ struct dev_iommu { void *priv; u32 max_pasids; u32 attach_deferred:1; + u32 require_direct:1; }; int iommu_device_register(struct iommu_device *iommu, -- cgit