summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)Author
2023-11-29iommufd: Add iommufd_ctx to iommufd_put_object()Jason Gunthorpe
Will be used in the next patch. Link: https://lore.kernel.org/r/1-v2-ca9e00171c5b+123-iommufd_syz4_jgg@nvidia.com/ Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-11-27Merge branch 'iommu/fixes' into coreJoerg Roedel
2023-11-27iommu/apple-dart: Use readl instead of readl_relaxed for consistencySven Peter
While the readl_relaxed in apple_dart_suspend is correct the rest of the driver uses the non-relaxed variants everywhere and the single readl_relaxed is inconsistent and possibly confusing. Signed-off-by: Sven Peter <sven@svenpeter.dev> Acked-by: Hector Martin <marcan@marcan.st> Reviewed-by: Neal Gompa <neal@gompa.dev> Link: https://lore.kernel.org/r/20231126162009.17934-1-sven@svenpeter.dev Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/apple-dart: Add support for t8103 USB4 DARTSven Peter
This variant of the regular t8103 DART is used for the two USB4/Thunderbolt PCIe controllers. It supports 64 instead of 16 streams which requires a slightly different MMIO layout. Acked-by: Hector Martin <marcan@marcan.st> Signed-off-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20231126151701.16534-4-sven@svenpeter.dev Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/apple-dart: Write to all DART_T8020_STREAM_SELECTSven Peter
We're about to add support for a DART variant that use more than 16 streams and requires writing to two separate stream select registers when issuing TLB flushes. Acked-by: Hector Martin <marcan@marcan.st> Signed-off-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20231126151701.16534-3-sven@svenpeter.dev Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Extend LPAE page table format to support custom allocatorsBoris Brezillon
We need that in order to implement the VM_BIND ioctl in the GPU driver targeting new Mali GPUs. VM_BIND is about executing MMU map/unmap requests asynchronously, possibly after waiting for external dependencies encoded as dma_fences. We intend to use the drm_sched framework to automate the dependency tracking and VM job dequeuing logic, but this comes with its own set of constraints, one of them being the fact we are not allowed to allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this sort of deadlocks: - VM_BIND map job needs to allocate a page table to map some memory to the VM. No memory available, so kswapd is kicked - GPU driver shrinker backend ends up waiting on the fence attached to the VM map job or any other job fence depending on this VM operation. With custom allocators, we will be able to pre-reserve enough pages to guarantee the map/unmap operations we queued will take place without going through the system allocator. But we can also optimize allocation/reservation by not free-ing pages immediately, so any upcoming page table allocation requests can be serviced by some free page table pool kept at the driver level. I might also be valuable for other aspects of GPU and similar use-cases, like fine-grained memory accounting and resource limiting. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20231124142434.1577550-3-boris.brezillon@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Allow passing custom allocators to pgtable driversBoris Brezillon
This will be useful for GPU drivers who want to keep page tables in a pool so they can: - keep freed page tables in a free pool and speed-up upcoming page table allocations - batch page table allocation instead of allocating one page at a time - pre-reserve pages for page tables needed for map/unmap operations, to ensure map/unmap operations don't try to allocate memory in paths they're allowed to block or fail It might also be valuable for other aspects of GPU and similar use-cases, like fine-grained memory accounting and resource limiting. We will extend the Arm LPAE format to support custom allocators in a separate commit. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20231124142434.1577550-2-boris.brezillon@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Set variable intel_dirty_ops to staticKunwu Chan
Fix the following warning: drivers/iommu/intel/iommu.c:302:30: warning: symbol 'intel_dirty_ops' was not declared. Should it be static? This variable is only used in its defining file, so it should be static. Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20231120101025.1103404-1-chentao@kylinos.cn Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Fix incorrect cache invalidation for mm notificationLu Baolu
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space. However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly. Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly. Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc: stable@vger.kernel.org Cc: Huang Ying <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Tested-by: Luo Yuzhang <yuzhang.luo@intel.com> # QAT Tested-by: Tony Zhu <tony.zhu@intel.com> # DSA Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231117090933.75267-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Add MTL to quirk list to skip TE disablingAbdul Halim, Mohd Syazwan
The VT-d spec requires (10.4.4 Global Command Register, TE field) that: Hardware implementations supporting DMA draining must drain any in-flight DMA read/write requests queued within the Root-Complex before switching address translation on or off and reflecting the status of the command through the TES field in the Global Status register. Unfortunately, some integrated graphic devices fail to do so after some kind of power state transition. As the result, the system might stuck in iommu_disable_translation(), waiting for the completion of TE transition. Add MTL to the quirk list for those devices and skips TE disabling if the qurik hits. Fixes: b1012ca8dc4f ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") Cc: stable@vger.kernel.org Signed-off-by: Abdul Halim, Mohd Syazwan <mohd.syazwan.abdul.halim@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116022324.30120-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Make context clearing consistent with context mappingLu Baolu
In the iommu probe_device path, domain_context_mapping() allows setting up the context entry for a non-PCI device. However, in the iommu release_device path, domain_context_clear() only clears context entries for PCI devices. Make domain_context_clear() behave consistently with domain_context_mapping() by clearing context entries for both PCI and non-PCI devices. Fixes: 579305f75d34 ("iommu/vt-d: Update to use PCI DMA aliases") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Disable PCI ATS in legacy passthrough modeLu Baolu
When IOMMU hardware operates in legacy mode, the TT field of the context entry determines the translation type, with three supported types (Section 9.3 Context Entry): - DMA translation without device TLB support - DMA translation with device TLB support - Passthrough mode with translated and translation requests blocked Device TLB support is absent when hardware is configured in passthrough mode. Disable the PCI ATS feature when IOMMU is configured for passthrough translation type in legacy (non-scalable) mode. Fixes: 0faa19a1515f ("iommu/vt-d: Decouple PASID & PRI enabling from SVA") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Omit devTLB invalidation requests when TES=0Lu Baolu
The latest VT-d spec indicates that when remapping hardware is disabled (TES=0 in Global Status Register), upstream ATS Invalidation Completion requests are treated as UR (Unsupported Request). Consequently, the spec recommends in section 4.3 Handling of Device-TLB Invalidations that software refrain from submitting any Device-TLB invalidation requests when address remapping hardware is disabled. Verify address remapping hardware is enabled prior to submitting Device- TLB invalidation requests. Fixes: 792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Support enforce_cache_coherency only for empty domainsLu Baolu
The enforce_cache_coherency callback ensures DMA cache coherency for devices attached to the domain. Intel IOMMU supports enforced DMA cache coherency when the Snoop Control bit in the IOMMU's extended capability register is set. Supporting it differs between legacy and scalable modes. In legacy mode, it's supported page-level by setting the SNP field in second-stage page-table entries. In scalable mode, it's supported in PASID-table granularity by setting the PGSNP field in PASID-table entries. In legacy mode, mappings before attaching to a device have SNP fields cleared, while mappings after the callback have them set. This means partial DMAs are cache coherent while others are not. One possible fix is replaying mappings and flipping SNP bits when attaching a domain to a device. But this seems to be over-engineered, given that all real use cases just attach an empty domain to a device. To meet practical needs while reducing mode differences, only support enforce_cache_coherency on a domain without mappings if SNP field is used. Fixes: fc0051cb9590 ("iommu/vt-d: Check domain force_snooping against attached devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Clean up open-coded ownership checksRobin Murphy
Some drivers already implement their own defence against the possibility of being given someone else's device. Since this is now taken care of by the core code (and via a slightly different path from the original fwspec-based idea), let's clean them up. Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/58a9879ce3f03562bb061e6714fe6efb554c3907.1700589539.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Retire bus opsRobin Murphy
With the rest of the API internals converted, it's time to finally tackle probe_device and how we bootstrap the per-device ops association to begin with. This ends up being disappointingly straightforward, since fwspec users are already doing it in order to find their of_xlate callback, and it works out that we can easily do the equivalent for other drivers too. Then shuffle the remaining awareness of iommu_ops into the couple of core headers that still need it, and breathe a sigh of relief. Ding dong the bus ops are gone! CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/a59011ef65b4b6657cb0b7a388d786b779b61305.1700589539.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/arm-smmu: Don't register fwnode for legacy bindingRobin Murphy
When using the legacy binding we bypass the of_xlate mechanism, so avoid registering the instance fwnodes which act as keys for that. This will help __iommu_probe_device() to retrieve the registered ops the same way as for x86 etc. when no fwspec has previously been set up by of_xlate. Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/18b0f812a42a74dd6924aea24e68ab409d6e1b52.1700589539.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Decouple iommu_domain_alloc() from bus opsRobin Murphy
As the final remaining piece of bus-dependent API, iommu_domain_alloc() can now take responsibility for the "one iommu_ops per bus" rule for itself. It turns out we can't safely make the internal allocation call any more group-based or device-based yet - that will have to wait until the external callers can pass the right thing - but we can at least get as far as deriving "bus ops" based on which driver is actually managing devices on the given bus, rather than whichever driver won the race to register first. This will then leave us able to convert the last of the core internals over to the IOMMU-instance model, allow multiple drivers to register and actually coexist (modulo the above limitation for unmanaged domain users in the short term), and start trying to solve the long-standing iommu_probe_device() mess. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/6c7313009aae0e39ae2855920990ebf85af4662f.1700589539.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Validate that devices match domainsRobin Murphy
Before we can allow drivers to coexist, we need to make sure that one driver's domain ops can't misinterpret another driver's dev_iommu_priv data. To that end, add a token to the domain so we can remember how it was allocated - for now this may as well be the device ops, since they still correlate 1:1 with drivers. We can trust ourselves for internal default domain attachment, so add checks to cover all the public attach interfaces. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/097c6f30480e4efe12195d00ba0e84ea4837fb4c.1700589539.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Decouple iommu_present() from bus opsRobin Murphy
Much as I'd like to remove iommu_present(), the final remaining users are proving stubbornly difficult to clean up, so kick that can down the road and just rework it to preserve the current behaviour without depending on bus ops. Since commit 57365a04c921 ("iommu: Move bus setup to IOMMU device registration"), any registered IOMMU instance is already considered "present" for every entry in iommu_buses, so it's simply a case of validating the bus and checking we have at least once IOMMU. Reviewed-by: Jason Gunthorpe<jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/caa93680bb9d35a8facbcd8ff46267ca67335229.1700589539.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Factor out some helpersRobin Murphy
The pattern for picking the first device out of the group list is repeated a few times now, so it's clearly worth factoring out, which also helps hide the iommu_group_dev detail from places that don't need to know. Similarly, the safety check for dev_iommu_ops() at certain public interfaces starts looking a bit repetitive, and might not be completely obvious at first glance, so let's factor that out for clarity as well, in preparation for more uses of both. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/566cbd161546caa6aed49662c9b3e8f09dc9c3cf.1700589539.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/virtio: Add ops->flush_iotlb_all and enable deferred flushNiklas Schnelle
Add ops->flush_iotlb_all operation to enable virtio-iommu for the dma-iommu deferred flush scheme. This results in a significant increase in performance in exchange for a window in which devices can still access previously IOMMU mapped memory when running with CONFIG_IOMMU_DEFAULT_DMA_LAZY. The previous strict behavior can be achieved with iommu.strict=1 on the kernel command line or CONFIG_IOMMU_DEFAULT_DMA_STRICT. Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/ Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20231120-viommu-sync-map-v3-2-50a57ecf78b5@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/virtio: Make use of ops->iotlb_sync_mapNiklas Schnelle
Pull out the sync operation from viommu_map_pages() by implementing ops->iotlb_sync_map. This allows the common IOMMU code to map multiple elements of an sg with a single sync (see iommu_map_sg()). Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/ Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20231120-viommu-sync-map-v3-1-50a57ecf78b5@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/amd: Set variable amd_dirty_ops to staticKunwu Chan
Fix the followng warning: drivers/iommu/amd/iommu.c:67:30: warning: symbol 'amd_dirty_ops' was not declared. Should it be static? This variable is only used in its defining file, so it should be static. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231120095342.1102999-1-chentao@kylinos.cn Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Avoid more races around device probeRobin Murphy
It turns out there are more subtle races beyond just the main part of __iommu_probe_device() itself running in parallel - the dev_iommu_free() on the way out of an unsuccessful probe can still manage to trip up concurrent accesses to a device's fwspec. Thus, extend the scope of iommu_probe_device_lock() to also serialise fwspec creation and initial retrieval. Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/ Fixes: 01657bc14a39 ("iommu: Avoid races around device probe") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: André Draszik <andre.draszik@linaro.org> Tested-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/r/16f433658661d7cadfea51e7c65da95826112a2b.1700071477.git.robin.murphy@arm.com Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Flow ERR_PTR out from __iommu_domain_alloc()Jason Gunthorpe
Most of the calling code now has error handling that can carry an error code further up the call chain. Keep the exported interface iommu_domain_alloc() returning NULL and reflow the internal code to use ERR_PTR not NULL for domain allocation failure. Optionally allow drivers to return ERR_PTR from any of the alloc ops. Many of the new ops (user, sva, etc) already return ERR_PTR, so having two rules is confusing and hard on drivers. This fixes a bug in DART that was returning ERR_PTR. Fixes: 482feb5c6492 ("iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/linux-iommu/b85e0715-3224-4f45-ad6b-ebb9f08c015d@moroto.mountain/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/0-v2-55ae413017b8+97-domain_alloc_err_ptr_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu: Map reserved memory as cacheable if device is coherentLaurentiu Tudor
Check if the device is marked as DMA coherent in the DT and if so, map its reserved memory as cacheable in the IOMMU. This fixes the recently added IOMMU reserved memory support which uses IOMMU_RESV_DIRECT without properly building the PROT for the mapping. Fixes: a5bf3cfce8cb ("iommu: Implement of_iommu_get_resv_regions()") Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20230926152600.8749-1-laurentiu.tudor@nxp.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-21x86/apic: Drop apic::delivery_modeAndrew Cooper
This field is set to APIC_DELIVERY_MODE_FIXED in all cases, and is read exactly once. Fold the constant in uv_program_mmr() and drop the field. Searching for the origin of the stale HyperV comment reveals commit a31e58e129f7 ("x86/apic: Switch all APICs to Fixed delivery mode") which notes: As a consequence of this change, the apic::irq_delivery_mode field is now pointless, but this needs to be cleaned up in a separate patch. 6 years is long enough for this technical debt to have survived. [ bp: Fold in https://lore.kernel.org/r/20231121123034.1442059-1-andrew.cooper3@citrix.com ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20231102-x86-apic-v1-1-bf049a2a0ed6@citrix.com
2023-11-09Merge tag 'iommu-updates-v6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Make default-domains mandatory for all IOMMU drivers - Remove group refcounting - Add generic_single_device_group() helper and consolidate drivers - Cleanup map/unmap ops - Scaling improvements for the IOVA rcache depot - Convert dart & iommufd to the new domain_alloc_paging() ARM-SMMU: - Device-tree binding update: - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC - SMMUv2: - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs - SMMUv3: - Large refactoring of the context descriptor code to move the CD table into the master, paving the way for '->set_dev_pasid()' support on non-SVA domains - Minor cleanups to the SVA code Intel VT-d: - Enable debugfs to dump domain attached to a pasid - Remove an unnecessary inline function AMD IOMMU: - Initial patches for SVA support (not complete yet) S390 IOMMU: - DMA-API conversion and optimized IOTLB flushing And some smaller fixes and improvements" * tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits) iommu/dart: Remove the force_bypass variable iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging() iommu/dart: Convert to domain_alloc_paging() iommu/dart: Move the blocked domain support to a global static iommu/dart: Use static global identity domains iommufd: Convert to alloc_domain_paging() iommu/vt-d: Use ops->blocked_domain iommu/vt-d: Update the definition of the blocking domain iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain Revert "iommu/vt-d: Remove unused function" iommu/amd: Remove DMA_FQ type from domain allocation path iommu: change iommu_map_sgtable to return signed values iommu/virtio: Add __counted_by for struct viommu_request and use struct_size() iommu/vt-d: debugfs: Support dumping a specified page table iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid} iommu/vt-d: debugfs: Dump entry pointing to huge page iommu/vt-d: Remove unused function iommu/arm-smmu-v3-sva: Remove bond refcount iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle iommu/arm-smmu-v3: Rename cdcfg to cd_table ...
2023-11-01Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This brings three new iommufd capabilities: - Dirty tracking for DMA. AMD/ARM/Intel CPUs can now record if a DMA writes to a page in the IOPTEs within the IO page table. This can be used to generate a record of what memory is being dirtied by DMA activities during a VM migration process. A VMM like qemu will combine the IOMMU dirty bits with the CPU's dirty log to determine what memory to transfer. VFIO already has a DMA dirty tracking framework that requires PCI devices to implement tracking HW internally. The iommufd version provides an alternative that the VMM can select, if available. The two are designed to have very similar APIs. - Userspace controlled attributes for hardware page tables (HWPT/iommu_domain). There are currently a few generic attributes for HWPTs (support dirty tracking, and parent of a nest). This is an entry point for the userspace iommu driver to control the HW in detail. - Nested translation support for HWPTs. This is a 2D translation scheme similar to the CPU where a DMA goes through a first stage to determine an intermediate address which is then translated trough a second stage to a physical address. Like for CPU translation the first stage table would exist in VM controlled memory and the second stage is in the kernel and matches the VM's guest to physical map. As every IOMMU has a unique set of parameter to describe the S1 IO page table and its associated parameters the userspace IOMMU driver has to marshal the information into the correct format. This is 1/3 of the feature, it allows creating the nested translation and binding it to VFIO devices, however the API to support IOTLB and ATC invalidation of the stage 1 io page table, and forwarding of IO faults are still in progress. The series includes AMD and Intel support for dirty tracking. Intel support for nested translation. Along the way are a number of internal items: - New iommu core items: ops->domain_alloc_user(), ops->set_dirty_tracking, ops->read_and_clear_dirty(), IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user - UAF fix in iopt_area_split() - Spelling fixes and some test suite improvement" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (52 commits) iommufd: Organize the mock domain alloc functions closer to Joerg's tree iommufd/selftest: Fix page-size check in iommufd_test_dirty() iommufd: Add iopt_area_alloc() iommufd: Fix missing update of domains_itree after splitting iopt_area iommu/vt-d: Disallow read-only mappings to nest parent domain iommu/vt-d: Add nested domain allocation iommu/vt-d: Set the nested domain to a device iommu/vt-d: Make domain attach helpers to be extern iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add helper for nested domain allocation iommu/vt-d: Extend dmar_domain to support nested domain iommufd: Add data structure for Intel VT-d stage-1 domain allocation iommu/vt-d: Enhance capability check for nested parent domain allocation iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs iommufd/selftest: Add nested domain allocation for mock domain iommu: Add iommu_copy_struct_from_user helper iommufd: Add a nested HW pagetable object iommu: Pass in parent domain with user_data to domain_alloc_user op iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable ...
2023-11-01Merge tag 'asm-generic-6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ia64 removal and asm-generic updates from Arnd Bergmann: - The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. - The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. * tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: hexagon: Remove unusable symbols from the ptrace.h uapi asm-generic: Fix spelling of architecture arch: Reserve map_shadow_stack() syscall number for all architectures syscalls: Cleanup references to sys_lookup_dcookie() Documentation: Drop or replace remaining mentions of IA64 lib/raid6: Drop IA64 support Documentation: Drop IA64 from feature descriptions kernel: Drop IA64 support from sig_fault handlers arch: Remove Itanium (IA-64) architecture
2023-10-30iommufd: Organize the mock domain alloc functions closer to Joerg's treeJason Gunthorpe
Patches in Joerg's iommu tree to convert the mock driver to use domain_alloc_paging() that clash badly with the way the selftest changes for nesting were structured. Massage the selftest so that it looks closer the code after the domain_alloc_paging() conversion to ease the merge. Change __mock_domain_alloc_paging() into mock_domain_alloc_paging() in the same way as the iommu tree. The merge resolution then trivially takes both and deletes mock_domain_alloc(). Link: https://lore.kernel.org/r/0-v1-90a855762c96+19de-mock_merge_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-30iommufd/selftest: Fix page-size check in iommufd_test_dirty()Joao Martins
iommufd_test_dirty()/IOMMU_TEST_OP_DIRTY sets the dirty bits in the mock domain implementation that the userspace side validates against what it obtains via the UAPI. However in introducing iommufd_test_dirty() it forgot to validate page_size being 0 leading to two possible divide-by-zero problems: one at the beginning when calculating @max and while calculating the IOVA in the XArray PFN tracking list. While at it, validate the length to require non-zero value as well, as we can't be allocating a 0-sized bitmap. Link: https://lore.kernel.org/r/20231030113446.7056-1-joao.m.martins@oracle.com Reported-by: syzbot+25dc7383c30ecdc83c38@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-iommu/00000000000005f6aa0608b9220f@google.com/ Fixes: a9af47e382a4 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-30iommufd: Add iopt_area_alloc()Jason Gunthorpe
We never initialize the two interval tree nodes, and zero fill is not the same as RB_CLEAR_NODE. This can hide issues where we missed adding the area to the trees. Factor out the allocation and clear the two nodes. Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://lore.kernel.org/r/20231030145035.GG691768@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-30iommufd: Fix missing update of domains_itree after splitting iopt_areaKoichiro Den
In iopt_area_split(), if the original iopt_area has filled a domain and is linked to domains_itree, pages_nodes have to be properly reinserted. Otherwise the domains_itree becomes corrupted and we will UAF. Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://lore.kernel.org/r/20231027162941.2864615-2-den@valinux.co.jp Cc: stable@vger.kernel.org Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-27Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', ↵Joerg Roedel
'x86/amd', 'core' and 's390' into next
2023-10-27iommu: Avoid unnecessary cache invalidationsLu Baolu
The iommu_create_device_direct_mappings() only needs to flush the caches when the mappings are changed in the affected domain. This is not true for non-DMA domains, or for devices attached to the domain that have no reserved regions. To avoid unnecessary cache invalidations, add a check before iommu_flush_iotlb_all(). Fixes: a48ce36e2786 ("iommu: Prevent RESV_DIRECT devices from blocking domains") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Henry Willard <henry.willard@oracle.com> Link: https://lore.kernel.org/r/20231026084942.17387-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26Merge tag 'v6.6-rc7' into coreJoerg Roedel
Linux 6.6-rc7
2023-10-26iommu/dart: Remove the force_bypass variableJason Gunthorpe
This flag just caches if the IO page size is larger than the CPU PAGE_SIZE. This only needs to be checked in two places so remove the confusingly named cache. dart would like to not support paging domains at all if the IO page size is larger than the CPU page size. In this case we should ideally fail domain_alloc_paging(), as there is no point in creating a domain that can never be attached. Move the test into apple_dart_finalize_domain(). The check in apple_dart_mod_streams() will prevent the domain from being attached to the wrong dart There is no HW limitation that prevents BLOCKED domains from working, remove that test. The check in apple_dart_of_xlate() is redundant since immediately after the pgsize is checked. Remove it. Remove the variable. Suggested-by: Janne Grunau <j@jannau.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Janne Grunau <j@jannau.net> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/9-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()Jason Gunthorpe
In many cases the dev argument will now be !NULL so we should use it to finalize the domain at allocation. Make apple_dart_finalize_domain() accept the correct type. Reviewed-by: Janne Grunau <j@jannau.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/8-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/dart: Convert to domain_alloc_paging()Jason Gunthorpe
Since the IDENTITY and BLOCKED behaviors were moved to global statics all that remains is the paging domain. Rename to apple_dart_attach_dev_paging() and remove the left over type check. Reviewed-by: Janne Grunau <j@jannau.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/7-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/dart: Move the blocked domain support to a global staticJason Gunthorpe
Move to the new static global for blocked domains. Move the blocked specific code to apple_dart_attach_dev_blocked(). Reviewed-by: Janne Grunau <j@jannau.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/6-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/dart: Use static global identity domainsJason Gunthorpe
Move to the new static global for identity domains. Move the identity specific code to apple_dart_attach_dev_identity(). Reviewed-by: Janne Grunau <j@jannau.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/5-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommufd: Convert to alloc_domain_paging()Jason Gunthorpe
Move the global static blocked domain to the ops and convert the unmanaged domain to domain_alloc_paging. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/4-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/vt-d: Use ops->blocked_domainJason Gunthorpe
Trivially migrate to the ops->blocked_domain for the existing global static. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/3-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/vt-d: Update the definition of the blocking domainJason Gunthorpe
The global static should pre-define the type and the NOP free function can be now left as NULL. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/2-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domainJason Gunthorpe
Following the pattern of identity domains, just assign the BLOCKED domain global statics to a value in ops. Update the core code to use the global static directly. Update powerpc to use the new scheme and remove its empty domain_alloc callback. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/1-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/vt-d: Disallow read-only mappings to nest parent domainLu Baolu
When remapping hardware is configured by system software in scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled) in first-stage page-table entries even when second-stage mappings indicate that corresponding first-stage page-table is Read-Only. As the result, contents of pages designated by VMM as Read-Only can be modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of address translation process due to DMAs issued by Guest. This disallows read-only mappings in the domain that is supposed to be used as nested parent. Reference from Sapphire Rapids Specification Update [1], errata details, SPR17. Userspace should know this limitation by checking the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO ioctl. [1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html Link: https://lore.kernel.org/r/20231026044216.64964-9-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Add nested domain allocationLu Baolu
This adds the support for IOMMU_HWPT_DATA_VTD_S1 type. And 'nested_parent' is added to mark the nested parent domain to sanitize the input parent domain. Link: https://lore.kernel.org/r/20231026044216.64964-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Set the nested domain to a deviceYi Liu
This adds the helper for setting the nested domain to a device hence enable nested domain usage on Intel VT-d. Link: https://lore.kernel.org/r/20231026044216.64964-7-yi.l.liu@intel.com Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>