summaryrefslogtreecommitdiff
path: root/drivers/iommu/intel
AgeCommit message (Collapse)Author
2024-02-21iommu/vt-d: Fix constant-out-of-range warningArnd Bergmann
On 32-bit builds, the vt-d driver causes a warning with clang: drivers/iommu/intel/nested.c:112:13: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned long' is always false [-Werror,-Wtautological-constant-out-of-range-compare] 112 | if (npages == U64_MAX) | ~~~~~~ ^ ~~~~~~~ Make the variable a 64-bit type, which matches both the caller and the use anyway. Fixes: f6f3721244a8 ("iommu/vt-d: Add iotlb flush for nested domain") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240213095832.455245-1-arnd@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Set SSADE when attaching to a parent with dirty trackingYi Liu
Should set the SSADE (Second Stage Access/Dirty bit Enable) bit of the pasid entry when attaching a device to a nested domain if its parent has already enabled dirty tracking. Fixes: 111bf85c68f6 ("iommu/vt-d: Add helper to setup pasid nested translation") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20240208091414.28133-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add missing dirty tracking set for parent domainYi Liu
Setting dirty tracking for a s2 domain requires to loop all the related devices and set the dirty tracking enable bit in the PASID table entry. This includes the devices that are attached to the nested domains of a s2 domain if this s2 domain is used as parent. However, the existing dirty tracking set only loops s2 domain's own devices. It will miss dirty page logs in the parent domain. Now, the parent domain tracks the nested domains, so it can loop the nested domains and the devices attached to the nested domains to ensure dirty tracking on the parent is set completely. Fixes: b41e38e22539 ("iommu/vt-d: Add nested domain allocation") Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-9-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Wrap the dirty tracking loop to be a helperYi Liu
Add device_set_dirty_tracking() to loop all the devices and set the dirty tracking per the @enable parameter. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20240208082307.15759-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()Yi Liu
The only usage of input @domain is to get the domain id (DID) to flush cache after setting dirty tracking. However, DID can be obtained from the pasid entry. So no need to pass in domain. This can make this helper cleaner when adding the missing dirty tracking for the parent domain, which needs to use the DID of nested domain. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-7-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add missing device iotlb flush for parent domainYi Liu
ATS-capable devices cache the result of nested translation. This result relies on the mappings in s2 domain (a.k.a. parent). When there are modifications in the s2 domain, the related nested translation caches on the device should be flushed. This includes the devices that are attached to the s1 domain. However, the existing code ignores this fact to only loops its own devices. As there is no easy way to identify the exact set of nested translations affected by the change of s2 domain. So, this just flushes the entire device iotlb on the device. As above, driver loops the s2 domain's s1_domains list and loops the devices list of each s1_domain to flush the entire device iotlb on the devices. Fixes: b41e38e22539 ("iommu/vt-d: Add nested domain allocation") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-6-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Update iotlb in nested domain attachYi Liu
Should call domain_update_iotlb() to update the has_iotlb_device flag of the domain after attaching device to nested domain. Without it, this flag is not set properly and would result in missing device TLB flush. Fixes: 9838f2bb6b6b ("iommu/vt-d: Set the nested domain to a device") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-5-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add missing iotlb flush for parent domainYi Liu
If a domain is used as the parent in nested translation its mappings might be cached using DID of the nested domain. But the existing code ignores this fact to only invalidate the iotlb entries tagged by the domain's own DID. Loop the s1_domains list, if any, to invalidate all iotlb entries related to the target s2 address range. According to VT-d spec there is no need for software to explicitly flush the affected s1 cache. It's implicitly done by HW when s2 cache is invalidated. Fixes: b41e38e22539 ("iommu/vt-d: Add nested domain allocation") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-4-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add __iommu_flush_iotlb_psi()Yi Liu
Add __iommu_flush_iotlb_psi() to do the psi iotlb flush with a DID input rather than calculating it within the helper. This is useful when flushing cache for parent domain which reuses DIDs of its nested domains. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-3-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Track nested domains in parentYi Liu
Today the parent domain (s2_domain) is unaware of which DID's are used by and which devices are attached to nested domains (s1_domain) nested on it. This leads to a problem that some operations (flush iotlb/devtlb and enable dirty tracking) on parent domain only apply to DID's and devices directly tracked in the parent domain hence are incomplete. This tracks the nested domains in list in parent domain. With this, operations on parent domain can loop the nested domains and refer to the devices and iommu_array to ensure the operations on parent domain take effect on all the affected devices and iommus. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-01-18Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This brings the first of three planned user IO page table invalidation operations: - IOMMU_HWPT_INVALIDATE allows invalidating the IOTLB integrated into the iommu itself. The Intel implementation will also generate an ATC invalidation to flush the device IOTLB as it unambiguously knows the device, but other HW will not. It goes along with the prior PR to implement userspace IO page tables (aka nested translation for VMs) to allow Intel to have full functionality for simple cases. An Intel implementation of the operation is provided. Also fix a small bug in the selftest mock iommu driver probe" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd/selftest: Check the bus type during probe iommu/vt-d: Add iotlb flush for nested domain iommufd: Add data structure for Intel VT-d stage-1 cache invalidation iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add mock_domain_cache_invalidate_user support iommu: Add iommu_copy_struct_from_user_array helper iommufd: Add IOMMU_HWPT_INVALIDATE iommu: Add cache_invalidate_user op
2024-01-18Merge tag 'iommu-updates-v6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Fix race conditions in device probe path - Retire IOMMU bus_ops - Support for passing custom allocators to page table drivers - Clean up Kconfig around IOMMU_SVA - Support for sharing SVA domains with all devices bound to a mm - Firmware data parsing cleanup - Tracing improvements for iommu-dma code - Some smaller fixes and cleanups ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Document Adreno clocks for Qualcomm's SM8350 SoC - SMMUv2: - Implement support for the ->domain_alloc_paging() callback - Ensure Secure context is restored following suspend of Qualcomm SMMU implementation - SMMUv3: - Disable stalling mode for the "quiet" context descriptor - Minor refactoring and driver cleanups Intel VT-d driver: - Cleanup and refactoring AMD IOMMU driver: - Improve IO TLB invalidation logic - Small cleanups and improvements Rockchip IOMMU driver: - DT binding update to add Rockchip RK3588 Apple DART driver: - Apple M1 USB4/Thunderbolt DART support - Cleanups Virtio IOMMU driver: - Add support for iotlb_sync_map - Enable deferred IO TLB flushes" * tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits) iommu: Don't reserve 0-length IOVA region iommu/vt-d: Move inline helpers to header files iommu/vt-d: Remove unused vcmd interfaces iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through() iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly iommu/sva: Fix memory leak in iommu_sva_bind_device() dt-bindings: iommu: rockchip: Add Rockchip RK3588 iommu/dma: Trace bounce buffer usage when mapping buffers iommu/arm-smmu: Convert to domain_alloc_paging() iommu/arm-smmu: Pass arm_smmu_domain to internal functions iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKED iommu/arm-smmu: Convert to a global static identity domain iommu/arm-smmu: Reorganize arm_smmu_domain_add_master() iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent() iommu/arm-smmu-v3: Add a type for the STE iommu/arm-smmu-v3: disable stall for quiet_cd iommu/qcom: restore IOMMU state if needed iommu/arm-smmu-qcom: Add QCM2290 MDSS compatible iommu/arm-smmu-qcom: Add missing GMU entry to match table ...
2024-01-11iommu/vt-d: Add iotlb flush for nested domainLu Baolu
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain. Link: https://lore.kernel.org/r/20240111041015.47920-9-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-01-08Merge tag 'x86-apic-2024-01-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: - Clean up 'struct apic': - Drop ::delivery_mode - Drop 'enum apic_delivery_modes' - Drop 'struct local_apic' - Fix comments * tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Remove unfinished sentence from comment x86/apic: Drop struct local_apic x86/apic: Drop enum apic_delivery_modes x86/apic: Drop apic::delivery_mode
2024-01-03Merge branches 'apple/dart', 'arm/rockchip', 'arm/smmu', 'virtio', ↵Joerg Roedel
'x86/vt-d', 'x86/amd' and 'core' into next
2023-12-19iommu/vt-d: Move inline helpers to header filesLu Baolu
Move inline helpers to header files so that other files can use them without duplicating the code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Remove unused vcmd interfacesLu Baolu
Commit 99b5726b4423 ("iommu: Remove ioasid infrastructure") has removed ioasid allocation interfaces from the iommu subsystem. As a result, these vcmd interfaces have become obsolete. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through()Lu Baolu
The domain parameter of this helper is unused and can be deleted to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directlyLu Baolu
The device_to_iommu() helper was originally designed to look up the DMAR ACPI table to retrieve the iommu device and the request ID for a given device. However, it was also being used in other places where there was no need to lookup the ACPI table at all. Retrieve the iommu device directly from the per-device iommu private data in functions called after device is probed. Rename the original device_to_iommu() function to a more meaningful name, device_lookup_iommu(), to avoid mis-using it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12iommu: Mark dev_iommu_priv_set() with a lockdepJason Gunthorpe
A perfect driver would only call dev_iommu_priv_set() from its probe callback. We've made it functionally correct to call it from the of_xlate by adding a lock around that call. lockdep assert that iommu_probe_device_lock is held to discourage misuse. Exclude PPC kernels with CONFIG_FSL_PAMU turned on because FSL_PAMU uses a global static for its priv and abuses priv for its domain. Remove the pointless stores of NULL, all these are on paths where the core code will free dev->iommu after the op returns. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Hector Martin <marcan@marcan.st> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-16e4def25ebb+820-iommu_fwspec_p1_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12iommu/vt-d: Remove mm->pasid in intel_sva_bind_mm()Tina Zhang
The pasid is passed in as a parameter through .set_dev_pasid() callback. Thus, intel_sva_bind_mm() can directly use it instead of retrieving the pasid value from mm->pasid. Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231027000525.1278806-3-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Set variable intel_dirty_ops to staticKunwu Chan
Fix the following warning: drivers/iommu/intel/iommu.c:302:30: warning: symbol 'intel_dirty_ops' was not declared. Should it be static? This variable is only used in its defining file, so it should be static. Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains") Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20231120101025.1103404-1-chentao@kylinos.cn Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Fix incorrect cache invalidation for mm notificationLu Baolu
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space. However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly. Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly. Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc: stable@vger.kernel.org Cc: Huang Ying <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Tested-by: Luo Yuzhang <yuzhang.luo@intel.com> # QAT Tested-by: Tony Zhu <tony.zhu@intel.com> # DSA Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231117090933.75267-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Add MTL to quirk list to skip TE disablingAbdul Halim, Mohd Syazwan
The VT-d spec requires (10.4.4 Global Command Register, TE field) that: Hardware implementations supporting DMA draining must drain any in-flight DMA read/write requests queued within the Root-Complex before switching address translation on or off and reflecting the status of the command through the TES field in the Global Status register. Unfortunately, some integrated graphic devices fail to do so after some kind of power state transition. As the result, the system might stuck in iommu_disable_translation(), waiting for the completion of TE transition. Add MTL to the quirk list for those devices and skips TE disabling if the qurik hits. Fixes: b1012ca8dc4f ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") Cc: stable@vger.kernel.org Signed-off-by: Abdul Halim, Mohd Syazwan <mohd.syazwan.abdul.halim@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116022324.30120-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Make context clearing consistent with context mappingLu Baolu
In the iommu probe_device path, domain_context_mapping() allows setting up the context entry for a non-PCI device. However, in the iommu release_device path, domain_context_clear() only clears context entries for PCI devices. Make domain_context_clear() behave consistently with domain_context_mapping() by clearing context entries for both PCI and non-PCI devices. Fixes: 579305f75d34 ("iommu/vt-d: Update to use PCI DMA aliases") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Disable PCI ATS in legacy passthrough modeLu Baolu
When IOMMU hardware operates in legacy mode, the TT field of the context entry determines the translation type, with three supported types (Section 9.3 Context Entry): - DMA translation without device TLB support - DMA translation with device TLB support - Passthrough mode with translated and translation requests blocked Device TLB support is absent when hardware is configured in passthrough mode. Disable the PCI ATS feature when IOMMU is configured for passthrough translation type in legacy (non-scalable) mode. Fixes: 0faa19a1515f ("iommu/vt-d: Decouple PASID & PRI enabling from SVA") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Omit devTLB invalidation requests when TES=0Lu Baolu
The latest VT-d spec indicates that when remapping hardware is disabled (TES=0 in Global Status Register), upstream ATS Invalidation Completion requests are treated as UR (Unsupported Request). Consequently, the spec recommends in section 4.3 Handling of Device-TLB Invalidations that software refrain from submitting any Device-TLB invalidation requests when address remapping hardware is disabled. Verify address remapping hardware is enabled prior to submitting Device- TLB invalidation requests. Fixes: 792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Support enforce_cache_coherency only for empty domainsLu Baolu
The enforce_cache_coherency callback ensures DMA cache coherency for devices attached to the domain. Intel IOMMU supports enforced DMA cache coherency when the Snoop Control bit in the IOMMU's extended capability register is set. Supporting it differs between legacy and scalable modes. In legacy mode, it's supported page-level by setting the SNP field in second-stage page-table entries. In scalable mode, it's supported in PASID-table granularity by setting the PGSNP field in PASID-table entries. In legacy mode, mappings before attaching to a device have SNP fields cleared, while mappings after the callback have them set. This means partial DMAs are cache coherent while others are not. One possible fix is replaying mappings and flipping SNP bits when attaching a domain to a device. But this seems to be over-engineered, given that all real use cases just attach an empty domain to a device. To meet practical needs while reducing mode differences, only support enforce_cache_coherency on a domain without mappings if SNP field is used. Fixes: fc0051cb9590 ("iommu/vt-d: Check domain force_snooping against attached devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-21x86/apic: Drop apic::delivery_modeAndrew Cooper
This field is set to APIC_DELIVERY_MODE_FIXED in all cases, and is read exactly once. Fold the constant in uv_program_mmr() and drop the field. Searching for the origin of the stale HyperV comment reveals commit a31e58e129f7 ("x86/apic: Switch all APICs to Fixed delivery mode") which notes: As a consequence of this change, the apic::irq_delivery_mode field is now pointless, but this needs to be cleaned up in a separate patch. 6 years is long enough for this technical debt to have survived. [ bp: Fold in https://lore.kernel.org/r/20231121123034.1442059-1-andrew.cooper3@citrix.com ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20231102-x86-apic-v1-1-bf049a2a0ed6@citrix.com
2023-11-09Merge tag 'iommu-updates-v6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Make default-domains mandatory for all IOMMU drivers - Remove group refcounting - Add generic_single_device_group() helper and consolidate drivers - Cleanup map/unmap ops - Scaling improvements for the IOVA rcache depot - Convert dart & iommufd to the new domain_alloc_paging() ARM-SMMU: - Device-tree binding update: - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC - SMMUv2: - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs - SMMUv3: - Large refactoring of the context descriptor code to move the CD table into the master, paving the way for '->set_dev_pasid()' support on non-SVA domains - Minor cleanups to the SVA code Intel VT-d: - Enable debugfs to dump domain attached to a pasid - Remove an unnecessary inline function AMD IOMMU: - Initial patches for SVA support (not complete yet) S390 IOMMU: - DMA-API conversion and optimized IOTLB flushing And some smaller fixes and improvements" * tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits) iommu/dart: Remove the force_bypass variable iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging() iommu/dart: Convert to domain_alloc_paging() iommu/dart: Move the blocked domain support to a global static iommu/dart: Use static global identity domains iommufd: Convert to alloc_domain_paging() iommu/vt-d: Use ops->blocked_domain iommu/vt-d: Update the definition of the blocking domain iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain Revert "iommu/vt-d: Remove unused function" iommu/amd: Remove DMA_FQ type from domain allocation path iommu: change iommu_map_sgtable to return signed values iommu/virtio: Add __counted_by for struct viommu_request and use struct_size() iommu/vt-d: debugfs: Support dumping a specified page table iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid} iommu/vt-d: debugfs: Dump entry pointing to huge page iommu/vt-d: Remove unused function iommu/arm-smmu-v3-sva: Remove bond refcount iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle iommu/arm-smmu-v3: Rename cdcfg to cd_table ...
2023-11-01Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This brings three new iommufd capabilities: - Dirty tracking for DMA. AMD/ARM/Intel CPUs can now record if a DMA writes to a page in the IOPTEs within the IO page table. This can be used to generate a record of what memory is being dirtied by DMA activities during a VM migration process. A VMM like qemu will combine the IOMMU dirty bits with the CPU's dirty log to determine what memory to transfer. VFIO already has a DMA dirty tracking framework that requires PCI devices to implement tracking HW internally. The iommufd version provides an alternative that the VMM can select, if available. The two are designed to have very similar APIs. - Userspace controlled attributes for hardware page tables (HWPT/iommu_domain). There are currently a few generic attributes for HWPTs (support dirty tracking, and parent of a nest). This is an entry point for the userspace iommu driver to control the HW in detail. - Nested translation support for HWPTs. This is a 2D translation scheme similar to the CPU where a DMA goes through a first stage to determine an intermediate address which is then translated trough a second stage to a physical address. Like for CPU translation the first stage table would exist in VM controlled memory and the second stage is in the kernel and matches the VM's guest to physical map. As every IOMMU has a unique set of parameter to describe the S1 IO page table and its associated parameters the userspace IOMMU driver has to marshal the information into the correct format. This is 1/3 of the feature, it allows creating the nested translation and binding it to VFIO devices, however the API to support IOTLB and ATC invalidation of the stage 1 io page table, and forwarding of IO faults are still in progress. The series includes AMD and Intel support for dirty tracking. Intel support for nested translation. Along the way are a number of internal items: - New iommu core items: ops->domain_alloc_user(), ops->set_dirty_tracking, ops->read_and_clear_dirty(), IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user - UAF fix in iopt_area_split() - Spelling fixes and some test suite improvement" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (52 commits) iommufd: Organize the mock domain alloc functions closer to Joerg's tree iommufd/selftest: Fix page-size check in iommufd_test_dirty() iommufd: Add iopt_area_alloc() iommufd: Fix missing update of domains_itree after splitting iopt_area iommu/vt-d: Disallow read-only mappings to nest parent domain iommu/vt-d: Add nested domain allocation iommu/vt-d: Set the nested domain to a device iommu/vt-d: Make domain attach helpers to be extern iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add helper for nested domain allocation iommu/vt-d: Extend dmar_domain to support nested domain iommufd: Add data structure for Intel VT-d stage-1 domain allocation iommu/vt-d: Enhance capability check for nested parent domain allocation iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs iommufd/selftest: Add nested domain allocation for mock domain iommu: Add iommu_copy_struct_from_user helper iommufd: Add a nested HW pagetable object iommu: Pass in parent domain with user_data to domain_alloc_user op iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable ...
2023-11-01Merge tag 'asm-generic-6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ia64 removal and asm-generic updates from Arnd Bergmann: - The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. - The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. * tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: hexagon: Remove unusable symbols from the ptrace.h uapi asm-generic: Fix spelling of architecture arch: Reserve map_shadow_stack() syscall number for all architectures syscalls: Cleanup references to sys_lookup_dcookie() Documentation: Drop or replace remaining mentions of IA64 lib/raid6: Drop IA64 support Documentation: Drop IA64 from feature descriptions kernel: Drop IA64 support from sig_fault handlers arch: Remove Itanium (IA-64) architecture
2023-10-27Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', ↵Joerg Roedel
'x86/amd', 'core' and 's390' into next
2023-10-26Merge tag 'v6.6-rc7' into coreJoerg Roedel
Linux 6.6-rc7
2023-10-26iommu/vt-d: Use ops->blocked_domainJason Gunthorpe
Trivially migrate to the ops->blocked_domain for the existing global static. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/3-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/vt-d: Update the definition of the blocking domainJason Gunthorpe
The global static should pre-define the type and the NOP free function can be now left as NULL. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/2-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu/vt-d: Disallow read-only mappings to nest parent domainLu Baolu
When remapping hardware is configured by system software in scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended Access bit if enabled) in first-stage page-table entries even when second-stage mappings indicate that corresponding first-stage page-table is Read-Only. As the result, contents of pages designated by VMM as Read-Only can be modified by IOMMU via PML5E (PML4E for 4-level tables) access as part of address translation process due to DMAs issued by Guest. This disallows read-only mappings in the domain that is supposed to be used as nested parent. Reference from Sapphire Rapids Specification Update [1], errata details, SPR17. Userspace should know this limitation by checking the IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 flag reported in the IOMMU_GET_HW_INFO ioctl. [1] https://www.intel.com/content/www/us/en/content-details/772415/content-details.html Link: https://lore.kernel.org/r/20231026044216.64964-9-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Add nested domain allocationLu Baolu
This adds the support for IOMMU_HWPT_DATA_VTD_S1 type. And 'nested_parent' is added to mark the nested parent domain to sanitize the input parent domain. Link: https://lore.kernel.org/r/20231026044216.64964-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Set the nested domain to a deviceYi Liu
This adds the helper for setting the nested domain to a device hence enable nested domain usage on Intel VT-d. Link: https://lore.kernel.org/r/20231026044216.64964-7-yi.l.liu@intel.com Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Make domain attach helpers to be externYi Liu
This makes the helpers visible to nested.c. Link: https://lore.kernel.org/r/20231026044216.64964-6-yi.l.liu@intel.com Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Add helper to setup pasid nested translationLu Baolu
The configurations are passed in from the user when the user domain is allocated. This helper interprets these configurations according to the data structure defined in uapi/linux/iommufd.h. The EINVAL error will be returned if any of configurations are not compatible with the hardware capabilities. The caller can retry with another compatible user domain. The encoding of fields of each pasid entry is defined in section 9.6 of the VT-d spec. Link: https://lore.kernel.org/r/20231026044216.64964-5-yi.l.liu@intel.com Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Add helper for nested domain allocationLu Baolu
This adds helper for accepting user parameters and allocate a nested domain. Link: https://lore.kernel.org/r/20231026044216.64964-4-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Extend dmar_domain to support nested domainLu Baolu
The nested domain fields are exclusive to those that used for a DMA remapping domain. Use union to avoid memory waste. Link: https://lore.kernel.org/r/20231026044216.64964-3-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Enhance capability check for nested parent domain allocationYi Liu
This adds the scalable mode check before allocating the nested parent domain as checking nested capability is not enough. User may turn off scalable mode which also means no nested support even if the hardware supports it. Fixes: c97d1b20d383 ("iommu/vt-d: Add domain_alloc_user op") Link: https://lore.kernel.org/r/20231024150011.44642-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu: Pass in parent domain with user_data to domain_alloc_user opYi Liu
domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well. The user data would be tagged with a type for iommu drivers to add their own driver specific user data per hw_pagetable. Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well. Link: https://lore.kernel.org/r/20231026043938.63898-7-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-25Revert "iommu/vt-d: Remove unused function"Lu Baolu
This reverts commit c61c255e114c52682546447ed44d3470b5708134. The pasid_set_wpe() helper, which was removed by the reverted commit, is still used by the nesting translation support in the iommufd tree. To avoid a merge conflict, revert the commit. Link: https://lore.kernel.org/linux-kernel/20231025153455.283c5b12@canb.auug.org.au/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231025131854.375388-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-24iommu/vt-d: Access/Dirty bit support for SS domainsJoao Martins
IOMMU advertises Access/Dirty bits for second-stage page table if the extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS). The first stage table is compatible with CPU page table thus A/D bits are implicitly supported. Relevant Intel IOMMU SDM ref for first stage table "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table "3.7.2 Accessed and Dirty Flags". First stage page table is enabled by default so it's allowed to set dirty tracking and no control bits needed, it just returns 0. To use SSADS, set bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB via pasid_flush_caches() following the manual. Relevant SDM refs: "3.7.2 Accessed and Dirty Flags" "6.5.3.3 Guidance to Software for Invalidations, Table 23. Guidance to Software for Invalidations" PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the caller of the iommu op will flush the IOTLB. Relevant manuals over the hardware translation is chapter 6 with some special mention to: "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations" "6.2.4 IOTLB" Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-16iommu/vt-d: debugfs: Support dumping a specified page tableJingqi Liu
The original debugfs only dumps all page tables without pasid. With pasid supported, the page table with pasid also needs to be dumped. This patch supports dumping a specified page table in legacy mode or scalable mode with or without a specified pasid. For legacy mode, according to bus number and DEVFN, traverse the root table and context table to get the pointer of page table in the context table entry, then dump the specified page table. For scalable mode, according to bus number, DEVFN and pasid, traverse the root table, context table, pasid directory and pasid table to get the pointer of page table in the pasid table entry, then dump the specified page table.. Examples are as follows: 1) Dump the page table of device "0000:00:1f.0" that only supports legacy mode. $ sudo cat /sys/kernel/debug/iommu/intel/0000:00:1f.0/domain_translation_struct 2) Dump the page table of device "0000:00:0a.0" with PASID "1" that supports scalable mode. $ sudo cat /sys/kernel/debug/iommu/intel/0000:00:0a.0/1/domain_translation_struct Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-4-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-16iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}Jingqi Liu
Add a debugfs directory per pair of {device, pasid} if the mappings of its page table are created and destroyed by the iommu_map/unmap() interfaces. i.e. /sys/kernel/debug/iommu/intel/<device source id>/<pasid>. Create a debugfs file in the directory for users to dump the page table corresponding to {device, pasid}. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct. For the default domain without pasid, it creates a debugfs file in the debugfs device directory for users to dump its page table. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct. When setting a domain to a PASID of device, create a debugfs file in the pasid debugfs directory for users to dump the page table of the specified pasid. Remove the debugfs device directory of the device when releasing a device. e.g. /sys/kernel/debug/iommu/intel/0000:00:01.0 Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-3-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-16iommu/vt-d: debugfs: Dump entry pointing to huge pageJingqi Liu
For the page table entry pointing to a huge page, the data below the level of the huge page is meaningless and does not need to be dumped. Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-2-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>