summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-24Merge branch 'ti/omap' into nextWill Deacon
* ti/omap: iommu/omap: Use syscon_regmap_lookup_by_phandle_args iommu/omap: Drop redundant check if ti,syscon-mmuconfig exists
2025-07-24Merge branch 'mediatek' into nextWill Deacon
* mediatek: iommu/mediatek-v1: Tidy up probe_finalize
2025-07-24Merge branch 'amd/amd-vi' into nextWill Deacon
* amd/amd-vi: iommu/amd: Fix geometry.aperture_end for V2 tables iommu/amd: Wrap debugfs ABI testing symbols snippets in literal code blocks iommu/amd: Add documentation for AMD IOMMU debugfs support iommu/amd: Add debugfs support to dump IRT Table iommu/amd: Add debugfs support to dump device table iommu/amd: Add support for device id user input iommu/amd: Add debugfs support to dump IOMMU command buffer iommu/amd: Add debugfs support to dump IOMMU Capability registers iommu/amd: Add debugfs support to dump IOMMU MMIO registers iommu/amd: Refactor AMD IOMMU debugfs initial setup iommu/amd: Enable PASID and ATS capabilities in the correct order iommu/amd: Add efr[HATS] max v1 page table level iommu/amd: Add HATDis feature support
2025-07-24Merge branch 'intel/vt-d' into nextWill Deacon
* intel/vt-d: iommu/vt-d: Fix UAF on sva unbind with pending IOPFs iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain iommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_range iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all iommu/vt-d: Split paging_domain_compatible() iommu/vt-d: Split intel_iommu_enforce_cache_coherency() iommu/vt-d: Create unique domain ops for each stage iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags() iommu/vt-d: Do not wipe out the page table NID when devices detach iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free() iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook
2025-07-24Merge branch 'samsung/exynos' into nextWill Deacon
* samsung/exynos: iommu/exynos: add support for reserved regions
2025-07-24Merge branch 'core' into nextWill Deacon
* core: iommu/qcom: Fix pgsize_bitmap iommu/intel: Convert to msi_create_parent_irq_domain() helper iommu/amd: Convert to msi_create_parent_irq_domain() helper iommu: Remove ops->pgsize_bitmap iommu/msm: Remove ops->pgsize_bitmap iommu/qcom: Remove iommu_ops pgsize_bitmap iommu/mtk: Remove iommu_ops pgsize_bitmap iommu: Remove iommu_ops pgsize_bitmap from simple drivers iommu: Remove ops.pgsize_bitmap from drivers that don't use it iommu/arm-smmu: Remove iommu_ops pgsize_bitmap qiommu/arm-smmu-v3: Remove iommu_ops pgsize_bitmap
2025-07-23iommu/vt-d: Fix UAF on sva unbind with pending IOPFsLu Baolu
Commit 17fce9d2336d ("iommu/vt-d: Put iopf enablement in domain attach path") disables IOPF on device by removing the device from its IOMMU's IOPF queue when the last IOPF-capable domain is detached from the device. Unfortunately, it did this in a wrong place where there are still pending IOPFs. As a result, a use-after-free error is potentially triggered and eventually a kernel panic with a kernel trace similar to the following: refcount_t: underflow; use-after-free. WARNING: CPU: 3 PID: 313 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0 Workqueue: iopf_queue/dmar0-iopfq iommu_sva_handle_iopf Call Trace: <TASK> iopf_free_group+0xe/0x20 process_one_work+0x197/0x3d0 worker_thread+0x23a/0x350 ? rescuer_thread+0x4a0/0x4a0 kthread+0xf8/0x230 ? finish_task_switch.isra.0+0x81/0x260 ? kthreads_online_cpu+0x110/0x110 ? kthreads_online_cpu+0x110/0x110 ret_from_fork+0x13b/0x170 ? kthreads_online_cpu+0x110/0x110 ret_from_fork_asm+0x11/0x20 </TASK> ---[ end trace 0000000000000000 ]--- The intel_pasid_tear_down_entry() function is responsible for blocking hardware from generating new page faults and flushing all in-flight ones. Therefore, moving iopf_for_domain_remove() after this function should resolve this. Fixes: 17fce9d2336d ("iommu/vt-d: Put iopf enablement in domain attach path") Reported-by: Ethan Milon <ethan.milon@eviden.com> Closes: https://lore.kernel.org/r/e8b37f3e-8539-40d4-8993-43a1f3ffe5aa@eviden.com Suggested-by: Ethan Milon <ethan.milon@eviden.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250723072045.1853328-1-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-21iommu/vt-d: Make iotlb_sync_map a static property of dmar_domainLu Baolu
Commit 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes") dynamically set iotlb_sync_map. This causes synchronization issues due to lack of locking on map and attach paths, racing iommufd userspace operations. Invalidation changes must precede device attachment to ensure all flushes complete before hardware walks page tables, preventing coherence issues. Make domain->iotlb_sync_map static, set once during domain allocation. If an IOMMU requires iotlb_sync_map but the domain lacks it, attach is rejected. This won't reduce domain sharing: RWBF and shadowing page table caching are legacy uses with legacy hardware. Mixed configs (some IOMMUs in caching mode, others not) are unlikely in real-world scenarios. Fixes: 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250721051657.1695788-1-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-17iommu/amd: Fix geometry.aperture_end for V2 tablesJason Gunthorpe
The AMD IOMMU documentation seems pretty clear that the V2 table follows the normal CPU expectation of sign extension. This is shown in Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which would have three regions in the page table: lower, non-canonical, upper. The manual describes that the V1 table does not sign extend in section 2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA Further, Vasant has checked this and indicates the HW has an addtional behavior that the manual does not yet describe. The AMDv2 table does not have the sign extended behavior when attached to PASID 0, which may explain why this has gone unnoticed. The iommu domain geometry does not directly support sign extended page tables. The driver should report only one of the lower/upper spaces. Solve this by removing the top VA bit from the geometry to use only the lower space. This will also make the iommu_domain work consistently on all PASID 0 and PASID != 1. Adjust dma_max_address() to remove the top VA bit. It now returns: 5 Level: Before 0x1ffffffffffffff After 0x0ffffffffffffff 4 Level: Before 0xffffffffffff After 0x7fffffffffff Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address") Link: https://lore.kernel.org/all/8858d4d6-d360-4ef0-935c-bfd13ea54f42@amd.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v2-0615cc99b88a+1ce-amdv2_geo_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-17iommu/amd: Wrap debugfs ABI testing symbols snippets in literal code blocksBagas Sanjaya
Commit 39215bb3b0d929 ("iommu/amd: Add documentation for AMD IOMMU debugfs support") documents debugfs ABI symbols for AMD IOMMU, but forgets to wrap examples snippets and their output in literal code blocks, hence Sphinx reports indentation warnings: Documentation/ABI/testing/debugfs-amd-iommu:31: ERROR: Unexpected indentation. [docutils] Documentation/ABI/testing/debugfs-amd-iommu:31: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils] Documentation/ABI/testing/debugfs-amd-iommu:31: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils] Wrap them to fix the warnings. Fixes: 39215bb3b0d9 ("iommu/amd: Add documentation for AMD IOMMU debugfs support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250716204207.73869849@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250717010331.8941-1-bagasdotme@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Add documentation for AMD IOMMU debugfs supportDheeraj Kumar Srivastava
Add documentation describing how to use AMD IOMMU debugfs support to dump IOMMU data structures - IRT table, Device table, Registers (MMIO and Capability) and command buffer. Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-9-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Add debugfs support to dump IRT TableDheeraj Kumar Srivastava
In cases where we have an issue in the device interrupt path with IOMMU interrupt remapping enabled, dumping valid IRT table entries for the device is very useful and good input for debugging the issue. eg. -> To dump irte entries for a particular device #echo "c4:00.0" > /sys/kernel/debug/iommu/amd/devid #cat /sys/kernel/debug/iommu/amd/irqtbl | less or #echo "0000:c4:00.0" > /sys/kernel/debug/iommu/amd/devid #cat /sys/kernel/debug/iommu/amd/irqtbl | less Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-8-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Add debugfs support to dump device tableDheeraj Kumar Srivastava
IOMMU uses device table data structure to get per-device information for DMA remapping, interrupt remapping, and other functionalities. It's a valuable data structure to visualize for debugging issues related to IOMMU. eg. -> To dump device table entry for a particular device #echo 0000:c4:00.0 > /sys/kernel/debug/iommu/amd/devid #cat /sys/kernel/debug/iommu/amd/devtbl or #echo c4:00.0 > /sys/kernel/debug/iommu/amd/devid #cat /sys/kernel/debug/iommu/amd/devtbl Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-7-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Add support for device id user inputDheeraj Kumar Srivastava
Dumping IOMMU data structures like device table, IRT, etc., for all devices on the system will be a lot of data dumped in a file. Also, user may want to dump and analyze these data structures just for one or few devices. So dumping IOMMU data structures like device table, IRT etc for all devices is not a good approach. Add "device id" user input to be used for dumping IOMMU data structures like device table, IRT etc in AMD IOMMU debugfs. eg. 1. # echo 0000:01:00.0 > /sys/kernel/debug/iommu/amd/devid # cat /sys/kernel/debug/iommu/amd/devid Output : 0000:01:00.0 2. # echo 01:00.0 > /sys/kernel/debug/iommu/amd/devid # cat /sys/kernel/debug/iommu/amd/devid Output : 0000:01:00.0 Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-6-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Add debugfs support to dump IOMMU command bufferDheeraj Kumar Srivastava
IOMMU driver sends command to IOMMU hardware via command buffer. In cases where IOMMU hardware fails to process commands in command buffer, dumping it is a valuable input to debug the issue. IOMMU hardware processes command buffer entry at offset equals to the head pointer. Dumping just the entry at the head pointer may not always be useful. The current head may not be pointing to the entry of the command buffer which is causing the issue. IOMMU Hardware may have processed the entry and updated the head pointer. So dumping the entire command buffer gives a broad understanding of what hardware was/is doing. The command buffer dump will have all entries from start to end of the command buffer. Along with that, it will have a head and tail command buffer pointer register dump to facilitate where the IOMMU driver and hardware are in the command buffer for injecting and processing the entries respectively. Command buffer is a per IOMMU data structure. So dumping on per IOMMU basis. eg. -> To get command buffer dump for iommu<x> (say, iommu00) #cat /sys/kernel/debug/iommu/amd/iommu00/cmdbuf Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-5-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Add debugfs support to dump IOMMU Capability registersDheeraj Kumar Srivastava
IOMMU Capability registers defines capabilities of IOMMU and information needed for initialising MMIO registers and device table. This is useful to dump these registers for debugging IOMMU related issues. e.g. -> To get capability registers value at offset 0x10 for iommu<x> (say, iommu00) # echo "0x10" > /sys/kernel/debug/iommu/amd/iommu00/capability # cat /sys/kernel/debug/iommu/amd/iommu00/capability Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-4-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Add debugfs support to dump IOMMU MMIO registersDheeraj Kumar Srivastava
Analyzing IOMMU MMIO registers gives a view of what IOMMU is configured with on the system and is helpful to debug issues with IOMMU. eg. -> To get mmio registers value at offset 0x18 for iommu<x> (say, iommu00) # echo "0x18" > /sys/kernel/debug/iommu/amd/iommu00/mmio # cat /sys/kernel/debug/iommu/amd/iommu00/mmio Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-3-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-15iommu/amd: Refactor AMD IOMMU debugfs initial setupDheeraj Kumar Srivastava
Rearrange initial setup of AMD IOMMU debugfs to segregate per IOMMU setup and setup which is common for all IOMMUs. This ensures that common debugfs paths (introduced in subsequent patches) are created only once instead of being created for each IOMMU. With the change, there is no need to use lock as amd_iommu_debugfs_setup() will be called only once during AMD IOMMU initialization. So remove lock acquisition in amd_iommu_debugfs_setup(). Signed-off-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250702093804.849-2-dheerajkumar.srivastava@amd.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/exynos: add support for reserved regionsKaustabh Chakraborty
The bootloader configures a reserved memory region for framebuffer, which is protected by the IOMMU. The kernel-side driver is oblivious as of which memory region is set up by the bootloader. In such case, the IOMMU tries to reference the reserved region - which is not reserved in the kernel anymore - and it results in an unrecoverable page fault. More information about it is provided in [1]. Add support for reserved regions using iommu_dma_get_resv_regions(). For OF supported boards, this requires defining the region in the iommu-addresses property of the IOMMU owner's node. Link: https://lore.kernel.org/r/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org [1] Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org> Link: https://lore.kernel.org/r/20250712-exynos-sysmmu-resv-regions-v1-1-e79681fcab1a@disroot.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/qcom: Fix pgsize_bitmapJason Gunthorpe
qcom uses the ARM_32_LPAE_S1 format which uses the ARM long descriptor page table. Eventually arm_32_lpae_alloc_pgtable_s1() will adjust the pgsize_bitmap with: cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G); So the current declaration is nonsensical. Fix it to be just SZ_4K which is what it has actually been using so far. Most likely the qcom driver copy and pasted the pgsize_bitmap from something using the ARM_V7S format. Fixes: db64591de4b2 ("iommu/qcom: Remove iommu_ops pgsize_bitmap") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYvif6kDDFar5ZK4Dff3XThSrhaZaJundjQYujaJW978yg@mail.gmail.com/ Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/0-v1-65a7964d2545+195-qcom_pgsize_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_rangeEthan Milon
The logic in cache_tag_flush_all() to iterate over cache tags and issue TLB invalidations is largely duplicated in cache_tag_flush_range(), with the only difference being the range parameters. Extend cache_tag_flush_range() to handle a full address space flush when called with start = 0 and end = ULONG_MAX. This allows cache_tag_flush_all() to simply delegate to cache_tag_flush_range() Signed-off-by: Ethan Milon <ethan.milon@eviden.com> Link: https://lore.kernel.org/r/20250708214821.30967-2-ethan.milon@eviden.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-12-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_allEthan Milon
The function cache_tag_flush_all() was originally implemented with incorrect device TLB invalidation logic that does not handle PASID, in commit c4d27ffaa8eb ("iommu/vt-d: Add cache tag invalidation helpers") This causes regressions where full address space TLB invalidations occur with a PASID attached, such as during transparent hugepage unmapping in SVA configurations or when calling iommu_flush_iotlb_all(). In these cases, the device receives a TLB invalidation that lacks PASID. This incorrect logic was later extracted into cache_tag_flush_devtlb_all(), in commit 3297d047cd7f ("iommu/vt-d: Refactor IOTLB and Dev-IOTLB flush for batching") The fix replaces the call to cache_tag_flush_devtlb_all() with cache_tag_flush_devtlb_psi(), which properly handles PASID. Fixes: 4f609dbff51b ("iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbs") Fixes: 4e589a53685c ("iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_all") Signed-off-by: Ethan Milon <ethan.milon@eviden.com> Link: https://lore.kernel.org/r/20250708214821.30967-1-ethan.milon@eviden.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-11-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Split paging_domain_compatible()Jason Gunthorpe
Make First/Second stage specific functions that follow the same pattern in intel_iommu_domain_alloc_first/second_stage() for computing EOPNOTSUPP. This makes the code easier to understand as if we couldn't create a domain with the parameters for this IOMMU instance then we certainly are not compatible with it. Check superpage support directly against the per-stage cap bits and the pgsize_bitmap. Add a note that the force_snooping is read without locking. The locking needs to cover the compatible check and the add of the device to the list. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-10-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Split intel_iommu_enforce_cache_coherency()Jason Gunthorpe
First Stage and Second Stage have very different ways to deny no-snoop. The first stage uses the PGSNP bit which is global per-PASID so enabling requires loading new PASID entries for all the attached devices. Second stage uses a bit per PTE, so enabling just requires telling future maps to set the bit. Since we now have two domain ops we can have two functions that can directly code their required actions instead of a bunch of logic dancing around use_first_level. Combine domain_set_force_snooping() into the new functions since they are the only caller. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-9-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Create unique domain ops for each stageJason Gunthorpe
Use the domain ops pointer to tell what kind of domain it is instead of the internal use_first_level indication. This also protects against wrongly using a SVA/nested/IDENTITY/BLOCKED domain type in places they should not be. The only remaining uses of use_first_level outside the paging domain are in paging_domain_compatible() and intel_iommu_enforce_cache_coherency(). Thus, remove the useless sets of use_first_level in intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of the unique ops for these domain types ever reference it on their call chains. Add a WARN_ON() check in domain_context_mapping_one() as it only works with second stage. This is preparation for iommupt which will have different ops for each of the stages. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-8-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()Jason Gunthorpe
Create stage specific functions that check the stage specific conditions if each stage can be supported. Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence until one does not return EOPNOTSUPP and prefer to use the first stage if available and suitable for the requested flags. Move second stage only operations like nested_parent and dirty_tracking into the second stage function for clarity. Move initialization of the iommu_domain members into paging_domain_alloc(). Drop initialization of domain->owner as the callers all do it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-7-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Do not wipe out the page table NID when devices detachJason Gunthorpe
The NID is used to control which NUMA node memory for the page table is allocated it from. It should be a permanent property of the page table when it was allocated and not change during attach/detach of devices. Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: 7c204426b818 ("iommu/vt-d: Add domain_alloc_paging support") Link: https://lore.kernel.org/r/20250714045028.958850-6-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()Jason Gunthorpe
It has only one caller, no need for two functions. Correct the WARN_ON() error handling to leak the entire page table if the HW is still referencing it so we don't UAF during WARN_ON recovery. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-5-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()Jason Gunthorpe
Pass the phys_addr_t down through the call chain from the top instead of passing a pgd_t * KVA. This moves the __pa() into domain_setup_first_level() which is the first function to obtain the pgd from the IOMMU page table in this call chain. The SVA flow is also adjusted to get the pa of the mm->pgd. iommput will move the __pa() into iommupt code, it never shares the KVA of the page table with the driver. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-dbbe6f7e7ae3+124ffe-vtd_prep_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-4-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modesLu Baolu
The iotlb_sync_map iommu ops allows drivers to perform necessary cache flushes when new mappings are established. For the Intel iommu driver, this callback specifically serves two purposes: - To flush caches when a second-stage page table is attached to a device whose iommu is operating in caching mode (CAP_REG.CM==1). - To explicitly flush internal write buffers to ensure updates to memory- resident remapping structures are visible to hardware (CAP_REG.RWBF==1). However, in scenarios where neither caching mode nor the RWBF flag is active, the cache_tag_flush_range_np() helper, which is called in the iotlb_sync_map path, effectively becomes a no-op. Despite being a no-op, cache_tag_flush_range_np() involves iterating through all cache tags of the iommu's attached to the domain, protected by a spinlock. This unnecessary execution path introduces overhead, leading to a measurable I/O performance regression. On systems with NVMes under the same bridge, performance was observed to drop from approximately ~6150 MiB/s down to ~4985 MiB/s. Introduce a flag in the dmar_domain structure. This flag will only be set when iotlb_sync_map is required (i.e., when CM or RWBF is set). The cache_tag_flush_range_np() is called only for domains where this flag is set. This flag, once set, is immutable, given that there won't be mixed configurations in real-world scenarios where some IOMMUs in a system operate in caching mode while others do not. Theoretically, the immutability of this flag does not impact functionality. Reported-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2115738 Link: https://lore.kernel.org/r/20250701171154.52435-1-ioanna-maria.alifieraki@canonical.com Fixes: 129dab6e1286 ("iommu/vt-d: Use cache_tag_flush_range_np() in iotlb_sync_map") Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250703031545.3378602-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250714045028.958850-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-14iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hookVineeth Pillai (Google)
iommu init hook is wrapped in CONFI_X86 and is a remnant of dmar.c when it was a common code in "drivers/pci/dmar.c". This was added in commit (9d5ce73a64be2 x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook) Now this is built only for x86. This config wrap could be removed. Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250616131740.3499289-1-vineeth@bitbyteword.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250714045028.958850-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-11iommu/amd: Enable PASID and ATS capabilities in the correct orderEaswar Hariharan
Per the PCIe spec, behavior of the PASID capability is undefined if the value of the PASID Enable bit changes while the Enable bit of the function's ATS control register is Set. Unfortunately, pdev_enable_caps() does exactly that by ordering enabling ATS for the device before enabling PASID. Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Fixes: eda8c2860ab679 ("iommu/amd: Enable device ATS/PASID/PRI capabilities independently") Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250703155433.6221-1-eahariha@linux.microsoft.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-07-04iommu/mediatek-v1: Tidy up probe_finalizeRobin Murphy
Krzysztof points out that although the driver now supports COMPILE_TEST for other architectures, it does not build cleanly with W=1 where the stubbed-out ARM API can lead to an unused variable warning. Since this is effectively the correct intent of the code in such cases, mark it as __maybe_unused, tidying up some cruft in the process. Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Link: https://lore.kernel.org/r/7c78149504900bc6c98a9c48f4418934b72d89ac.1751036478.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-07-04iommu/intel: Convert to msi_create_parent_irq_domain() helperMarc Zyngier
Now that we have a concise helper to create an MSI parent domain, switch the Intel IOMMU remapping over to that. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204124549.607054-10-maz@kernel.org Link: https://lore.kernel.org/r/169c793c50be8493cfd9d11affb00e9ed6341c36.1750858125.git.namcao@linutronix.de Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-07-04iommu/amd: Convert to msi_create_parent_irq_domain() helperMarc Zyngier
Now that we have a concise helper to create an MSI parent domain, switch the AMD IOMMU remapping over to that. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204124549.607054-9-maz@kernel.org Link: https://lore.kernel.org/r/92e5ae97a03e4ffc272349d0863cd2cc8f904c44.1750858125.git.namcao@linutronix.de Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu: Remove ops->pgsize_bitmapJason Gunthorpe
No driver uses it now, remove the core code. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/7-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/msm: Remove ops->pgsize_bitmapJason Gunthorpe
This driver just uses a constant, put it in domain_alloc_paging and use the domain's value instead of ops during finalise. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/0-v1-662aad101e51+45-iommu_rm_ops_pgsize_msm_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/omap: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250624-syscon-phandle-args-iommu-v3-2-1a36487d69b8@linaro.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/omap: Drop redundant check if ti,syscon-mmuconfig existsKrzysztof Kozlowski
The syscon_regmap_lookup_by_phandle() will fail if property does not exist, so doing of_property_read_bool() earlier is redundant. Drop that check and move error message to syscon_regmap_lookup_by_phandle() error case while converting it to dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20250624-syscon-phandle-args-iommu-v3-1-1a36487d69b8@linaro.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/qcom: Remove iommu_ops pgsize_bitmapJason Gunthorpe
This driver just uses a constant, put it in domain_alloc_paging and use the domain's value instead of ops during init_domain. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/6-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/mtk: Remove iommu_ops pgsize_bitmapJason Gunthorpe
This driver just uses a constant, put it in domain_alloc_paging and use the domain's value instead of ops during finalise. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/5-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu: Remove iommu_ops pgsize_bitmap from simple driversJason Gunthorpe
These drivers just have a constant value for their page size, move it into their domain_alloc_paging function before setting up the geometry. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> # for s390-iommu.c Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> # for exynos-iommu.c Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Thierry Reding <treding@nvidia.com> Acked-by: Chen-Yu Tsai <wens@csie.org> # sun50i-iommu.c Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/4-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu: Remove ops.pgsize_bitmap from drivers that don't use itJason Gunthorpe
These drivers all set the domain->pgsize_bitmap in their domain_alloc_paging() functions, so the ops value is never used. Delete it. Reviewed-by: Sven Peter <sven@svenpeter.dev> # for Apple DART Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> # for RISC-V Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/3-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/arm-smmu: Remove iommu_ops pgsize_bitmapJason Gunthorpe
The driver never reads this value, arm_smmu_init_domain_context() always sets domain.pgsize_bitmap to smmu->pgsize_bitmap, the per-instance value. Remove the ops version entirely, the related dead code and make arm_smmu_ops const. Since this driver does not yet finalize the domain under arm_smmu_domain_alloc_paging() add a page size initialization to alloc so the page size is still setup prior to attach. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/2-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27qiommu/arm-smmu-v3: Remove iommu_ops pgsize_bitmapJason Gunthorpe
The driver never reads this value, arm_smmu_domain_finalise() always sets domain.pgsize_bitmap to pgtbl_cfg, which comes from the per-smmu calculated value. Remove the ops version entirely, the related dead code and make arm_smmu_ops const. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/1-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/amd: Add efr[HATS] max v1 page table levelAnkit Soni
The EFR[HATS] bits indicate maximum host translation level supported by IOMMU. Adding support to set the maximum host page table level as indicated by EFR[HATS]. If the HATS=11b (reserved), the driver will attempt to use guest page table for DMA API. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/df0f8562c2a20895cc185c86f1a02c4d826fd597.1749016436.git.Ankit.Soni@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/amd: Add HATDis feature supportAnkit Soni
Current AMD IOMMU assumes Host Address Translation (HAT) is always supported, and Linux kernel enables this capability by default. However, in case of emulated and virtualized IOMMU, this might not be the case. For example,current QEMU-emulated AMD vIOMMU does not support host translation for VFIO pass-through device, but the interrupt remapping support is required for x2APIC (i.e. kvm-msi-ext-dest-id is also not supported by the guest OS). This would require the guest kernel to boot with guest kernel option iommu=pt to by-pass the initialization of host (v1) table. The AMD I/O Virtualization Technology (IOMMU) Specification Rev 3.10 [1] introduces a new flag 'HATDis' in the IVHD 11h IOMMU attributes to indicate that HAT is not supported on a particular IOMMU instance. Therefore, modifies the AMD IOMMU driver to detect the new HATDis attributes, and disable host translation and switch to use guest translation if it is available. Otherwise, the driver will disable DMA translation. [1] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Ankit Soni <Ankit.Soni@amd.com> Link: https://lore.kernel.org/r/8109b208f87b80e400c2abd24a2e44fcbc0763a5.1749016436.git.Ankit.Soni@amd.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-22Linux 6.16-rc3v6.16-rc3Linus Torvalds
2025-06-22Merge tag 'i2c-for-6.16-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - subsystem: convert drivers to use recent callbacks of struct i2c_algorithm A typical after-rc1 cleanup, which I couldn't send in time for rc2 - tegra: fix YAML conversion of device tree bindings - k1: re-add a check which got lost during upstreaming * tag 'i2c-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: k1: check for transfer error i2c: use inclusive callbacks in struct i2c_algorithm dt-bindings: i2c: nvidia,tegra20-i2c: Specify the required properties
2025-06-22Merge tag 'x86_urgent_for_v6.16_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure the array tracking which kernel text positions need to be alternatives-patched doesn't get mishandled by out-of-order modifications, leading to it overflowing and causing page faults when patching - Avoid an infinite loop when early code does a ranged TLB invalidation before the broadcast TLB invalidation count of how many pages it can flush, has been read from CPUID - Fix a CONFIG_MODULES typo - Disable broadcast TLB invalidation when PTI is enabled to avoid an overflow of the bitmap tracking dynamic ASIDs which need to be flushed when the kernel switches between the user and kernel address space - Handle the case of a CPU going offline and thus reporting zeroes when reading top-level events in the resctrl code * tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Fix int3 handling failure from broken text_poke array x86/mm: Fix early boot use of INVPLGB x86/its: Fix an ifdef typo in its_alloc() x86/mm: Disable INVLPGB when PTI is enabled x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem