linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-11-12	iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC	Nicolin Chen
	Now a vIOMMU holds a shareable nesting parent HWPT. So, it can act like that nesting parent HWPT to allocate a nested HWPT. Support that in the IOMMU_HWPT_ALLOC ioctl handler, and update its kdoc. Also, add an iommufd_viommu_alloc_hwpt_nested helper to allocate a nested HWPT for a vIOMMU object. Since a vIOMMU object holds the parent hwpt's refcount already, increase the refcount of the vIOMMU only. Link: https://patch.msgid.link/r/a0f24f32bfada8b448d17587adcaedeeb50a67ed.1730836219.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl	Nicolin Chen
	Add a new ioctl for user space to do a vIOMMU allocation. It must be based on a nesting parent HWPT, so take its refcount. IOMMU driver wanting to support vIOMMUs must define its IOMMU_VIOMMU_TYPE_ in the uAPI header and implement a viommu_alloc op in its iommu_ops. Link: https://patch.msgid.link/r/dc2b8ba9ac935007beff07c1761c31cd097ed780.1730836219.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd: Verify object in iommufd_object_finalize/abort()	Nicolin Chen
	To support driver-allocated vIOMMU objects, it's required for IOMMU driver to call the provided iommufd_viommu_alloc helper to embed the core struct. However, there is no guarantee that every driver will call it and allocate objects properly. Make the iommufd_object_finalize/abort functions more robust to verify if the xarray slot indexed by the input obj->id is having an XA_ZERO_ENTRY, which is the reserved value stored by xa_alloc via iommufd_object_alloc. Link: https://patch.msgid.link/r/334bd4dde8e0a88eb30fa67eeef61827cdb546f9.1730836219.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommufd: Move _iommufd_object_alloc helper to a sharable file	Nicolin Chen
	The following patch will add a new vIOMMU allocator that will require this _iommufd_object_alloc to be sharable with IOMMU drivers (and iommufd too). Add a new driver.c file that will be built with CONFIG_IOMMUFD_DRIVER_CORE selected by CONFIG_IOMMUFD, and put the CONFIG_DRIVER under that remaining to be selectable for drivers to build the existing iova_bitmap.c file. Link: https://patch.msgid.link/r/2f4f6e116dc49ffb67ff6c5e8a7a8e789ab9e98e.1730836219.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-12	iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift	Nicolin Chen
	When configuring a kernel with PAGE_SIZE=4KB, depending on its setting of CONFIG_CMA_ALIGNMENT, VCMDQ_LOG2SIZE_MAX=19 could fail the alignment test and trigger a WARN_ON: WARNING: at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3646 Call trace: arm_smmu_init_one_queue+0x15c/0x210 tegra241_cmdqv_init_structures+0x114/0x338 arm_smmu_device_probe+0xb48/0x1d90 Fix it by capping max_n_shift to CMDQ_MAX_SZ_SHIFT as SMMUv3 CMDQ does. Fixes: 918eb5c856f6 ("iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV") Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20241111030226.1940737-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-11	kmemleak: iommu/iova: fix transient kmemleak false positive	Catalin Marinas
	The introduction of iova_depot_pop() in 911aa1245da8 ("iommu/iova: Make the rcache depot scale better") confused kmemleak by moving a struct iova_magazine object from a singly linked list to rcache->depot and resetting the 'next' pointer referencing it. Unlike doubly linked lists, the content of the object being referred is never changed on removal from a singly linked list and the kmemleak checksum heuristics do not detect such scenario. This leads to false positives like: unreferenced object 0xffff8881a5301000 (size 1024): comm "softirq", pid 0, jiffies 4306297099 (age 462.991s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 e7 7d 05 00 00 00 00 00 .........}...... 0f b4 05 00 00 00 00 00 b4 96 05 00 00 00 00 00 ................ backtrace: [<ffffffff819f5f08>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff818a239a>] kmalloc_trace+0x2a/0x60 [<ffffffff8231d31e>] free_iova_fast+0x28e/0x4e0 [<ffffffff82310860>] fq_ring_free_locked+0x1b0/0x310 [<ffffffff8231225d>] fq_flush_timeout+0x19d/0x2e0 [<ffffffff813e95ba>] call_timer_fn+0x19a/0x5c0 [<ffffffff813ea16b>] __run_timers+0x78b/0xb80 [<ffffffff813ea5bd>] run_timer_softirq+0x5d/0xd0 [<ffffffff82f1d915>] __do_softirq+0x205/0x8b5 Introduce kmemleak_transient_leak() which resets the object checksum requiring another scan pass before it is reported (if still unreferenced). Call this new API in iova_depot_pop(). Link: https://lkml.kernel.org/r/20241104111944.2207155-1-catalin.marinas@arm.com Link: https://lore.kernel.org/r/ZY1osaGLyT-sdKE8@shredder/ Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Ido Schimmel <idosch@idosch.org> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-08	iommufd: Move struct iommufd_object to public iommufd header	Nicolin Chen
	Prepare for an embedded structure design for driver-level iommufd_viommu objects: // include/linux/iommufd.h struct iommufd_viommu { struct iommufd_object obj; .... }; // Some IOMMU driver struct iommu_driver_viommu { struct iommufd_viommu core; .... }; It has to expose struct iommufd_object and enum iommufd_object_type from the core-level private header to the public iommufd header. Link: https://patch.msgid.link/r/54a43b0768089d690104530754f499ca05ce0074.1730836219.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-08	iommu: Make set_dev_pasid op support domain replacement	Yi Liu
	The iommu core is going to support domain replacement for pasid, it needs to make the set_dev_pasid op support replacing domain and keep the old domain config in the failure case. AMD iommu driver does not support domain replacement for pasid yet, so it would fail the set_dev_pasid op to keep the old config if the input @old is non-NULL. Till now, all the set_dev_pasid callbacks can handle the old parameter and can keep the old config when failed, so update the kdoc of set_dev_pasid op. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-14-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/arm-smmu-v3: Make set_dev_pasid() op support replace	Jason Gunthorpe
	set_dev_pasid() op is going to be enhanced to support domain replacement of a pasid. This prepares for this op definition. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-13-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Add set_dev_pasid callback for nested domain	Yi Liu
	Add intel_nested_set_dev_pasid() to set a nested type domain to a PASID of a device. Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-12-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Make identity_domain_set_dev_pasid() to handle domain replacement	Yi Liu
	Let identity_domain_set_dev_pasid() call the pasid replace helpers hence be able to do domain replacement. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-11-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Make intel_svm_set_dev_pasid() support domain replacement	Yi Liu
	Make intel_svm_set_dev_pasid() support replacement. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-10-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Limit intel_iommu_set_dev_pasid() for paging domain	Yi Liu
	intel_iommu_set_dev_pasid() is only supposed to be used by paging domain, so limit it. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-9-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement	Yi Liu
	Let intel_iommu_set_dev_pasid() call the pasid replace helpers hence be able to do domain replacement. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Add iommu_domain_did() to get did	Yi Liu
	domain_id_iommu() does not support SVA type and identity type domains. Add iommu_domain_did() to support all domain types. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-7-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Consolidate the struct dev_pasid_info add/remove	Yi Liu
	The domain_add_dev_pasid() and domain_remove_dev_pasid() are added to consolidate the adding/removing of the struct dev_pasid_info. Besides, it includes the cache tag assign/unassign as well. This also prepares for adding domain replacement for pasid. The set_dev_pasid callbacks need to deal with the dev_pasid_info for both old and new domain. These two helpers make the life easier. intel_iommu_set_dev_pasid() and intel_svm_set_dev_pasid() are updated to use the helpers. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-6-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Add pasid replace helpers	Yi Liu
	pasid replacement allows converting a present pasid entry to be FS, SS, PT or nested, hence add helpers for such operations. Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-5-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Refactor the pasid setup helpers	Yi Liu
	It is clearer to have a new set of pasid replacement helpers other than extending the existing ones to cover both initial setup and replacement. Then abstract out the common code for manipulating the pasid entry as preparation. No functional change is intended. Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-4-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/vt-d: Add a helper to flush cache for updating present pasid entry	Yi Liu
	Generalize the logic for flushing pasid-related cache upon changes to bits other than SSADE and P which requires a different flow according to VT-d spec. No functional change is intended. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-3-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu: Pass old domain to set_dev_pasid op	Yi Liu
	To support domain replacement for pasid, the underlying iommu driver needs to know the old domain hence be able to clean up the existing attachment. It would be much convenient for iommu layer to pass down the old domain. Otherwise, iommu drivers would need to track domain for pasids by themselves, this would duplicate code among the iommu drivers. Or iommu drivers would rely group->pasid_array to get domain, which may not always the correct one. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241107122234.7424-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-08	iommu/iova: Fix typo 'adderss'	Gan Jie
	Fix typo 'adderss' to 'address'. Signed-off-by: Gan Jie <ganjie182@gmail.com> Link: https://lore.kernel.org/r/20241101072709.702-1-ganjie182@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-07	iommufd: Allow fault reporting for non-PRI PCI devices	Zhangfei Gao
	iommufd_fault_iopf_enable has limitation to PRI on PCI/SRIOV VFs because the PRI might be a shared resource and current iommu subsystem is not ready to support enabling/disabling PRI on a VF without any impact on others. However, we have devices that appear as PCI but are actually on the AMBA bus. These fake PCI devices have PASID capability, support stall as well as SRIOV, so remove the limitation for these devices. Link: https://patch.msgid.link/r/20241107043711.116-1-zhangfei.gao@linaro.org Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-11-06	iommu: Add a kdoc to iommu_unmap()	Jason Gunthorpe
	Describe the most conservative version of the driver implementations. All drivers should support this. Many drivers support extending the range if a large page is hit, but let's not make that officially approved API. The main point is to document explicitly that split is not supported. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-b3a5b5937f56+7bb-arm_no_split_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-06	iommu/io-pgtable-arm-v7s: Remove split on unmap behavior	Jason Gunthorpe
	A minority of page table implementations (arm_lpae, armv7) are unique in how they handle partial unmap of large IOPTEs. Other implementations will unmap the large IOPTE and return it's length. For example if a 2M IOPTE is present and the first 4K is requested to be unmapped then unmap will remove the whole 2M and report 2M as the result. armv7 instead will break up contiguous entries and replace an entry with a whole table so it can unmap the requested 4k. This seems copied from the arm_lpae implementation, which was analyzed here: https://lore.kernel.org/all/20241024134411.GA6956@nvidia.com/ Bring consistency to the implementations and remove this unused functionality. There are no uses outside iommu, this effects the ARM_V7S drivers msm_iommu, mtk_iommu, and arm-smmmu. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-b3a5b5937f56+7bb-arm_no_split_jgg@nvidia.com [will: Remove unused 'loopnr' variable] Signed-off-by: Will Deacon <will@kernel.org>
2024-11-06	iommu/io-pgtable-arm: Remove split on unmap behavior	Jason Gunthorpe
	A minority of page table implementations (arm_lpae, armv7) are unique in how they handle partial unmap of large IOPTEs. Other implementations will unmap the large IOPTE and return it's length. For example if a 2M IOPTE is present and the first 4K is requested to be unmapped then unmap will remove the whole 2M and report 2M as the result. arm_lpae instead replaces the IOPTE with a table of smaller IOPTEs, unmaps the 4K and returns 4k. This is actually an illegal/non-hitless operation on at least SMMUv3 because of the BBM level 0 rules. Will says this was done to support VFIO, but upon deeper analysis this was never strictly necessary: https://lore.kernel.org/all/20241024134411.GA6956@nvidia.com/ In summary, historical VFIO supported the AMD behavior of unmapping the whole large IOPTE and returning the size, even if asked to unmap a portion. The driver would see this as a request to split a large IOPTE. Modern VFIO always unmaps entire large IOPTEs (except on AMD) and drivers don't see an IOPTE split. Given it doesn't work fully correctly on SMMUv3 and relying on ARM unique behavior would create portability problems across IOMMU drivers, retire this functionality. Outside the iommu users, this will potentially effect io_pgtable users of ARM_32_LPAE_S1, ARM_32_LPAE_S2, ARM_64_LPAE_S1, ARM_64_LPAE_S2, and ARM_MALI_LPAE formats. Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: dri-devel@lists.freedesktop.org Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-b3a5b5937f56+7bb-arm_no_split_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-05	iommu/vt-d: Drain PRQs when domain removed from RID	Lu Baolu
	As this iommu driver now supports page faults for requests without PASID, page requests should be drained when a domain is removed from the RID2PASID entry. This results in the intel_iommu_drain_pasid_prq() call being moved to intel_pasid_tear_down_entry(). This indicates that when a translation is removed from any PASID entry and the PRI has been enabled on the device, page requests are drained in the domain detachment path. The intel_iommu_drain_pasid_prq() helper has been modified to support sending device TLB invalidation requests for both PASID and non-PASID cases. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241101045543.70086-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Drop pasid requirement for prq initialization	Klaus Jensen
	PASID support within the IOMMU is not required to enable the Page Request Queue, only the PRS capability. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-5-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommufd: Enable PRI when doing the iommufd_hwpt_alloc	Joel Granados
	Add IOMMU_HWPT_FAULT_ID_VALID as part of the valid flags when doing an iommufd_hwpt_alloc allowing the use of an iommu fault allocation (iommu_fault_alloc) with the IOMMU_HWPT_ALLOC ioctl. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-4-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Move IOMMU_IOPF into INTEL_IOMMU	Joel Granados
	Move IOMMU_IOPF from under INTEL_IOMMU_SVM into INTEL_IOMMU. This certifies that the core intel iommu utilizes the IOPF library functions, independent of the INTEL_IOMMU_SVM config. Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-3-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Remove the pasid present check in prq_event_thread	Klaus Jensen
	PASID is not strictly needed when handling a PRQ event; remove the check for the pasid present bit in the request. This change was not included in the creation of prq.c to emphasize the change in capability checks when handing PRQ events. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-2-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Separate page request queue from SVM	Joel Granados
	IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move all Page Request Queue (PRQ) functions that handle prq events to a new file in drivers/iommu/intel/prq.c. The page_req_des struct is now declared in drivers/iommu/intel/prq.c. No functional changes are intended. This is a preparation patch to enable the use of IO page faults outside the SVM/PASID use cases. Signed-off-by: Joel Granados <joel.granados@kernel.org> Link: https://lore.kernel.org/r/20241015-jag-iopfv8-v4-1-b696ca89ba29@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Fix checks and print in pgtable_walk()	Zhenzhong Duan
	There are some issues in pgtable_walk(): 1. Super page is dumped as non-present page 2. dma_pte_superpage() should not check against leaf page table entries 3. Pointer pte is never NULL so checking it is meaningless 4. When an entry is not present, it still makes sense to dump the entry content. Fix 1,2 by checking dma_pte_superpage()'s returned value after level check. Fix 3 by removing pte check. Fix 4 by checking present bit after printing. By this chance, change to print "page table not present" instead of "PTE not present" to be clearer. Fixes: 914ff7719e8a ("iommu/vt-d: Dump DMAR translation structure when DMA fault occurs") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/r/20241024092146.715063-3-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Fix checks and print in dmar_fault_dump_ptes()	Zhenzhong Duan
	There are some issues in dmar_fault_dump_ptes(): 1. return value of phys_to_virt() is used for checking if an entry is present. 2. dump is confusing, e.g., "pasid table entry is not present", confusing by unpresent pasid table vs. unpresent pasid table entry. Current code means the former. 3. pgtable_walk() is called without checking if page table is present. Fix 1 by checking present bit of an entry before dump a lower level entry. Fix 2 by removing "entry" string, e.g., "pasid table is not present". Fix 3 by checking page table present before walk. Take issue 3 for example, before fix: [ 442.240357] DMAR: pasid dir entry: 0x000000012c83e001 [ 442.246661] DMAR: pasid table entry[0]: 0x0000000000000000 [ 442.253429] DMAR: pasid table entry[1]: 0x0000000000000000 [ 442.260203] DMAR: pasid table entry[2]: 0x0000000000000000 [ 442.266969] DMAR: pasid table entry[3]: 0x0000000000000000 [ 442.273733] DMAR: pasid table entry[4]: 0x0000000000000000 [ 442.280479] DMAR: pasid table entry[5]: 0x0000000000000000 [ 442.287234] DMAR: pasid table entry[6]: 0x0000000000000000 [ 442.293989] DMAR: pasid table entry[7]: 0x0000000000000000 [ 442.300742] DMAR: PTE not present at level 2 After fix: ... [ 357.241214] DMAR: pasid table entry[6]: 0x0000000000000000 [ 357.248022] DMAR: pasid table entry[7]: 0x0000000000000000 [ 357.254824] DMAR: scalable mode page table is not present Fixes: 914ff7719e8a ("iommu/vt-d: Dump DMAR translation structure when DMA fault occurs") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/r/20241024092146.715063-2-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Drop s1_pgtbl from dmar_domain	Yi Liu
	dmar_domian has stored the s1_cfg which includes the s1_pgtbl info, so no need to store s1_pgtbl, hence drop it. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241025143339.2328991-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Remove unused dmar_msi_read	Dr. David Alan Gilbert
	dmar_msi_read() has been unused since 2022 in commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture") Remove it. (dmar_msi_write still exists and is used once). Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20241022002702.302728-1-linux@treblig.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Increase buffer size for device name	Andy Shevchenko
	GCC is not happy with the current code, e.g.: .../iommu/intel/dmar.c:1063:9: note: ‘sprintf’ output between 6 and 15 bytes into a destination of size 13 1063 \| sprintf(iommu->name, "dmar%d", iommu->seq_id); When `make W=1` is supplied, this prevents kernel building. Fix it by increasing the buffer size for device name and use sizeoF() instead of hard coded constants. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20241014104529.4025937-1-andriy.shevchenko@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Use PCI_DEVID() macro	Jinjie Ruan
	The macro PCI_DEVID() can be used instead of compose it manually. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240829021011.4135618-1-ruanjinjie@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Refine intel_iommu_domain_alloc_user()	Lu Baolu
	The domain_alloc_user ops should always allocate a guest-compatible page table unless specific allocation flags are specified. Currently, IOMMU_HWPT_ALLOC_NEST_PARENT and IOMMU_HWPT_ALLOC_DIRTY_TRACKING require special handling, as both require hardware support for scalable mode and second-stage translation. In such cases, the driver should select a second-stage page table for the paging domain. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Refactor first_level_by_default()	Lu Baolu
	The first stage page table is compatible across host and guest kernels. Therefore, this driver uses the first stage page table as the default for paging domains. The helper first_level_by_default() determines the feasibility of using the first stage page table based on a global policy. This policy requires consistency in scalable mode and first stage translation capability among all iommu units. However, this is unnecessary as domain allocation, attachment, and removal operations are performed on a per-device basis. The domain type (IOMMU_DOMAIN_DMA vs. IOMMU_DOMAIN_UNMANAGED) should not be a factor in determining the first stage page table usage. Both types are for paging domains, and there's no fundamental difference between them. The driver should not be aware of this distinction unless the core specifies allocation flags that require special handling. Convert first_level_by_default() from global to per-iommu and remove the 'type' input. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Remove domain_update_iommu_superpage()	Lu Baolu
	The requirement for consistent super page support across all the IOMMU hardware in the system has been removed. In the past, if a new IOMMU was hot-added and lacked consistent super page capability, the hot-add process would be aborted. However, with the updated attachment semantics, it is now permissible for the super page capability to vary among different IOMMU hardware units. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Remove domain_update_iommu_cap()	Lu Baolu
	The attributes of a paging domain are initialized during the allocation process, and any attempt to attach a domain that is not compatible will result in a failure. Therefore, there is no need to update the domain attributes at the time of domain attachment. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Enhance compatibility check for paging domain attach	Lu Baolu
	The driver now supports domain_alloc_paging, ensuring that a valid device pointer is provided whenever a paging domain is allocated. Additionally, the dmar_domain attributes are set up at the time of allocation. Consistent with the established semantics in the IOMMU core, if a domain is attached to a device and found to be incompatible with the IOMMU hardware capabilities, the operation will return an -EINVAL error. This implicitly advises the caller to allocate a new domain for the device and attempt the domain attachment again. Rename prepare_domain_attach_device() to a more meaningful name. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Remove unused domain_alloc callback	Lu Baolu
	With domain_alloc_paging callback supported, the legacy domain_alloc callback will never be used anymore. Remove it to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/vt-d: Add domain_alloc_paging support	Lu Baolu
	Add the domain_alloc_paging callback for domain allocation using the iommu_paging_domain_alloc() interface. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241021085125.192333-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-11-05	iommu/arm-smmu-v3: Expose the arm_smmu_attach interface	Jason Gunthorpe
	The arm-smmuv3-iommufd.c file will need to call these functions too. Remove statics and put them in the header file. Remove the kunit visibility protections from arm_smmu_make_abort_ste() and arm_smmu_make_s2_domain_ste(). Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-05	iommu/arm-smmu-v3: Implement IOMMU_HWPT_ALLOC_NEST_PARENT	Jason Gunthorpe
	For SMMUv3 the parent must be a S2 domain, which can be composed into a IOMMU_DOMAIN_NESTED. In future the S2 parent will also need a VMID linked to the VIOMMU and even to KVM. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-05	iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_info	Nicolin Chen
	For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU instance need to be available to the VMM so it can construct an appropriate vSMMUv3 that reflects the correct HW capabilities. For userspace page tables these values are required to constrain the valid values within the CD table and the IOPTEs. The kernel does not sanitize these values. If building a VMM then userspace is required to only forward bits into a VM that it knows it can implement. Some bits will also require a VMM to detect if appropriate kernel support is available such as for ATS and BTM. Start a new file and kconfig for the advanced iommufd support. This lets it be compiled out for kernels that are not intended to support virtualization, and allows distros to leave it disabled until they are shipping a matching qemu too. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-05	iommu/arm-smmu-v3: Report IOMMU_CAP_ENFORCE_CACHE_COHERENCY for CANWBS	Jason Gunthorpe
	HW with CANWBS is always cache coherent and ignores PCI No Snoop requests as well. This meets the requirement for IOMMU_CAP_ENFORCE_CACHE_COHERENCY, so let's return it. Implement the enforce_cache_coherency() op to reject attaching devices that don't have CANWBS. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-05	vfio: Remove VFIO_TYPE1_NESTING_IOMMU	Jason Gunthorpe
	This control causes the ARM SMMU drivers to choose a stage 2 implementation for the IO pagetable (vs the stage 1 usual default), however this choice has no significant visible impact to the VFIO user. Further qemu never implemented this and no other userspace user is known. The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide SMMU translation services to the guest operating system" however the rest of the API to set the guest table pointer for the stage 1 and manage invalidation was never completed, or at least never upstreamed, rendering this part useless dead code. Upstream has now settled on iommufd as the uAPI for controlling nested translation. Choosing the stage 2 implementation should be done by through the IOMMU_HWPT_ALLOC_NEST_PARENT flag during domain allocation. Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the enable_nesting iommu_domain_op. Just in-case there is some userspace using this continue to treat requesting it as a NOP, but do not advertise support any more. Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-30	iommu/amd: Improve amd_iommu_release_device()	Vasant Hegde
	Previous patch added ops->release_domain support. Core will attach devices to release_domain->attach_dev() before calling this function. Devices are already detached their current domain and attached to blocked domain. This is mostly dummy function now. Just throw warning if device is still attached to domain. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20241030063556.6104-13-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>