summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-11iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf()Nicolin Chen
The current flow of tegra241_cmdqv_remove_vintf() is: 1. For each LVCMDQ, tegra241_vintf_remove_lvcmdq(): a. Disable the LVCMDQ HW b. Release the LVCMDQ SW resource 2. For current VINTF, tegra241_vintf_hw_deinit(): c. Disable all LVCMDQ HWs d. Disable VINTF HW Obviously, the step 1.a and the step 2.c are redundant. Since tegra241_vintf_hw_deinit() disables all of its LVCMDQ HWs, it could simplify the flow in tegra241_cmdqv_remove_vintf() by calling that first: 1. For current VINTF, tegra241_vintf_hw_deinit(): a. Disable all LVCMDQ HWs b. Disable VINTF HW 2. Release all LVCMDQ SW resources Drop tegra241_vintf_remove_lvcmdq(), and move tegra241_vintf_free_lvcmdq() as the new step 2. Link: https://patch.msgid.link/r/86c97c8c4ee9ca192e7e7fa3007c10399d792ce6.1752126748.git.nicolinc@nvidia.com Acked-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommu/tegra241-cmdqv: Use request_threaded_irqNicolin Chen
A vEVENT can be reported only from a threaded IRQ context. Change to using request_threaded_irq to support that. Link: https://patch.msgid.link/r/f160193980e3b273afbd1d9cfc3e360084c05ba6.1752126748.git.nicolinc@nvidia.com Acked-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommu/arm-smmu-v3-iommufd: Add hw_info to impl_opsNicolin Chen
This will be used by Tegra241 CMDQV implementation to report a non-default HW info data. Link: https://patch.msgid.link/r/8a3bf5709358eb21aed2e8434534c30ecf83917c.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommu/arm-smmu-v3-iommufd: Add vsmmu_size/type and vsmmu_init impl opsNicolin Chen
An impl driver might want to allocate its own type of vIOMMU object or the standard IOMMU_VIOMMU_TYPE_ARM_SMMUV3 by setting up its own SW/HW bits, as the tegra241-cmdqv driver will add IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV. Add vsmmu_size/type and vsmmu_init to struct arm_smmu_impl_ops. Prioritize them in arm_smmu_get_viommu_size() and arm_vsmmu_init(). Link: https://patch.msgid.link/r/375ac2b056764534bb7c10ecc4f34a0bae82b108.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/selftest: Update hw_info coverage for an input data_typeNicolin Chen
Test both IOMMU_HW_INFO_TYPE_DEFAULT and IOMMU_HW_INFO_TYPE_SELFTEST, and add a negative test for an unsupported type. Also drop the unused mask in test_cmd_get_hw_capabilities() as checkpatch is complaining. Link: https://patch.msgid.link/r/f01a1e50cd7366f217cbf192ad0b2b79e0eb89f0.1752126748.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd: Allow an input data_type via iommu_hw_infoNicolin Chen
The iommu_hw_info can output via the out_data_type field the vendor data type from a driver, but this only allows driver to report one data type. Now, with SMMUv3 having a Tegra241 CMDQV implementation, it has two sets of types and data structs to report. One way to support that is to use the same type field bidirectionally. Reuse the same field by adding an "in_data_type", allowing user space to request for a specific type and to get the corresponding data. For backward compatibility, since the ioctl handler has never checked an input value, add an IOMMU_HW_INFO_FLAG_INPUT_TYPE to switch between the old output-only field and the new bidirectional field. Link: https://patch.msgid.link/r/887378a7167e1786d9d13cde0c36263ed61823d7.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommu: Allow an input type in hw_info opNicolin Chen
The hw_info uAPI will support a bidirectional data_type field that can be used as an input field for user space to request for a specific info data. To prepare for the uAPI update, change the iommu layer first: - Add a new IOMMU_HW_INFO_TYPE_DEFAULT as an input, for which driver can output its only (or firstly) supported type - Update the kdoc accordingly - Roll out the type validation in the existing drivers Link: https://patch.msgid.link/r/00f4a2d3d930721f61367014717b3ba2d1e82a81.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11Documentation: userspace-api: iommufd: Update HW QUEUENicolin Chen
With the introduction of the new object and its infrastructure, update the doc to reflect that. Link: https://patch.msgid.link/r/caa3ddc0d9bacf05c5b3e02c5f306ff3172cc54d.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/selftest: Add coverage for the new mmap interfaceNicolin Chen
Extend the loopback test to a new mmap page. Link: https://patch.msgid.link/r/b02b1220c955c3cf9ea5dd9fe9349ab1b4f8e20b.1752126748.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd: Add mmap interfaceNicolin Chen
For vIOMMU passing through HW resources to user space (VMs), allowing a VM to control the passed through HW directly by accessing hardware registers, add an mmap infrastructure to map the physical MMIO pages to user space. Maintain a maple tree per ictx as a translation table managing mmappable regions, from an allocated for-user mmap offset to an iommufd_mmap struct, where it stores the real physical address range for io_remap_pfn_range(). Keep track of the lifecycle of the mmappable region by taking refcount of its owner, so as to enforce user space to unmap the region first before it can destroy its owner object. To allow an IOMMU driver to add and delete mmappable regions onto/from the maple tree, add iommufd_viommu_alloc/destroy_mmap helpers. Link: https://patch.msgid.link/r/9a888a326b12aa5fe940083eae1156304e210fe0.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOCNicolin Chen
Some simple tests for IOMMUFD_CMD_HW_QUEUE_ALLOC infrastructure covering the new iommufd_hw_queue_depend/undepend() helpers. Link: https://patch.msgid.link/r/e8a194d187d7ef445f43e4a3c04fb39472050afd.1752126748.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpersNicolin Chen
NVIDIA Virtual Command Queue is one of the iommufd users exposing vIOMMU features to user space VMs. Its hardware has a strict rule when mapping and unmapping multiple global CMDQVs to/from a VM-owned VINTF, requiring mappings in ascending order and unmappings in descending order. The tegra241-cmdqv driver can apply the rule for a mapping in the LVCMDQ allocation handler. However, it can't do the same for an unmapping since user space could start random destroy calls breaking the rule, while the destroy op in the driver level can't reject a destroy call as it returns void. Add iommufd_hw_queue_depend/undepend for-driver helpers, allowing LVCMDQ allocator to refcount_inc() a sibling LVCMDQ object and LVCMDQ destroyer to refcount_dec(), so that iommufd core will help block a random destroy call that breaks the rule. This is a bit of compromise, because a driver might end up with abusing the API that deadlocks the objects. So restrict the API to a dependency between two driver-allocated objects of the same type, as iommufd would unlikely build any core-level dependency in this case. And encourage to use the macro version that currently supports the HW QUEUE objects only. Link: https://patch.msgid.link/r/2735c32e759c82f2e6c87cb32134eaf09b7589b5.1752126748.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctlNicolin Chen
Introduce a new IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl for user space to allocate a HW QUEUE object for a vIOMMU specific HW-accelerated queue, e.g.: - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers Since this is introduced with NVIDIA's VCMDQs that access the guest memory in the physical address space, add an iommufd_hw_queue_alloc_phys() helper that will create an access object to the queue memory in the IOAS, to avoid the mappings of the guest memory from being unmapped, during the life cycle of the HW queue object. AMD's HW will need an hw_queue_init op that is mutually exclusive with the hw_queue_init_phys op, and their case will bypass the access part, i.e. no iommufd_hw_queue_alloc_phys() call. Link: https://patch.msgid.link/r/dab4ace747deb46c1fe70a5c663307f46990ae56.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related structNicolin Chen
Add IOMMUFD_OBJ_HW_QUEUE with an iommufd_hw_queue structure, representing a HW-accelerated queue type of IOMMU's physical queue that can be passed through to a user space VM for direct hardware control, such as: - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers Add new viommu ops for iommufd to communicate with IOMMU drivers to fetch supported HW queue structure size and to forward user space ioctls to the IOMMU drivers for initialization/destroy. As the existing HWs, NVIDIA's VCMDQs access the guest memory via physical addresses, while AMD's Buffers access the guest memory via guest physical addresses (i.e. iova of the nesting parent HWPT). Separate two mutually exclusive hw_queue_init and hw_queue_init_phys ops to indicate whether a vIOMMU HW accesses the guest queue in the guest physical space (via iova) or the host physical space (via pa). In a latter case, the iommufd core will validate the physical pages of a given guest queue, to ensure the underlying physical pages are contiguous and pinned. Since this is introduced with NVIDIA's VCMDQs, add hw_queue_init_phys for now, and leave some notes for hw_queue_init in the near future (for AMD). Either NVIDIA's or AMD's HW is a multi-queue model: NVIDIA's will be only one type in enum iommu_hw_queue_type, while AMD's will be three different types (two of which will have multi queues). Compared to letting the core manage multiple queues with three types per vIOMMU object, it'd be easier for the driver to manage that by having three different driver-structure arrays per vIOMMU object. Thus, pass in the index to the init op. Link: https://patch.msgid.link/r/6939b73699e278e60ce167e911b3d9be68882bad.1752126748.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/viommu: Add driver-defined vDEVICE supportNicolin Chen
NVIDIA VCMDQ driver will have a driver-defined vDEVICE structure and do some HW configurations with that. To allow IOMMU drivers to define their own vDEVICE structures, move the struct iommufd_vdevice to the public header and provide a pair of viommu ops, similar to get_viommu_size and viommu_init. Doing this, however, creates a new window between the vDEVICE allocation and its driver-level initialization, during which an abort could happen but it can't invoke a driver destroy function from the struct viommu_ops since the driver structure isn't initialized yet. vIOMMU object doesn't have this problem, since its destroy op is set via the viommu_ops by the driver viommu_init function. Thus, vDEVICE should do something similar: add a destroy function pointer inside the struct iommufd_vdevice instead of the struct iommufd_viommu_ops. Note that there is unlikely a use case for a type dependent vDEVICE, so a static vdevice_size is probably enough for the near term instead of a get_vdevice_size function op. Link: https://patch.msgid.link/r/1e751c01da7863c669314d8e27fdb89eabcf5605.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd/access: Bypass access->ops->unmap for internal useNicolin Chen
The access object has been used externally by VFIO mdev devices, allowing them to pin/unpin physical pages (via needs_pin_pages). Meanwhile, a racy unmap can occur in this case, so these devices usually implement an unmap handler, invoked by iommufd_access_notify_unmap(). The new HW queue object will need the same pin/unpin feature, although it (unlike the mdev case) wants to reject any unmap attempt, during its life cycle. Instead, it would not implement an unmap handler. Thus, bypass any access->ops->unmap access call when the access is marked as internal. Also, an area being pinned by an internal access should reject any unmap request. This cannot be done inside iommufd_access_notify_unmap() as it's a per-iopt action. Add a "num_locks" counter in the struct iopt_area, set that in iopt_area_add_access() when the caller is an internal access. Link: https://patch.msgid.link/r/6df9a43febf79c0379091ec59747276ce9d2493b.1752126748.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommufd/access: Add internal APIs for HW queue to useNicolin Chen
The new HW queue object, as an internal iommufd object, wants to reuse the struct iommufd_access to pin some iova range in the iopt. However, an access generally takes the refcount of an ictx. So, in such an internal case, a deadlock could happen when the release of the ictx has to wait for the release of the access first when releasing a hw_queue object, which could wait for the release of the ictx that is refcounted: ictx --releases--> hw_queue --releases--> access ^ | |_________________releases________________v To address this, add a set of lightweight internal APIs to unlink the ictx and the access, i.e. no ictx refcounting by the access: ictx --releases--> hw_queue --releases--> access Then, there's no point in setting the access->ictx. So simply define !ictx as an flag for an internal use and add an inline helper. Link: https://patch.msgid.link/r/d8d84bf99cbebec56034b57b966a3d431385b90d.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommufd/selftest: Add coverage for viommu dataNicolin Chen
Extend the existing test_cmd/err_viommu_alloc helpers to accept optional user data. And add a TEST_F for a loopback test. Link: https://patch.msgid.link/r/8ceb64d30e9953f29270a7d341032ca439317271.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommufd/selftest: Support user_data in mock_viommu_allocNicolin Chen
Add a simple user_data for an input-to-output loopback test. Link: https://patch.msgid.link/r/cae4632bb3d98a1efb3b77488fbf81814f2041c6.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommufd/viommu: Allow driver-specific user data for a vIOMMU objectNicolin Chen
The new type of vIOMMU for tegra241-cmdqv driver needs a driver-specific user data. So, add data_len/uptr to the iommu_viommu_alloc uAPI and pass it in via the viommu_init iommu op. Link: https://patch.msgid.link/r/2315b0e164b355746387e960745ac9154caec124.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommu: Pass in a driver-level user data structure to viommu_init opNicolin Chen
The new type of vIOMMU for tegra241-cmdqv allows user space VM to use one of its virtual command queue HW resources exclusively. This requires user space to mmap the corresponding MMIO page from kernel space for direct HW control. To forward the mmap info (offset and length), iommufd should add a driver specific data structure to the IOMMUFD_CMD_VIOMMU_ALLOC ioctl, for driver to output the info during the vIOMMU initialization back to user space. Similar to the existing ioctls and their IOMMU handlers, add a user_data to viommu_init op to bridge between iommufd and drivers. Link: https://patch.msgid.link/r/90bd5637dab7f5507c7a64d2c4826e70431e45a4.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommu: Add iommu_copy_struct_to_user helperNicolin Chen
Similar to the iommu_copy_struct_from_user helper receiving data from the user space, add an iommu_copy_struct_to_user helper to report output data back to the user space data pointer. Link: https://patch.msgid.link/r/fa292c2a730aadd77085ec3a8272360c96eabb9c.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommu: Use enum iommu_hw_info_type for type in hw_info opNicolin Chen
Replace u32 to make it clear. No functional changes. Also simplify the kdoc since the type itself is clear enough. Link: https://patch.msgid.link/r/651c50dee8ab900f691202ef0204cd5a43fdd6a2.1752126748.git.nicolinc@nvidia.com Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommufd/viommu: Explicitly define vdev->virt_idNicolin Chen
The "id" is too general to get its meaning easily. Rename it explicitly to "virt_id" and update the kdocs for readability. No functional changes. Link: https://patch.msgid.link/r/1fac22d645e6ee555675726faf3798a68315b044.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommufd: Correct virt_id kdoc at struct iommu_vdevice_allocNicolin Chen
The userspace-api iommufd.rst has described it correctly but the uAPI doc was remained uncorrected. Thus, fix it. Link: https://patch.msgid.link/r/2cdcecaf2babee16fda7545ccad4e5bed7a5032d.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-10iommufd: Report unmapped bytes in the error path of iopt_unmap_iova_rangeNicolin Chen
There are callers that read the unmapped bytes even when rc != 0. Thus, do not forget to report it in the error path too. Fixes: 8d40205f6093 ("iommufd: Add kAPI toward external drivers for kernel access") Link: https://patch.msgid.link/r/e2b61303bbc008ba1a4e2d7c2a2894749b59fdac.1752126748.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Apply the new iommufd_object_alloc_ucmd helperNicolin Chen
Now the new ucmd-based object allocator eases the finalize/abort routine, apply this to all existing allocators that aren't protected by any lock. Upgrade the for-driver vIOMMU alloctor too, and pass down to all existing viommu_alloc op accordingly. Note that __iommufd_object_alloc_ucmd() builds in some static tests that cover both static_asserts in the iommufd_viommu_alloc(). Thus drop them. Link: https://patch.msgid.link/r/107b24a3b791091bb09c92ffb0081c56c413b26d.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Introduce iommufd_object_alloc_ucmd helperNicolin Chen
An object allocator needs to call either iommufd_object_finalize() upon a success or iommufd_object_abort_and_destroy() upon an error code. To reduce duplication, store a new_obj in the struct iommufd_ucmd and call iommufd_object_finalize/iommufd_object_abort_and_destroy() accordingly in the main function. Similar to iommufd_object_alloc() and __iommufd_object_alloc(), add a pair of helpers: __iommufd_object_alloc_ucmd() and iommufd_object_alloc_ucmd(). Link: https://patch.msgid.link/r/e7206d4227844887cc8dbf0cc7b0242580fafd9d.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Move _iommufd_object_alloc out of driver.cNicolin Chen
Now, all driver structures will be allocated by the core, i.e. no longer a need of driver calling _iommufd_object_alloc. Thus, move it back. Before: text data bss dec hex filename 3024 180 0 3204 c84 drivers/iommu/iommufd/driver.o 9074 610 64 9748 2614 drivers/iommu/iommufd/main.o After: text data bss dec hex filename 2665 164 0 2829 b0d drivers/iommu/iommufd/driver.o 9410 618 64 10092 276c drivers/iommu/iommufd/main.o Link: https://patch.msgid.link/r/79e630c7b911930cf36e3c8a775a04e66c528d65.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommu: Deprecate viommu_alloc opNicolin Chen
To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced. Now, those existing vIOMMU supported drivers implemented these two ops, replacing the viommu_alloc one. So, there is no use of it. Remove it from the headers and the viommu core. Link: https://patch.msgid.link/r/5b32d4499d7ed02a63e57a293c11b642d226ef8d.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommu/arm-smmu-v3: Replace arm_vsmmu_alloc with arm_vsmmu_initNicolin Chen
To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced. Sanitize the inputs and report the size of struct arm_vsmmu on success, in arm_smmu_get_viommu_size(). Place the type sanity at the last, becase there will be soon an impl level get_viommu_size op, which will require the same sanity tests prior. It can simply insert a piece of code in front of the IOMMU_VIOMMU_TYPE_ARM_SMMUV3 sanity. The core will ensure the viommu_type is set to the core vIOMMU object, and pass in the same dev pointer, so arm_vsmmu_init() won't need to repeat the same sanity tests but to simply init the arm_vsmmu struct. Remove the arm_vsmmu_alloc, completing the replacement. Link: https://patch.msgid.link/r/64e4b4c33acd26e1bd676e077be80e00fb63f17c.1749882255.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd/selftest: Replace mock_viommu_alloc with mock_viommu_initNicolin Chen
To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced. Sanitize the inputs and report the size of struct mock_viommu on success, in mock_get_viommu_size(). The core will ensure the viommu_type is set to the core vIOMMU object, so simply init the driver part in mock_viommu_init(). Remove the mock_viommu_alloc, completing the replacement. Link: https://patch.msgid.link/r/993beabbb0bc9705d979a92801ea5ed5996a34eb.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd/selftest: Drop parent domain from mock_iommu_domain_nestedNicolin Chen
There is no use of this parent domain. Delete the dead code. Note that the s2_parent in struct mock_viommu will be a deadcode too. Yet, keep it because it will be soon used by HW queue objects, i.e. no point in adding it back and forth in such a short window. Besides, keeping it could cover the majority of vIOMMU use cases where a driver-level structure will be larger in size than the core structure. Link: https://patch.msgid.link/r/0f155a7cd71034a498448fe4828fb4aaacdabf95.1749882255.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd/viommu: Support get_viommu_size and viommu_init opsNicolin Chen
To ease the for-driver iommufd APIs, get_viommu_size and viommu_init ops are introduced to replace the viommu_init op. Let the new viommu_init pathway coexist with the old viommu_alloc one. Since the viommu_alloc op and its pathway will be soon deprecated, try to minimize the code difference between them by adding a tentative jump tag. Note that this fails a !viommu->ops case from now on with a WARN_ON_ONCE since a vIOMMU is expected to support an alloc_domain_nested op for now, or some sort of a viommu op in the foreseeable future. This WARN_ON_ONCE can be lifted, if some day there is a use case wanting !viommu->ops. Link: https://patch.msgid.link/r/35c5fa5926be45bda82f5fc87545cd3180ad4c9c.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommu: Introduce get_viommu_size and viommu_init opsNicolin Chen
So far, a vIOMMU object has been allocated by IOMMU driver and initialized with the driver-level structure, before it returns to the iommufd core for core-level structure initialization. It has been requiring iommufd core to expose some core structure/helpers in its driver.c file, which result in a size increase of this driver module. Meanwhile, IOMMU drivers are now requiring more vIOMMU-base structures for some advanced feature, such as the existing vDEVICE and a future HW_QUEUE. Initializing a core-structure later than driver-structure gives for-driver helpers some trouble, when they are used by IOMMU driver assuming that the new structure (including core) are fully initialized, for example: core: viommu = ops->viommu_alloc(); driver: // my_viommu is successfully allocated driver: my_viommu = iommufd_viommu_alloc(...); driver: // This may crash if it reads viommu->ictx driver: new = iommufd_new_viommu_helper(my_viommu->core ...); core: viommu->ictx = ucmd->ictx; core: ... To ease such a condition, allow the IOMMU driver to report the size of its vIOMMU structure, let the core allocate a vIOMMU object and initialize the core-level structure first, and then hand it over the driver to initialize its driver-level structure. Thus, this requires two new iommu ops, get_viommu_size and viommu_init, so iommufd core can communicate with drivers to replace the viommu_alloc op: core: viommu = ops->get_viommu_size(); driver: return VIOMMU_STRUCT_SIZE(); core: viommu->ictx = ucmd->ictx; // and others core: rc = ops->viommu_init(); driver: // This is safe now as viommu->ictx is inited driver: new = iommufd_new_viommu_helper(my_viommu->core ...); core: ... This also adds a VIOMMU_STRUCT_SIZE macro, for drivers to use, which would statically sanitize the driver structure. Link: https://patch.msgid.link/r/3ab52c5b622dad476c43b1b1f1636c8b902f1692.1749882255.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Return EOPNOTSUPP for failures due to driver bugsNicolin Chen
It's more accurate to report EOPNOTSUPP when an ioctl failed due to driver bug, since there is nothing wrong with the user space side. Link: https://patch.msgid.link/r/623bb6f0e8fdd7b9c5745a2f99f280163f9f1f5a.1749882255.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Use enum iommu_veventq_type for type in struct iommufd_veventqNicolin Chen
Replace unsigned int, to make it clear. No functional changes. Link: https://patch.msgid.link/r/208a260c100a00667d3799feaad1260745f96c6b.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Use enum iommu_viommu_type for type in struct iommufd_viommuNicolin Chen
Replace unsigned int, to make it clear. No functional changes. The viommu_alloc iommu op will be deprecated, so don't change that. Link: https://patch.msgid.link/r/6c6ba5c0cd381594f17ae74355872d78d7a022c0.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Drop unused ictx in struct iommufd_vdeviceNicolin Chen
The core code can always get the ictx pointer via vdev->viommu->ictx, thus drop this unused one. Link: https://patch.msgid.link/r/6cbb65e8df433de45b6c3a4bb2c5df09faca8a7c.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-19iommufd: Apply obvious cosmetic fixesNicolin Chen
Run clang-format but exclude those not so obvious ones, which leaves us: - Align indentations - Add missing spaces - Remove unnecessary spaces - Remove unnecessary line wrappings Link: https://patch.msgid.link/r/9132e1ab45690ab1959c66bbb51ac5536a635388.1749882255.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-06-08Linux 6.16-rc1v6.16-rc1Linus Torvalds
2025-06-08Merge tag 'turbostat-2025.06.08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Add initial DMR support, which required smarter RAPL probe - Fix AMD MSR RAPL energy reporting - Add RAPL power limit configuration output - Minor fixes * tag 'turbostat-2025.06.08' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 2025.06.08 tools/power turbostat: Add initial support for BartlettLake tools/power turbostat: Add initial support for DMR tools/power turbostat: Dump RAPL sysfs info tools/power turbostat: Avoid probing the same perf counters tools/power turbostat: Allow probing RAPL with platform_features->rapl_msrs cleared tools/power turbostat: Clean up add perf/msr counter logic tools/power turbostat: Introduce add_msr_counter() tools/power turbostat: Remove add_msr_perf_counter_() tools/power turbostat: Remove add_cstate_perf_counter_() tools/power turbostat: Remove add_rapl_perf_counter_() tools/power turbostat: Quit early for unsupported RAPL counters tools/power turbostat: Always check rapl_joules flag tools/power turbostat: Fix AMD package-energy reporting tools/power turbostat: Fix RAPL_GFX_ALL typo tools/power turbostat: Add Android support for MSR device handling tools/power turbostat.8: pm_domain wording fix tools/power turbostat.8: fix typo: idle_pct should be pct_idle
2025-06-08Merge tag 'timers-cleanups-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanup from Thomas Gleixner: "The delayed from_timer() API cleanup: The renaming to the timer_*() namespace was delayed due massive conflicts against Linux-next. Now that everything is upstream finish the conversion" * tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename from_timer() to timer_container_of()
2025-06-08Merge tag 'x86-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Cure IO bitmap inconsistencies A failed fork cleans up all resources of the newly created thread via exit_thread(). exit_thread() invokes io_bitmap_exit() which does the IO bitmap cleanups, which unfortunately assume that the cleanup is related to the current task, which is obviously bogus. Make it work correctly - A lockdep fix in the resctrl code removed the clearing of the command buffer in two places, which keeps stale error messages around. Bring them back. - Remove unused trace events" * tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex x86/iopl: Cure TIF_IO_BITMAP inconsistencies x86/fpu: Remove unused trace events
2025-06-08Merge tag 'timers-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Add the missing seq_file forward declaration in the timer namespace header" * tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timens: Add struct seq_file forward declaration
2025-06-08tools/power turbostat: version 2025.06.08Len Brown
Add initial DMR support, which required smarter RAPL probe Fix AMD MSR RAPL energy reporting Add RAPL power limit configuration output Minor fixes Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add initial support for BartlettLakeZhang Rui
Add initial support for BartlettLake. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Add initial support for DMRZhang Rui
Add initial support for DMR. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Dump RAPL sysfs infoZhang Rui
for example: intel-rapl:1: psys 28.0s:100W 976.0us:100W intel-rapl:0: package-0 28.0s:57W,max:15W 2.4ms:57W intel-rapl:0/intel-rapl:0:0: core disabled intel-rapl:0/intel-rapl:0:1: uncore disabled intel-rapl-mmio:0: package-0 28.0s:28W,max:15W 2.4ms:57W [lenb: simplified format] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> squish me Signed-off-by: Len Brown <len.brown@intel.com>
2025-06-08tools/power turbostat: Avoid probing the same perf countersZhang Rui
For the RAPL package energy status counter, Intel and AMD share the same perf_subsys and perf_name, but with different MSR addresses. Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are introduced to describe this counter for different Vendors. As a result, the perf counter is probed twice, and causes a failure in in get_rapl_counters() because expected_read_size and actual_read_size don't match. Fix the problem by skipping the already probed counter. Note, this is not a perfect fix. For example, if different vendors/platforms use the same MSR value for different purpose, the code can be fooled when it probes a rapl_counter_arch_infos[] entry that does not belong to the running Vendor/Platform. In a long run, better to put rapl_counter_arch_infos[] into the platform_features so that this becomes Vendor/Platform specific. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>