linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-02-11	iommufd: Make attach_handle generic than fault specific	Nicolin Chen
	"attach_handle" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases. Now, both the MSI and PASID series require to reuse the attach_handle for non-fault cases. Add a set of new attach/detach/replace helpers that does the attach_handle allocation/releasing/replacement in the common path and also handles those fault specific routines such as iopf enabling/disabling and auto response. This covers both non-fault and fault cases in a clean way, replacing those inline helpers in the header. The following patch will clean up those old helpers in the fault.c file. Link: https://patch.msgid.link/r/32687df01c02291d89986a9fca897bbbe2b10987.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-10	iommu: Fix potential memory leak in iopf_queue_remove_device()	Lu Baolu
	The iopf_queue_remove_device() helper removes a device from the per-iommu iopf queue when PRI is disabled on the device. It responds to all outstanding iopf's with an IOMMU_PAGE_RESP_INVALID code and detaches the device from the queue. However, it fails to release the group structure that represents a group of iopf's awaiting for a response after responding to the hardware. This can cause a memory leak if iopf_queue_remove_device() is called with pending iopf's. Fix it by calling iopf_free_group() after the iopf group is responded. Fixes: 199112327135 ("iommu: Track iopf group instead of last fault") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250117055800.782462-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-25	Merge tag 'hyperv-next-signed-20250123' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-24	Merge tag 'for-linus-iommufd' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix struct iommu_hwpt_pgfault init and padding iommufd/fault: Use a separate spinlock to protect fault->deliver list iommufd/fault: Destroy response and mutex in iommufd_fault_destroy() iommufd: Keep OBJ/IOCTL lists in an alphabetical order iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index() iommu: iommufd: fix WARNING in iommufd_device_unbind iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core iommufd/selftest: Remove domain_alloc_paging()
2025-01-24	Merge tag 'iommu-updates-v6.14' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - PASID support for the blocked_domain ARM-SMMU Updates: - SMMUv2: - Implement per-client prefetcher configuration on Qualcomm SoCs - Support for the Adreno SMMU on Qualcomm's SDM670 SOC - SMMUv3: - Pretty-printing of event records - Drop the ->domain_alloc_paging implementation in favour of domain_alloc_paging_flags(flags==0) - IO-PGTable: - Generalisation of the page-table walker to enable external walkers (e.g. for debugging unexpected page-faults from the GPU) - Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages - Misc: - Clean-up device probing and replace the crufty probe-deferral hack with a more robust implementation of arm_smmu_get_by_fwnode() - Device-tree binding updates for a bunch of Qualcomm platforms Intel VT-d Updates: - Remove domain_alloc_paging() - Remove capability audit code - Draining PRQ in sva unbind path when FPD bit set - Link cache tags of same iommu unit together AMD-Vi Updates: - Use CMPXCHG128 to update DTE - Cleanups of the domain_alloc_paging() path RiscV IOMMU: - Platform MSI support - Shutdown support Rockchip IOMMU: - Add DT bindings for Rockchip RK3576 More smaller fixes and cleanups" * tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits) iommu: Use str_enable_disable-like helpers iommu/amd: Fully decode all combinations of alloc_paging_flags iommu/amd: Move the nid to pdom_setup_pgtable() iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode iommu/amd: Remove type argument from do_iommu_domain_alloc() and related iommu/amd: Remove dev == NULL checks iommu/amd: Remove domain_alloc() iommu/amd: Remove unused amd_iommu_domain_update() iommu/riscv: Fixup compile warning iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h iommu/arm-smmu-v3: Use str_read_write helper w/ logs iommu/io-pgtable-arm: Add way to debug pgtable walk iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys iommu/io-pgtable-arm: Make pgtable walker more generic iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500 iommu/arm-smmu: Introduce ACTLR custom prefetcher settings iommu/arm-smmu: Add support for PRR bit setup iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer iommu/arm-smmu: Re-enable context caching in smmu reset operation iommu/vt-d: Link cache tags of same iommu unit together ...
2025-01-21	Merge tag 'irq-core-2025-01-21' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: - Consolidate the machine_kexec_mask_interrupts() by providing a generic implementation and replacing the copy & pasta orgy in the relevant architectures. - Prevent unconditional operations on interrupt chips during kexec shutdown, which can trigger warnings in certain cases when the underlying interrupt has been shut down before. - Make the enforcement of interrupt handling in interrupt context unconditionally available, so that it actually works for non x86 related interrupt chips. The earlier enablement for ARM GIC chips set the required chip flag, but did not notice that the check was hidden behind a config switch which is not selected by ARM[64]. - Decrapify the handling of deferred interrupt affinity setting. Some interrupt chips require that affinity changes are made from the context of handling an interrupt to avoid certain race conditions. For x86 this was the default, but with interrupt remapping this requirement was lifted and a flag was introduced which tells the core code that affinity changes can be done in any context. Unrestricted affinity changes are the default for the majority of interrupt chips. RISCV has the requirement to add the deferred mode to one of it's interrupt controllers, but with the original implementation this would require to add the any context flag to all other RISC-V interrupt chips. That's backwards, so reverse the logic and require that chips, which need the deferred mode have to be marked accordingly. That avoids chasing the 'sane' chips and marking them. - Add multi-node support to the Loongarch AVEC interrupt controller driver. - The usual tiny cleanups, fixes and improvements all over the place. * tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set() genirq/timings: Add kernel-doc for a function parameter genirq: Remove IRQ_MOVE_PCNTXT and related code x86/apic: Convert to IRQCHIP_MOVE_DEFERRED genirq: Provide IRQCHIP_MOVE_DEFERRED hexagon: Remove GENERIC_PENDING_IRQ leftover ARC: Remove GENERIC_PENDING_IRQ genirq: Remove handle_enforce_irqctx() wrapper genirq: Make handle_enforce_irqctx() unconditionally available irqchip/loongarch-avec: Add multi-nodes topology support irqchip/ts4800: Replace seq_printf() by seq_puts() irqchip/ti-sci-inta : Add module build support irqchip/ti-sci-intr: Add module build support irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown kexec: Consolidate machine_kexec_mask_interrupts() implementation genirq: Reuse irq_thread_fn() for forced thread case genirq: Move irq_thread_fn() further up in the code
2025-01-21	iommufd: Fix struct iommu_hwpt_pgfault init and padding	Nicolin Chen
	The iommu_hwpt_pgfault is used to report IO page fault data to userspace, but iommufd_fault_fops_read was never zeroing its padding. This leaks the content of the kernel stack memory to userspace. Also, the iommufd uAPI requires explicit padding and use of __aligned_u64 to ensure ABI compatibility's with 32 bit. pahole result, before: struct iommu_hwpt_pgfault { __u32 flags; /* 0 4 / __u32 dev_id; / 4 4 / __u32 pasid; / 8 4 / __u32 grpid; / 12 4 / __u32 perm; / 16 4 / / XXX 4 bytes hole, try to pack / __u64 addr; / 24 8 / __u32 length; / 32 4 / __u32 cookie; / 36 4 / / size: 40, cachelines: 1, members: 8 / / sum members: 36, holes: 1, sum holes: 4 / / last cacheline: 40 bytes / }; pahole result, after: struct iommu_hwpt_pgfault { __u32 flags; / 0 4 / __u32 dev_id; / 4 4 / __u32 pasid; / 8 4 / __u32 grpid; / 12 4 / __u32 perm; / 16 4 / __u32 __reserved; / 20 4 / __u64 addr __attribute__((__aligned__(8))); / 24 8 / __u32 length; / 32 4 / __u32 cookie; / 36 4 / / size: 40, cachelines: 1, members: 9 / / forced alignments: 1 / / last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); Fixes: c714f15860fc ("iommufd: Add fault and response message definitions") Link: https://patch.msgid.link/r/20250120195051.2450-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-20	iommufd/fault: Use a separate spinlock to protect fault->deliver list	Nicolin Chen
	The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-17	Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', ↵	Joerg Roedel
	'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next
2025-01-17	iommu: Use str_enable_disable-like helpers	Krzysztof Kozlowski
	Replace ternary (condition ? "enable" : "disable") syntax with helpers from string_choices.h because: 1. Simple function call with one argument is easier to read. Ternary operator has three arguments and with wrapping might lead to quite long code. 2. Is slightly shorter thus also easier to read. 3. It brings uniformity in the text - same string. 4. Allows deduping by the linker, which results in a smaller binary file. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250114192642.912331-1-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/amd: Fully decode all combinations of alloc_paging_flags	Jason Gunthorpe
	Currently AMD does not support IOMMU_HWPT_ALLOC_PASID \| IOMMU_HWPT_ALLOC_DIRTY_TRACKING It should be rejected. Instead it creates a V1 domain without dirty tracking support. Use a switch to fully decode the flags. Fixes: ce2cd175469f ("iommu/amd: Enhance amd_iommu_domain_alloc_user()") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/amd: Move the nid to pdom_setup_pgtable()	Jason Gunthorpe
	The only thing that uses the nid is the io_pgtable code, and it should be set before calling alloc_io_pgtable_ops() to ensure that the top levels are allocated on the correct nid. Since dev is never NULL now we can just do this trivially and remove the other uses of nid. SVA and identity code paths never use it since they don't use io_pgtable. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode	Jason Gunthorpe
	Currently it uses enum io_pgtable_fmt which is from the io pagetable code and most of the enum values are invalid. protection_domain_mode is internal the driver and has the only two valid values. Fix some signatures and variables to use the right type as well. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/amd: Remove type argument from do_iommu_domain_alloc() and related	Jason Gunthorpe
	do_iommu_domain_alloc() is only called from amd_iommu_domain_alloc_paging_flags() so type is always IOMMU_DOMAIN_UNMANAGED. Remove type and all the dead conditionals checking it. IOMMU_DOMAIN_IDENTITY checks are similarly obsolete as the conversion to the global static identity domain removed those call paths. The caller of protection_domain_alloc() should set the type, fix the miss in the SVA code. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/amd: Remove dev == NULL checks	Jason Gunthorpe
	This is no longer possible, amd_iommu_domain_alloc_paging_flags() is never called with dev = NULL from the core code. Similarly get_amd_iommu_from_dev() can never be NULL either. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/amd: Remove domain_alloc()	Jason Gunthorpe
	IOMMU drivers should not be sensitive to the domain type, a paging domain should be created based only on the flags passed in, the same for all callers. AMD was using the domain_alloc() path to force VFIO into a v1 domain type, because v1 gives higher performance. However now that IOMMU_HWPT_ALLOC_PASID is present, and a NULL device is not possible, domain_alloc_paging_flags() will do the right thing for VFIO. When invoked from VFIO flags will be 0 and the amd_iommu_pgtable type of domain will be selected. This is v1 by default unless the kernel command line has overridden it to v2. If the admin is forcing v2 assume they know what they are doing so force it everywhere, including for VFIO. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/amd: Remove unused amd_iommu_domain_update()	Alejandro Jimenez
	All the callers have been removed by the below commit, remove the implementation and prototypes. Fixes: 322d889ae7d3 ("iommu/amd: Remove amd_iommu_domain_update() from page table freeing") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17	iommu/riscv: Fixup compile warning	Guo Ren
	When __BITS_PER_LONG == 32, size_t is defined as unsigned int rather than unsigned long. Therefore, we should use size_t to avoid type-checking errors. Fixes: 488ffbf18171 ("iommu/riscv: Paging domain support") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Tomasz Jeznach <tjeznach@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> Link: https://lore.kernel.org/r/20250103024616.3359159-1-guoren@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-16	iommufd/fault: Destroy response and mutex in iommufd_fault_destroy()	Nicolin Chen
	Both were missing in the initial patch. Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/bc8bb13e215af27e62ee51bdba3648dd4ed2dce3.1736923732.git.nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-15	x86/apic: Convert to IRQCHIP_MOVE_DEFERRED	Thomas Gleixner
	Instead of marking individual interrupts as safe to be migrated in arbitrary contexts, mark the interrupt chips, which require the interrupt to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED flag. This makes more sense because this is a per interrupt chip property and not restricted to individual interrupts. That flips the logic from the historical opt-out to a opt-in model. This is simpler to handle for other architectures, which default to unrestricted affinity setting. It also allows to cleanup the redundant core logic significantly. All interrupt chips, which belong to a top-level domain sitting directly on top of the x86 vector domain are marked accordingly, unless the related setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
2025-01-14	iommufd: Keep OBJ/IOCTL lists in an alphabetical order	Nicolin Chen
	Reorder the existing OBJ/IOCTL lists. Also run clang-format for the same coding style at line wrappings. No functional change. Link: https://patch.msgid.link/r/c5e6d6e0a0bb7abc92ad26937fde19c9426bee96.1736237481.git.nicolinc@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-14	iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index()	Qasim Ijaz
	Resolve a UBSAN shift-out-of-bounds issue in iova_bitmap_offset_to_index() where shifting the constant "1" (of type int) by bitmap->mapped.pgshift (an unsigned long value) could result in undefined behavior. The constant "1" defaults to a 32-bit "int", and when "pgshift" exceeds 31 (e.g., pgshift = 63) the shift operation overflows, as the result cannot be represented in a 32-bit type. To resolve this, the constant is updated to "1UL", promoting it to an unsigned long type to match the operand's type. Fixes: 58ccf0190d19 ("vfio: Add an IOVA bitmap support") Link: https://patch.msgid.link/r/20250113223820.10713-1-qasdev00@gmail.com Reported-by: syzbot <syzbot+85992ace37d5b7b51635@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=85992ace37d5b7b51635 Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-14	iommu: iommufd: fix WARNING in iommufd_device_unbind	Suraj Sonawane
	Fix an issue detected by syzbot: WARNING in iommufd_device_unbind iommufd: Time out waiting for iommufd object to become free Resolve a warning in iommufd_device_unbind caused by a timeout while waiting for the shortterm_users reference count to reach zero. The existing 10-second timeout is insufficient in some scenarios, resulting in failures the above warning. Increase the timeout in iommufd_object_dec_wait_shortterm from 10 seconds to 60 seconds to allow sufficient time for the reference count to drop to zero. This change prevents premature timeouts and reduces the likelihood of warnings during iommufd_device_unbind. Fixes: 6f9c4d8c468c ("iommufd: Do not UAF during iommufd_put_object()") Link: https://patch.msgid.link/r/20241123195900.3176-1-surajsonawane0215@gmail.com Reported-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c92878e123785b1fa2db Tested-by: syzbot+c92878e123785b1fa2db@syzkaller.appspotmail.com Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-10	iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h	Will Deacon
	Commit f2c77f6e41e6 ("iommu/arm-smmu-v3: Use str_read_write helper w/ logs") introduced a call to str_read_write() in the SMMUv3 driver but without an explicit #include of <linux/string_choices.h>. This breaks the build for custom configurations where CONFIG_ACPI=n: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1909:4: error: call to undeclared function 'str_read_write'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1909 \| str_read_write(evt->read), \| ^ Add the missing #include. Link: https://lore.kernel.org/r/d07e82a4-2880-4ae3-961b-471bfa7ac6c4@samsung.com Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: f2c77f6e41e6 ("iommu/arm-smmu-v3: Use str_read_write helper w/ logs") Signed-off-by: Will Deacon <will@kernel.org>
2025-01-10	iommu/hyper-v: Don't assume cpu_possible_mask is dense	Michael Kelley
	Current code gets the APIC IDs for CPUs numbered 255 and lower. This code assumes cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask contains holes, num_possible_cpus() is less than nr_cpu_ids, so some CPUs might get skipped. Furthermore, getting the APIC ID of a CPU that isn't in cpu_possible_mask is invalid. However, the configurations that Hyper-V provides to guest VMs on x86 hardware, in combination with how x86 code assigns Linux CPU numbers, does always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to determine the range to scan based on nr_cpu_ids, and skip any CPUs that are not in the cpu_possible_mask. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241003035333.49261-4-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20241003035333.49261-4-mhklinux@outlook.com>
2025-01-08	iommu/arm-smmu-v3: Use str_read_write helper w/ logs	Pranjal Shrivastava
	Adopt the `str_read_write` helper in event logging as suggested by the coccinelle tool. Signed-off-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/all/20250107130053.GC6991@willie-the-truck/ Link: https://lore.kernel.org/r/20250107165100.1093357-1-praan@google.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/io-pgtable-arm: Add way to debug pgtable walk	Rob Clark
	Add an io-pgtable method to walk the pgtable returning the raw PTEs that would be traversed for a given iova access. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241210165127.600817-4-robdclark@gmail.com [will: Removed 'arm_lpae_io_pgtable_walk_data::level' per Mostafa] Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys	Rob Clark
	Re-use the generic pgtable walk path. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241210165127.600817-3-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/io-pgtable-arm: Make pgtable walker more generic	Rob Clark
	We can re-use this basic pgtable walk logic in a few places. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241210165127.600817-2-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500	Bibek Kumar Patro
	Add ACTLR data table for qcom_smmu_500 including corresponding data entry and set prefetch value by way of a list of compatible strings. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-6-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/arm-smmu: Introduce ACTLR custom prefetcher settings	Bibek Kumar Patro
	Currently in Qualcomm SoCs the default prefetch is set to 1 which allows the TLB to fetch just the next page table. MMU-500 features ACTLR register which is implementation defined and is used for Qualcomm SoCs to have a custom prefetch setting enabling TLB to prefetch the next set of page tables accordingly allowing for faster translations. ACTLR value is unique for each SMR (Stream matching register) and stored in a pre-populated table. This value is set to the register during context bank initialisation. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-5-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/arm-smmu: Add support for PRR bit setup	Bibek Kumar Patro
	Add an adreno-smmu-priv interface for drm/msm to call into arm-smmu-qcom and initiate the "Partially Resident Region" (PRR) bit setup or reset sequence as per request. This will be used by GPU to setup the PRR bit and related configuration registers through adreno-smmu private interface instead of directly poking the smmu hardware. Suggested-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-4-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer	Bibek Kumar Patro
	qcom_smmu_match_data is static and constant so refactor qcom_smmu to store single pointer to qcom_smmu_match_data instead of replicating multiple child members of the same and handle the further dereferences in the places that want them. Suggested-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-3-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/arm-smmu: Re-enable context caching in smmu reset operation	Bibek Kumar Patro
	Default MMU-500 reset operation disables context caching in prefetch buffer. It is however expected for context banks using the ACTLR register to retain their prefetch value during reset and runtime suspend. Add config 'ARM_SMMU_MMU_500_CPRE_ERRATA' to gate this errata workaround in default MMU-500 reset operation which defaults to 'Y' and provide option to disable workaround for context caching in prefetch buffer as and when needed. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-2-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2025-01-07	iommu/vt-d: Link cache tags of same iommu unit together	Zhenzhong Duan
	Cache tag invalidation requests for a domain are accumulated until a different iommu unit is found when traversing the cache_tags linked list. But cache tags of same iommu unit can be distributed in the linked list, this make batched flush less efficient. E.g., one device backed by iommu0 is attached to a domain in between two devices attaching backed by iommu1. Group cache tags together for same iommu unit in cache_tag_assign() to maximize the performance of batched flush. Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/r/20241219054358.8654-1-zhenzhong.duan@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07	iommu/vt-d: Draining PRQ in sva unbind path when FPD bit set	Lu Baolu
	When a device uses a PASID for SVA (Shared Virtual Address), it's possible that the PASID entry is marked as non-present and FPD bit set before the device flushes all ongoing DMA requests and removes the SVA domain. This can occur when an exception happens and the process terminates before the device driver stops DMA and calls the iommu driver to unbind the PASID. There's no need to drain the PRQ in the mm release path. Instead, the PRQ will be drained in the SVA unbind path. But in such case, intel_pasid_tear_down_entry() only checks the presence of the pasid entry and returns directly. Add the code to clear the FPD bit and drain the PRQ. Fixes: c43e1ccdebf2 ("iommu/vt-d: Drain PRQs when domain removed from RID") Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241217024240.139615-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07	iommu/vt-d: Remove iommu cap audit	Lu Baolu
	The capability audit code was introduced by commit <ad3d19029979> "iommu/vt-d: Audit IOMMU Capabilities and add helper functions", aiming to verify the consistency of capabilities across all IOMMUs for supported features. Nowadays, all the kAPIs of the iommu subsystem have evolved to be device oriented, in preparation for supporting heterogeneous IOMMU architectures. There is no longer a need to require capability consistence among IOMMUs for any feature. Remove the iommu cap audit code to make the driver align with the design in the iommu core. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20241216071828.22962-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07	iommu/vt-d: Remove domain_alloc_paging()	Jason Gunthorpe
	This is duplicated by intel_iommu_domain_alloc_paging_flags(), just remove it. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/0-v1-b101d00c5ee5+17645-vtd_paging_flags_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-07	iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE	Kees Bakker
	There is a WARN_ON_ONCE to catch an unlikely situation when domain_remove_dev_pasid can't find the `pasid`. In case it nevertheless happens we must avoid using a NULL pointer. Signed-off-by: Kees Bakker <kees@ijzerbout.nl> Link: https://lore.kernel.org/r/20241218201048.E544818E57E@bout3.ijzerbout.nl Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06	iommu/amd: remove return value of amd_iommu_detect	Gao Shiyuan
	The return value of amd_iommu_detect is not used, so remove it and is consistent with other iommu detect functions. Signed-off-by: Gao Shiyuan <gaoshiyuan@baidu.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250103165808.80939-1-gaoshiyuan@baidu.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06	iommu/msm: Use helper function devm_clk_get_prepared()	Zhang Heng
	Since commit 7ef9651e9792 ("clk: Provide new devm_clk helpers for prepared and enabled clocks"), devm_clk_get() and clk_prepare() can now be replaced by devm_clk_get_prepared() when driver prepares the clocks for the whole lifetime of the device. Moreover, it is no longer necessary to unprepare the clocks explicitly. Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20250103113059.463033-1-zhangheng@kylinos.cn Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06	iommu/riscv: Add shutdown function for iommu driver	Xu Lu
	This commit supplies shutdown callback for iommu driver. The shutdown callback resets necessary registers so that newly booted kernel can pass riscv_iommu_init_check() after kexec. Also, the shutdown callback resets iommu mode to bare instead of off so that new kernel can still use PCIE devices even when CONFIG_RISCV_IOMMU is not enabled. Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Link: https://lore.kernel.org/r/20250103093220.38106-3-luxu.kernel@bytedance.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-06	iommu/riscv: Empty iommu queue before enabling it	Xu Lu
	Changing cqen/fqen/pqen from 0 to 1 sets the cqh/fqt/pqt registers to 0. But the cqt/fqh/pqh registers are left unmodified. This commit resets cqt/fqh/pqh registers to ensure corresponding queues are empty before being enabled during initialization. Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Link: https://lore.kernel.org/r/20250103093220.38106-2-luxu.kernel@bytedance.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-19	iommu/tegra241-cmdqv: Read SMMU IDR1.CMDQS instead of hardcoding	Nicolin Chen
	The hardware limitation "max=19" actually comes from SMMU Command Queue. So, it'd be more natural for tegra241-cmdqv driver to read it out rather than hardcoding it itself. This is not an issue yet for a kernel on a baremetal system, but a guest kernel setting the queue base/size in form of IPA/gPA might result in a noncontiguous queue in the physical address space, if underlying physical pages backing up the guest RAM aren't contiguous entirely: e.g. 2MB-page backed guest RAM cannot guarantee a contiguous queue if it is 8MB (capped to VCMDQ_LOG2SIZE_MAX=19). This might lead to command errors when HW does linear-read from a noncontiguous queue memory. Adding this extra IDR1.CMDQS cap (in the guest kernel) allows VMM to set SMMU's IDR1.CMDQS=17 for the case mentioned above, so a guest-level queue will be capped to maximum 2MB, ensuring a contiguous queue memory. Fixes: a3799717b881 ("iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift") Reported-by: Ian Kalinowski <ikalinowski@nvidia.com> Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20241219051421.1850267-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-12-19	iommu/io-pgtable-arm: Fix cfg reading in arm_lpae_concat_mandatory()	Mostafa Saleh
	The newly introduced arm_lpae_concat_mandatory() function reads the ias/oas fields from the 'io_pgtable_cfg' copy embedded inside the 'arm_lpae_io_pgtable' structure. However, this copy is not set until later in alloc_io_pgtable_ops() after the alloc() function has been called. Use the address sizes passed in the 'io_pgtable_cfg' structure when deciding whether or not to concatenate the PGD. Fixes: 4dcac8407fe1 ("iommu/io-pgtable-arm: Fix stage-2 concatenation with 16K") Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241215200412.561400-1-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-12-18	iommu: Remove the remove_dev_pasid op	Yi Liu
	The iommu drivers that supports PASID have supported attaching pasid to the blocked_domain, hence remove the remove_dev_pasid op from the iommu_ops. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-8-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18	iommu/amd: Make the blocked domain support PASID	Yi Liu
	The blocked domain can be extended to park PASID of a device to be the DMA blocking state. By this the remove_dev_pasid() op is dropped. Remove PASID from old domain and device GCR3 table. No need to attach PASID to the blocked domain as clearing PASID from GCR3 table will make sure all DMAs for that PASID are blocked. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-7-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18	iommu/vt-d: Make the blocked domain support PASID	Yi Liu
	The blocked domain can be extended to park PASID of a device to be the DMA blocking state. By this the remove_dev_pasid() op is dropped. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-6-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18	iommu/arm-smmu-v3: Make the blocked domain support PASID	Jason Gunthorpe
	The blocked domain is used to park RID to be blocking DMA state. This can be extended to PASID as well. By this, the remove_dev_pasid() op of ARM SMMUv3 can be dropped. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-5-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-12-18	iommu: Detaching pasid by attaching to the blocked_domain	Yi Liu
	The iommu drivers are on the way to detach pasid by attaching to the blocked domain. However, this cannot be done in one shot. During the transition, iommu core would select between the remove_dev_pasid op and the blocked domain. Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20241204122928.11987-4-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>