summaryrefslogtreecommitdiff
path: root/drivers/iommu/arm
AgeCommit message (Collapse)Author
2022-05-06iommu/arm-smmu-v3-sva: Fix mm use-after-freeJean-Philippe Brucker
We currently call arm64_mm_context_put() without holding a reference to the mm, which can result in use-after-free. Call mmgrab()/mmdrop() to ensure the mm only gets freed after we unpinned the ASID. Fixes: 32784a9562fb ("iommu/arm-smmu-v3: Implement iommu_sva_bind/unbind()") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Link: https://lore.kernel.org/r/20220426130444.300556-1-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06iommu/arm-smmu-v3: check return value after calling platform_get_resource()Yang Yingliang
It will cause null-ptr-deref if platform_get_resource() returns NULL, we need check the return value. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220425114525.2651143-1-yangyingliang@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-05-06iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe()Yang Yingliang
It will cause null-ptr-deref when using 'res', if platform_get_resource() returns NULL, so move using 'res' after devm_ioremap_resource() that will check it to avoid null-ptr-deref. And use devm_platform_get_and_ioremap_resource() to simplify code. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220425114136.2649310-1-yangyingliang@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-22iommu: arm-smmu: disable large page mappings for Nvidia arm-smmuAshish Mhetre
Tegra194 and Tegra234 SoCs have the erratum that causes walk cache entries to not be invalidated correctly. The problem is that the walk cache index generated for IOVA is not same across translation and invalidation requests. This is leading to page faults when PMD entry is released during unmap and populated with new PTE table during subsequent map request. Disabling large page mappings avoids the release of PMD entry and avoid translations seeing stale PMD entry in walk cache. Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and Tegra234 devices. This is recommended fix from Tegra hardware design team. Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Krishna Reddy <vdumpa@nvidia.com> Co-developed-by: Pritesh Raithatha <praithatha@nvidia.com> Signed-off-by: Pritesh Raithatha <praithatha@nvidia.com> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com> Link: https://lore.kernel.org/r/20220421081504.24678-1-amhetre@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-20iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range()Nicolin Chen
The arm_smmu_mm_invalidate_range function is designed to be called by mm core for Shared Virtual Addressing purpose between IOMMU and CPU MMU. However, the ways of two subsystems defining their "end" addresses are slightly different. IOMMU defines its "end" address using the last address of an address range, while mm core defines that using the following address of an address range: include/linux/mm_types.h: unsigned long vm_end; /* The first byte after our end address ... This mismatch resulted in an incorrect calculation for size so it failed to be page-size aligned. Further, it caused a dead loop at "while (iova < end)" check in __arm_smmu_tlb_inv_range function. This patch fixes the issue by doing the calculation correctly. Fixes: 2f7e8c553e98 ("iommu/arm-smmu-v3: Hook up ATC invalidation to mm ops") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20220419210158.21320-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-24Merge tag 'iommu-updates-v5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - IOMMU Core changes: - Removal of aux domain related code as it is basically dead and will be replaced by iommu-fd framework - Split of iommu_ops to carry domain-specific call-backs separatly - Cleanup to remove useless ops->capable implementations - Improve 32-bit free space estimate in iova allocator - Intel VT-d updates: - Various cleanups of the driver - Support for ATS of SoC-integrated devices listed in ACPI/SATC table - ARM SMMU updates: - Fix SMMUv3 soft lockup during continuous stream of events - Fix error path for Qualcomm SMMU probe() - Rework SMMU IRQ setup to prepare the ground for PMU support - Minor cleanups and refactoring - AMD IOMMU driver: - Some minor cleanups and error-handling fixes - Rockchip IOMMU driver: - Use standard driver registration - MSM IOMMU driver: - Minor cleanup and change to standard driver registration - Mediatek IOMMU driver: - Fixes for IOTLB flushing logic * tag 'iommu-updates-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (47 commits) iommu/amd: Improve amd_iommu_v2_exit() iommu/amd: Remove unused struct fault.devid iommu/amd: Clean up function declarations iommu/amd: Call memunmap in error path iommu/arm-smmu: Account for PMU interrupts iommu/vt-d: Enable ATS for the devices in SATC table iommu/vt-d: Remove unused function intel_svm_capable() iommu/vt-d: Add missing "__init" for rmrr_sanity_check() iommu/vt-d: Move intel_iommu_ops to header file iommu/vt-d: Fix indentation of goto labels iommu/vt-d: Remove unnecessary prototypes iommu/vt-d: Remove unnecessary includes iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO iommu/vt-d: Remove domain and devinfo mempool iommu/vt-d: Remove iova_cache_get/put() iommu/vt-d: Remove finding domain in dmar_insert_one_dev_info() iommu/vt-d: Remove intel_iommu::domains iommu/mediatek: Always tlb_flush_all when each PM resume iommu/mediatek: Add tlb_lock in tlb_flush_all iommu/mediatek: Remove the power status checking in tlb flush all ...
2022-03-08Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', ↵Joerg Roedel
'arm/smmu', 'x86/vt-d' and 'x86/amd' into next
2022-03-07iommu/arm-smmu: Account for PMU interruptsRobin Murphy
In preparation for SMMUv2 PMU support, rejig our IRQ setup code to account for PMU interrupts as additional resources. We can simplify the whole flow by only storing the context IRQs, since the global IRQs are devres-managed and we never refer to them beyond the initial request. CC: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/b2a40caaf1622eb35c555074a0d72f4f0513cff9.1645106346.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-28iommu: Split struct iommu_opsLu Baolu
Move the domain specific operations out of struct iommu_ops into a new structure that only has domain specific operations. This solves the problem of needing to know if the method vector for a given operation needs to be retrieved from the device or the domain. Logically the domain ops are the ones that make sense for external subsystems and endpoint drivers to use, while device ops, with the sole exception of domain_alloc, are IOMMU API internals. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220216025249.3459465-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-15iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exitFenghua Yu
PASIDs are process-wide. It was attempted to use refcounted PASIDs to free them when the last thread drops the refcount. This turned out to be complex and error prone. Given the fact that the PASID space is 20 bits, which allows up to 1M processes to have a PASID associated concurrently, PASID resource exhaustion is not a realistic concern. Therefore, it was decided to simplify the approach and stick with lazy on demand PASID allocation, but drop the eager free approach and make an allocated PASID's lifetime bound to the lifetime of the process. Get rid of the refcounting mechanisms and replace/rename the interfaces to reflect this new approach. [ bp: Massage commit message. ] Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20220207230254.3342514-6-fenghua.yu@intel.com
2022-02-08iommu/arm-smmu-v3: fix event handling soft lockupZhou Guanghui
During event processing, events are read from the event queue one by one until the queue is empty.If the master device continuously requests address access at the same time and the SMMU generates events, the cyclic processing of the event takes a long time and softlockup warnings may be reported. arm-smmu-v3 arm-smmu-v3.34.auto: event 0x0a received: arm-smmu-v3 arm-smmu-v3.34.auto: 0x00007f220000280a arm-smmu-v3 arm-smmu-v3.34.auto: 0x000010000000007e arm-smmu-v3 arm-smmu-v3.34.auto: 0x00000000034e8670 watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [irq/268-arm-smm:247] Call trace: _dev_info+0x7c/0xa0 arm_smmu_evtq_thread+0x1c0/0x230 irq_thread_fn+0x30/0x80 irq_thread+0x128/0x210 kthread+0x134/0x138 ret_from_fork+0x10/0x1c Kernel panic - not syncing: softlockup: hung tasks Fix this by calling cond_resched() after the event information is printed. Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Link: https://lore.kernel.org/r/20220119070754.26528-1-zhouguanghui1@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-08iommu/arm-smmu: Add missing pm_runtime_disable() in qcom_iommu_device_probeMiaoqian Lin
If the probe fails, we should use pm_runtime_disable() to balance pm_runtime_enable(). Add missing pm_runtime_disable() for error handling. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220105101619.29108-1-linmq006@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-08iommu/arm-smmu-v3: Simplify memory allocationChristophe JAILLET
Use devm_bitmap_zalloc() instead of hand writing it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/598bd905103dcbe5653a54bb0dfb5a8597728214.1644274051.git.christophe.jaillet@wanadoo.fr Signed-off-by: Will Deacon <will@kernel.org>
2022-02-08iommu/arm-smmu-v3: Avoid open coded arithmetic in memory allocationChristophe JAILLET
kmalloc_array()/kcalloc() should be used to avoid potential overflow when a multiplication is needed to compute the size of the requested memory. So turn a devm_kzalloc()+explicit size computation into an equivalent devm_kcalloc(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/3f7b9b202c6b6f5edc234ab7af5f208fbf8bc944.1644274051.git.christophe.jaillet@wanadoo.fr Signed-off-by: Will Deacon <will@kernel.org>
2022-01-13Merge tag 'irq-msi-2022-01-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI irq updates from Thomas Gleixner: "Rework of the MSI interrupt infrastructure. This is a treewide cleanup and consolidation of MSI interrupt handling in preparation for further changes in this area which are necessary to: - address existing shortcomings in the VFIO area - support the upcoming Interrupt Message Store functionality which decouples the message store from the PCI config/MMIO space" * tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits) genirq/msi: Populate sysfs entry only once PCI/MSI: Unbreak pci_irq_get_affinity() genirq/msi: Convert storage to xarray genirq/msi: Simplify sysfs handling genirq/msi: Add abuse prevention comment to msi header genirq/msi: Mop up old interfaces genirq/msi: Convert to new functions genirq/msi: Make interrupt allocation less convoluted platform-msi: Simplify platform device MSI code platform-msi: Let core code handle MSI descriptors bus: fsl-mc-msi: Simplify MSI descriptor handling soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs() soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation NTB/msi: Convert to msi_on_each_desc() PCI: hv: Rework MSI handling powerpc/mpic_u3msi: Use msi_for_each-desc() powerpc/fsl_msi: Use msi_for_each_desc() powerpc/pasemi/msi: Convert to msi_on_each_dec() powerpc/cell/axon_msi: Convert to msi_on_each_desc() powerpc/4xx/hsta: Rework MSI handling ...
2021-12-16iommu/arm-smmu-v3: Use msi_get_virq()Thomas Gleixner
Let the core code fiddle with the MSI descriptor retrieval. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211210221815.089008198@linutronix.de
2021-12-16platform-msi: Use msi_desc::msi_indexThomas Gleixner
Use the common msi_index member and get rid of the pointless wrapper struct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211210221814.413638645@linutronix.de
2021-12-16device: Move MSI related data into a structThomas Gleixner
The only unconditional part of MSI data in struct device is the irqdomain pointer. Everything else can be allocated on demand. Create a data structure and move the irqdomain pointer into it. The other MSI specific parts are going to be removed from struct device in later steps. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211210221813.617178827@linutronix.de
2021-12-15Revert "iommu/arm-smmu-v3: Decrease the queue size of evtq and priq"Zhou Wang
The commit f115f3c0d5d8 ("iommu/arm-smmu-v3: Decrease the queue size of evtq and priq") decreases evtq and priq, which may lead evtq/priq to be full with fault events, e.g HiSilicon ZIP/SEC/HPRE have maximum 1024 queues in one device, every queue could be binded with one process and trigger a fault event. So let's revert f115f3c0d5d8. In fact, if an implementation of SMMU really does not need so long evtq and priq, value of IDR1_EVTQS and IDR1_PRIQS can be set to proper ones. Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Acked-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/1638858768-9971-1-git-send-email-wangzhou1@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14iommu/arm-smmu-v3: Constify arm_smmu_mmu_notifier_opsRikard Falkeborn
The only usage of arm_smmu_mmu_notifier_ops is to assign its address to the ops field in the mmu_notifier struct, which is a pointer to const struct mmu_notifier_ops. Make it const to allow the compiler to put it in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20211204223301.100649-1-rikard.falkeborn@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14iommu: arm-smmu-impl: Add SM8450 qcom iommu implementationVinod Koul
Add SM8450 qcom iommu implementation to the table of qcom_smmu_impl_of_match table which brings in iommu support for SM8450 SoC Signed-off-by: Vinod Koul <vkoul@kernel.org> Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20211201073943.3969549-3-vkoul@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14iommu/arm-smmu-qcom: Fix TTBR0 readRob Clark
It is a 64b register, lets not lose the upper bits. Fixes: ab5df7b953d8 ("iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info") Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20211108171724.470973-1-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-31Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/smmu', ↵Joerg Roedel
'arm/tegra', 'iommu/fixes', 'x86/amd', 'x86/vt-d' and 'core' into next
2021-10-08iommu/arm-smmu-qcom: Request direct mapping for modem deviceSibi Sankar
The SID configuration requirement for Modem on SC7280 is similar to the ones found on SC7180/SDM845 SoCs. So, add the SC7280 modem compatible to the list to defer the programming of the modem SIDs to the kernel. Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Reviewed-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/1631886935-14691-5-git-send-email-sibis@codeaurora.org Signed-off-by: Will Deacon <will@kernel.org>
2021-10-07qcom_scm: hide Kconfig symbolArnd Bergmann
Now that SCM can be a loadable module, we have to add another dependency to avoid link failures when ipa or adreno-gpu are built-in: aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe': ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available' ld.lld: error: undefined symbol: qcom_scm_is_available >>> referenced by adreno_gpu.c >>> gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in archive drivers/built-in.a This can happen when CONFIG_ARCH_QCOM is disabled and we don't select QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd use a similar dependency here to what we have for QCOM_RPROC_COMMON, but that causes dependency loops from other things selecting QCOM_SCM. This appears to be an endless problem, so try something different this time: - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on' but that is simply selected by all of its users - All the stubs in include/linux/qcom_scm.h can go away - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to allow compile-testing QCOM_SCM on all architectures. - To avoid a circular dependency chain involving RESET_CONTROLLER and PINCTRL_SUNXI, drop the 'select RESET_CONTROLLER' statement. According to my testing this still builds fine, and the QCOM platform selects this symbol already. Acked-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Alex Elder <elder@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-04iommu: arm-smmu-qcom: Add compatible for QCM2290Loic Poulain
Add compatible for QCM2290 SMMU to use the Qualcomm Technologies, Inc. specific implementation. Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/1633096832-7762-1-git-send-email-loic.poulain@linaro.org [will: Sort by alphabetical order, add commit message] Signed-off-by: Will Deacon <will@kernel.org>
2021-10-04iommu/arm-smmu-qcom: Add SM6350 SMMU compatibleKonrad Dybcio
Add compatible for SM6350 SMMU to use the Qualcomm Technologies, Inc. specific implementation. Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org> Link: https://lore.kernel.org/r/20210820202906.229292-2-konrad.dybcio@somainline.org Signed-off-by: Will Deacon <will@kernel.org>
2021-10-04iommu/arm-smmu-v3: Properly handle the return value of arm_smmu_cmdq_build_cmd()Zhen Lei
1. Build command CMD_SYNC cannot fail. So the return value can be ignored. 2. The arm_smmu_cmdq_build_cmd() almost never fails, the addition of "unlikely()" can optimize the instruction pipeline. 3. Check the return value in arm_smmu_cmdq_batch_add(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210818080452.2079-2-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-10-04iommu/arm-smmu-v3: Stop pre-zeroing batch commands in arm_smmu_atc_inv_master()Zhen Lei
Pre-zeroing the batched commands structure is inefficient, as individual commands are zeroed later in arm_smmu_cmdq_build_cmd(). Therefore, only the member 'num' needs to be initialized to 0. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20210817113411.1962-1-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-20Merge branches 'apple/dart', 'arm/smmu', 'iommu/fixes', 'x86/amd', ↵Joerg Roedel
'x86/vt-d' and 'core' into next
2021-08-20iommu/arm-smmu: Fix missing unlock on error in arm_smmu_device_group()Yang Yingliang
Add the missing unlock before return from function arm_smmu_device_group() in the error handling case. Fixes: b1a1347912a7 ("iommu/arm-smmu: Fix race condition during iommu_group creation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210820074949.1946576-1-yangyingliang@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/arm-smmu: Prepare for multiple DMA domain typesRobin Murphy
In preparation for the strict vs. non-strict decision for DMA domains to be expressed in the domain type, make sure we expose our flush queue awareness by accepting the new domain type. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/8f217ef285bd0bb9456c27ef622d2efdbbca1ad8.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/io-pgtable: Remove non-strict quirkRobin Murphy
IO_PGTABLE_QUIRK_NON_STRICT was never a very comfortable fit, since it's not a quirk of the pagetable format itself. Now that we have a more appropriate way to convey non-strict unmaps, though, this last of the non-quirk quirks can also go, and with the flush queue code also now enforcing its own ordering we can have a lovely cleanup all round. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/155b5c621cd8936472e273a8b07a182f62c6c20d.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/arm-smmu: Drop IOVA cookie managementRobin Murphy
The core code bakes its own cookies now. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/7ae3680dad9735cc69c3618866666896bd11e031.1628682048.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-13iommu/arm-smmu-v3: Stop pre-zeroing batch commandsJohn Garry
Pre-zeroing the batched commands structure is inefficient, as individual commands are zeroed later in arm_smmu_cmdq_build_cmd(). The size is quite large and commonly most commands won't even be used: struct arm_smmu_cmdq_batch cmds = {}; 345c: 52800001 mov w1, #0x0 // #0 3460: d2808102 mov x2, #0x408 // #1032 3464: 910143a0 add x0, x29, #0x50 3468: 94000000 bl 0 <memset> Stop pre-zeroing the complete structure and only zero the num member. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1628696966-88386-1-git-send-email-john.garry@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-13iommu/arm-smmu-v3: Extract reusable function __arm_smmu_cmdq_skip_err()Zhen Lei
When SMMU_GERROR.CMDQP_ERR is different to SMMU_GERRORN.CMDQP_ERR, it indicates that one or more errors have been encountered on a command queue control page interface. We need to traverse all ECMDQs in that control page to find all errors. For each ECMDQ error handling, it is much the same as the CMDQ error handling. This common processing part is extracted as a new function __arm_smmu_cmdq_skip_err(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210811114852.2429-5-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-13iommu/arm-smmu-v3: Add and use static helper function arm_smmu_get_cmdq()Zhen Lei
One SMMU has only one normal CMDQ. Therefore, this CMDQ is used regardless of the core on which the command is inserted. It can be referenced directly through "smmu->cmdq". However, one SMMU has multiple ECMDQs, and the ECMDQ used by the core on which the command insertion is executed may be different. So the helper function arm_smmu_get_cmdq() is added, which returns the CMDQ/ECMDQ that the current core should use. Currently, the code that supports ECMDQ is not added. just simply returns "&smmu->cmdq". Many subfunctions of arm_smmu_cmdq_issue_cmdlist() use "&smmu->cmdq" or "&smmu->cmdq.q" directly. To support ECMDQ, they need to call the newly added function arm_smmu_get_cmdq() instead. Note that normal CMDQ is still required until ECMDQ is available. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210811114852.2429-4-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-13iommu/arm-smmu-v3: Add and use static helper function ↵Zhen Lei
arm_smmu_cmdq_issue_cmd_with_sync() The obvious key to the performance optimization of commit 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is to allow multiple cores to insert commands in parallel after a brief mutex contention. Obviously, inserting as many commands at a time as possible can reduce the number of times the mutex contention participates, thereby improving the overall performance. At least it reduces the number of calls to function arm_smmu_cmdq_issue_cmdlist(). Therefore, function arm_smmu_cmdq_issue_cmd_with_sync() is added to insert the 'cmd+sync' commands at a time. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210811114852.2429-3-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-13iommu/arm-smmu-v3: Use command queue batching helpers to improve performanceZhen Lei
The obvious key to the performance optimization of commit 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention during command-queue insertion") is to allow multiple cores to insert commands in parallel after a brief mutex contention. Obviously, inserting as many commands at a time as possible can reduce the number of times the mutex contention participates, thereby improving the overall performance. At least it reduces the number of calls to function arm_smmu_cmdq_issue_cmdlist(). Therefore, use command queue batching helpers to insert multiple commands at a time. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210811114852.2429-2-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-13iommu/arm-smmu: Optimize ->tlb_flush_walk() for qcom implementationSai Prakash Ranjan
Currently for iommu_unmap() of large scatter-gather list with page size elements, the majority of time is spent in flushing of partial walks in __arm_lpae_unmap() which is a VA based TLB invalidation invalidating page-by-page on iommus like arm-smmu-v2 (TLBIVA). For example: to unmap a 32MB scatter-gather list with page size elements (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K) resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge overhead. On qcom implementation, there are several performance improvements for TLB cache invalidations in HW like wait-for-safe (for realtime clients such as camera and display) and few others to allow for cache lookups/updates when TLBI is in progress for the same context bank. So the cost of over-invalidation is less compared to the unmap latency on several usecases like camera which deals with large buffers. So, ASID based TLB invalidations (TLBIASID) can be used to invalidate the entire context for partial walk flush thereby improving the unmap latency. For this example of 32MB scatter-gather list unmap, this change results in just 16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192 TLBIVAs thereby increasing the performance of unmaps drastically. Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap: (average over 10 iterations) Before this optimization: size iommu_map_sg iommu_unmap 4K 2.067 us 1.854 us 64K 9.598 us 8.802 us 1M 148.890 us 130.718 us 2M 305.864 us 67.291 us 12M 1793.604 us 390.838 us 16M 2386.848 us 518.187 us 24M 3563.296 us 775.989 us 32M 4747.171 us 1033.364 us After this optimization: size iommu_map_sg iommu_unmap 4K 1.723 us 1.765 us 64K 9.880 us 8.869 us 1M 155.364 us 135.223 us 2M 303.906 us 5.385 us 12M 1786.557 us 21.250 us 16M 2391.890 us 27.437 us 24M 3570.895 us 39.937 us 32M 4755.234 us 51.797 us Real world data also shows big difference in unmap performance as below: There were reports of camera frame drops because of high overhead in iommu unmap without this optimization because of frequent unmaps issued by camera of about 100MB/s taking more than 100ms thereby causing frame drops. Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Link: https://lore.kernel.org/r/20210811160426.10312-1-saiprakash.ranjan@codeaurora.org Signed-off-by: Will Deacon <will@kernel.org>
2021-08-10iommu/arm-smmu: Fix race condition during iommu_group creationKrishna Reddy
When two devices with same SID are getting probed concurrently through iommu_probe_device(), the iommu_group sometimes is getting allocated more than once as call to arm_smmu_device_group() is not protected for concurrency. Furthermore, it leads to each device holding a different iommu_group and domain pointer, separate IOVA space and only one of the devices' domain is used for translations from IOMMU. This causes accesses from other device to fault or see incorrect translations. Fix this by protecting iommu_group allocation from concurrency in arm_smmu_device_group(). Signed-off-by: Krishna Reddy <vdumpa@nvidia.com> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com> Link: https://lore.kernel.org/r/1628570641-9127-3-git-send-email-amhetre@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-10iommu/arm-smmu: Add clk_bulk_{prepare/unprepare} to system pm callbacksSai Prakash Ranjan
Some clocks for SMMU can have parent as XO such as gpu_cc_hub_cx_int_clk of GPU SMMU in QTI SC7280 SoC and in order to enter deep sleep states in such cases, we would need to drop the XO clock vote in unprepare call and this unprepare callback for XO is in RPMh (Resource Power Manager-Hardened) clock driver which controls RPMh managed clock resources for new QTI SoCs. Given we cannot have a sleeping calls such as clk_bulk_prepare() and clk_bulk_unprepare() in arm-smmu runtime pm callbacks since the iommu operations like map and unmap can be in atomic context and are in fast path, add this prepare and unprepare call to drop the XO vote only for system pm callbacks since it is not a fast path and we expect the system to enter deep sleep states with system pm as opposed to runtime pm. This is a similar sequence of clock requests (prepare,enable and disable,unprepare) in arm-smmu probe and remove. Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Co-developed-by: Rajendra Nayak <rnayak@codeaurora.org> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org> Link: https://lore.kernel.org/r/20210810064808.32486-1-saiprakash.ranjan@codeaurora.org Signed-off-by: Will Deacon <will@kernel.org>
2021-08-02iommu/arm-smmu-v3: Implement the map_pages() IOMMU driver callbackXiang Chen
Implement the map_pages() callback for ARM SMMUV3 driver to allow calls from iommu_map to map multiple pages of the same size in one call. Also remove the map() callback for the ARM SMMUV3 driver as it will no longer be used. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1627697831-158822-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-02iommu/arm-smmu-v3: Implement the unmap_pages() IOMMU driver callbackXiang Chen
Implement the unmap_pages() callback for ARM SMMUV3 driver to allow calls from iommu_unmap to unmap multiple pages of the same size in one call. Also remove the unmap() callback for the ARM SMMUV3 driver as it will no longer be used. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1627697831-158822-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-02iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()John Garry
Members of struct "llq" will be zero-inited, apart from member max_n_shift. But we write llq.val straight after the init, so it was pointless to zero init those other members. As such, separately init member max_n_shift only. In addition, struct "head" is initialised to "llq" only so that member max_n_shift is set. But that member is never referenced for "head", so remove any init there. Removing these initializations is seen as a small performance optimisation, as this code is (very) hot path. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1624293394-202509-1-git-send-email-john.garry@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-07-26iommu: Streamline iommu_iova_to_phys()Robin Murphy
If people are going to insist on calling iommu_iova_to_phys() pointlessly and expecting it to work, we can at least do ourselves a favour by handling those cases in the core code, rather than repeatedly across an inconsistent handful of drivers. Since all the existing drivers implement the internal callback, and any future ones are likely to want to work with iommu-dma which relies on iova_to_phys a fair bit, we may as well remove that currently-redundant check as well and consider it mandatory. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/f564f3f6ff731b898ff7a898919bf871c2c7745a.1626354264.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/arm-smmu: Implement the map_pages() IOMMU driver callbackIsaac J. Manjarres
Implement the map_pages() callback for the ARM SMMU driver to allow calls from iommu_map to map multiple pages of the same size in one call. Also, remove the map() callback for the ARM SMMU driver, as it will no longer be used. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-16-git-send-email-quic_c_gdjako@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-26iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callbackIsaac J. Manjarres
Implement the unmap_pages() callback for the ARM SMMU driver to allow calls from iommu_unmap to unmap multiple pages of the same size in one call. Also, remove the unmap() callback for the SMMU driver, as it will no longer be used. Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/1623850736-389584-15-git-send-email-quic_c_gdjako@quicinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-07-15Merge tag 'Wimplicit-fallthrough-clang-5.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo Silva: "This fixes many fall-through warnings when building with Clang and -Wimplicit-fallthrough, and also enables -Wimplicit-fallthrough for Clang, globally. It's also important to notice that since we have adopted the use of the pseudo-keyword macro fallthrough, we also want to avoid having more /* fall through */ comments being introduced. Contrary to GCC, Clang doesn't recognize any comments as implicit fall-through markings when the -Wimplicit-fallthrough option is enabled. So, in order to avoid having more comments being introduced, we use the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang, will cause a warning in case a code comment is intended to be used as a fall-through marking. The patch for Makefile also enforces this. We had almost 4,000 of these issues for Clang in the beginning, and there might be a couple more out there when building some architectures with certain configurations. However, with the recent fixes I think we are in good shape and it is now possible to enable the warning for Clang" * tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) Makefile: Enable -Wimplicit-fallthrough for Clang powerpc/smp: Fix fall-through warning for Clang dmaengine: mpc512x: Fix fall-through warning for Clang usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang powerpc/powernv: Fix fall-through warning for Clang MIPS: Fix unreachable code issue MIPS: Fix fall-through warnings for Clang ASoC: Mediatek: MT8183: Fix fall-through warning for Clang power: supply: Fix fall-through warnings for Clang dmaengine: ti: k3-udma: Fix fall-through warning for Clang s390: Fix fall-through warnings for Clang dmaengine: ipu: Fix fall-through warning for Clang iommu/arm-smmu-v3: Fix fall-through warning for Clang mmc: jz4740: Fix fall-through warning for Clang PCI: Fix fall-through warning for Clang scsi: libsas: Fix fall-through warning for Clang video: fbdev: Fix fall-through warning for Clang math-emu: Fix fall-through warning cpufreq: Fix fall-through warning for Clang drm/msm: Fix fall-through warning in msm_gem_new_impl() ...
2021-07-14iommu/qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"Marek Szyprowski
QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller, what fails for the second and latter IOMMU devices. This is intended and must be not fatal to the driver registration process. Also the cleanup path should take care of the runtime PM state, what is missing in the current patch. Revert relevant changes to the QCOM IOMMU driver until a proper fix is prepared. This partially reverts commit 249c9dc6aa0db74a0f7908efd04acf774e19b155. Fixes: 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error path") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210705065657.30356-1-m.szyprowski@samsung.com Signed-off-by: Joerg Roedel <jroedel@suse.de>