Age | Commit message (Collapse) | Author |
|
Requiring each IOMMU driver to initialise the 'owner' field of their
'struct iommu_ops' is error-prone and easily forgotten. Follow the
example set by PCI and USB by assigning THIS_MODULE automatically when
registering the ops structure with IOMMU core.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The commit c18647900ec8 ("iommu/dma: Relax locking in
iommu_dma_prepare_msi()") introduced a compliation warning,
drivers/iommu/dma-iommu.c: In function 'iommu_dma_prepare_msi':
drivers/iommu/dma-iommu.c:1206:27: warning: variable 'cookie' set but
not used [-Wunused-but-set-variable]
struct iommu_dma_cookie *cookie;
^~~~~~
Fixes: c18647900ec8 ("iommu/dma: Relax locking in iommu_dma_prepare_msi()")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
If the device fails to be added to the group, make sure to unlink the
reference before returning.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device")
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This adds the missing teardown step that removes the device link from
the group when the device addition fails.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Fixes: 797a8b4d768c5 ("iommu: Handle default domain attach failure")
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Starting with commit fa212a97f3a3 ("iommu/vt-d: Probe DMA-capable ACPI
name space devices"), we now probe DMA-capable ACPI name
space devices. On Dell XPS 13 9343, which has an Intel LPSS platform
device INTL9C60 enumerated via ACPI, this change leads to the following
warning:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1 at pci_device_group+0x11a/0x130
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.5.0-rc3+ #22
Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A20 06/06/2019
RIP: 0010:pci_device_group+0x11a/0x130
Code: f0 ff ff 48 85 c0 49 89 c4 75 c4 48 8d 74 24 10 48 89 ef e8 48 ef ff ff 48 85 c0 49 89 c4 75 af e8 db f7 ff ff 49 89 c4 eb a5 <0f> 0b 49 c7 c4 ea ff ff ff eb 9a e8 96 1e c7 ff 66 0f 1f 44 00 00
RSP: 0000:ffffc0d6c0043cb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffa3d1d43dd810 RCX: 0000000000000000
RDX: ffffa3d1d4fecf80 RSI: ffffa3d12943dcc0 RDI: ffffa3d1d43dd810
RBP: ffffa3d1d43dd810 R08: 0000000000000000 R09: ffffa3d1d4c04a80
R10: ffffa3d1d4c00880 R11: ffffa3d1d44ba000 R12: 0000000000000000
R13: ffffa3d1d4383b80 R14: ffffa3d1d4c090d0 R15: ffffa3d1d4324530
FS: 0000000000000000(0000) GS:ffffa3d1d6700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000460a001 CR4: 00000000003606e0
Call Trace:
? iommu_group_get_for_dev+0x81/0x1f0
? intel_iommu_add_device+0x61/0x170
? iommu_probe_device+0x43/0xd0
? intel_iommu_init+0x1fa2/0x2235
? pci_iommu_init+0x52/0xe7
? e820__memblock_setup+0x15c/0x15c
? do_one_initcall+0xcc/0x27e
? kernel_init_freeable+0x169/0x259
? rest_init+0x95/0x95
? kernel_init+0x5/0xeb
? ret_from_fork+0x35/0x40
---[ end trace 28473e7abc25b92c ]---
DMAR: ACPI name space devices didn't probe correctly
The bug results from the fact that while we now enumerate ACPI devices,
we aren't able to handle any non-PCI device when generating the device
group. Fix the issue by implementing an Intel-specific callback that
returns `pci_device_group` only if the device is a PCI device.
Otherwise, it will return a generic device group.
Fixes: fa212a97f3a3 ("iommu/vt-d: Probe DMA-capable ACPI name space devices")
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Cc: stable@vger.kernel.org # v5.3+
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The bit 13 and bit 14 of the IOMMU control register are
PPRLogEn and PPRIntEn. They are related to PPR (Peripheral Page
Request) instead of 'PPF'. Fix them accrodingly.
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The usage of the local variables 'range' and 'misc' was removed
from commit 226e889b20a9 ("iommu/amd: Remove first/last_device handling")
and commit 23c742db2171 ("iommu/amd: Split out PCI related parts of
IOMMU initialization"). So, remove them accrodingly.
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Export page table internals of the domain attached to each device.
Example of such dump on a Skylake machine:
$ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct
[ ... ]
Device 0000:00:14.0 with pasid 0 @0x15f3d9000
IOVA_PFN PML5E PML4E
0x000000008ced0 | 0x0000000000000000 0x000000015f3da003
0x000000008ced1 | 0x0000000000000000 0x000000015f3da003
0x000000008ced2 | 0x0000000000000000 0x000000015f3da003
0x000000008ced3 | 0x0000000000000000 0x000000015f3da003
0x000000008ced4 | 0x0000000000000000 0x000000015f3da003
0x000000008ced5 | 0x0000000000000000 0x000000015f3da003
0x000000008ced6 | 0x0000000000000000 0x000000015f3da003
0x000000008ced7 | 0x0000000000000000 0x000000015f3da003
0x000000008ced8 | 0x0000000000000000 0x000000015f3da003
0x000000008ced9 | 0x0000000000000000 0x000000015f3da003
PDPE PDE PTE
0x000000015f3db003 0x000000015f3dc003 0x000000008ced0003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced1003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced2003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced3003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced4003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced5003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced6003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced7003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced8003
0x000000015f3db003 0x000000015f3dc003 0x000000008ced9003
[ ... ]
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
After we make all map/unmap paths support first level page table.
Let's turn it on if hardware supports scalable mode.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
First-level translation may map input addresses to 4-KByte pages,
2-MByte pages, or 1-GByte pages. Support for 4-KByte pages and
2-Mbyte pages are mandatory for first-level translation. Hardware
support for 1-GByte page is reported through the FL1GP field in
the Capability Register.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
First-level translation restricts the input-address to a canonical
address (i.e., address bits 63:N have the same value as address
bit [N-1], where N is 48-bits with 4-level paging and 57-bits with
5-level paging). (section 3.6 in the spec)
This makes first level IOVA canonical by using IOVA with bit [N-1]
always cleared.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When software has changed first-level tables, it should invalidate
the affected IOTLB and the paging-structure-caches using the PASID-
based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Intel VT-d in scalable mode supports two types of page tables for
IOVA translation: first level and second level. The IOMMU driver
can choose one from both for IOVA translation according to the use
case. This sets up the pasid entry if a domain is selected to use
the first-level page table for iova translation.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Current intel_pasid_setup_first_level() use 5-level paging for
first level translation if CPUs use 5-level paging mode too.
This makes sense for SVA usages since the page table is shared
between CPUs and IOMMUs. But it makes no sense if we only want
to use first level for IOVA translation. Add PASID_FLAG_FL5LP
bit in the flags which indicates whether the 5-level paging
mode should be used.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This adds the Intel VT-d specific callback of setting
DOMAIN_ATTR_NESTING domain attribution. It is necessary
to let the VT-d driver know that the domain represents
a virtual machine which requires the IOMMU hardware to
support nested translation mode. Return success if the
IOMMU hardware suports nested mode, otherwise failure.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This checks whether a domain should use the first level page
table for map/unmap and marks it in the domain structure.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Currently if flush queue initialization fails, we return error
or enforce the system-wide strict mode. These are unnecessary
because we always check the existence of a flush queue before
queuing any iova's for lazy flushing. Printing a informational
message is enough.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
If Intel IOMMU strict mode is enabled by users, it's unnecessary
to create the iova flush queue.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Current map_sg stores trace message in a coarse manner. This
extends it so that more detailed messages could be traced.
The map_sg trace message looks like:
map_sg: dev=0000:00:17.0 [1/9] dev_addr=0xf8f90000 phys_addr=0x158051000 size=4096
map_sg: dev=0000:00:17.0 [2/9] dev_addr=0xf8f91000 phys_addr=0x15a858000 size=4096
map_sg: dev=0000:00:17.0 [3/9] dev_addr=0xf8f92000 phys_addr=0x15aa13000 size=4096
map_sg: dev=0000:00:17.0 [4/9] dev_addr=0xf8f93000 phys_addr=0x1570f1000 size=8192
map_sg: dev=0000:00:17.0 [5/9] dev_addr=0xf8f95000 phys_addr=0x15c6d0000 size=4096
map_sg: dev=0000:00:17.0 [6/9] dev_addr=0xf8f96000 phys_addr=0x157194000 size=4096
map_sg: dev=0000:00:17.0 [7/9] dev_addr=0xf8f97000 phys_addr=0x169552000 size=4096
map_sg: dev=0000:00:17.0 [8/9] dev_addr=0xf8f98000 phys_addr=0x169dde000 size=4096
map_sg: dev=0000:00:17.0 [9/9] dev_addr=0xf8f99000 phys_addr=0x148351000 size=4096
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Use combined macros for_each_svm_dev() to simplify SVM device iteration
and error checking.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Page responses should only be sent when last page in group (LPIG) or
private data is present in the page request. This patch avoids sending
invalid descriptors.
Fixes: 5d308fc1ecf53 ("iommu/vt-d: Add 256-bit invalidation descriptor support")
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
PASID allocator uses IDR which is exclusive for the end of the
allocation range. There is no need to decrement pasid_max.
Fixes: af39507305fb ("iommu/vt-d: Apply global PASID in SVA")
Reported-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
After each setup for PASID entry, related translation caches must be
flushed. We can combine duplicated code into one function which is less
error prone.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add a check during SVM bind to ensure CPU and IOMMU hardware capabilities
are met.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When setting up first level page tables for sharing with CPU, we need
to ensure IOMMU can support no less than the levels supported by the
CPU.
It is not adequate, as in the current code, to set up 5-level paging
in PASID entry First Level Paging Mode(FLPM) solely based on CPU.
Currently, intel_pasid_setup_first_level() is only used by native SVM
code which already checks paging mode matches. However, future use of
this helper function may not be limited to native SVM.
https://lkml.org/lkml/2019/11/18/1037
Fixes: 437f35e1cd4c8 ("iommu/vt-d: Add first level page table interface")
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Shared Virtual Memory(SVM) is based on a collective set of hardware
features detected at runtime. There are requirements for matching CPU
and IOMMU capabilities.
The current code checks CPU and IOMMU feature set for SVM support but
the result is never stored nor used. Therefore, SVM can still be used
even when these checks failed. The consequences can be:
1. CPU uses 5-level paging mode for virtual address of 57 bits, but
IOMMU can only support 4-level paging mode with 48 bits address for DMA.
2. 1GB page size is used by CPU but IOMMU does not support it. VT-d
unrecoverable faults may be generated.
The best solution to fix these problems is to prevent them in the first
place.
This patch consolidates code for checking PASID, CPU vs. IOMMU paging
mode compatibility, as well as provides specific error messages for
each failed checks. On sane hardware configurations, these error message
shall never appear in kernel log.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This adds Kconfig option INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
to make it easier for distributions to enable or disable the
Intel IOMMU scalable mode by default during kernel build.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
|
|
The iommu variable in set_device_exclusion_range() us unused
now and causes a compiler warning. Remove it.
Fixes: 387caf0b759a ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Use the new standard function instead of open-coding it.
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: virtualization@lists.linux-foundation.org
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Use the new standard function instead of open-coding it.
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Use the new standard function instead of open-coding it.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Use the new standard function instead of open-coding it.
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Implement a generic function for removing reserved regions. This can be
used by drivers that don't do anything fancy with these regions other
than allocating memory for them.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When running heavy memory pressure workloads, this 5+ old system is
throwing endless warnings below because disk IO is too slow to recover
from swapping. Since the volume from alloc_iova_fast() could be large,
once it calls printk(), it will trigger disk IO (writing to the log
files) and pending softirqs which could cause an infinite loop and make
no progress for days by the ongoimng memory reclaim. This is the counter
part for Intel where the AMD part has already been merged. See the
commit 3d708895325b ("iommu/amd: Silence warnings under memory
pressure"). Since the allocation failure will be reported in
intel_alloc_iova(), so just call dev_err_once() there because even the
"ratelimited" is too much, and silence the one in alloc_iova_mem() to
avoid the expensive warn_alloc().
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
slab_out_of_memory: 66 callbacks suppressed
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
node 0: slabs: 1822, objs: 16398, free: 0
node 1: slabs: 2051, objs: 18459, free: 31
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
node 0: slabs: 1822, objs: 16398, free: 0
node 1: slabs: 2051, objs: 18459, free: 31
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 0: slabs: 697, objs: 4182, free: 0
node 0: slabs: 697, objs: 4182, free: 0
node 0: slabs: 697, objs: 4182, free: 0
node 0: slabs: 697, objs: 4182, free: 0
node 1: slabs: 381, objs: 2286, free: 27
node 1: slabs: 381, objs: 2286, free: 27
node 1: slabs: 381, objs: 2286, free: 27
node 1: slabs: 381, objs: 2286, free: 27
node 0: slabs: 1822, objs: 16398, free: 0
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 1: slabs: 2051, objs: 18459, free: 31
node 0: slabs: 697, objs: 4182, free: 0
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
node 1: slabs: 381, objs: 2286, free: 27
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 0: slabs: 697, objs: 4182, free: 0
node 1: slabs: 381, objs: 2286, free: 27
hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
warn_alloc: 96 callbacks suppressed
kworker/11:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1
CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G B
Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19
12/27/2015
Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
dump_stack+0xa0/0xea
warn_alloc.cold.94+0x8a/0x12d
__alloc_pages_slowpath+0x1750/0x1870
__alloc_pages_nodemask+0x58a/0x710
alloc_pages_current+0x9c/0x110
alloc_slab_page+0xc9/0x760
allocate_slab+0x48f/0x5d0
new_slab+0x46/0x70
___slab_alloc+0x4ab/0x7b0
__slab_alloc+0x43/0x70
kmem_cache_alloc+0x2dd/0x450
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
alloc_iova+0x33/0x210
cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
node 0: slabs: 697, objs: 4182, free: 0
alloc_iova_fast+0x62/0x3d1
node 1: slabs: 381, objs: 2286, free: 27
intel_alloc_iova+0xce/0xe0
intel_map_sg+0xed/0x410
scsi_dma_map+0xd7/0x160
scsi_queue_rq+0xbf7/0x1310
blk_mq_dispatch_rq_list+0x4d9/0xbc0
blk_mq_sched_dispatch_requests+0x24a/0x300
__blk_mq_run_hw_queue+0x156/0x230
blk_mq_run_work_fn+0x3b/0x40
process_one_work+0x579/0xb90
worker_thread+0x63/0x5b0
kthread+0x1e6/0x210
ret_from_fork+0x3a/0x50
Mem-Info:
active_anon:2422723 inactive_anon:361971 isolated_anon:34403
active_file:2285 inactive_file:1838 isolated_file:0
unevictable:0 dirty:1 writeback:5 unstable:0
slab_reclaimable:13972 slab_unreclaimable:453879
mapped:2380 shmem:154 pagetables:6948 bounce:0
free:19133 free_pcp:7363 free_cma:0
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Current implementation for IOMMU x2APIC support makes use of
the MMIO access to MSI capability block registers, which requires
checking EFR[MsiCapMmioSup]. However, only IVHD type 11h/40h contain
the information, and not in the IVHD type 10h IOMMU feature reporting
field. Since the BIOS in newer systems, which supports x2APIC, would
normally contain IVHD type 11h/40h, remove the IOMMU_FEAT_XTSUP_SHIFT
check for IVHD type 10h, and only support x2APIC with IVHD type 11h/40h.
Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts')
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The IOMMU MMIO access to MSI capability registers is available only if
the EFR[MsiCapMmioSup] is set. Current implementation assumes this bit
is set if the EFR[XtSup] is set, which might not be the case.
Fix by checking the EFR[MsiCapMmioSup] before accessing the MSI address
low/high and MSI data registers via the MMIO.
Fixes: 66929812955b ('iommu/amd: Add support for X2APIC IOMMU interrupts')
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Some buggy BIOSes might define multiple exclusion ranges of the
IVMD entries which are associated with the same IOMMU hardware.
This leads to the overwritten exclusion range (exclusion_start
and exclusion_length members) in set_device_exclusion_range().
Here is a real case:
When attaching two Broadcom RAID controllers to a server, the first
one reports the failure during booting (the disks connecting to the
RAID controller cannot be detected).
This patch prevents the issue by treating per-device exclusion
ranges as r/w unity-mapped regions.
Discussion:
* https://lists.linuxfoundation.org/pipermail/iommu/2019-November/040140.html
Suggested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
I no longer work for Arm, so update the stale reference to my old email
address.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
By conditionally dropping support for the legacy binding and exporting
the newly introduced 'arm_smmu_impl_init()' function we can allow the
ARM SMMU driver to be built as a module.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When removing the SMMU driver, we need to clear any state that we
registered during probe. This includes our bus ops, sysfs entries and
the IOMMU device registered for early firmware probing of masters.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
By removing the redundant call to 'pci_request_acs()' we can allow the
ARM SMMUv3 driver to be built as a module.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add support for SMMU drivers built as modules to the ACPI/IORT device
probing path, by deferring the probe of the master if the SMMU driver is
known to exist but has not been loaded yet. Given that the IORT code
registers a platform device for each SMMU that it discovers, we can
easily trigger the udev based autoloading of the SMMU drivers by making
the platform device identifier part of the module alias.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: John Garry <john.garry@huawei.com> # only manual smmu ko loading
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When removing the SMMUv3 driver, we need to clear any state that we
registered during probe. This includes our bus ops, sysfs entries and
the IOMMU device registered for early firmware probing of masters.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Forcefully unbinding the Arm SMMU drivers is a pretty dangerous operation,
since it will likely lead to catastrophic failure for any DMA devices
mastering through the SMMU being unbound. When the driver then attempts
to "handle" the fatal faults, it's very easy to trip over dead data
structures, leading to use-after-free.
On John's machine, he reports that the machine was "unusable" due to
loss of the storage controller following a forced unbind of the SMMUv3
driver:
| # cd ./bus/platform/drivers/arm-smmu-v3
| # echo arm-smmu-v3.0.auto > unbind
| hisi_sas_v2_hw HISI0162:01: CQE_AXI_W_ERR (0x800) found!
| platform arm-smmu-v3.0.auto: CMD_SYNC timeout at 0x00000146
| [hwprod 0x00000146, hwcons 0x00000000]
Prevent this forced unbinding of the drivers by setting "suppress_bind_attrs"
to true.
Link: https://lore.kernel.org/lkml/06dfd385-1af0-3106-4cc5-6a5b8e864759@huawei.com
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This reverts commit addb672f200f4e99368270da205320b83efe01a0.
Let's get the SMMU driver building as a module, which means putting
back some dead code that we used to carry.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This reverts commit c07b6426df922d21a13a959cf785d46e9c531941.
Let's get the SMMUv3 driver building as a module, which means putting
back some dead code that we used to carry.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
'bus_set_iommu()' allows IOMMU drivers to register their ops for a given
bus type. Unfortunately, it then doesn't allow them to be removed, which
is necessary for modular drivers to shutdown cleanly so that they can be
reloaded later on.
Allow 'bus_set_iommu()' to take a NULL 'ops' argument, which clear the
ops pointer for the selected bus_type.
Signed-off-by: Will Deacon <will@kernel.org>
Tested-by: John Garry <john.garry@huawei.com> # smmu v3
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|