summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms/pseries/iommu.c
AgeCommit message (Collapse)Author
2024-02-23powerpc/pseries/iommu: IOMMU table is not initialized for kdump over SR-IOVGaurav Batra
When kdump kernel tries to copy dump data over SR-IOV, LPAR panics due to NULL pointer exception: Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000020847ad4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: mlx5_core(+) vmx_crypto pseries_wdt papr_scm libnvdimm mlxfw tls psample sunrpc fuse overlay squashfs loop CPU: 12 PID: 315 Comm: systemd-udevd Not tainted 6.4.0-Test102+ #12 Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries NIP: c000000020847ad4 LR: c00000002083b2dc CTR: 00000000006cd18c REGS: c000000029162ca0 TRAP: 0300 Not tainted (6.4.0-Test102+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48288244 XER: 00000008 CFAR: c00000002083b2d8 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 ... NIP _find_next_zero_bit+0x24/0x110 LR bitmap_find_next_zero_area_off+0x5c/0xe0 Call Trace: dev_printk_emit+0x38/0x48 (unreliable) iommu_area_alloc+0xc4/0x180 iommu_range_alloc+0x1e8/0x580 iommu_alloc+0x60/0x130 iommu_alloc_coherent+0x158/0x2b0 dma_iommu_alloc_coherent+0x3c/0x50 dma_alloc_attrs+0x170/0x1f0 mlx5_cmd_init+0xc0/0x760 [mlx5_core] mlx5_function_setup+0xf0/0x510 [mlx5_core] mlx5_init_one+0x84/0x210 [mlx5_core] probe_one+0x118/0x2c0 [mlx5_core] local_pci_probe+0x68/0x110 pci_call_probe+0x68/0x200 pci_device_probe+0xbc/0x1a0 really_probe+0x104/0x540 __driver_probe_device+0xb4/0x230 driver_probe_device+0x54/0x130 __driver_attach+0x158/0x2b0 bus_for_each_dev+0xa8/0x130 driver_attach+0x34/0x50 bus_add_driver+0x16c/0x300 driver_register+0xa4/0x1b0 __pci_register_driver+0x68/0x80 mlx5_init+0xb8/0x100 [mlx5_core] do_one_initcall+0x60/0x300 do_init_module+0x7c/0x2b0 At the time of LPAR dump, before kexec hands over control to kdump kernel, DDWs (Dynamic DMA Windows) are scanned and added to the FDT. For the SR-IOV case, default DMA window "ibm,dma-window" is removed from the FDT and DDW added, for the device. Now, kexec hands over control to the kdump kernel. When the kdump kernel initializes, PCI busses are scanned and IOMMU group/tables created, in pci_dma_bus_setup_pSeriesLP(). For the SR-IOV case, there is no "ibm,dma-window". The original commit: b1fc44eaa9ba, fixes the path where memory is pre-mapped (direct mapped) to the DDW. When TCEs are direct mapped, there is no need to initialize IOMMU tables. iommu_table_setparms_lpar() only considers "ibm,dma-window" property when initiallizing IOMMU table. In the scenario where TCEs are dynamically allocated for SR-IOV, newly created IOMMU table is not initialized. Later, when the device driver tries to enter TCEs for the SR-IOV device, NULL pointer execption is thrown from iommu_area_alloc(). The fix is to initialize the IOMMU table with DDW property stored in the FDT. There are 2 points to remember: 1. For the dedicated adapter, kdump kernel would encounter both default and DDW in FDT. In this case, DDW property is used to initialize the IOMMU table. 2. A DDW could be direct or dynamic mapped. kdump kernel would initialize IOMMU table and mark the existing DDW as "dynamic". This works fine since, at the time of table initialization, iommu_table_clear() makes some space in the DDW, for some predefined number of TCEs which are needed for kdump to succeed. Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240125203017.61014-1-gbatra@linux.ibm.com
2023-10-19powerpc/pseries/iommu: enable_ddw incorrectly returns direct mapping for ↵Gaurav Batra
SR-IOV device When a device is initialized, the driver invokes dma_supported() twice - first for streaming mappings followed by coherent mappings. For an SR-IOV device, default window is deleted and DDW created. With vPMEM enabled, TCE mappings are dynamically created for both vPMEM and SR-IOV device. There are no direct mappings. First time when dma_supported() is called with 64 bit mask, DDW is created and marked as dynamic window. The second time dma_supported() is called, enable_ddw() finds existing window for the device and incorrectly returns it as "direct mapping". This only happens when size of DDW is big enough to map max LPAR memory. This results in streaming TCEs to not get dynamically mapped, since code incorrently assumes these are already pre-mapped. The adapter initially comes up but goes down due to EEH. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231003030802.47914-1-gbatra@linux.vnet.ibm.com
2023-08-18powerpc: Move DMA64_PROPNAME define to a headerMichal Suchanek
Avoid redefining the same value in multiple source. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230817162411.429-1-msuchanek@suse.de
2023-06-26powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memoryGaurav Batra
When memory is dynamically added/removed, iommu_mem_notifier() is invoked. This routine traverses through all the DMA windows (DDW only, not default windows) to add/remove "direct" TCE mappings. The routines for this purpose are tce_clearrange_multi_pSeriesLP() and tce_clearrange_multi_pSeriesLP(). Both these routines are designed for Direct mapped DMA windows only. The issue is that there could be some DMA windows in the list which are not "direct" mapped. Calling these routines will either, 1) remove some dynamically mapped TCEs, Or 2) try to add TCEs which are out of bounds and HCALL returns H_PARAMETER Here are the side affects when these routines are incorrectly invoked for "dynamically" mapped DMA windows. tce_setrange_multi_pSeriesLP() This adds direct mapped TCEs. Now, this could invoke HCALL to add TCEs with out-of-bound range. In this scenario, HCALL will return H_PARAMETER and DLAR ADD of memory will fail. tce_clearrange_multi_pSeriesLP() This will remove range of TCEs. The TCE range that is calculated, depending on the memory range being added, could infact be mapping some other memory address (for dynamic DMA window scenario). This will wipe out those TCEs. The solution is for iommu_mem_notifier() to only invoke these routines for "direct" mapped DMA windows. Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> [mpe: Initialise direct at allocation time in ddw_list_new_entry()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230613171641.15641-1-gbatra@linux.vnet.ibm.com
2023-05-30powerpc/iommu: Limit number of TCEs to 512 for H_STUFF_TCE hcallGaurav Batra
Currently in tce_freemulti_pSeriesLP() there is no limit on how many TCEs are passed to the H_STUFF_TCE hcall. This has not caused an issue until now, but newer firmware releases have started enforcing a limit of 512 TCEs per call. The limit is correct per the specification (PAPR v2.12 § 14.5.4.2.3). The code has been in it's current form since it was initially merged. Cc: stable@vger.kernel.org Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> [mpe: Tweak change log wording & add PAPR reference] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230525143454.56878-1-gbatra@linux.vnet.ibm.com
2023-05-17powerpc/iommu: Incorrect DDW Table is referenced for SR-IOV deviceGaurav Batra
For an SR-IOV device, while enabling DDW, a new table is created and added at index 1 in the group. In the below 2 scenarios, the table is incorrectly referenced at index 0 (which is where the table is for default DMA window). 1. When adding DDW This issue is exposed with "slub_debug". Error thrown out from dma_iommu_dma_supported() Warning: IOMMU offset too big for device mask mask: 0xffffffff, table offset: 0x800000000000000 2. During Dynamic removal of the PCI device. Error is from iommu_tce_table_put() since a NULL table pointer is passed in. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com> Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230505184701.91613-1-gbatra@linux.vnet.ibm.com
2023-05-17powerpc/iommu: Remove iommu_del_device()Jason Gunthorpe
Now that power calls iommu_device_register() and populates its groups using iommu_ops->device_group it should not be calling iommu_group_remove_device(). The core code owns the groups and all the other related iommu data, it will clean it up automatically. Remove the bus notifiers and explicit calls to iommu_group_remove_device(). Fixes: a940904443e4 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/0-v1-1421774b874b+167-ppc_device_group_jgg@nvidia.com
2023-04-04powerpc: Use of_address_to_resource()Rob Herring
Replace open coded reading of "reg" or of_get_address()/ of_translate_address() calls with a single call to of_address_to_resource(). Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230329220337.141295-1-robh@kernel.org
2023-03-30powerpc/pseries: Add spaces around / operatorPetr Vaněk
This is follow up change after 14b5d59a261b ("powerpc/pseries: Fix formatting to make code look more beautiful") to conform to kernel coding style. Signed-off-by: Petr Vaněk <arkamar@atlas.cz> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230324220041.11378-1-arkamar@atlas.cz
2023-03-15powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domainsAlexey Kardashevskiy
Up until now PPC64 managed to avoid using iommu_ops. The VFIO driver uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in the Type1 VFIO driver. Recent development added 2 uses of iommu_ops to the generic VFIO which broke POWER: - a coherency capability check; - blocking IOMMU domain - iommu_group_dma_owner_claimed()/... This adds a simple iommu_ops which reports support for cache coherency and provides a basic support for blocking domains. No other domain types are implemented so the default domain is NULL. Since now iommu_ops controls the group ownership, this takes it out of VFIO. This adds an IOMMU device into a pci_controller (=PHB) and registers it in the IOMMU subsystem, iommu_ops is registered at this point. This setup is done in postcore_initcall_sync. This replaces iommu_group_add_device() with iommu_probe_device() as the former misses necessary steps in connecting PCI devices to IOMMU devices. This adds a comment about why explicit iommu_probe_device() is still needed. The previous discussion is here: https://lore.kernel.org/r/20220707135552.3688927-1-aik@ozlabs.ru/ https://lore.kernel.org/r/20220701061751.1955857-1-aik@ozlabs.ru/ Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence") Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> [mpe: Fix CONFIG_IOMMU_API=n build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2000135730.16998523.1678123860135.JavaMail.zimbra@raptorengineeringinc.com
2023-03-14powerpc/iommu: Add "borrowing" iommu_table_group_opsAlexey Kardashevskiy
PPC64 IOMMU API defines iommu_table_group_ops which handles DMA windows for PEs: control the ownership, create/set/unset a table the hardware for dynamic DMA windows (DDW). VFIO uses the API to implement support on POWER. So far only PowerNV IODA2 (POWER8 and newer machines) implemented this and other cases (POWER7 or nested KVM) did not and instead reused existing iommu_table structs. This means 1) no DDW 2) ownership transfer is done directly in the VFIO SPAPR TCE driver. Soon POWER is going to get its own iommu_ops and ownership control is going to move there. This implements spapr_tce_table_group_ops which borrows iommu_table tables. The upside is that VFIO needs to know less about POWER. The new ops returns the existing table from create_table() and only checks if the same window is already set. This is only going to work if the default DMA window starts table_group.tce32_start and as big as pe->table_group.tce32_size (not the case for IODA2+ PowerNV). This changes iommu_table_group_ops::take_ownership() to return an error if borrowing a table failed. This should not cause any visible change in behavior for PowerNV. pSeries was not that well tested/supported anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> [mpe: Fix CONFIG_IOMMU_API=n build (skiroot_defconfig), & formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/525438831.16998517.1678123820075.JavaMail.zimbra@raptorengineeringinc.com
2022-11-24powerpc/pseries: Fix formatting to make code look more beautifulDeming Wang
Operators should be separated by spaces in tce_buildmulti_pSeriesLP() Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220701094553.1722-1-wangdeming@inspur.com
2022-07-28pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-windowAlexey Kardashevskiy
The pseries platform uses 32bit default DMA window (always 4K pages) and optional 64bit DMA window available via DDW ("Dynamic DMA Windows"), 64K or 2M pages. For ages the default one was not removed and a huge window was created in addition. Things changed with SRIOV-enabled PowerVM which creates a default-and-bigger DMA window in 64bit space (still using 4K pages) for IOV VFs so certain OSes do not need to use the DDW API in order to utilize all available TCE budget. Linux on the other hand removes the default window and creates a bigger one (with more TCEs or/and a bigger page size - 64K/2M) in a bid to map the entire RAM, and if the new window size is smaller than that - it still uses this new bigger window. The result is that the default window is removed but the "ibm,dma-window" property is not. When kdump is invoked, the existing code tries reusing the existing 64bit DMA window which location and parameters are stored in the device tree but this fails as the new property does not make it to the kdump device tree blob. So the code falls back to the default window which does not exist anymore although the device tree says that it does. The result of that is that PCI devices become unusable and cannot be used for kdumping. This preserves the DMA64 and DIRECT64 properties in the device tree blob for the crash kernel. Since the crash kernel setup is done after device drivers are loaded and probed, the proper DMA config is stored at least for boot time devices. Because DDW window is optional and the code configures the default window first, the existing code creates an IOMMU table descriptor for the non-existing default DMA window. It is harmless for kdump as it does not touch the actual window (only reads what is mapped and marks those IO pages as used) but it is bad for kexec which clears it thinking it is a smaller default window rather than a bigger DDW window. This removes the "ibm,dma-window" property from the device tree after a bigger window is created and the crash kernel setup picks it up. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220629060614.1680476-1-aik@ozlabs.ru
2022-06-29powerpc/pseries/iommu: Print ibm,query-pe-dma-windows parametersAlexey Kardashevskiy
PowerVM has a stricter policy about allocating TCEs for LPARs and often there is not enough TCEs for 1:1 mapping, this adds the supported numbers into dev_info() to help analyzing bugreports. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220601040117.1467710-1-aik@ozlabs.ru
2022-05-19Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge our KVM topic branch.
2022-05-19KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlersAlexey Kardashevskiy
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation. The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine. If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version. On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs. So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput. This removes the real mode handlers from KVM and related code from the powernv platform. This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed. This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2022-05-05powerpc: fix typos in commentsJulia Lawall
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2021-12-23powerpc/pseries: Add __init attribute to eligible functionsNick Child
Some functions defined in 'arch/powerpc/platforms/pseries' are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-13-nick.child@ibm.com
2021-11-15powerpc/pseries/ddw: Do not try direct mapping with persistent memory and ↵Alexey Kardashevskiy
one window There is a possibility of having just one DMA window available with a limited capacity which the existing code does not handle that well. If the window is big enough for the system RAM but less than MAX_PHYSMEM_BITS (which we want when persistent memory is present), we create 1:1 window and leave persistent memory without DMA. This disables 1:1 mapping entirely if there is persistent memory and either: - the huge DMA window does not cover the entire address space; - the default DMA window is removed. This relies on reverted 54fc3c681ded ("powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory") to return the actual amount RAM in ddw_memory_hotplug_max() (posted separately). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211108040320.3857636-4-aik@ozlabs.ru
2021-11-15powerpc/pseries/ddw: simplify enable_ddw()Alexey Kardashevskiy
This drops rather useless ddw_enabled flag as direct_mapping implies it anyway. While at this, fix indents in enable_ddw(). This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211108040320.3857636-3-aik@ozlabs.ru
2021-11-15powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for ↵Alexey Kardashevskiy
persistent memory" This reverts commit 54fc3c681ded9437e4548e2501dc1136b23cfa9a which does not allow 1:1 mapping even for the system RAM which is usually possible. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211108040320.3857636-2-aik@ozlabs.ru
2021-11-05Merge tag 'powerpc-5.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Enable STRICT_KERNEL_RWX for Freescale 85xx platforms. - Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it to be disabled. - Add support for out-of-line static calls on 32-bit. - Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled. - Fix boot hangs on e5500 due to stale value in ESR passed to do_page_fault(). - Fix several bugs on pseries in handling of device tree cache information for hotplugged CPUs, and/or during partition migration. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin, Andrew Donnellan, Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le Goater, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Denis Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo A. R. Silva, Hari Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Niklas Schnelle, Oliver O'Halloran, Rob Herring, Russell Currey, Srikar Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant Hegde, Wan Jiabing, and Xiaoming Ni, * tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (73 commits) powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST powerpc/32e: Ignore ESR in instruction storage interrupt handler powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC MAINTAINERS: Update powerpc KVM entry powerpc/xmon: fix task state output powerpc/44x/fsp2: add missing of_node_put powerpc/dcr: Use cmplwi instead of 3-argument cmpli KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling powerpc/security: Use a mutex for interrupt exit code patching powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs powerpc/book3e: Fix set_memory_x() and set_memory_nx() powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect() powerpc/bpf: Fix write protecting JIT code selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.sh powerpc/64s/interrupt: Fix check_return_regs_valid() false positive powerpc/boot: Set LC_ALL=C in wrapper script powerpc/64s: Default to 64K pages for 64 bit book3s Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC" ...
2021-10-25powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is presentAlexey Kardashevskiy
The iommu_init_table() helper takes an address range to reserve in the IOMMU table being initialized to exclude MMIO addresses, this is useful if the window stretches far beyond 4GB (although wastes some TCEs). At the moment the code searches for such MMIO32 range and fails if none found which is considered a problem while it really is not: it is actually better as this says there is no MMIO32 to reserve and we can use usually wasted TCEs. Furthermore PHYP never actually allows creating windows starting at busaddress=0 so this MMIO32 range is never useful. This removes error exit and initializes the table with zero range if no MMIO32 is detected. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211020132315.2287178-5-aik@ozlabs.ru
2021-10-25powerpc/pseries/iommu: Check if the default window in use before removing itAlexey Kardashevskiy
At the moment this check is performed after we remove the default window which is late and disallows to revert whatever changes enable_ddw() has made to DMA windows. This moves the check and error exit before removing the window. This raised the message severity from "debug" to "warning" as this should not happen in practice and cannot be triggered by the userspace. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211020132315.2287178-4-aik@ozlabs.ru
2021-10-25powerpc/pseries/iommu: Use correct vfree for it_mapAlexey Kardashevskiy
The it_map array is vzalloc'ed so use vfree() for it when creating a huge DMA window failed for whatever reason. While at this, write zero to it_map. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211020132315.2287178-3-aik@ozlabs.ru
2021-10-22powerpc/pseries/iommu: Add of_node_put() before breakWan Jiabing
Fix following coccicheck warning: ./arch/powerpc/platforms/pseries/iommu.c:924:1-28: WARNING: Function for_each_node_with_property should have of_node_put() before break Early exits from for_each_node_with_property should decrement the node reference counter. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Leonardo Bras <leobras.c@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211014075624.16344-1-wanjiabing@vivo.com
2021-10-09powerps/pseries/dma: Add support for 2M IOMMU page sizeAlexey Kardashevskiy
The upcoming PAPR spec adds a 2M page size, bit 23 right after 16G page size in the "ibm,query-pe-dma-window" call. This adds support for the new page size. Since the new page size is out of sorted order, this changes the loop to not assume that shift[] is sorted. This has now been tested and is known to work on a pre-release version of phyp. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211006044735.1114669-1-aik@ozlabs.ru
2021-08-27powerpc/pseries/iommu: Rename "direct window" to "dma window"Leonardo Bras
A previous change introduced the usage of DDW as a bigger indirect DMA mapping when the DDW available size does not map the whole partition. As most of the code that manipulates direct mappings was reused for indirect mappings, it's necessary to rename all names and debug/info messages to reflect that it can be used for both kinds of mapping. This should cause no behavioural change, just adjust naming. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-12-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Make use of DDW for indirect mappingLeonardo Bras
So far it's assumed possible to map the guest RAM 1:1 to the bus, which works with a small number of devices. SRIOV changes it as the user can configure hundreds VFs and since phyp preallocates TCEs and does not allow IOMMU pages bigger than 64K, it has to limit the number of TCEs per a PE to limit waste of physical pages. As of today, if the assumed direct mapping is not possible, DDW creation is skipped and the default DMA window "ibm,dma-window" is used instead. By using DDW, indirect mapping can get more TCEs than available for the default DMA window, and also get access to using much larger pagesizes (16MB as implemented in qemu vs 4k from default DMA window), causing a significant increase on the maximum amount of memory that can be IOMMU mapped at the same time. Indirect mapping will only be used if direct mapping is not a possibility. For indirect mapping, it's necessary to re-create the iommu_table with the new DMA window parameters, so iommu_alloc() can use it. Removing the default DMA window for using DDW with indirect mapping is only allowed if there is no current IOMMU memory allocated in the iommu_table. enable_ddw() is aborted otherwise. Even though there won't be both direct and indirect mappings at the same time, we can't reuse the DIRECT64_PROPNAME property name, or else an older kexec()ed kernel can assume direct mapping, and skip iommu_alloc(), causing undesirable behavior. So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info" was created to represent a DDW that does not allow direct mapping. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-11-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Find existing DDW with given property nameLeonardo Bras
At the moment pseries stores information about created directly mapped DDW window in DIRECT64_PROPNAME. With the objective of implementing indirect DMA mapping with DDW, it's necessary to have another propriety name to make sure kexec'ing into older kernels does not break, as it would if we reuse DIRECT64_PROPNAME. In order to have this, find_existing_ddw_windows() needs to be able to look for different property names. Extract find_existing_ddw_windows() into find_existing_ddw_windows_named() and calls it with current property name. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-10-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Update remove_dma_window() to accept property nameLeonardo Bras
Update remove_dma_window() so it can be used to remove DDW with a given property name. This enables the creation of new property names for DDW, so we can have different usage for it, like indirect mapping. Also, add return values to it so we can check if the property was found while removing the active DDW. This allows skipping the remaining property names while reducing the impact of multiple property names. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-9-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helperLeonardo Bras
Add a new helper _iommu_table_setparms(), and use it in iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated code. Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(), so move it to the new helper. Since we need the iommu_table_ops to be declared before used, declare iommu_table_lpar_multi_ops and iommu_table_pseries_ops to before their respective iommu_table_setparms*(). Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-8-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()Leonardo Bras
Code used to create a ddw property that was previously scattered in enable_ddw() is now gathered in ddw_property_create(), which deals with allocation and filling the property, letting it ready for of_property_add(), which now occurs in sequence. This created an opportunity to reorganize the second part of enable_ddw(): Without this patch enable_ddw() does, in order: kzalloc() property & members, create_ddw(), fill ddwprop inside property, ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory, of_add_property(), and list_add(). With this patch enable_ddw() does, in order: create_ddw(), ddw_property_create(), of_add_property(), ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory, and list_add(). This change requires of_remove_property() in case anything fails after of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk in all memory, which looks the most expensive operation, only if everything else succeeds. Also, the error path got remove_ddw() replaced by a new helper __remove_dma_window(), which only removes the new DDW with an rtas-call. For this, a new helper clean_dma_window() was needed to clean anything that could left if walk_system_ram_range() fails. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-7-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Allow DDW windows starting at 0x00Leonardo Bras
enable_ddw() currently returns the address of the DMA window, which is considered invalid if has the value 0x00. Also, it only considers valid an address returned from find_existing_ddw if it's not 0x00. Changing this behavior makes sense, given the users of enable_ddw() only need to know if direct mapping is possible. It can also allow a DMA window starting at 0x00 to be used. This will be helpful for using a DDW with indirect mapping, as the window address will be different than 0x00, but it will not map the whole partition. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-6-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Add ddw_list_new_entry() helperLeonardo Bras
There are two functions creating direct_window_list entries in a similar way, so create a ddw_list_new_entry() to avoid duplicity and simplify those functions. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-5-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helperLeonardo Bras
Creates a helper to allow allocating a new iommu_table without the need to reallocate the iommu_group. This will be helpful for replacing the iommu_table for the new DMA window, after we remove the old one with iommu_tce_table_put(). Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-4-leobras.c@gmail.com
2021-08-27powerpc/pseries/iommu: Replace hard-coded page shiftLeonardo Bras
Some functions assume IOMMU page size can only be 4K (pageshift == 12). Update them to accept any page size passed, so we can use 64K pages. In the process, some defines like TCE_SHIFT were made obsolete, and then removed. IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show a RPN of 52-bit, and considers a 12-bit pageshift, so there should be no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn. It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and tce_buildmulti_pSeriesLP(). Most places had a tbl struct, so using tbl->it_page_shift was simple. tce_free_pSeriesLP() was a special case, since callers not always have a tbl struct, so adding a tceshift parameter seems the right thing to do. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817063929.38701-2-leobras.c@gmail.com
2021-04-23powerpc/iommu: Do not immediately panic when failed IOMMU table allocationAlexey Kardashevskiy
Most platforms allocate IOMMU table structures (specifically it_map) at the boot time and when this fails - it is a valid reason for panic(). However the powernv platform allocates it_map after a device is returned to the host OS after being passed through and this happens long after the host OS booted. It is quite possible to trigger the it_map allocation panic() and kill the host even though it is not necessary - the host OS can still use the DMA bypass mode (requires a tiny fraction of it_map's memory) and even if that fails, the host OS is runnnable as it was without the device for which allocating it_map causes the panic. Instead of immediately crashing in a powernv/ioda2 system, this prints an error and continues. All other platforms still call panic(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Leonardo Bras <leobras.c@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210216033307.69863-3-aik@ozlabs.ru
2021-04-21powerpc/pseries/iommu: Fix window size for direct mapping with pmemLeonardo Bras
As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's possible to use direct DMA mapping even with pmem region. But, if that happens, the window size (len) is set to (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a pagesize times smaller DDW to be created, being insufficient for correct usage. Fix this so the correct window size is used in this case. Fixes: bf6e2d562bbc4 ("powerpc/dma: Fallback to dma_ops when persistent memory present") Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210420045404.438735-1-leobras.c@gmail.com
2021-04-14powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPARLeonardo Bras
According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes" will let the OS know all possible pagesizes that can be used for creating a new DDW. Currently Linux will only try using 3 of the 8 available options: 4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M, 128M, 256M and 16G. Enabling bigger pages would be interesting for direct mapping systems with a lot of RAM, while using less TCE entries. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210408201915.174217-1-leobras.c@gmail.com
2020-11-27powerpc/dma: Fallback to dma_ops when persistent memory presentAlexey Kardashevskiy
So far we have been using huge DMA windows to map all the RAM available. The RAM is normally mapped to the VM address space contiguously, and there is always a reasonable upper limit for possible future hot plugged RAM which makes it easy to map all RAM via IOMMU. Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike normal RAM) can map anywhere in the VM space beyond the maximum RAM size and since it can be used for DMA, it requires extending the huge window up to MAX_PHYSMEM_BITS which requires hypervisor support for: 1. huge TCE tables; 2. multilevel TCE tables; 3. huge IOMMU pages. Certain hypervisors cannot do either so the only option left is restricting the huge DMA window to include only RAM and fallback to the default DMA window for persistent memory. This defines arch_dma_map_direct/etc to allow generic DMA code perform additional checks on whether direct DMA is still possible. This checks if the system has persistent memory. If it does not, the DMA bypass mode is selected, i.e. * dev->bus_dma_limit = 0 * dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping. If there is such memory, this creates identity mapping only for RAM and sets the dev->bus_dma_limit to let the generic code decide whether to call into the direct DMA or the indirect DMA ops. This should not change the existing behaviour when no persistent memory as dev->dma_ops_bypass is expected to be set. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-09-08powerpc/pseries/iommu: Allow bigger 64bit window by removing default DMA windowLeonardo Bras
On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the default DMA window for the device, before attempting to configure a DDW, in order to make the maximum resources available for the next DDW to be created. This is a requirement for using DDW on devices in which hypervisor allows only one DMA window. If setting up a new DDW fails anywhere after the removal of this default DMA window, it's needed to restore the default DMA window. For this, an implementation of ibm,reset-pe-dma-windows rtas call is needed: Platforms supporting the DDW option starting with LoPAR level 2.7 implement ibm,ddw-extensions. The first extension available (index 2) carries the token for ibm,reset-pe-dma-windows rtas call, which is used to restore the default DMA window for a device, if it has been deleted. It does so by resetting the TCE table allocation for the PE to it's boot time value, available in "ibm,dma-window" device tree node. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-5-leobras.c@gmail.com
2020-09-08powerpc/pseries/iommu: Move window-removing part of remove_ddw into ↵Leonardo Bras
remove_dma_window Move the window-removing part of remove_ddw into a new function (remove_dma_window), so it can be used to remove other DMA windows. It's useful for removing DMA windows that don't create DIRECT64_PROPNAME property, like the default DMA window from the device, which uses "ibm,dma-window". Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-4-leobras.c@gmail.com
2020-09-08powerpc/pseries/iommu: Update call to ibm, query-pe-dma-windowsLeonardo Bras
>From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of outputs from "ibm,query-pe-dma-windows" go from 5 to 6. This change of output size is meant to expand the address size of largest_available_block PE TCE from 32-bit to 64-bit, which ends up shifting page_size and migration_capable. This ends up requiring the update of ddw_query_response->largest_available_block from u32 to u64, and manually assigning the values from the buffer into this struct, according to output size. Also, a routine was created for helping reading the ddw extensions as suggested by LoPAR: First reading the size of the extension array from index 0, checking if the property exists, and then returning it's value. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-3-leobras.c@gmail.com
2020-09-08powerpc/pseries/iommu: Create defines for operations in ibm, ddw-applicableLeonardo Bras
Create defines to help handling ibm,ddw-applicable values, avoiding confusion about the index of given operations. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Tested-by: David Dai <zdai@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200805030455.123024-2-leobras.c@gmail.com
2020-04-03powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent ↵Alexey Kardashevskiy
memory Unlike normal memory ("memory" compatible type in the FDT), the persistent memory ("ibm,pmemory" in the FDT) can be mapped anywhere in the guest physical space and it can be used for DMA. In order to maintain 1:1 mapping via the huge DMA window, we need to know the maximum physical address at the time of the window setup. So far we've been looking at "memory" nodes but "ibm,pmemory" does not have fixed addresses and the persistent memory may be mapped afterwards. Since the persistent memory is still backed with page structs, use MAX_PHYSMEM_BITS as the upper limit. This effectively disables huge DMA window in LPAR under pHyp if persistent memory is present but this is the best we can do for the moment. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200331012338.23773-1-aik@ozlabs.ru
2020-01-06powerpc/pseries/iommu: Separate FW_FEATURE_MULTITCE to put/stuff featuresAlexey Kardashevskiy
H_PUT_TCE_INDIRECT allows packing up to 512 TCE updates into a single hypercall; H_STUFF_TCE can clear lots in a single hypercall too. However, unlike H_STUFF_TCE (which writes the same TCE to all entries), H_PUT_TCE_INDIRECT uses a 4K page with new TCEs. In a secure VM environment this means sharing a secure VM page with a hypervisor which we would rather avoid. This splits the FW_FEATURE_MULTITCE feature into FW_FEATURE_PUT_TCE_IND and FW_FEATURE_STUFF_TCE. "hcall-multi-tce" in the "/rtas/ibm,hypertas-functions" device tree property sets both; the "multitce=off" kernel command line parameter disables both. This should not cause behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Tested-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191216041924.42318-4-aik@ozlabs.ru
2020-01-06powerpc/pseries: Allow not having ibm, hypertas-functions::hcall-multi-tce ↵Alexey Kardashevskiy
for DDW By default a pseries guest supports a H_PUT_TCE hypercall which maps a single IOMMU page in a DMA window. Additionally the hypervisor may support H_PUT_TCE_INDIRECT/H_STUFF_TCE which update multiple TCEs at once; this is advertised via the device tree /rtas/ibm,hypertas-functions property which Linux converts to FW_FEATURE_MULTITCE. FW_FEATURE_MULTITCE is checked when dma_iommu_ops is used; however the code managing the huge DMA window (DDW) ignores it and calls H_PUT_TCE_INDIRECT even if it is explicitly disabled via the "multitce=off" kernel command line parameter. This adds FW_FEATURE_MULTITCE checking to the DDW code path. This changes tce_build_pSeriesLP to take liobn and page size as the huge window does not have iommu_table descriptor which usually the place to store these numbers. Fixes: 4e8b0cf46b25 ("powerpc/pseries: Add support for dynamic dma windows") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Tested-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191216041924.42318-3-aik@ozlabs.ru
2020-01-06Revert "powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests"Ram Pai
This reverts commit edea902c1c1efb855f77e041f9daf1abe7a9768a. At the time the change allowed direct DMA ops for secure VMs; however since then we switched on using SWIOTLB backed with IOMMU (direct mapping) and to make this work, we need dma_iommu_ops which handles all cases including TCE mapping I/O pages in the presence of an IOMMU. Fixes: edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests") Signed-off-by: Ram Pai <linuxram@us.ibm.com> [aik: added "revert" and "fixes:"] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Tested-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191216041924.42318-2-aik@ozlabs.ru
2019-08-30powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guestsThiago Jung Bauermann
Secure guest memory is inacessible to devices so regular DMA isn't possible. In that case set devices' dma_map_ops to NULL so that the generic DMA code path will use SWIOTLB to bounce buffers for DMA. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-14-bauerman@linux.ibm.com