summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
AgeCommit message (Collapse)Author
2023-11-17drm/amdkfd: Move TLB flushing logic into amdgpuFelix Kuehling
This will make it possible for amdgpu GEM ioctls to flush TLBs on compute VMs. This removes VMID-based TLB flushing and always uses PASID-based flushing. This still works because it scans the VMID-PASID mapping registers to find the right VMID. It's only slightly less efficient. This is not a production use case. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-03drm/amdkfd: drop struct kfd_cu_infoAlex Deucher
I think this was an abstraction back from when kfd supported both radeon and amdgpu. Since we just support amdgpu now, there is no more need for this and we can use the amdgpu structures directly. This also avoids having the kfd_cu_info structures on the stack when inlining which can blow up the stack. Cc: Arnd Bergmann <arnd@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26drm/amdkfd: Move dma unmapping after TLB flushPhilip Yang
Otherwise GPU may access the stale mapping and generate IOMMU IO_PAGE_FAULT. Move this to inside p->mutex to prevent multiple threads mapping and unmapping concurrently race condition. After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm, kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and before free the mem attachment in case failed to unmap from GPUs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-11drm/amdkfd: drop IOMMUv2 supportAlex Deucher
Now that we use the dGPU path for all APUs, drop the IOMMUv2 support. v2: drop the now unused queue manager functions for gfx7/8 APUs Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-04Merge tag 'amd-drm-next-6.6-2023-07-28' of ↵Daniel Vetter
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.6-2023-07-28: amdgpu: - Lots of checkpatch cleanups - GFX 9.4.3 updates - Add USB PD and IFWI flashing documentation - GPUVM updates - RAS fixes - DRR fixes - FAMS fixes - Virtual display fixes - Soft IH fixes - SMU13 fixes - Rework PSP firmware loading for other IPs - Kernel doc fixes - DCN 3.0.1 fixes - LTTPR fixes - DP MST fixes - DCN 3.1.6 fixes - SubVP fixes - Display bandwidth calculation fixes - VCN4 secure submission fixes - Allow building DC on RISC-V - Add visible FB info to bo_print_info - HBR3 fixes - Add PSP 14.0 support - GFX9 MCBP fix - GMC10 vmhub index fix - GMC11 vmhub index fix - Create a new doorbell manager - SR-IOV fixes amdkfd: - Cleanup CRIU dma-buf handling - Use KIQ to unmap HIQ - GFX 9.4.3 debugger updates - GFX 9.4.2 debugger fixes - Enable cooperative groups fof gfx11 - SVM fixes radeon: - Lots of checkpatch cleanups Merge conflicts: - drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c The switch to drm eu helpers in 8a206685d36f ("drm/amdgpu: use drm_exec for GEM and CSA handling v2") clashed with the cosmetic cleanups from 30953c4d000b ("drm/amdgpu: Fix style issues in amdgpu_gem.c"). I kept the former since the cleanup up code is gone. - drivers/gpu/drm/amd/amdgpu/atom.c. adf64e214280 ("drm/amd: Avoid reading the VBIOS part number twice") removed code that 992b8fe106ab ("drm/radeon: Replace all non-returning strlcpy with strscpy") polished. From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230728214228.8102-1-alexander.deucher@amd.com [sima: some merge conflict wrangling as noted] Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2023-07-12drm/amdkfd: switch over to using drm_exec v3Christian König
Avoids quite a bit of logic and kmalloc overhead. v2: fix multiple problems pointed out by Felix v3: two more nit picks from Felix fixed Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-4-christian.koenig@amd.com
2023-07-07drm/amdkfd: Use KIQ to unmap HIQMukul Joshi
Currently, we unmap HIQ by directly writing to HQD registers. This doesn't work for GFX9.4.3. Instead, use KIQ to unmap HIQ, similar to how we use KIQ to map HIQ. Using KIQ to unmap HIQ works for all GFX series post GFXv9. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: add debug suspend and resume process queues operationJonathan Kim
In order to inspect waves from the saved context at any point during a debug session, the debugger must be able to preempt queues to trigger context save by suspending them. On queue suspend, the KFD will copy the context save header information so that the debugger can correctly crawl the appropriate size of the saved context. The debugger must then also be allowed to resume suspended queues. A queue that is newly created cannot be suspended because queue ids are recycled after destruction so the debugger needs to know that this has occurred. Query functions will be later added that will clear a given queue of its new queue status. A queue cannot be destroyed while it is suspended to preserve its saved context during debugger inspection. Have queue destruction block while a queue is suspended and unblocked when it is resumed. Likewise, if a queue is about to be destroyed, it cannot be suspended. Return the number of queues successfully suspended or resumed along with a per queue status array where the upper bits per queue status show that the request was invalid (new/destroyed queue suspend request, missing queue) or an error occurred (HWS in a fatal state so it can't suspend or resume queues). Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: update process interrupt handling for debug eventsJonathan Kim
The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts. If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation. The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status. The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime. GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/gmc9: fix 64 bit division in partition codeAlex Deucher
Rework logic or use do_div() to avoid problems on 32 bit. v2: add a missing case for XCP macro v3: fix out of bounds array access v4: fix xcp handling harder Acked-by: Guchun Chen <guchun.chen@amd.com> (v1) Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> (v3) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Do not access members of xcp w/o check (v2)Hawking Zhang
Not all the asic needs xcp. ensure check xcp availabity before accessing its member. v2: add missing change in kfd_topology.c Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Refactor migrate init to support partition switchPhilip Yang
Rename smv_migrate_init to a better name kgd2kfd_init_zone_device because it setup zone devive pgmap for page migration and keep it in kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it only once in amdgpu_device_ip_init after adev ip blocks are initialized, but before amdgpu_amdkfd_device_init initialize kfd nodes which enable SVM support based on pgmap. svm_range_set_max_pages is called by kgd2kfd_device_init everytime after switching compute partition mode. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Fix memory reporting on GFX 9.4.3Mukul Joshi
This patch fixes memory reporting on the GFX 9.4.3 APU and dGPU by reporting available memory on a per partition basis. If its an APU, available and used memory calculations take into account system and TTM memory. v2: squash in fix ("drm/amdkfd: Fix array out of bound warning") squash in fix ("drm/amdgpu: Update memory reporting for GFX9.4.3") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Move local_mem_info to kfd_nodeMukul Joshi
We need to track memory usage on a per partition basis. To do that, store the local memory information in KFD node instead of kfd device. v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: KFD graphics interop support compute partitionPhilip Yang
kfd_ioctl_get_dmabuf use the amdgpu bo xcp_id to get the gpu_id of the KFD node from the exported dmabuf_adev, and then create kfd bo on the correct adev and KFD node when importing the amdgpu bo to KFD. Remove function kfd_device_by_adev, it is not needed as it is the same result as dmabuf_adev->kfd.dev->nodes[0]->id. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Store xcp partition id to amdgpu boPhilip Yang
For memory accounting per compute partition and export drm amdgpu bo and then import to KFD, we need the xcp id to account the memory usage or find the KFD node of the original amdgpu bo to create the KFD bo on the correct adev KFD node. Set xcp_id_plus1 of amdgpu_bo_param to create bo and store xcp_id to amddgpu bo. Add helper macro to get the mem_id from adev and xcp_id. v2: squash in fix ("drm/amdgpu: Fix BO creation failure on GFX 9.4.3 dGPU") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Show KFD node memory partition infoPhilip Yang
Show KFD node memory partition id and size, add helper function KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used later to support memory accounting per partition. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Move pgmap to amdgpu_kfd_dev structurePhilip Yang
VRAM pgmap resource is allocated every time when switching compute partitions because kfd_dev is re-initialized by post_partition_switch, As a result, it causes memory region resource leaking and system memory usage accounting unbalanced. pgmap resource should be allocated and registered only once when loading driver and freed when unloading driver, move it from kfd_dev to amdgpu_kfd_dev. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Add device repartition supportMukul Joshi
GFX9.4.3 will support dynamic repartitioning of the GPU through sysfs. Add device repartitioning support in KFD to repartition GPU from one mode to other. v2: squash in fix ("drm/amdkfd: Fix warning kgd2kfd_unlock_kfd defined but not used") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add XCC inst to PASID TLB flushingMukul Joshi
Add XCC instance to select the correct KIQ ring when flushing TLBs on a multi-XCC setup. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-28drm/amdkfd: Implement DMA buf fd export from KFDFelix Kuehling
Exports a DMA buf fd of a given KFD buffer handle. This is intended for being able to import KFD BOs into GEM contexts to leverage the amdgpu_bo_va API for more flexible virtual address mappings. It will also be used for the new upstreamable RDMA solution coming to UCX and RCCL. The corresponding user mode change (Thunk API and kfdtest) is here: https://github.com/fxkamd/ROCT-Thunk-Interface/commits/fxkamd/dmabuf Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-11drm/amdkfd: Page aligned memory reserve sizePhilip Yang
Use page aligned size to reserve memory usage because page aligned TTM BO size is used to unreserve memory usage, otherwise no page aligned size causes memory usage accounting unbalanced. Change vram_used definition type to int64_t to be able to trigger WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the accounting issue with warning and backtrace. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-11drm/amdkfd: Cleanup vm process info if init vm failedPhilip Yang
If acquire_vm failed when initializing KFD vm, set vm->process_info to NULL and free process info, otherwise, the future acquire_vm will always fail as vm->process_info is not NULL. Pass avm as parameter to remove the duplicate code. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdkfd: Fix double release compute pasidPhilip Yang
If kfd_process_device_init_vm returns failure after vm is converted to compute vm and vm->pasid set to compute pasid, KFD will not take pdd->drm_file reference. As a result, drm close file handler maybe called to release the compute pasid before KFD process destroy worker to release the same pasid and set vm->pasid to zero, this generates below WARNING backtrace and NULL pointer access. Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step of kfd_process_device_init_vm, to ensure vm pasid is the original pasid if acquiring vm failed or is the compute pasid with pdd->drm_file reference taken to avoid double release same pasid. amdgpu: Failed to create process VM object ida_free called for id=32770 which is not allocated. WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140 RIP: 0010:ida_free+0x96/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:ida_free+0x76/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: Add notifier lock for KFD userptrsFelix Kuehling
Add a per-process MMU notifier lock for processing notifiers from userptrs. Use that lock to properly synchronize page table updates with MMU notifiers. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Xiaogang Chen<Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27drm/amdkfd: Cleanup kfd_dev structMukul Joshi
Cleanup kfd_dev struct by removing ddev and pdev as both drm_device and pci_dev can be fetched from amdgpu_device. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-10drm/amdgpu: Pessimistic availability based on rounded up allocationsDaniel Phillips
Separately accumulate a statistic of rounded up allocations to use to report availability, with a view to increasing the likelihood a buffer object can be successfully allocated at exactly the size reported by the availability API. Signed-off-by: Daniel Phillips <daniel.phillips@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdgpu: add debugfs for kfd system and ttm mem usedAlex Sierra
This keeps track of kfd system mem used and kfd ttm mem used. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28drm/amdkfd: track unified memory reservation with xnack offAlex Sierra
[WHY] Unified memory with xnack off should be tracked, as userptr mappings and legacy allocations do. To avoid oversuscribe system memory when xnack off. [How] Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM API and call them on every prange creation and free. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdkfd: Add user queue eviction restore SMI eventPhilip Yang
Output user queue eviction and restore event. User queue eviction may be triggered by svm or userptr MMU notifier, TTM eviction, device suspend and CRIU checkpoint and restore. User queue restore may be rescheduled if eviction happens again while restore. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23drm/amdkfd: Enable GFX11 usermode queue oversubscriptionGraham Sider
Starting with GFX11, MES requires wptr BOs to be GTT allocated/mapped to GART for usermode queues in order to support oversubscription. In the case that work is submitted to an unmapped queue, MES must have a GART wptr address to determine whether the queue should be mapped. This change is accompanied with changes in MES and is applicable for MES_API_VERSION >= 2. v3: - Use amdgpu_vm_bo_lookup_mapping for wptr_bo mapping lookup - Move wptr_bo refcount increment to amdgpu_amdkfd_map_gtt_bo_to_gart - Remove list_del_init from amdgpu_amdkfd_map_gtt_bo_to_gart - Cleanup/fix create_queue wptr_bo error handling v4: - Add MES version shift/mask defines to amdgpu_mes.h - Change version check from MES_VERSION to MES_API_VERSION - Add check in kfd_ioctl_create_queue before wptr bo pin/GART map to ensure bo is a single page. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-14drm/amdkfd: Add available memory ioctlDaniel Phillips
Add a new KFD ioctl to return the largest possible memory size that can be allocated as a buffer object using kfd_ioctl_alloc_memory_of_gpu. It attempts to use exactly the same accept/reject criteria as that function so that allocating a new buffer object of the size returned by this new ioctl is guaranteed to succeed, barring races with other allocating tasks. This IOCTL will be used by libhsakmt: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html Signed-off-by: Daniel Phillips <Daniel.Phillips@amd.com> Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdgpu: Add work_struct for GPU reset from kfd.Andrey Grodzovsky
We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amdkfd: Document and fix GTT BO kmap APIFelix Kuehling
Removed an unused parameter from two functions and added kernel-doc comments. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUsRamesh Errabolu
Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25drm/amdkfd: use tlb_seq from the VM subsystem for SVM as well v2Christian König
Instead of hand rolling the table_freed parameter. v2: add some changes suggested by Philip Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang<Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25drm/amdgpu: add UTCL2 RAS poison query for Aldebaran (v2)Tao Zhou
Add help functions to query and reset RAS UTCL2 poison status. v2: implement it on amdgpu side and kfd only calls it. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdkfd: CRIU Implement KFD resume ioctlRajneesh Bhardwaj
This adds support to create userptr BOs on restore and introduces a new ioctl op to restart memory notifiers for the restored userptr BOs. When doing CRIU restore MMU notifications can happen anytime after we call amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the restore process i.e. criu_resume ioctl op is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdkfd: CRIU Implement KFD checkpoint ioctlRajneesh Bhardwaj
This adds support to discover the buffer objects that belong to a process being checkpointed. The data corresponding to these buffer objects is returned to user space plugin running under criu master context which then stores this info to recreate these buffer objects during a restore operation. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11drm/amdkfd: remove unused functionNirmoy Das
Remove unused amdgpu_amdkfd_get_vram_usage() CC: Felix.Kuehling@amd.com Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Fixes: dfcbe6d5f4a340 ("drm/amdgpu: Remove unused function pointers")
2021-12-28drm/amdkfd: reset queue which consumes RAS poison (v2)Tao Zhou
CP supports unmap queue with reset mode which only destroys specific queue without affecting others. Replacing whole gpu reset with reset queue mode for RAS poison consumption saves much time, and we can also fallback to gpu reset solution if reset queue fails. v2: Return directly if process is NULL; Reset queue solution is not applicable to SDMA, fallback to legacy way; Call kfd_unref_process after lookup process. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22drm/amdgpu: Update BO memory accounting to rely on allocation flagRamesh Errabolu
Accounting system to track amount of available memory (system, TTM and VRAM of a device) relies on BO's domain. The change is to rely instead on allocation flag indicating BO type - VRAM, GTT, USERPTR, MMIO or DOORBELL Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: replace trivial funcs with direct accessGraham Sider
These get funcs simply return an adev field. Replace funcs/calls with direct field accesses instead. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: remove kgd_dev declaration and initializationGraham Sider
Completes removal of kgd_dev. Direct references to amdgpu_device objects should now be used instead. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: replace kgd_dev in gpuvm amdgpu_amdkfd funcsGraham Sider
Modified definitions: - amdgpu_amdkfd_gpuvm_acquire_process_vm - amdgpu_amdkfd_gpuvm_release_process_vm - amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu - amdgpu_amdkfd_gpuvm_free_memory_of_gpu - amdgpu_amdkfd_gpuvm_map_memory_to_gpu - amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu - amdgpu_amdkfd_gpuvm_sync_memory - amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel - amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel - amdgpu_amdkfd_gpuvm_get_vm_fault_info - amdgpu_amdkfd_gpuvm_import_dmabuf - amdgpu_amdkfd_get_tile_config Removed: - get_amdgpu_device Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: replace kgd_dev in get amdgpu_amdkfd funcsGraham Sider
Modified definitions: - amdgpu_amdkfd_get_fw_version - amdgpu_amdkfd_get_local_mem_info - amdgpu_amdkfd_get_gpu_clock_counter - amdgpu_amdkfd_get_max_engine_clock_in_mhz - amdgpu_amdkfd_get_cu_info - amdgpu_amdkfd_get_dmabuf_info - amdgpu_amdkfd_get_vram_usage - amdgpu_amdkfd_get_hive_id - amdgpu_amdkfd_get_unique_id - amdgpu_amdkfd_get_mmio_remap_phys_addr - amdgpu_amdkfd_get_num_gws - amdgpu_amdkfd_get_asic_rev_id - amdgpu_amdkfd_get_noretry - amdgpu_amdkfd_get_xgmi_hops_count - amdgpu_amdkfd_get_xgmi_bandwidth_mbytes - amdgpu_amdkfd_get_pcie_bandwidth_mbytes Also replaces kfd_device_by_kgd with kfd_device_by_adev, now searching via adev rather than kgd. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcsGraham Sider
Modified definitions: - amdgpu_amdkfd_submit_ib - amdgpu_amdkfd_set_compute_idle - amdgpu_amdkfd_have_atomics_support - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_gpu_reset - amdgpu_amdkfd_alloc_gtt_mem - amdgpu_amdkfd_free_gtt_mem - amdgpu_amdkfd_alloc_gws - amdgpu_amdkfd_free_gws - amdgpu_amdkfd_ras_poison_consumption_handler Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: Fix dangling kfd_bo pointer for shared BOsFelix Kuehling
If a kfd_bo was shared (e.g. a dmabuf export), the original kfd_bo may be freed when the amdgpu_bo still lives on. Free the kfd_bo struct in the release_notify callback then the amdgpu_bo is freed. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-By: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28drm/amdkfd: Separate pinned BOs destruction from general routineLang Yu
Currently, all kfd BOs use same destruction routine. But pinned BOs are not unpinned properly. Separate them from general routine. v2 (Felix): Add safeguard to prevent user space from freeing signal BO. Kunmap signal BO in the event of setting event page error. Just kunmap signal BO to avoid duplicating the code. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-04drm/amdkfd: clean up parameters in kgd2kfd_probeAlex Deucher
We can get the pdev and asic type from the adev. No need to pass them explicitly. v2: squash in build fix for !CONFIG_HSA_AMD from Anson Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>