summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
AgeCommit message (Collapse)Author
2022-01-11drm/amdkfd: remove unused functionNirmoy Das
Remove unused amdgpu_amdkfd_get_vram_usage() CC: Felix.Kuehling@amd.com Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Fixes: dfcbe6d5f4a340 ("drm/amdgpu: Remove unused function pointers")
2021-12-30drm/amdgpu: save error count in RAS poison handlerTao Zhou
Otherwise the RAS error count couldn't be queried from sysfs. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28drm/amdkfd: reset queue which consumes RAS poison (v2)Tao Zhou
CP supports unmap queue with reset mode which only destroys specific queue without affecting others. Replacing whole gpu reset with reset queue mode for RAS poison consumption saves much time, and we can also fallback to gpu reset solution if reset queue fails. v2: Return directly if process is NULL; Reset queue solution is not applicable to SDMA, fallback to legacy way; Call kfd_unref_process after lookup process. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: replace trivial funcs with direct accessGraham Sider
These get funcs simply return an adev field. Replace funcs/calls with direct field accesses instead. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: remove kgd_dev declaration and initializationGraham Sider
Completes removal of kgd_dev. Direct references to amdgpu_device objects should now be used instead. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: replace kgd_dev in get amdgpu_amdkfd funcsGraham Sider
Modified definitions: - amdgpu_amdkfd_get_fw_version - amdgpu_amdkfd_get_local_mem_info - amdgpu_amdkfd_get_gpu_clock_counter - amdgpu_amdkfd_get_max_engine_clock_in_mhz - amdgpu_amdkfd_get_cu_info - amdgpu_amdkfd_get_dmabuf_info - amdgpu_amdkfd_get_vram_usage - amdgpu_amdkfd_get_hive_id - amdgpu_amdkfd_get_unique_id - amdgpu_amdkfd_get_mmio_remap_phys_addr - amdgpu_amdkfd_get_num_gws - amdgpu_amdkfd_get_asic_rev_id - amdgpu_amdkfd_get_noretry - amdgpu_amdkfd_get_xgmi_hops_count - amdgpu_amdkfd_get_xgmi_bandwidth_mbytes - amdgpu_amdkfd_get_pcie_bandwidth_mbytes Also replaces kfd_device_by_kgd with kfd_device_by_adev, now searching via adev rather than kgd. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcsGraham Sider
Modified definitions: - amdgpu_amdkfd_submit_ib - amdgpu_amdkfd_set_compute_idle - amdgpu_amdkfd_have_atomics_support - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_gpu_reset - amdgpu_amdkfd_alloc_gtt_mem - amdgpu_amdkfd_free_gtt_mem - amdgpu_amdkfd_alloc_gws - amdgpu_amdkfd_free_gws - amdgpu_amdkfd_ras_poison_consumption_handler Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-04drm/amdkfd: clean up parameters in kgd2kfd_probeAlex Deucher
We can get the pdev and asic type from the adev. No need to pass them explicitly. v2: squash in build fix for !CONFIG_HSA_AMD from Anson Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-04amd/amdkfd: add ras page retirement handling for sq/sdma (v3)Tao Zhou
In ras poison mode, page retirement will be handled by the irq handler of the module which consumes corrupted data. v2: rename ras_process_cb to ras_poison_consumption_handler. move the handler's implementation from ASIC specific file to common file. v3: call gpu reset for xGMI connected mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-16drm/amdgpu: add amdgpu_amdkfd_resume_iommuJames Zhu
Add amdgpu_amdkfd_resume_iommu for amdgpu. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-08-16drm/amd/amdgpu embed hw_fence into amdgpu_jobJack Zhang
Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Jack Zhang <Jack.Zhang7@hotmail.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23drm/amdkfd: report pcie bandwidth to the kfdJonathan Kim
Similar to xGMI reporting the min/max bandwidth between direct peers, PCIe will report the min/max bandwidth to the KFD. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23drm/amdkfd: report xgmi bandwidth between direct peers to the kfdJonathan Kim
Report the min/max bandwidth in megabytes to the kfd for direct xgmi connections only. Indirect peers will report 0 since indirect route is unknown. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-22Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging from drm/drm-next to the patches for AMD devices for v5.14. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2021-05-19drm/amdgpu: Add early fini callbackAndrey Grodzovsky
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. v7: Split kfd suspend from device exit to expdite HW related stuff to amdgpu_pci_remove v8: Squash previous KFD commit into this commit to avoid compile break. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com
2021-05-19drm/amdkfd: heavy-weight flush TLB after unmapPhilip Yang
Need do a heavy-weight TLB flush to make sure we have no more dirty data in the cache for the unmapped pages. Define enum TLB_FLUSH_TYPE, add flush_type parameter to amdgpu_amdkfd_flush_gpu_tlb_pasid. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: use amdgpu_bo_user bo for metadata and tiling flagNirmoy Das
Tiling flag and metadata are only needed for BOs created by amdgpu_gem_object_create(), so we can remove those from the base class. v2: * squash tiling_flags and metadata relared patches into one * use BUG_ON for non ttm_bo_type_device type when accessing tiling_flags and metadata._ v3: *include to_amdgpu_bo_user Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: use amdgpu_bo_create_user() for when possibleNirmoy Das
Use amdgpu_bo_create_user() for all the BO allocations for ttm_bo_type_device type. v2: include amdgpu_amdkfd_alloc_gws() as well it calls amdgpu_bo_create() for ttm_bo_type_device Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: allow variable BO struct creationNirmoy Das
Allow allocating BO structures with different structure size than struct amdgpu_bo. v2: Check bo_ptr_size in all amdgpu_bo_create() caller. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23Revert "drm/amdgpu: During compute disable GFXOFF for Sienna_Cichlid"Harish Kasiviswanathan
This reverts commit 73bf5cad2696fe3a21f70101821405db839ea18e. Fixed in newer firmware Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23drm/amdgpu: Add kfd init_complete flag to check from amdgpu sideshaoyunl
amdgpu driver may be in reset state during init which will not initialize the kfd, driver need to initialize the KFD after reset by check the flag Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23drm/amdgpu: Use free system memory size for kfd memory accountingOak Zeng
With the current kfd memory accounting scheme, kfd applications can use up to 15/16 of total system memory. For system which has small total system memory size it leaves small system memory for OS. For example, if the system has totally 16GB of system memory, this scheme leave OS and non-kfd applications only 1GB of system memory. In many cases, this leads to OOM killer. This patch changed the KFD system memory accounting scheme. 15/16 of free system memory when kfd driver load. This deduct the system memory that OS already use. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Suggested-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-01-28drm/amd/amdkfd: adjust dummy functions' placementLang Yu
Move all the dummy functions in amdgpu_amdkfd.c to amdgpu_amdkfd.h as inline functions. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-11-04drm/amdgpu: Change the way to determine framebuffer typeGang Ba
Determine FRAMEBUFFER_PUBLIC/PRIVATE only based host-accessibility, not peer-accesssibility Signed-off-by: Gang Ba <gaba@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-23drm/amdgpu: During compute disable GFXOFF for Sienna_CichlidHarish Kasiviswanathan
Workaround to fix the soft hang observed in certain compute applications. Acked-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-09drm/amdgpu: kfd_initialized can be statickernel test robot
Fixes: c7651b73586600 ("drm/amdgpu: Fix handling of KFD initialization failures") Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-25drm/amdgpu: store noretry parameter per driver instanceAlex Deucher
This will allow us to have different defaults per asic in a future patch. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-22drm/amdgpu: Fix handling of KFD initialization failuresFelix Kuehling
Remember KFD module initializaton status in a global variable. Skip KFD device probing when the module was not initialized. Other amdgpu_amdkfd calls are then protected by the adev->kfd.dev check. Also print a clear error message when KFD disables itself. Amdgpu continues its initialization even when KFD failed. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-09-08Merge tag 'amd-drm-next-5.10-2020-09-03' of ↵Dave Airlie
git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-next-5.10-2020-09-03: amdgpu: - RAS fixes - Sienna Cichlid updates - Navy Flounder updates - DCE6 (SI) support in DC - Enable plane rotation - Rework pre-OS vram reservation handling during driver init - Add standard interface to dump GPU metrics table from SMU - Rework tiling and tmz state handling in atomic commits - Pstate fixes - Add voltage and power hwmon interfaces for renoir - SW CTF fixes - S/G display fix for Raven - Print client strings for vmfaults for vega and newer - Manual fan control fixes - Display updates - Reorg power management directory structure - Misc bug fixes - Misc code cleanups amdkfd: - Topology fixes - Add SMI events for thermal throttling and GPU resets radeon: - switch from pci_* to dma_* for dma allocations - PLL fix Scheduler: - Clean up priority levels UAPI: - amdgpu INFO IOCTL query update for TMZ state https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049 - amdkfd SMI event interface updates https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/therm_thrott From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200903222921.4152-1-alexander.deucher@amd.com
2020-08-24drm/amdgpu: Get DRM dev from adev by inline-fLuben Tuikov
Add a static inline adev_to_drm() to obtain the DRM device pointer from an amdgpu_device pointer. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)Luben Tuikov
Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14drm/amdgpu: revert "fix system hang issue during GPU reset"Christian König
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-12Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging drm-next into drm-misc-next for nouveau and panel updates. Resolves a conflict between ttm and nouveau, where struct ttm_mem_res got renamed to struct ttm_resource. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2020-08-07drm/amdgpu: unlock mutex on errorDennis Li
Make sure to unlock the mutex when error happen v2: 1. correct syntax error in the commit comments 2. remove change-Id Acked-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-06drm/ttm: rename ttm_mem_type_manager -> ttm_resource_manager.Dave Airlie
This name makes a lot more sense, since these are about managing driver resources rather than just memory ranges. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200804025632.3868079-59-airlied@gmail.com
2020-08-06drm/amdgfx/ttm: use wrapper to get ttm memory managersDave Airlie
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200804025632.3868079-38-airlied@gmail.com
2020-07-27drm/amdkfd: Add thermal throttling SMI eventMukul Joshi
Add support for reporting thermal throttling events through SMI. Also, add a counter to count the number of throttling interrupts observed and report the count in the SMI event message. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-27drm/amdgpu: fix system hang issue during GPU resetDennis Li
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Suggested-by: Luben Tukov <luben.tuikov@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-02drm/amdgpu: Clean up KFD VMID assignmentFelix Kuehling
The KFD VMID assignment was hard-coded in a few places. Consolidate that in a single variable adev->vm_manager.first_kfd_vmid. The value is still assigned in gmc-ip-version-specific code. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-28drm/amdkfd: Put ASIC revision into HSA capabilityJoseph Greathouse
In order to surface the ASIC revision to user level, we want to put it into the HSA topology. This can be because different ASIC revisions may require user-level software to do different things (e.g. patch code for things that are changed in later hardware revisions). The ASIC revision from the hardware is maximum of 4 bits at this time, so put it into 4 of the open bits in the HSA capability. Then user-level software can use this capability information to know -- for each ASIC -- what revision-based things must be done. Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10drm/amdkfd: Consolidate duplicated bo alloc flagsYong Zhao
ALLOC_MEM_FLAGS_* used are the same as the KFD_IOC_ALLOC_MEM_FLAGS_*, but they are interweavedly used in kernel driver, resulting in bad readability. For example, KFD_IOC_ALLOC_MEM_FLAGS_COHERENT is not referenced in kernel, and it functions implicitly in kernel through ALLOC_MEM_FLAGS_COHERENT, causing unnecessary confusion. Replace all occurrences of ALLOC_MEM_FLAGS_* with KFD_IOC_ALLOC_MEM_FLAGS_* to solve the problem. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-06drm/amdgpu: Use better names to reflect it is CP MQD bufferYong Zhao
Add "CP" to AMDGPU_GEM_CREATE_MQD_GFX9 to indicate it is only for CP MQD buffer. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdkfd: Avoid ambiguity by indicating it's cp queueYong Zhao
The queues represented in queue_bitmap are only CP queues. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amd: Extend ROCt to surface UUID for devices that have themDivya Shikre
Devices from Arcturus onwards will have their UUID exposed to Thunk. Adding neccessary functions to the kernel to propagate the uuid. Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdgpu: Fix check for DPM when returning max clockKent Russell
pp_funcs may not exist, while dpm may be enabled. This change ensures that KFD topology will report the same as pp_dpm_sclk, as the conditions for reporting them will be the same. Otherwise, we may see the issue where KFD reports "100MHz" in topology as the max speed, while DPM is working correctly. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-26drm/amdgpu: Remove kfd eviction fence before release bo (v2)xinhui pan
No need to trigger eviction as the memory mapping will not be used anymore. All pt/pd bos share same resv, hence the same shared eviction fence. Everytime page table is freed, the fence will be signled and that cuases kfd unexcepted evictions. v2: squash in 32 bit fix CC: Christian König <christian.koenig@amd.com> CC: Felix Kuehling <felix.kuehling@amd.com> CC: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-25drm/amdgpu: Improve Vega20 XGMI TLB flush workaroundFelix Kuehling
Using a heavy-weight TLB flush once is not sufficient. Concurrent memory accesses in the same TLB cache line can re-populate TLB entries from stale texture cache (TC) entries while the heavy-weight TLB flush is in progress. To fix this race condition, perform another TLB flush after the heavy-weight one, when TC is known to be clean. Move the workaround into the low-level TLB flushing functions. This way they apply to amdgpu as well, and KIQ-based TLB flush only needs to synchronize once. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: shaoyun liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12drm/amdkfd: refactor runtime pm for bacoRajneesh Bhardwaj
So far the kfd driver implemented same routines for runtime and system wide suspend and resume (s2idle or mem). During system wide suspend the kfd aquires an atomic lock that prevents any more user processes to create queues and interact with kfd driver and amd gpu. This mechanism created problem when amdgpu device is runtime suspended with BACO enabled. Any application that relies on kfd driver fails to load because the driver reports a locked kfd device since gpu is runtime suspended. However, in an ideal case, when gpu is runtime suspended the kfd driver should be able to: - auto resume amdgpu driver whenever a client requests compute service - prevent runtime suspend for amdgpu while kfd is in use This change refactors the amdgpu and amdkfd drivers to support BACO and runtime power management. Reviewed-by: Oak Zeng <oak.zeng@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16drm/amdgpu: GPU TLB flush API moved to amdgpu_amdkfdAlex Sierra
[Why] TLB flush method has been deprecated using kfd2kgd interface. This implementation is now on the amdgpu_amdkfd API. [How] TLB flush functions now implemented in amdgpu_amdkfd. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-14drm/amd/powerplay: cover the powerplay implementation details V3Evan Quan
This can save users much troubles. As they do not actually need to care whether swSMU or traditional powerplay routine should be used. V2: apply the fixes to vi.c and cik.c also V3: squash in oops fix Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>