summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2025-07-01drm/i915/gsc: mei interrupt top half should be in irq disabled contextJunxiao Chang
MEI GSC interrupt comes from i915. It has top half and bottom half. Top half is called from i915 interrupt handler. It should be in irq disabled context. With RT kernel, by default i915 IRQ handler is in threaded IRQ. MEI GSC top half might be in threaded IRQ context. generic_handle_irq_safe API could be called from either IRQ or process context, it disables local IRQ then calls MEI GSC interrupt top half. This change fixes A380/A770 GPU boot hang issue with RT kernel. Fixes: 1e3dc1d8622b ("drm/i915/gsc: add gsc as a mei auxiliary device") Tested-by: Furong Zhou <furong.zhou@intel.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Link: https://lore.kernel.org/r/20250425151108.643649-1-junxiao.chang@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit dccf655f69002d496a527ba441b4f008aa5bebbf) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-07-01drm/i915/gt: Fix timeline left held on VMA alloc errorJanusz Krzysztofik
The following error has been reported sporadically by CI when a test unbinds the i915 driver on a ring submission platform: <4> [239.330153] ------------[ cut here ]------------ <4> [239.330166] i915 0000:00:02.0: [drm] drm_WARN_ON(dev_priv->mm.shrink_count) <4> [239.330196] WARNING: CPU: 1 PID: 18570 at drivers/gpu/drm/i915/i915_gem.c:1309 i915_gem_cleanup_early+0x13e/0x150 [i915] ... <4> [239.330640] RIP: 0010:i915_gem_cleanup_early+0x13e/0x150 [i915] ... <4> [239.330942] Call Trace: <4> [239.330944] <TASK> <4> [239.330949] i915_driver_late_release+0x2b/0xa0 [i915] <4> [239.331202] i915_driver_release+0x86/0xa0 [i915] <4> [239.331482] devm_drm_dev_init_release+0x61/0x90 <4> [239.331494] devm_action_release+0x15/0x30 <4> [239.331504] release_nodes+0x3d/0x120 <4> [239.331517] devres_release_all+0x96/0xd0 <4> [239.331533] device_unbind_cleanup+0x12/0x80 <4> [239.331543] device_release_driver_internal+0x23a/0x280 <4> [239.331550] ? bus_find_device+0xa5/0xe0 <4> [239.331563] device_driver_detach+0x14/0x20 ... <4> [357.719679] ---[ end trace 0000000000000000 ]--- If the test also unloads the i915 module then that's followed with: <3> [357.787478] ============================================================================= <3> [357.788006] BUG i915_vma (Tainted: G U W N ): Objects remaining on __kmem_cache_shutdown() <3> [357.788031] ----------------------------------------------------------------------------- <3> [357.788204] Object 0xffff888109e7f480 @offset=29824 <3> [357.788670] Allocated in i915_vma_instance+0xee/0xc10 [i915] age=292729 cpu=4 pid=2244 <4> [357.788994] i915_vma_instance+0xee/0xc10 [i915] <4> [357.789290] init_status_page+0x7b/0x420 [i915] <4> [357.789532] intel_engines_init+0x1d8/0x980 [i915] <4> [357.789772] intel_gt_init+0x175/0x450 [i915] <4> [357.790014] i915_gem_init+0x113/0x340 [i915] <4> [357.790281] i915_driver_probe+0x847/0xed0 [i915] <4> [357.790504] i915_pci_probe+0xe6/0x220 [i915] ... Closer analysis of CI results history has revealed a dependency of the error on a few IGT tests, namely: - igt@api_intel_allocator@fork-simple-stress-signal, - igt@api_intel_allocator@two-level-inception-interruptible, - igt@gem_linear_blits@interruptible, - igt@prime_mmap_coherency@ioctl-errors, which invisibly trigger the issue, then exhibited with first driver unbind attempt. All of the above tests perform actions which are actively interrupted with signals. Further debugging has allowed to narrow that scope down to DRM_IOCTL_I915_GEM_EXECBUFFER2, and ring_context_alloc(), specific to ring submission, in particular. If successful then that function, or its execlists or GuC submission equivalent, is supposed to be called only once per GEM context engine, followed by raise of a flag that prevents the function from being called again. The function is expected to unwind its internal errors itself, so it may be safely called once more after it returns an error. In case of ring submission, the function first gets a reference to the engine's legacy timeline and then allocates a VMA. If the VMA allocation fails, e.g. when i915_vma_instance() called from inside is interrupted with a signal, then ring_context_alloc() fails, leaving the timeline held referenced. On next I915_GEM_EXECBUFFER2 IOCTL, another reference to the timeline is got, and only that last one is put on successful completion. As a consequence, the legacy timeline, with its underlying engine status page's VMA object, is still held and not released on driver unbind. Get the legacy timeline only after successful allocation of the context engine's VMA. v2: Add a note on other submission methods (Krzysztof Karas): Both execlists and GuC submission use lrc_alloc() which seems free from a similar issue. Fixes: 75d0a7f31eec ("drm/i915: Lift timeline into intel_context") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12061 Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Reviewed-by: Krzysztof Niemiec <krzysztof.niemiec@intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250611104352.1014011-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit cc43422b3cc79eacff4c5a8ba0d224688ca9dd4f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-06-30drm/vmwgfx: Fix guests running with TDX/SEVMarko Kiiskila
Commit 81256a50aa0f ("x86/mm: Make memremap(MEMREMAP_WB) map memory as encrypted by default") changed the default behavior of memremap(MEMREMAP_WB) and started mapping memory as encrypted. The driver requires the fifo memory to be decrypted to communicate with the host but was relaying on the old default behavior of memremap(MEMREMAP_WB) and thus broke. Fix it by explicitly specifying the desired behavior and passing MEMREMAP_DEC to memremap. Fixes: 81256a50aa0f ("x86/mm: Make memremap(MEMREMAP_WB) map memory as encrypted by default") Signed-off-by: Marko Kiiskila <marko.kiiskila@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20250618192926.1092450-1-zack.rusin@broadcom.com
2025-06-30drm/i915/gsc: mei interrupt top half should be in irq disabled contextJunxiao Chang
MEI GSC interrupt comes from i915. It has top half and bottom half. Top half is called from i915 interrupt handler. It should be in irq disabled context. With RT kernel, by default i915 IRQ handler is in threaded IRQ. MEI GSC top half might be in threaded IRQ context. generic_handle_irq_safe API could be called from either IRQ or process context, it disables local IRQ then calls MEI GSC interrupt top half. This change fixes A380/A770 GPU boot hang issue with RT kernel. Fixes: 1e3dc1d8622b ("drm/i915/gsc: add gsc as a mei auxiliary device") Tested-by: Furong Zhou <furong.zhou@intel.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Link: https://lore.kernel.org/r/20250425151108.643649-1-junxiao.chang@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-30drm/amd/display: Don't allow OLED to go down to fully offMario Limonciello
[Why] OLED panels can be fully off, but this behavior is unexpected. [How] Ensure that minimum luminance is at least 1. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4338 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51496c7737d06a74b599d0aa7974c3d5a4b1162e)
2025-06-30drm/amd/display: Added case for when RR equals panel's max RR using freesyncHarold Sun
[WHY] Rounding error sometimes occurs when the refresh rate is equal to a panel's max refresh rate, causing HDMI compliance failures. [HOW] Added a case so that we round up to avoid v_total_min to be below a panel's minimum bound. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Harold Sun <Harold.Sun@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fe7645d22bc0f7c1558296538ec49987bf268ef6)
2025-06-30drm/amdkfd: add hqd_sdma_get_doorbell callbacks for gfx7/8Alex Deucher
These were missed when support was added for other generations. The callbacks are called unconditionally so we need to make sure all generations have them. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4304 Link: https://github.com/ROCm/ROCm/issues/4965 Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Cc: Jonathan Kim <jonathan.kim@amd.com> Reported-by: Johl Brown <johlbrown@gmail.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1e9d17a5dcf1242e9518e461d8e63ad35240e49e) Cc: stable@vger.kernel.org
2025-06-30drm/amdgpu: Fix memory leak in amdgpu_ctx_mgr_entity_finiLin.Cao
patch dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") removed ctx put, which will cause amdgpu_ctx_fini() cannot be called and then cause some finished fence that added by amdgpu_ctx_add_fence() cannot be released and cause memleak. Fixes: dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8cf66089e28108dedd47e6156a48489303cf525c) Cc: stable@vger.kernel.org
2025-06-30drm/amdkfd: Don't call mmput from MMU notifier callbackPhilip Yang
If the process is exiting, the mmput inside mmu notifier callback from compactd or fork or numa balancing could release the last reference of mm struct to call exit_mmap and free_pgtable, this triggers deadlock with below backtrace. The deadlock will leak kfd process as mmu notifier release is not called and cause VRAM leaking. The fix is to take mm reference mmget_non_zero when adding prange to the deferred list to pair with mmput in deferred list work. If prange split and add into pchild list, the pchild work_item.mm is not used, so remove the mm parameter from svm_range_unmap_split and svm_range_add_child. The backtrace of hung task: INFO: task python:348105 blocked for more than 64512 seconds. Call Trace: __schedule+0x1c3/0x550 schedule+0x46/0xb0 rwsem_down_write_slowpath+0x24b/0x4c0 unlink_anon_vmas+0xb1/0x1c0 free_pgtables+0xa9/0x130 exit_mmap+0xbc/0x1a0 mmput+0x5a/0x140 svm_range_cpu_invalidate_pagetables+0x2b/0x40 [amdgpu] mn_itree_invalidate+0x72/0xc0 __mmu_notifier_invalidate_range_start+0x48/0x60 try_to_unmap_one+0x10fa/0x1400 rmap_walk_anon+0x196/0x460 try_to_unmap+0xbb/0x210 migrate_page_unmap+0x54d/0x7e0 migrate_pages_batch+0x1c3/0xae0 migrate_pages_sync+0x98/0x240 migrate_pages+0x25c/0x520 compact_zone+0x29d/0x590 compact_zone_order+0xb6/0xf0 try_to_compact_pages+0xbe/0x220 __alloc_pages_direct_compact+0x96/0x1a0 __alloc_pages_slowpath+0x410/0x930 __alloc_pages_nodemask+0x3a9/0x3e0 do_huge_pmd_anonymous_page+0xd7/0x3e0 __handle_mm_fault+0x5e3/0x5f0 handle_mm_fault+0xf7/0x2e0 hmm_vma_fault.isra.0+0x4d/0xa0 walk_pmd_range.isra.0+0xa8/0x310 walk_pud_range+0x167/0x240 walk_pgd_range+0x55/0x100 __walk_page_range+0x87/0x90 walk_page_range+0xf6/0x160 hmm_range_fault+0x4f/0x90 amdgpu_hmm_range_get_pages+0x123/0x230 [amdgpu] amdgpu_ttm_tt_get_user_pages+0xb1/0x150 [amdgpu] init_user_pages+0xb1/0x2a0 [amdgpu] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu] kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu] kfd_ioctl+0x29d/0x500 [amdgpu] Fixes: fa582c6f3684 ("drm/amdkfd: Use mmget_not_zero in MMU notifier") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a29e067bd38946f752b0ef855f3dfff87e77bec7) Cc: stable@vger.kernel.org
2025-06-30drm/amdgpu: Include sdma_4_4_4.binKent Russell
This got missed during SDMA 4.4.4 support. Fixes: 968e3811c3e8 ("drm/amdgpu: add initial support for sdma444") Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51526efe02714339ed6139f7bc348377d363200a) Cc: stable@vger.kernel.org
2025-06-30amdkfd: MTYPE_UC for ext-coherent system memoryDavid Yat Sin
Set memory mtype to UC host memory when ext-coherent flag is set and memory is registered as a SVM allocation. Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5d14fdab4778c29cfd39e62c3ce84d232b4a7d8c)
2025-06-30drm/amdgpu/sdma5.x: suspend KFD queues in ring resetAlex Deucher
SDMA 5.x only supports engine soft reset which resets all queues on the engine. As such, we need to suspend KFD queues around resets like we do for SDMA 4.x. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 61feed0baa1a0d094af0e07e968b1e6e875f07d0)
2025-06-30drm/amdgpu/sdma6: add more ucode version checks for userq supportAlex Deucher
Fill in the SDMA ucode version checks for more SDMA 6.x parts. v2: squash in fixes (Alex) Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/radeon: bump version to 2.51.0Patrick Lerda
The version 2.51.0 adds OpenGL 4.6 compatibility to evergreen and cayman. Signed-off-by: Patrick Lerda <patrick9876@free.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Remove useless timeout error messageYiPeng Chai
The timeout is only used to interrupt polling and not need to print a error message. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix code style issueCe Sun
cocci warnings: (new ones prefixed by >>) >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6088:8-9: Unneeded variable: "r". Return "0" on line 6141 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506281925.HHIpXiO7-lkp@intel.com/ Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: refine ras error injection when eeprom initialization failedganglxie
when eeprom initialization failed, we still support ras error injection, and reserve bad pages, but do not save bad pages to eeprom Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix error with dev_info_once usageLijo Lazar
Fixes error with dev_info_once usage in amdgpu_device_asic_has_dc_support. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506281140.mXfWT3EN-lkp@intel.com/ Fixes: a3e510fd69c3 ("drm/amdgpu: Convert from DRM_* to dev_*") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Use correct severity for BP threshold exceed eventXiang Liu
The severity of CPER for BP threshold exceed event should be set as CPER_SEV_FATAL to match the OOB implementation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Change kv-dpm DRM_*() macros to drm_*()Mario Limonciello
drm_*() macros can show the device a message came from. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Cc: Alexandre Demers <alexandre.f.demers@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250627143432.3222843-3-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Change legacy-dpm DRM_*() macros to drm_*()Mario Limonciello
drm_*() macros can show the device a message came from. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Cc: Alexandre Demers <alexandre.f.demers@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250627143432.3222843-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Decrease message level for legacy-pm, kv-dpm and si-dpmMario Limonciello
legacy-pm, kv-dpm and si-dpm have prints while changing power states that don't have a level and thus are printed by default. These are not useful at runtime for most people, so decrease them to debug. Reported-by: Alexandre Demers <alexandre.f.demers@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4322 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250627143432.3222843-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Promote DAL to 3.2.340Taimur Hassan
Summary: * Remove unused tunnel BW validation * Refactor DML21 initialization and configuration * Fix link override sequencing when switching between DIO/HPO * Ensure OLED minimum luminance Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: [FW Promotion] Release 0.1.17.0Taimur Hassan
Summary for changes in firmware: * Add AMD brightness adjustment feature for edp * Fix BL enable * Revise low power init sequence * Fix brightness delta after IPS1 entry * Adjusted DP blanking sequence Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Add DPP & HUBP reset if power gate enabled on DCN314Ivan Lipski
[WHY] On DCN314, using full screen application with enabled scaling like 150%, 175%, with overlay cursor, causes a second cursor to appear when changing planes. Dpp cache is used to track the HW cursor enable. Since power gate is disabled for hubp & dpp in DCN314, dpp_reset() zero'ed the dpp struct, while the dpp hardware was not power gated. So, when plane is changed in a full screen app, and the overlay cursor is enabled, the cache is cleared, so the cache does not represent the actual cursor state. [HOW] Added conditionals for dpp & hubp reset and their pg_control functions only if according power_gate flags are enabled. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Ivan Lipski <ivlipski@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Fix Link Override Sequencing When Switching Between DIO/HPOMichael Strauss
[WHY] When performing certain link maintenance compliance tests or forcing link settings, changing between 128b/132b and 8b/10b rates no longer works on some ASICs. Some rate divider updates only occur when we set timings or validate state, which is not performed currently when toggling DPMS to change rates. [HOW] Re-calculate dividers and reprogram audio when switching between DIO and HPO through DP compliance/escape code path. Add OTG disable/re-enable so we don't touch the clock while OTG is active. Acquire dcLock before forcing link settings to avoid thread synchronization errors due to added programming in escape code path and potential HPD interrupts. Reviewed-by: George Shen <george.shen@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Mike Katsnelson <mike.katsnelson@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Don't allow OLED to go down to fully offMario Limonciello
[Why] OLED panels can be fully off, but this behavior is unexpected. [How] Ensure that minimum luminance is at least 1. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4338 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Added case for when RR equals panel's max RR using freesyncHarold Sun
[WHY] Rounding error sometimes occurs when the refresh rate is equal to a panel's max refresh rate, causing HDMI compliance failures. [HOW] Added a case so that we round up to avoid v_total_min to be below a panel's minimum bound. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Harold Sun <Harold.Sun@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Separate set_gsl from set_gsl_source_selectIlya Bakoulin
[Why/How] Separate the checks for set_gsl and set_gsl_source_select, since source_select may not be implemented/necessary. Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Refactor DML21 Initialization and ConfigurationWenjing Liu
[Why & How] - Consolidated the initialization of DML21 parameters into a single function `dml21_populate_dml_init_params` to streamline the process and improve code readability. - Updated the function signatures in the header files to reflect changes in parameter passing for DML context. - Removed redundant debug option handling and integrated it into the new configuration population function. - Adjusted the DML21 initialization logic in the wrapper to accommodate the new structure, ensuring compatibility with different DCN versions. - Enhanced the handling of clock parameters and bounding box configurations from various sources, including hardware defaults and software policies. - Improved the clarity of the code by renaming functions and variables for better understanding of their purposes. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: prepare for new platformKarthi Kandasamy
[Why & How] Expose some function for new platform use Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Remove unused tunnel BW validationCruise Hung
[Why & How] The tunnel BW validation code has changed to the new one. Remove the unused code. The DP tunneling overhead is not updated in SST. Move updating DP tunneling overhead for both SST and MST. Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: add null checkPeichen Huang
[WHY] Prevents null pointer dereferences to enhance function robustness [HOW] Adds early null check and return false if invalid. Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: move scheduler wqueue handling into callbacksAlex Deucher
Move the scheduler wqueue stopping and starting into the ring reset callbacks. On some IPs we have to reset an engine which may have multiple queues. Move the wqueue handling into the backend so we can handle them as needed based on the type of reset available. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: move guilty handling into ring resetsAlex Deucher
Move guilty logic into the ring reset callbacks. This allows each ring reset callback to better handle fence errors and force completions in line with the reset behavior for each IP. It also allows us to remove the ring guilty callback since that logic now lives in the reset callback. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: move force completion into ring resetsAlex Deucher
Move the force completion handling into each ring reset function so that each engine can determine whether or not it needs to force completion on the jobs in the ring. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/panthor: Wait for _READY register when powering onSteven Price
panthor_gpu_block_power_on() takes a register offset (rdy_reg) for the purpose of waiting for the power transition to complete. However, a copy/paste error converting to use the new 64 register functions switched it to using the pwrtrans_reg register instead. Fix the function to use the correct register. Fixes: 4d230aa209ed ("drm/panthor: Add 64-bit and poll register accessors") Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20250630140704.432409-1-steven.price@arm.com
2025-06-30drm/amdgpu: rework queue reset scheduler interactionChristian König
Stopping the scheduler for queue reset is generally a good idea because it prevents any worker from touching the ring buffer. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: update ring reset function signatureAlex Deucher
Going forward, we'll need more than just the vmid. Add the guilty amdgpu_fence. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: remove job parameter from amdgpu_fence_emit()Alex Deucher
What we actually care about is the amdgpu_fence object so pass that in explicitly to avoid possible mistakes in the future. The job_run_counter handling can be safely removed at this point as we no longer support job resubmission. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdkfd: add hqd_sdma_get_doorbell callbacks for gfx7/8Alex Deucher
These were missed when support was added for other generations. The callbacks are called unconditionally so we need to make sure all generations have them. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4304 Link: https://github.com/ROCm/ROCm/issues/4965 Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Cc: Jonathan Kim <jonathan.kim@amd.com> Reported-by: Johl Brown <johlbrown@gmail.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix memory leak in amdgpu_ctx_mgr_entity_finiLin.Cao
patch dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") removed ctx put, which will cause amdgpu_ctx_fini() cannot be called and then cause some finished fence that added by amdgpu_ctx_add_fence() cannot be released and cause memleak. Fixes: dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: indent an if statementDan Carpenter
The "return true;" line wasn't indented. Also checkpatch likes when we align the && conditions. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Use dma_buf from GEM object instanceThomas Zimmermann
Avoid dereferencing struct drm_gem_object.import_attach for the imported dma-buf. The dma_buf field in the GEM object instance refers to the same buffer. Prepares to make import_attach an implementation detail of PRIME. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. v2: - keep amdgpu_bo_print_info() as-is (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Convert from DRM_* to dev_*Lijo Lazar
Convert from generic DRM_* to dev_* calls to have device context info. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/pm: Fetch SMUv13.0.12 xgmi max speed/widthLijo Lazar
On SMU v13.0.12 SOCs, fetch the max values of xgmi speed/width from firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdkfd: Don't call mmput from MMU notifier callbackPhilip Yang
If the process is exiting, the mmput inside mmu notifier callback from compactd or fork or numa balancing could release the last reference of mm struct to call exit_mmap and free_pgtable, this triggers deadlock with below backtrace. The deadlock will leak kfd process as mmu notifier release is not called and cause VRAM leaking. The fix is to take mm reference mmget_non_zero when adding prange to the deferred list to pair with mmput in deferred list work. If prange split and add into pchild list, the pchild work_item.mm is not used, so remove the mm parameter from svm_range_unmap_split and svm_range_add_child. The backtrace of hung task: INFO: task python:348105 blocked for more than 64512 seconds. Call Trace: __schedule+0x1c3/0x550 schedule+0x46/0xb0 rwsem_down_write_slowpath+0x24b/0x4c0 unlink_anon_vmas+0xb1/0x1c0 free_pgtables+0xa9/0x130 exit_mmap+0xbc/0x1a0 mmput+0x5a/0x140 svm_range_cpu_invalidate_pagetables+0x2b/0x40 [amdgpu] mn_itree_invalidate+0x72/0xc0 __mmu_notifier_invalidate_range_start+0x48/0x60 try_to_unmap_one+0x10fa/0x1400 rmap_walk_anon+0x196/0x460 try_to_unmap+0xbb/0x210 migrate_page_unmap+0x54d/0x7e0 migrate_pages_batch+0x1c3/0xae0 migrate_pages_sync+0x98/0x240 migrate_pages+0x25c/0x520 compact_zone+0x29d/0x590 compact_zone_order+0xb6/0xf0 try_to_compact_pages+0xbe/0x220 __alloc_pages_direct_compact+0x96/0x1a0 __alloc_pages_slowpath+0x410/0x930 __alloc_pages_nodemask+0x3a9/0x3e0 do_huge_pmd_anonymous_page+0xd7/0x3e0 __handle_mm_fault+0x5e3/0x5f0 handle_mm_fault+0xf7/0x2e0 hmm_vma_fault.isra.0+0x4d/0xa0 walk_pmd_range.isra.0+0xa8/0x310 walk_pud_range+0x167/0x240 walk_pgd_range+0x55/0x100 __walk_page_range+0x87/0x90 walk_page_range+0xf6/0x160 hmm_range_fault+0x4f/0x90 amdgpu_hmm_range_get_pages+0x123/0x230 [amdgpu] amdgpu_ttm_tt_get_user_pages+0xb1/0x150 [amdgpu] init_user_pages+0xb1/0x2a0 [amdgpu] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu] kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu] kfd_ioctl+0x29d/0x500 [amdgpu] Fixes: fa582c6f3684 ("drm/amdkfd: Use mmget_not_zero in MMU notifier") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Include sdma_4_4_4.binKent Russell
This got missed during SDMA 4.4.4 support. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Include <linux/export.h> when neededAndré Almeida
Fix the following compile time warning when building with W=1: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>