summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-04Merge tag 'amd-drm-next-6.17-2025-07-01' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-01: amdgpu: - FAMS2 fixes - OLED fixes - Misc cleanups - AUX fixes - DMCUB updates - SR-IOV hibernation support - RAS updates - DP tunneling fixes - DML2 fixes - Backlight improvements - Suspend improvements - Use scaling for non-native modes on eDP - SDMA 4.4.x fixes - PCIe DPM fixes - SDMA 5.x fixes - Cleaner shader updates for GC 9.x - Remove fence slab - ISP genpd support - Parition handling rework - SDMA FW checks for userq support - Add missing firmware declaration - Fix leak in amdgpu_ctx_mgr_entity_fini() - Freesync fix - Ring reset refactoring - Legacy dpm verbosity changes amdkfd: - GWS fix - mtype fix for ext coherent system memory - MMU notifier fix - gfx7/8 fix radeon: - CS validation support for additional GL extensions - Bump driver version for new CS validation checks From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250701194707.32905-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-07-03drm: Simplify drmm_alloc_ordered_workqueue returnMatthew Brost
Rather than returning ERR_PTR or NULL on failure, replace the NULL return with ERR_PTR(-ENOMEM). This simplifies error handling at the caller. While here, add kernel documentation for drmm_alloc_ordered_workqueue. Cc: Louis Chauvet <louis.chauvet@bootlin.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Link: https://lore.kernel.org/r/20250702232831.3271328-2-matthew.brost@intel.com
2025-07-02drm/ttm: Remove unneeded blank line in commentJocelyn Falempe
There is an unneeded blank line in the documentation of the function ttm_bo_kmap_try_from_panic(). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506290453.NeTXAb7S-lkp@intel.com/ Fixes: 718370ff28328 ("drm/ttm: Add ttm_bo_kmap_try_from_panic()") Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250630144154.44661-1-jfalempe@redhat.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-02drm/dp: Add documentation for luminance_setSuraj Kandpal
Documentation for luminance_set for struct drm_edp_backlight_info was missed which causes warnings. Fixes: 2af612ad4290 ("drm/dp: Introduce new member in drm_backlight_info") Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@gmail.com> Link: https://lore.kernel.org/r/20250701085054.746408-1-suraj.kandpal@intel.com
2025-07-02drm/i915/power: use intel_de_wait_for_clear() instead of wait_for()Jani Nikula
Prefer the register read specific wait function over i915 wait_for_us(). The existing condition is quite complicated. Simplify by checking for requesters first, and determine timeout based on that. Refresh requesters in case of timeouts, should one have popped up during the wait. The downside is that this does not cut the wait short if requesters show up *during* the wait, but we're talking about 1 ms so shouldn't be an issue. v2: Refresh requesters only if there were none before (Imre) Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/20250626192632.2330349-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-07-01drm/i915/display: drop a number of dependencies on i915_drv.hJani Nikula
With the switch to an unordered workqueue dedicated to display, we've stopped using struct drm_i915_private in a number of places, and can drop the dependencies on i915_drv.h. Cc: Luca Coelho <luciano.coelho@intel.com> Reviewed-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20250626101636.1896365-1-jani.nikula@intel.com
2025-07-01drm/i915/fb: use struct intel_display for DISPLAY_VER()Jani Nikula
Convert a leftover struct drm_i915_private use to struct intel_display. Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/20250626101712.1898434-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-07-01drm/mipi-dsi: Drop MIPI_DSI_MODE_VSYNC_FLUSH flagPhilipp Zabel
Drop the unused MIPI_DSI_MODE_VSYNC_FLUSH flag. Whether or not a display FIFO flush on vsync is required to avoid sending garbage to the panel is not a property of the DSI link, but of the integration between display controller and DSI host bridge. Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250627-dsi-vsync-flush-v2-4-4066899a5608@pengutronix.de Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-07-01drm/panel: samsung-s6e8aa0: Drop MIPI_DSI_MODE_VSYNC_FLUSH flagPhilipp Zabel
Drop the MIPI_DSI_MODE_VSYNC_FLUSH flag from DSI mode_flags. It has no effect anymore. Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250627-dsi-vsync-flush-v2-3-4066899a5608@pengutronix.de Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-07-01drm/panel: samsung-s6d7aa0: Drop MIPI_DSI_MODE_VSYNC_FLUSH flagPhilipp Zabel
Drop the MIPI_DSI_MODE_VSYNC_FLUSH flag from DSI mode_flags. It has no effect anymore. Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250627-dsi-vsync-flush-v2-2-4066899a5608@pengutronix.de Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-07-01drm/bridge: samsung-dsim: Always flush display FIFO on vsync pulsePhilipp Zabel
Always flush the display FIFO on vsync pulse, even if not explicitly requested by the panel via MIPI_DSI_MODE_VSYNC_FLUSH mode_flag. The display FIFO should be empty at vsync. Flushing it at vsync pulses helps to remove garbage that may have entered the FIFO during startup (if synchronisation between upstream display controller and Samsung DSIM is lacking) and that may persist in form of last frame's leftovers on subsequent frames. Flushing the display FIFO if it is already empty should have no effect. This will allow to remove the MIPI_DSI_MODE_VSYNC_FLUSH flag, which is only used by the Samsung DSIM bridge driver. Arguably this flag doesn't belong in the panel configuration at all: flushing the display FIFO on vsync is a workaround for issues with the integration between display controller and DSI bridge, not a property of the DSI link between bridge and panel. No panel actually has a requirement to receive garbage or old frame content after vsync. I wonder if host controller FIFO resets are mentioned by the MIPI DSI specification at all. This patch is based on the assumption that the MIPI_DSI_MODE_VSYNC_FLUSH flag only exists because the DSIM_MFLUSH_VS bit happens to be located in the same register as the bits controlling the DSI mode. Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250627-dsi-vsync-flush-v2-1-4066899a5608@pengutronix.de Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-06-30drm/i915/gsc: mei interrupt top half should be in irq disabled contextJunxiao Chang
MEI GSC interrupt comes from i915. It has top half and bottom half. Top half is called from i915 interrupt handler. It should be in irq disabled context. With RT kernel, by default i915 IRQ handler is in threaded IRQ. MEI GSC top half might be in threaded IRQ context. generic_handle_irq_safe API could be called from either IRQ or process context, it disables local IRQ then calls MEI GSC interrupt top half. This change fixes A380/A770 GPU boot hang issue with RT kernel. Fixes: 1e3dc1d8622b ("drm/i915/gsc: add gsc as a mei auxiliary device") Tested-by: Furong Zhou <furong.zhou@intel.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Link: https://lore.kernel.org/r/20250425151108.643649-1-junxiao.chang@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-30drm/amdgpu/sdma6: add more ucode version checks for userq supportAlex Deucher
Fill in the SDMA ucode version checks for more SDMA 6.x parts. v2: squash in fixes (Alex) Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/radeon: bump version to 2.51.0Patrick Lerda
The version 2.51.0 adds OpenGL 4.6 compatibility to evergreen and cayman. Signed-off-by: Patrick Lerda <patrick9876@free.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Remove useless timeout error messageYiPeng Chai
The timeout is only used to interrupt polling and not need to print a error message. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix code style issueCe Sun
cocci warnings: (new ones prefixed by >>) >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6088:8-9: Unneeded variable: "r". Return "0" on line 6141 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506281925.HHIpXiO7-lkp@intel.com/ Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: refine ras error injection when eeprom initialization failedganglxie
when eeprom initialization failed, we still support ras error injection, and reserve bad pages, but do not save bad pages to eeprom Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix error with dev_info_once usageLijo Lazar
Fixes error with dev_info_once usage in amdgpu_device_asic_has_dc_support. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506281140.mXfWT3EN-lkp@intel.com/ Fixes: a3e510fd69c3 ("drm/amdgpu: Convert from DRM_* to dev_*") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Use correct severity for BP threshold exceed eventXiang Liu
The severity of CPER for BP threshold exceed event should be set as CPER_SEV_FATAL to match the OOB implementation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Change kv-dpm DRM_*() macros to drm_*()Mario Limonciello
drm_*() macros can show the device a message came from. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Cc: Alexandre Demers <alexandre.f.demers@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250627143432.3222843-3-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Change legacy-dpm DRM_*() macros to drm_*()Mario Limonciello
drm_*() macros can show the device a message came from. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Cc: Alexandre Demers <alexandre.f.demers@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250627143432.3222843-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Decrease message level for legacy-pm, kv-dpm and si-dpmMario Limonciello
legacy-pm, kv-dpm and si-dpm have prints while changing power states that don't have a level and thus are printed by default. These are not useful at runtime for most people, so decrease them to debug. Reported-by: Alexandre Demers <alexandre.f.demers@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4322 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250627143432.3222843-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Promote DAL to 3.2.340Taimur Hassan
Summary: * Remove unused tunnel BW validation * Refactor DML21 initialization and configuration * Fix link override sequencing when switching between DIO/HPO * Ensure OLED minimum luminance Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: [FW Promotion] Release 0.1.17.0Taimur Hassan
Summary for changes in firmware: * Add AMD brightness adjustment feature for edp * Fix BL enable * Revise low power init sequence * Fix brightness delta after IPS1 entry * Adjusted DP blanking sequence Acked-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Add DPP & HUBP reset if power gate enabled on DCN314Ivan Lipski
[WHY] On DCN314, using full screen application with enabled scaling like 150%, 175%, with overlay cursor, causes a second cursor to appear when changing planes. Dpp cache is used to track the HW cursor enable. Since power gate is disabled for hubp & dpp in DCN314, dpp_reset() zero'ed the dpp struct, while the dpp hardware was not power gated. So, when plane is changed in a full screen app, and the overlay cursor is enabled, the cache is cleared, so the cache does not represent the actual cursor state. [HOW] Added conditionals for dpp & hubp reset and their pg_control functions only if according power_gate flags are enabled. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Ivan Lipski <ivlipski@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Fix Link Override Sequencing When Switching Between DIO/HPOMichael Strauss
[WHY] When performing certain link maintenance compliance tests or forcing link settings, changing between 128b/132b and 8b/10b rates no longer works on some ASICs. Some rate divider updates only occur when we set timings or validate state, which is not performed currently when toggling DPMS to change rates. [HOW] Re-calculate dividers and reprogram audio when switching between DIO and HPO through DP compliance/escape code path. Add OTG disable/re-enable so we don't touch the clock while OTG is active. Acquire dcLock before forcing link settings to avoid thread synchronization errors due to added programming in escape code path and potential HPD interrupts. Reviewed-by: George Shen <george.shen@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Mike Katsnelson <mike.katsnelson@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Don't allow OLED to go down to fully offMario Limonciello
[Why] OLED panels can be fully off, but this behavior is unexpected. [How] Ensure that minimum luminance is at least 1. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4338 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Added case for when RR equals panel's max RR using freesyncHarold Sun
[WHY] Rounding error sometimes occurs when the refresh rate is equal to a panel's max refresh rate, causing HDMI compliance failures. [HOW] Added a case so that we round up to avoid v_total_min to be below a panel's minimum bound. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Harold Sun <Harold.Sun@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Separate set_gsl from set_gsl_source_selectIlya Bakoulin
[Why/How] Separate the checks for set_gsl and set_gsl_source_select, since source_select may not be implemented/necessary. Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Refactor DML21 Initialization and ConfigurationWenjing Liu
[Why & How] - Consolidated the initialization of DML21 parameters into a single function `dml21_populate_dml_init_params` to streamline the process and improve code readability. - Updated the function signatures in the header files to reflect changes in parameter passing for DML context. - Removed redundant debug option handling and integrated it into the new configuration population function. - Adjusted the DML21 initialization logic in the wrapper to accommodate the new structure, ensuring compatibility with different DCN versions. - Enhanced the handling of clock parameters and bounding box configurations from various sources, including hardware defaults and software policies. - Improved the clarity of the code by renaming functions and variables for better understanding of their purposes. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: prepare for new platformKarthi Kandasamy
[Why & How] Expose some function for new platform use Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: Remove unused tunnel BW validationCruise Hung
[Why & How] The tunnel BW validation code has changed to the new one. Remove the unused code. The DP tunneling overhead is not updated in SST. Move updating DP tunneling overhead for both SST and MST. Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Cruise Hung <Cruise.Hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/display: add null checkPeichen Huang
[WHY] Prevents null pointer dereferences to enhance function robustness [HOW] Adds early null check and return false if invalid. Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: move scheduler wqueue handling into callbacksAlex Deucher
Move the scheduler wqueue stopping and starting into the ring reset callbacks. On some IPs we have to reset an engine which may have multiple queues. Move the wqueue handling into the backend so we can handle them as needed based on the type of reset available. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: move guilty handling into ring resetsAlex Deucher
Move guilty logic into the ring reset callbacks. This allows each ring reset callback to better handle fence errors and force completions in line with the reset behavior for each IP. It also allows us to remove the ring guilty callback since that logic now lives in the reset callback. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: move force completion into ring resetsAlex Deucher
Move the force completion handling into each ring reset function so that each engine can determine whether or not it needs to force completion on the jobs in the ring. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/panthor: Wait for _READY register when powering onSteven Price
panthor_gpu_block_power_on() takes a register offset (rdy_reg) for the purpose of waiting for the power transition to complete. However, a copy/paste error converting to use the new 64 register functions switched it to using the pwrtrans_reg register instead. Fix the function to use the correct register. Fixes: 4d230aa209ed ("drm/panthor: Add 64-bit and poll register accessors") Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20250630140704.432409-1-steven.price@arm.com
2025-06-30drm/amdgpu: rework queue reset scheduler interactionChristian König
Stopping the scheduler for queue reset is generally a good idea because it prevents any worker from touching the ring buffer. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: update ring reset function signatureAlex Deucher
Going forward, we'll need more than just the vmid. Add the guilty amdgpu_fence. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: remove job parameter from amdgpu_fence_emit()Alex Deucher
What we actually care about is the amdgpu_fence object so pass that in explicitly to avoid possible mistakes in the future. The job_run_counter handling can be safely removed at this point as we no longer support job resubmission. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdkfd: add hqd_sdma_get_doorbell callbacks for gfx7/8Alex Deucher
These were missed when support was added for other generations. The callbacks are called unconditionally so we need to make sure all generations have them. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4304 Link: https://github.com/ROCm/ROCm/issues/4965 Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Cc: Jonathan Kim <jonathan.kim@amd.com> Reported-by: Johl Brown <johlbrown@gmail.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix memory leak in amdgpu_ctx_mgr_entity_finiLin.Cao
patch dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") removed ctx put, which will cause amdgpu_ctx_fini() cannot be called and then cause some finished fence that added by amdgpu_ctx_add_fence() cannot be released and cause memleak. Fixes: dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: indent an if statementDan Carpenter
The "return true;" line wasn't indented. Also checkpatch likes when we align the && conditions. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Use dma_buf from GEM object instanceThomas Zimmermann
Avoid dereferencing struct drm_gem_object.import_attach for the imported dma-buf. The dma_buf field in the GEM object instance refers to the same buffer. Prepares to make import_attach an implementation detail of PRIME. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. v2: - keep amdgpu_bo_print_info() as-is (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Convert from DRM_* to dev_*Lijo Lazar
Convert from generic DRM_* to dev_* calls to have device context info. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/pm: Fetch SMUv13.0.12 xgmi max speed/widthLijo Lazar
On SMU v13.0.12 SOCs, fetch the max values of xgmi speed/width from firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdkfd: Don't call mmput from MMU notifier callbackPhilip Yang
If the process is exiting, the mmput inside mmu notifier callback from compactd or fork or numa balancing could release the last reference of mm struct to call exit_mmap and free_pgtable, this triggers deadlock with below backtrace. The deadlock will leak kfd process as mmu notifier release is not called and cause VRAM leaking. The fix is to take mm reference mmget_non_zero when adding prange to the deferred list to pair with mmput in deferred list work. If prange split and add into pchild list, the pchild work_item.mm is not used, so remove the mm parameter from svm_range_unmap_split and svm_range_add_child. The backtrace of hung task: INFO: task python:348105 blocked for more than 64512 seconds. Call Trace: __schedule+0x1c3/0x550 schedule+0x46/0xb0 rwsem_down_write_slowpath+0x24b/0x4c0 unlink_anon_vmas+0xb1/0x1c0 free_pgtables+0xa9/0x130 exit_mmap+0xbc/0x1a0 mmput+0x5a/0x140 svm_range_cpu_invalidate_pagetables+0x2b/0x40 [amdgpu] mn_itree_invalidate+0x72/0xc0 __mmu_notifier_invalidate_range_start+0x48/0x60 try_to_unmap_one+0x10fa/0x1400 rmap_walk_anon+0x196/0x460 try_to_unmap+0xbb/0x210 migrate_page_unmap+0x54d/0x7e0 migrate_pages_batch+0x1c3/0xae0 migrate_pages_sync+0x98/0x240 migrate_pages+0x25c/0x520 compact_zone+0x29d/0x590 compact_zone_order+0xb6/0xf0 try_to_compact_pages+0xbe/0x220 __alloc_pages_direct_compact+0x96/0x1a0 __alloc_pages_slowpath+0x410/0x930 __alloc_pages_nodemask+0x3a9/0x3e0 do_huge_pmd_anonymous_page+0xd7/0x3e0 __handle_mm_fault+0x5e3/0x5f0 handle_mm_fault+0xf7/0x2e0 hmm_vma_fault.isra.0+0x4d/0xa0 walk_pmd_range.isra.0+0xa8/0x310 walk_pud_range+0x167/0x240 walk_pgd_range+0x55/0x100 __walk_page_range+0x87/0x90 walk_page_range+0xf6/0x160 hmm_range_fault+0x4f/0x90 amdgpu_hmm_range_get_pages+0x123/0x230 [amdgpu] amdgpu_ttm_tt_get_user_pages+0xb1/0x150 [amdgpu] init_user_pages+0xb1/0x2a0 [amdgpu] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu] kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu] kfd_ioctl+0x29d/0x500 [amdgpu] Fixes: fa582c6f3684 ("drm/amdkfd: Use mmget_not_zero in MMU notifier") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Include sdma_4_4_4.binKent Russell
This got missed during SDMA 4.4.4 support. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Include <linux/export.h> when neededAndré Almeida
Fix the following compile time warning when building with W=1: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>