summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2025-02-28drm/xe/xelp: Move Wa_16011163337 from tunings to workaroundsTvrtko Ursulin
Workaround database specifies 16011163337 as a workaround so lets move it there. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-3-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-28drm/xe: Fix GT "for each engine" workaroundsTvrtko Ursulin
Any rules using engine matching are currently broken due RTP processing happening too in early init, before the list of hardware engines has been initialised. Fix this by moving workaround processing to later in the driver probe sequence, to just before the processed list is used for the first time. Looking at the debugfs gt0/workarounds on ADL-P we notice 14011060649 should be present while we see, before: GT Workarounds 14011059788 14015795083 And with the patch: GT Workarounds 14011060649 14011059788 14015795083 Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: stable@vger.kernel.org # v6.11+ Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227101304.46660-2-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-28gpu: host1x: Remove unused host1x_debug_dump_syncptsDr. David Alan Gilbert
host1x_debug_dump_syncpts() has been unused since commit f0fb260a0cdb ("gpu: host1x: Implement syncpoint wait using DMA fences") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241215214750.448209-1-linux@treblig.org
2025-02-28drm/i915/display: Make POWER_DOMAIN_*() always result in enum ↵Gustavo Sousa
intel_display_power_domain In the hope of contributing to type safety in our code, let's ensure that the type returned by the POWER_DOMAIN_*() macros is always of type enum intel_display_power_domain. v2: - Remove accidental +1 in definition of POWER_DOMAIN_PIPE(). (Jani) Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227-improve-type-safey-power-domain-macros-v3-2-b6eaa00f9c33@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-02-28drm/i915/display: Use explicit base values in POWER_DOMAIN_*() macrosGustavo Sousa
Although we have comments in intel_display_limits.h saying that the code expects PIPE_A and TRANSCODER_A to be zero, it doesn't hurt to add them as explicit base values for calculating the power domain offset in POWER_DOMAIN_*() macros. On the plus side, we have that this: * Fixes a warning reported by kernel test robot <lkp@intel.com> about doing arithmetic with two different enum types. * Makes the code arguably more robust (in the unlikely event of those bases becoming non-zero). v2: - Prefer using explicit base values instead of simply casting the macro argument to int. (Ville) - Update commit message to match the new approach (for reference, the old message subject was "drm/i915/display: Use explicit cast in POWER_DOMAIN_*() macros"). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502120809.XfmcqkBD-lkp@intel.com/ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227-improve-type-safey-power-domain-macros-v3-1-b6eaa00f9c33@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-02-28Merge drm/drm-next into drm-xe-nextLucas De Marchi
Sync to fix conlicts between drm-xe-next and drm-intel-next. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-28drm/i915/audio: Extend Wa_14020863754 to Xe3_LPDGustavo Sousa
Workaround Wa_14020863754 also applies to Xe3_LPD. Update needs_wa_14020863754() accordingly. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227-xe3lpd-wa-14020863754-v2-2-92b35de1c563@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-02-28drm/i915/display: Use IP version check for Wa_14020863754Gustavo Sousa
Wa_14020863754 applies to the display IP, so we should be checking on display IP version instead of platform. So, let's replace display->platform.battlemage with the proper IP version check (14.01 for Xe2_HPD). Furthermore, for workarounds, we should be checking on full IP versions to avoid applying the workaround to some variant of the IP that could theoretically appear in the future (which is likely to have a different minor release number), since the issue addressed by the workaround could be fixed in such new release. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227-xe3lpd-wa-14020863754-v2-1-92b35de1c563@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-02-28Merge drm/drm-next into drm-intel-nextJani Nikula
Sync to fix conlicts between drm-xe-next and drm-intel-next. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-02-28drm/xe/vf: Retry sending MMIO request to GUC on timeout errorSatyanarayana K V P
Add support to allow retrying the sending of MMIO requests from the VF to the GUC in the event of an error. During the suspend/resume process, VFs begin resuming only after the PF has resumed. Although the PF resumes, the GUC reset and provisioning occur later in a separate worker process. When there are a large number of VFs, some may attempt to resume before the PF has completed its provisioning. Therefore, if a MMIO request from a VF fails during this period, we will retry sending the request up to GUC_RESET_VF_STATE_RETRY_MAX times, which is set to a maximum of 10 attempts. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piorkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224102807.11065-3-satyanarayana.k.v.p@intel.com
2025-02-28drm/xe/pf: Create a link between PF and VF devicesSatyanarayana K V P
When both PF and VF devices are enabled on the host, they resume simultaneously during system resume. However, the PF must finish provisioning the VF before any VFs can successfully resume. Establish a parent-child device link between the PF and VF devices to ensure the correct order of resumption. V4 -> V5: - Added missing break in the error condition. V3 -> V4: - Made xe_pci_pf_get_vf_dev() as a static function and updated input parameter types. - Updated xe_sriov_warn() to xe_sriov_abort() when VF device cannot be found. V2 -> V3: - Added function documentation for xe_pci_pf_get_vf_dev(). - Added assertion if not called from PF. V1 -> V2: - Added a helper function to get VF pci_dev. - Updated xe_sriov_notice() to xe_sriov_warn() if vf pci_dev is not found. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piorkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224102807.11065-2-satyanarayana.k.v.p@intel.com
2025-02-28drm/vboxvideo: Remove unused hgsmi_cursor_positionDr. David Alan Gilbert
hgsmi_cursor_position() has been unused since 2018's commit 35f3288c453e ("staging: vboxvideo: Atomic phase 1: convert cursor to universal plane") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20241215220014.452537-1-linux@treblig.org
2025-02-28drm/xe/xe3lpg: Add Wa_13012615864Tejas Upadhyay
Wa_13012615864 applies to xe3lpg Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221112200.388612-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-02-28Merge tag 'drm-misc-next-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: Cross-subsystem Changes: bus: - mhi: Avoid access to uninitialized field Core Changes: - Fix docmentation dp: - Add helpers for LTTPR transparent mode sched: - Improve job peek/pop operations - Optimize layout of struct drm_sched_job Driver Changes: arc: - Convert to devm_platform_ioremap_resource() aspeed: - Convert to devm_platform_ioremap_resource() bridge: - ti-sn65dsi86: Support CONFIG_PWM tristate i915: - dp: Use helpers for LTTPR transparent mode mediatek: - Convert to devm_platform_ioremap_resource() msm: - dp: Use helpers for LTTPR transparent mode nouveau: - dp: Use helpers for LTTPR transparent mode panel: - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 repaper: - Fix integer overflows stm: - Convert to devm_platform_ioremap_resource() vc4: - Convert to devm_platform_ioremap_resource() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250227094041.GA114623@linux.fritz.box
2025-02-28Merge tag 'drm-xe-fixes-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes uAPI: - OA uapi fix (Umesh) Driver: - Userptr related fixes (Auld) - Remove a duplicated register entry (Mingong) - Scheduler related fix to prevent exec races when freeing it (Tejas) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z8CSqJre1VCjPXt2@intel.com
2025-02-28Merge tag 'drm-intel-fixes-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix encoder HW state readout for DP UHBR MST (Imre) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z8CRM7XzlerbWSJy@intel.com
2025-02-28Merge tag 'drm-misc-fixes-2025-02-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Fix a rounding error in vkms, a header fix for img, a connector status fix for nouveau, and a NULL pointer dereference fix for deferred IO drivers. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250227-antique-robust-earthworm-09dfd1@houat
2025-02-27drm/amdgpu: Fix parameter annotation in vcn_v5_0_0_is_idleSrinivasan Shanmugam
Update parameter description in the vcn_v5_0_0_is_idle function Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: debugfs hang_hws skip GPU with MESPhilip Yang
debugfs hang_hws is used by GPU reset test with HWS, for MES this crash the kernel with NULL pointer access because dqm->packet_mgr is not setup for MES path. Skip GPU with MES for now, MES hang_hws debugfs interface will be supported later. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: Fix pqm_destroy_queue race with GPU resetPhilip Yang
If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should delete the queue from process_queue_list and free the resource. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Fix parameter annotations for VCN clock gating functionsSrinivasan Shanmugam
The previous references to a non-existent `adev` parameter have been removed & corrected to reflect the use of the `vinst` pointer, which points to the VCN instance structure, in the below files: - vcn_v1_0.c - vcn_v2_0.c - vcn_v3_0.c Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Function parameter or struct member 'vinst' not described in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Excess function parameter 'adev' description in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Function parameter or struct member 'vinst' not described in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Excess function parameter 'adev' description in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'adev' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'inst' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'adev' description in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'inst' description in 'vcn_v3_0_enable_clock_gating' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: Fix mode1 reset crash issuePhilip Yang
If HW scheduler hangs and mode1 reset is used to recover GPU, KFD signal user space to abort the processes. After process abort exit, user queues still use the GPU to access system memory before h/w is reset while KFD cleanup worker free system memory and free VRAM. There is use-after-free race bug that KFD allocate and reuse the freed system memory, and user queue write to the same system memory to corrupt the data structure and cause driver crash. To fix this race, KFD cleanup worker terminate user queues, then flush reset_domain wq to wait for any GPU ongoing reset complete, and then free outstanding BOs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: KFD release_work possible circular lockingPhilip Yang
If waiting for gpu reset done in KFD release_work, thers is WARNING: possible circular locking dependency detected #2 kfd_create_process kfd_process_mutex flush kfd release work #1 kfd release work wait for amdgpu reset work #0 amdgpu_device_gpu_reset kgd2kfd_pre_reset kfd_process_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&p->release_work)); lock((wq_completion)kfd_process_wq); lock((work_completion)(&p->release_work)); lock((wq_completion)amdgpu-reset-dev); To fix this, KFD create process move flush release work outside kfd_process_mutex. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: Remove kfd_process_hw_exception workerPhilip Yang
With GPU reset-domain worker implemented, KFD hw_exception worker is not needed any more, just call amdgpu_amdkfd_gpu_reset directly from kfd_hws_hang. Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amd/amdgpu: Add support for xgmi_v6_4_1Asad Kamal
Add support for xgmi_v6_4_1 and use it appropriate places Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Add xgmi speed/width related infoLijo Lazar
Add APIs to initialize XGMI speed, width details and get to max bandwidth supported. It is assumed that a device only supports same generation of XGMI links with uniform width. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Move xgmi definitions to xgmi headerLijo Lazar
Move definitions related to xgmi to amdgpu_xgmi header Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amd/pm: add fan abnormal detectionKenneth Feng
add fan abnormal detection on smu v14.0.2&smu v14.0.3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: remove kfd_pasid.c from amdgpu driver buildXiaogang Chen
Since kfd uses pasid values from graphic driver now do not need use kfd pasid fucntions. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: clamp queue size to minimumDavid Yat Sin
If queue size is less than minimum, clamp it to minimum to prevent underflow when writing queue mqd. Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Create a debug option to disable ring resetAndré Almeida
Prior to the addition of ring reset, the debug option `debug_disable_soft_recovery` could be used to force a full device reset. Now that we have ring reset, create a debug option to disable them in amdgpu, forcing the driver to go with the full device reset path again when both options are combined. This option is useful for testing and debugging purposes when one wants to test the full reset from userspace. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amd/display: Fix null check for pipe_ctx->plane_state in ↵Ma Ke
resource_build_scaling_params Null pointer dereference issue could occur when pipe_ctx->plane_state is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not null before accessing. This prevents a null pointer dereference. Found by code review. Fixes: 3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() callsAlex Deucher
They are noops on GFX11 for most firmware versions. KFD already handles its own queues and they should already be unmapped at this point so even if this runs, it's not doing anything. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammarColin Ian King
There is a spelling mistake and a grammatical error in a dev_err message. Fix it. Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Decode deferred error type in aca bank parserXiang Liu
In the case of poison inband log, the error type need to be specified by checking the deferred or poison bit of status register. v2: check both deferred and poison bit Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: add sdma page queue irq processing for sdma442Le Ma
Add the trap irq processing for page queue of sdma442 Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amd/pm: disable gfxoff on the specific skuKenneth Feng
disable gfxoff on the specific sku based on the requirement Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Report generic instead of unknown boot time errorsXiang Liu
Change the DMESG reporting of unknown errors to "Boot Controller Generic Error" to align with the RAS SPEC and provide more clarity to customers. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Fix logic to fetch supported NPS modesLijo Lazar
Correct the logic to find supported NPS modes from firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Ava Zhang <niandong.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Fixes: 30eb41f5d1a7 ("drm/amdgpu: Use firmware supported NPS modes") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Disable fru_id field in CPER sectionXiang Liu
The fru_id field is disabled cause of mis-matching defination between CPER spec and driver. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdkfd: Fix Circular Locking Dependency in ↵Srinivasan Shanmugam
'svm_range_cpu_invalidate_pagetables' This commit addresses a circular locking dependency in the svm_range_cpu_invalidate_pagetables function. The function previously held a lock while determining whether to perform an unmap or eviction operation, which could lead to deadlocks. Fixes the below: [ 223.418794] ====================================================== [ 223.418820] WARNING: possible circular locking dependency detected [ 223.418845] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 223.418869] ------------------------------------------------------ [ 223.418889] kfdtest/3939 is trying to acquire lock: [ 223.418906] ffff8957552eae38 (&dqm->lock_hidden){+.+.}-{3:3}, at: evict_process_queues_cpsch+0x43/0x210 [amdgpu] [ 223.419302] but task is already holding lock: [ 223.419303] ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu] [ 223.419447] Console: switching to colour dummy device 80x25 [ 223.419477] [IGT] amd_basic: executing [ 223.419599] which lock already depends on the new lock. [ 223.419611] the existing dependency chain (in reverse order) is: [ 223.419621] -> #2 (&prange->lock){+.+.}-{3:3}: [ 223.419636] __mutex_lock+0x85/0xe20 [ 223.419647] mutex_lock_nested+0x1b/0x30 [ 223.419656] svm_range_validate_and_map+0x2f1/0x15b0 [amdgpu] [ 223.419954] svm_range_set_attr+0xe8c/0x1710 [amdgpu] [ 223.420236] svm_ioctl+0x46/0x50 [amdgpu] [ 223.420503] kfd_ioctl_svm+0x50/0x90 [amdgpu] [ 223.420763] kfd_ioctl+0x409/0x6d0 [amdgpu] [ 223.421024] __x64_sys_ioctl+0x95/0xd0 [ 223.421036] x64_sys_call+0x1205/0x20d0 [ 223.421047] do_syscall_64+0x87/0x140 [ 223.421056] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 223.421068] -> #1 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 223.421084] __ww_mutex_lock.constprop.0+0xab/0x1560 [ 223.421095] ww_mutex_lock+0x2b/0x90 [ 223.421103] amdgpu_amdkfd_alloc_gtt_mem+0xcc/0x2b0 [amdgpu] [ 223.421361] add_queue_mes+0x3bc/0x440 [amdgpu] [ 223.421623] unhalt_cpsch+0x1ae/0x240 [amdgpu] [ 223.421888] kgd2kfd_start_sched+0x5e/0xd0 [amdgpu] [ 223.422148] amdgpu_amdkfd_start_sched+0x3d/0x50 [amdgpu] [ 223.422414] amdgpu_gfx_enforce_isolation_handler+0x132/0x270 [amdgpu] [ 223.422662] process_one_work+0x21e/0x680 [ 223.422673] worker_thread+0x190/0x330 [ 223.422682] kthread+0xe7/0x120 [ 223.422690] ret_from_fork+0x3c/0x60 [ 223.422699] ret_from_fork_asm+0x1a/0x30 [ 223.422708] -> #0 (&dqm->lock_hidden){+.+.}-{3:3}: [ 223.422723] __lock_acquire+0x16f4/0x2810 [ 223.422734] lock_acquire+0xd1/0x300 [ 223.422742] __mutex_lock+0x85/0xe20 [ 223.422751] mutex_lock_nested+0x1b/0x30 [ 223.422760] evict_process_queues_cpsch+0x43/0x210 [amdgpu] [ 223.423025] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu] [ 223.423285] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu] [ 223.423540] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu] [ 223.423807] __mmu_notifier_invalidate_range_start+0x1f5/0x250 [ 223.423819] copy_page_range+0x1e94/0x1ea0 [ 223.423829] copy_process+0x172f/0x2ad0 [ 223.423839] kernel_clone+0x9c/0x3f0 [ 223.423847] __do_sys_clone+0x66/0x90 [ 223.423856] __x64_sys_clone+0x25/0x30 [ 223.423864] x64_sys_call+0x1d7c/0x20d0 [ 223.423872] do_syscall_64+0x87/0x140 [ 223.423880] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 223.423891] other info that might help us debug this: [ 223.423903] Chain exists of: &dqm->lock_hidden --> reservation_ww_class_mutex --> &prange->lock [ 223.423926] Possible unsafe locking scenario: [ 223.423935] CPU0 CPU1 [ 223.423942] ---- ---- [ 223.423949] lock(&prange->lock); [ 223.423958] lock(reservation_ww_class_mutex); [ 223.423970] lock(&prange->lock); [ 223.423981] lock(&dqm->lock_hidden); [ 223.423990] *** DEADLOCK *** [ 223.423999] 5 locks held by kfdtest/3939: [ 223.424006] #0: ffffffffb82b4fc0 (dup_mmap_sem){.+.+}-{0:0}, at: copy_process+0x1387/0x2ad0 [ 223.424026] #1: ffff89575eda81b0 (&mm->mmap_lock){++++}-{3:3}, at: copy_process+0x13a8/0x2ad0 [ 223.424046] #2: ffff89575edaf3b0 (&mm->mmap_lock/1){+.+.}-{3:3}, at: copy_process+0x13e4/0x2ad0 [ 223.424066] #3: ffffffffb82e76e0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: copy_page_range+0x1cea/0x1ea0 [ 223.424088] #4: ffff8957556b83b0 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x9d/0x850 [amdgpu] [ 223.424365] stack backtrace: [ 223.424374] CPU: 0 UID: 0 PID: 3939 Comm: kfdtest Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 223.424392] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 223.424401] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO WIFI/X570 AORUS PRO WIFI, BIOS F36a 02/16/2022 [ 223.424416] Call Trace: [ 223.424423] <TASK> [ 223.424430] dump_stack_lvl+0x9b/0xf0 [ 223.424441] dump_stack+0x10/0x20 [ 223.424449] print_circular_bug+0x275/0x350 [ 223.424460] check_noncircular+0x157/0x170 [ 223.424469] ? __bfs+0xfd/0x2c0 [ 223.424481] __lock_acquire+0x16f4/0x2810 [ 223.424490] ? srso_return_thunk+0x5/0x5f [ 223.424505] lock_acquire+0xd1/0x300 [ 223.424514] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu] [ 223.424783] __mutex_lock+0x85/0xe20 [ 223.424792] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu] [ 223.425058] ? srso_return_thunk+0x5/0x5f [ 223.425067] ? mark_held_locks+0x54/0x90 [ 223.425076] ? evict_process_queues_cpsch+0x43/0x210 [amdgpu] [ 223.425339] ? srso_return_thunk+0x5/0x5f [ 223.425350] mutex_lock_nested+0x1b/0x30 [ 223.425358] ? mutex_lock_nested+0x1b/0x30 [ 223.425367] evict_process_queues_cpsch+0x43/0x210 [amdgpu] [ 223.425631] kfd_process_evict_queues+0x8a/0x1d0 [amdgpu] [ 223.425893] kgd2kfd_quiesce_mm+0x43/0x90 [amdgpu] [ 223.426156] svm_range_cpu_invalidate_pagetables+0x4a7/0x850 [amdgpu] [ 223.426423] ? srso_return_thunk+0x5/0x5f [ 223.426436] __mmu_notifier_invalidate_range_start+0x1f5/0x250 [ 223.426450] copy_page_range+0x1e94/0x1ea0 [ 223.426461] ? srso_return_thunk+0x5/0x5f [ 223.426474] ? srso_return_thunk+0x5/0x5f [ 223.426484] ? lock_acquire+0xd1/0x300 [ 223.426494] ? copy_process+0x1718/0x2ad0 [ 223.426502] ? srso_return_thunk+0x5/0x5f [ 223.426510] ? sched_clock_noinstr+0x9/0x10 [ 223.426519] ? local_clock_noinstr+0xe/0xc0 [ 223.426528] ? copy_process+0x1718/0x2ad0 [ 223.426537] ? srso_return_thunk+0x5/0x5f [ 223.426550] copy_process+0x172f/0x2ad0 [ 223.426569] kernel_clone+0x9c/0x3f0 [ 223.426577] ? __schedule+0x4c9/0x1b00 [ 223.426586] ? srso_return_thunk+0x5/0x5f [ 223.426594] ? sched_clock_noinstr+0x9/0x10 [ 223.426602] ? srso_return_thunk+0x5/0x5f [ 223.426610] ? local_clock_noinstr+0xe/0xc0 [ 223.426619] ? schedule+0x107/0x1a0 [ 223.426629] __do_sys_clone+0x66/0x90 [ 223.426643] __x64_sys_clone+0x25/0x30 [ 223.426652] x64_sys_call+0x1d7c/0x20d0 [ 223.426661] do_syscall_64+0x87/0x140 [ 223.426671] ? srso_return_thunk+0x5/0x5f [ 223.426679] ? common_nsleep+0x44/0x50 [ 223.426690] ? srso_return_thunk+0x5/0x5f [ 223.426698] ? trace_hardirqs_off+0x52/0xd0 [ 223.426709] ? srso_return_thunk+0x5/0x5f [ 223.426717] ? syscall_exit_to_user_mode+0xcc/0x200 [ 223.426727] ? srso_return_thunk+0x5/0x5f [ 223.426736] ? do_syscall_64+0x93/0x140 [ 223.426748] ? srso_return_thunk+0x5/0x5f [ 223.426756] ? up_write+0x1c/0x1e0 [ 223.426765] ? srso_return_thunk+0x5/0x5f [ 223.426775] ? srso_return_thunk+0x5/0x5f [ 223.426783] ? trace_hardirqs_off+0x52/0xd0 [ 223.426792] ? srso_return_thunk+0x5/0x5f [ 223.426800] ? syscall_exit_to_user_mode+0xcc/0x200 [ 223.426810] ? srso_return_thunk+0x5/0x5f [ 223.426818] ? do_syscall_64+0x93/0x140 [ 223.426826] ? syscall_exit_to_user_mode+0xcc/0x200 [ 223.426836] ? srso_return_thunk+0x5/0x5f [ 223.426844] ? do_syscall_64+0x93/0x140 [ 223.426853] ? srso_return_thunk+0x5/0x5f [ 223.426861] ? irqentry_exit+0x6b/0x90 [ 223.426869] ? srso_return_thunk+0x5/0x5f [ 223.426877] ? exc_page_fault+0xa7/0x2c0 [ 223.426888] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 223.426898] RIP: 0033:0x7f46758eab57 [ 223.426906] Code: ba 04 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 41 89 c0 85 c0 75 2c 64 48 8b 04 25 10 00 [ 223.426930] RSP: 002b:00007fff5c3e5188 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 [ 223.426943] RAX: ffffffffffffffda RBX: 00007f4675f8c040 RCX: 00007f46758eab57 [ 223.426954] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 [ 223.426965] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 223.426975] R10: 00007f4675e81a50 R11: 0000000000000246 R12: 0000000000000001 [ 223.426986] R13: 00007fff5c3e5470 R14: 00007fff5c3e53e0 R15: 00007fff5c3e5410 [ 223.427004] </TASK> v2: To resolve this issue, the allocation of the process context buffer (`proc_ctx_bo`) has been moved from the `add_queue_mes` function to the `pqm_create_queue` function. This change ensures that the buffer is allocated only when the first queue for a process is created and only if the Micro Engine Scheduler (MES) is enabled. (Felix) v3: Fix typo s/Memory Execution Scheduler (MES)/Micro Engine Scheduler in commit message. (Lijo) Fixes: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd") Cc: Jesse Zhang <jesse.zhang@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Philip Yang <Philip.Yang@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/msm/a6xx: Add support for Adreno 623Jie Zhang
Add support for Adreno 623 GPU found in QCS8300 chipsets. Signed-off-by: Jie Zhang <quic_jiezh@quicinc.com> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/640056/ Signed-off-by: Rob Clark <robdclark@chromium.org>
2025-02-27drm/msm/a6xx: Fix gpucc register block for A621Jie Zhang
Adreno 621 has a different memory map for GPUCC block. So update a6xx_gpu_state code to dump the correct set of gpucc registers. Signed-off-by: Jie Zhang <quic_jiezh@quicinc.com> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/640055/ Signed-off-by: Rob Clark <robdclark@chromium.org>
2025-02-27drm/msm/a6xx: Split out gpucc register blockJie Zhang
Some GPUs have different memory map for GPUCC block. So split out the gpucc range from a6xx_gmu_cx_registers to a separate block to accommodate those GPUs. Signed-off-by: Jie Zhang <quic_jiezh@quicinc.com> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/640052/ Signed-off-by: Rob Clark <robdclark@chromium.org>
2025-02-27drm/msm/gem: Fix error code msm_parse_deps()Dan Carpenter
The SUBMIT_ERROR() macro turns the error code negative. This extra '-' operation turns it back to positive EINVAL again. The error code is passed to ERR_PTR() and since positive values are not an IS_ERR() it eventually will lead to an oops. Delete the '-'. Fixes: 866e43b945bf ("drm/msm: UAPI error reporting") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/637625/ Signed-off-by: Rob Clark <robdclark@chromium.org>
2025-02-27drm/amdgpu: Add amdisp pinctrl MFD resourceBenjamin Chan
AMDISP GPIO control uses a dedicated pinctrl driver, and requires MFD hotadd GPIO resources. Co-developed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Benjamin Chan <benjamin.chan@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() callsAlex Deucher
They are noops on GFX12. There is no suspend/resume all support in firmware so the function doesn't do anything. KFD already handles its own queues and they should already be unmapped at this point so even if this runs, it's not doing anything. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amd/display: Remove unused optc3_fpu_set_vrr_m_constDr. David Alan Gilbert
The last use of optc3_fpu_set_vrr_m_const() was removed in 2022's commit 64f991590ff4 ("drm/amd/display: Fix a compilation failure on PowerPC caused by FPU code") which removed the only caller (with a similar) name. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amdgpu: Replace DRM_ERROR() with drm_err()Pratap Nirujogi
DRM_ERROR() is no longer preferred. Replace DRM_ERROR() usage with drm_err() in isp driver. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27drm/amd/display/dc: Refactor remove duplicationsLuan Arcanjo
All dce command_table_helper's shares a copy-pasted collection of copy-pasted functions, which are: phy_id_to_atom, clock_source_id_to_atom_phy_clk_src_id, and engine_bp_to_atom. This patch removes the multiple copy-pasted by moving them to the command_table_helper.c and make the command_table_helper's calls the functions implemented by the command_table_helper.c instead. The changes were not tested on actual hardware. I am only able to verify that the changes keep the code compileable and do my best to to look repeatedly if I am not actually changing any code. This is the version 4 of the PATCH, fixed comments about licence in the new files and the matches From email to Signed-off-by email. Fixed comments about using command_table_helper instead of creating a dce_common Signed-off-by: Luan Icaro Pinto Arcanjo <luanicaro@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>