summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2024-04-16drm/amd/swsmu: support smu block discovery for smu v14Kenneth Feng
Support for smu ip block add for SMU v14. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: rename DBG_DRV to HAD_DRV for psp v14Hawking Zhang
Add a psp bl command enum for HAD_DRV. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: refactoring the runtime pm mode detection codeMa Jun
refactor the code of runtime pm mode detection to support amdgpu_runtime_pm =2 and 1 two cases Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: Load ipkeymgr drv for psp v14Hawking Zhang
while DBG_DRV is renamed to HAD_DRV for psp v14, part of its APIs/functionality is moved to a new component named Ipkeymgr_Drv. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: Add missing space to DRM_WARN() messageThorsten Blum
s/,please/, please/ Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: Fix discovery initialization failure during pci rescanMa Jun
Waiting for system ready to fix the discovery initialization failure issue. This failure usually occurs when dGPU is removed and then rescanned via command line. It's caused by following two errors: [1] vram size is 0 [2] wrong binary signature Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: fix visible VRAM handling during faultsChristian König
When we removed the hacky start code check we actually didn't took into account that *all* VRAM pages needs to be CPU accessible. Clean up the code and unify the handling into a single helper which checks if the whole resource is CPU accessible. The only place where a partial check would make sense is during eviction, but that is neglitible. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org
2024-04-16drm/amdgpu: validate the parameters of bo mapping operations more clearlyxinhui pan
Verify the parameters of amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place. Fixes: dc54d3d1744d ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2") Cc: stable@vger.kernel.org Reported-by: Vlad Stolyarov <hexed@google.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: add new aca smu callback func parse_error_code()Yang Wang
add new aca smu callback parse_error_code{} to avoid specific asic check in amdgpu_aca.c file Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: increase mes submission timeoutJonathan Kim
MES internally has a timeout allowance of 2 seconds. Increase driver timeout to 3 seconds to be safe. Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu: add missing vbios version from devcoredumpSunil Khatri
Add vbios version in the devcoredump along with formatting the information with proper alignment. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16drm/amdgpu/gfx11: properly handle regGRBM_GFX_CNTL in soft resetAlex Deucher
Need to take the srbm_mutex and while we are here, use the helper function soc21_grbm_select(); Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-12drm/amdgpu: remove invalid resource->start check v2Christian König
The majority of those where removed in the commit aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") But this one was missed because it's working on the resource and not the BO. Since we also no longer use a fake start address for visible BOs this will now trigger invalid mapping errors. v2: also remove the unused variable Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") CC: stable@vger.kernel.org Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-12drm/amdgpu/sdma6: set sdma hang watchdogJack Xiao
Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers to improve SDMA reliability. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-12drm/amd/amdgpu: Update PF2VF HeaderLuqmaan Irshad
Adding a new field for GPU Capacity to align the header with the host. Signed-off-by: Luqmaan Irshad <Luqmaan.Irshad@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-10drm/amdgpu: differentiate external rev id for gfx 11.5.0Yifan Zhang
This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-04-09drm/amdgpu: fix incorrect number of active RBs for gfx11Tim Huang
The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-04-09drm/amdgpu: clear set_q_mode_offs when VM changedZhenGuo Yin
[Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-04-09drm/amdgpu: Fix VCN allocation in CPX partitionLijo Lazar
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In certain configs, VCN instance can be exclusively allocated to a partition even under CPX mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amd/pm: fix the high voltage issue after unloadKenneth Feng
fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2Tao Zhou
SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: add smu 14.0.1 discovery supportYifan Zhang
This patch to add smu 14.0.1 support Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu : Increase the mes log buffer size as per new MES FW versionshaoyunl
From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu : Add mes_log_enable to control mes log featureshaoyunl
The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: Reset dGPU if suspend got abortedLijo Lazar
For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-04-09drm/amdgpu/umsch: reinitialize write pointer in hw initLang Yu
Otherwise the old one will be used during GPU reset. That's not expected. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-04-09drm/amdgpu: Refine IB schedule error loggingLijo Lazar
Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: always force full reset for SOC21Alex Deucher
There are cases where soft reset seems to succeed, but does not, so always use mode1/2 for now. Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-04-09drm/amdgpu: differentiate external rev id for gfx 11.5.0Yifan Zhang
This patch to differentiate external rev id for gfx 11.5.0. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: fix incorrect number of active RBs for gfx11Tim Huang
The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09amd/amdgpu: improve VF recover timeZhigang Luo
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. 2. set fatel error detected flag. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: Set fatal errror detected flag earlierLijo Lazar
In case of fatal errors, set FED status when interrupt is received. Set the flag on other devices in the hive before RAS recovery work. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: clear set_q_mode_offs when VM changedZhenGuo Yin
[Why] set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE packet to init shadow memory will be skiped, hence there has a page fault. [How] VM flush is needed after GPU reset, clear set_q_mode_offs when emitting VM flush. Fixes: 8bc75586ea01 ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: retire UMC v12 mca_addr_to_paTao Zhou
RAS TA will handle it, the function is useless. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriovchongli2
support MES command SET_HW_RESOURCE1 in sriov Signed-off-by: chongli2 <chongli2@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: update check condition for XGMI ACA UETao Zhou
Check more possible ext error codes. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: Fix VCN allocation in CPX partitionLijo Lazar
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In certain configs, VCN instance can be exclusively allocated to a partition even under CPX mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: Add support for BAMACO mode checkingMa Jun
Optimize the code to add support for BAMACO mode checking Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: Bypass asd if display hw is not availableHawking Zhang
ASD is not needed by headless GPU. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu/pm: Add support for MACO flag checkingMa Jun
Add support for MACO flag checking. MACO mode only works if BACO is supported. Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amd/pm: fix the high voltage issue after unloadKenneth Feng
fix the high voltage issue after unload on smu 13.0.10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwardsDanijel Slivka
Apply this rule to all newer asics in sriov case. For asic with VF MMIO access protection avoid using CPU for VM table updates. CPU pagetable updates have issues with HDP flush as VF MMIO access protection blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register during sriov runtime. Moved the check to amdgpu_device_init() to ensure it is done after amdgpu_device_ip_early_init() where the IP versions are discovered. Signed-off-by: Danijel Slivka <danijel.slivka@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amd/amdgpu: add pipe1 hardware supportArunpravin Paneer Selvam
Enable pipe1 support starting from SIENNA CICHLID asic Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: select HDP ref/mask according to gfx ring pipeZhenGuo Yin
Use correct ref/mask for differnent gfx ring pipe. This should fix the gfx hang issue after enabling gfx pipe1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2Tao Zhou
SDMA_CNTL is not set in some cases, driver configures it by itself. v2: simplify code Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: add smu 14.0.1 discovery supportYifan Zhang
This patch to add smu 14.0.1 support Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu : Increase the mes log buffer size as per new MES FW versionshaoyunl
From MES version 0x54, the log entry increased and require the log buffer size to be increased. The 16k is maximum size agreed Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu : Add mes_log_enable to control mes log featureshaoyunl
The MES log might slow down the performance for extra step of log the data, disable it by default and introduce a parameter can enable it when necessary Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: refine function signature of amdgpu_aca_get_error_data()Yang Wang
refine function signature of amdgpu_aca_get_error_data(); Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09drm/amdgpu: Reset dGPU if suspend got abortedLijo Lazar
For SOC21 ASICs, there is an issue in re-enabling PM features if a suspend got aborted. In such cases, reset the device during resume phase. This is a workaround till a proper solution is finalized. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org