summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-02-25drm/amdgpu: update the handle ptr in is_idleSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of is_idle. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-21drm/amdgpu: remove all KFD fences from the BO on releaseChristian König
Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: update the handle ptr in get_clockgating_stateSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of get_clockgating_state. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add clear DCC and Tiling callback for DCERodrigo Siqueira
Introduce the DCC and Tiling reset callback to all DCE versions that can call it. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdkfd: Fix error handling for missing PASID in 'kfd_process_device_init_vm'Srinivasan Shanmugam
In the kfd_process_device_init_vm function, a valid error code is now returned when the associated Process Address Space ID (PASID) is not present. If the address space virtual memory (avm) does not have an associated PASID, the function sets the ret variable to -EINVAL before proceeding to the error handling section. This ensures that the calling function, such as kfd_ioctl_acquire_vm, can appropriately handle the error, thereby preventing any issues during virtual memory initialization. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1694 kfd_process_device_init_vm() warn: missing error code 'ret' drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c 1647 int kfd_process_device_init_vm(struct kfd_process_device *pdd, 1648 struct file *drm_file) 1649 { ... 1690 1691 if (unlikely(!avm->pasid)) { 1692 dev_warn(pdd->dev->adev->dev, "WARN: vm %p has no pasid associated", 1693 avm); --> 1694 goto err_get_pasid; ret = -EINVAL? 1695 } Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Reported by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Xiaogang Chen <xiaogang.chen@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Remove redundant check of adevXiang Liu
There is no need to check adev for sure. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Check aca enabled inside cper init/fini funcXiang Liu
Move code about checking aca enabled to the cper init/fini function to make code clean. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Use firmware supported NPS modesLijo Lazar
If firmware supported NPS modes are available through CAP register, use those values for supported NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/pm: Fetch current power limit from PMFWLijo Lazar
On SMU v13.0.12, always query the firmware to get the current power limit as it could be updated through other means also. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Add ring reset callback for JPEG4_0_3Sathishkumar S
Add ring reset function callback for JPEG4_0_3 to recover from job timeouts without a full gpu reset. V2: - sched->ready flag shouldn't be modified by HW backend (Christian) V3: - Dont modifying sched/job-submission state from HW backend (Christian) - Implement per-core reset sequence V4: - Dont create reset_mask sysfs and return -EOPNOTSUPP on VFs (Lijo) Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Add JPEG4_0_3 core reset control regSathishkumar S
Add core reset control registers for JPEG4_0_3 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid ↵Srinivasan Shanmugam
Priority Inversion in SRIOV RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Zhigang Luo <zhigang.luo@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: 3.2.321Taimur Hassan
Summary: * Add support for disconnected eDP streams * Add log for MALL entry on DCN32x * Add DCC/Tiling reset helper for DCN and DCE * Guard against setting dispclk low when active * Other minor fixes Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add support for disconnected eDP streamsHarry VanZyllDeJong
[Why] eDP may not be connected to the GPU on driver start causing fail enumeration. [How] Move the virtual signal type check before the eDP connector signal check. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Harry VanZyllDeJong <hvanzyll@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: dpia should avoid encoder used by dp2Peichen Huang
[WHY] In current HPO DP2 implementation, driver would enable/disable DIG encoder when configuring HPO DP2. Therefore, usb4 dp tunnelling should not use the DIG encoder if the corresponded phy is used by a HPO DP2 stream. [HOW] A DP2 stream is treated as a dig stream. Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Guard against setting dispclk low when activeNicholas Kazlauskas
[Why] We should never apply a minimum dispclk value while in prepare_bandwidth or while displays are active. This is always an optimization for when all displays are disabled. [How] Defer dispclk optimization until safe_to_lower = true and display_count reaches 0. Since 0 has a special value in this logic (ie. no dispclk required) we also need adjust the logic that clamps it for the actual request to PMFW. Reviewed-by: Gabe Teeger <gabe.teeger@amd.com> Reviewed-by: Leo Chen <leo.chen@amd.com> Reviewed-by: Syed Hassan <syed.hassan@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Fix BT2020 YCbCr limited/full range inputIlya Bakoulin
[Why] BT2020 YCbCr input is not handled properly when full range quantization is used and limited range is not supported at all. [How] - Add enums for BT2020 YCbCr limited/full range - Add limited range CSC matrix Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Robert Mader <robert.mader@collabora.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add log for MALL entry on DCN32xAurabindo Pillai
[Why&How] Add a dyndbg log entry to check whether the driver requested scanout from MALL cache to PMFW via DMCUB Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add total_num_dpps_required field to informative structureOleh Kuzhylnyi
[Why] The informative structure needs to be extended by the total number of DPPs required per each active plane. The new informative field is going to be used as a statistical indicator. [How] The dml2_core_calcs_get_informative() routine must count a total number of DPPs. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Oleh Kuzhylnyi <okuzhyln@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Read LTTPR ALPM caps during link cap retrievalGeorge Shen
[Why] The latest DP spec requires the DP TX to read DPCD F0000h through F0009h when detecting LTTPR capabilities for the first time. [How] Update LTTPR cap retrieval to read up to F0009h (two more bytes than the previous F0007h), and store the LTTPR ALPM capabilities. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Print seamless boot message in mark_seamless_boot_streamAlex Hung
[WHAT & HOW] Add a message so users know the stream will be used for seamless boot. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add clear DCC and Tiling callback for DCNRodrigo Siqueira
Introduce the DCC and Tiling reset callback to all DCN versions that can call it. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Rename panic functionRodrigo Siqueira
Rename dc_plane_force_update_for_panic to dc_plane_force_dcc_and_tiling_disable to describe the function operation in the name. Also, this function might be used in other contexts, and a more generic name can be helpful for this purpose. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Add DCC/Tiling reset helper for DCN and DCERodrigo Siqueira
This commit introduces a function helper for resetting DCN/DCE DCC and tiling. Those functions are generic for their respective DCN/DCE, so they were added to the oldest version of each architecture. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19Revert "drm/amd/display: Request HW cursor on DCN3.2 with SubVP"Leo Zeng
This reverts commit 13437c91606c9232c747475e202fe3827cd53264. Reason to revert: idle power regression found in testing. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Leo Zeng <Leo.Zeng@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Don't treat wb connector as physical in ↵Harry Wentland
create_validate_stream_for_sink Don't try to operate on a drm_wb_connector as an amdgpu_dm_connector. While dereferencing aconnector->base will "work" it's wrong and might lead to unknown bad things. Just... don't. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-19drm/amd/display: Exit idle optimizations before accessing PHYOvidiu Bunea
[why & how] By default, DCN HW is in idle optimized state which does not allow access to PHY registers. If BIOS powers up the DCN, it is fine because they will power up everything. Only exit idle optimized state when not taking control from VBIOS. Fixes: be704e5ef4bd ("Revert "drm/amd/display: Exit idle optimizations before attempt to access PHY"") Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Generate bad page threshold cper recordsXiang Liu
Generate CPER record when bad page threshold exceed and commit to CPER ring. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Commit CPER entryXiang Liu
Commit the CPER entry to the ring buffer. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: add mutex lock for cper ringTao Zhou
Avoid the confliction between read and write of ring buffer. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amd/pm: Limit jpeg rings as per max for jpeg_v_4_0_3Asad Kamal
Since pmfw supports for smuv13_0_6 is limited to 8 jpeg rings per instance, which is the max for jpeg_v_4_0_3. Limit it to same to avoid out of bound access. Fixes: 568199a5c7a9 ("drm/amd/pm: Limit to 8 jpeg rings per instance") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: add data write function for CPER ringTao Zhou
Old CPER data will be overwritten if ring buffer is full, and read pointer always points to CPER header. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: read CPER ring via debugfsTao Zhou
We read CPER data from read pointer to write pointer without changing the pointers. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: add RAS CPER ring bufferTao Zhou
And initialize it, this is a pure software ring to store RAS CPER data. v2: change ring size to 0x100000 v2: update the initialization of count_dw of cper ring, it's dword variable v3: skip VM inv eng for cper v3: init/fini when aca enabled Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Get timestamp from system timeXiang Liu
Get system local time and encode it to timestamp for CPER. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu/mes12: allocate hw_resource_1 buffer onceAlex Deucher
Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu/mes11: allocate hw_resource_1 buffer onceAlex Deucher
Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amd/display: Reapply 2fde4fdddc1fNathan Chancellor
Commit 2563391e57b5 ("drm/amd/display: DML2.1 resynchronization") blew away the compiler warning fix from commit 2fde4fdddc1f ("drm/amd/display: Avoid -Wenum-float-conversion in add_margin_and_round_to_dfs_grainularity()"), causing the warning to reappear. drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c:183:58: error: arithmetic between enumeration type 'enum dentist_divider_range' and floating-point type 'double' [-Werror,-Wenum-float-conversion] 183 | divider = (unsigned int)(DFS_DIVIDER_RANGE_SCALE_FACTOR * (vco_freq_khz / clock_khz)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Apply the fix again to resolve the warning. Fixes: 1b30456150e5 ("drm/amd/display: DML21 Reintegration") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Generate cper recordsHawking Zhang
Encode the error information in CPER format and commit to the cper ring Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdkfd: Fix user queue validation on Gfx7/8Philip Yang
To workaround queue full h/w issue on Gfx7/8, when application create AQL queue, the ring buffer bo allocate size is queue_size/2 and map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each attachment map size is queue_size/2, with same ring_bo backing memory. For Gfx7/8, user queue buffer validation should use queue_size/2 to verify ring_bo allocation and mapping size. Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers") Suggested-by: Tomáš Trnka <trnka@scm.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Introduce funcs for generating cper recordHawking Zhang
Introduce new functions that are used to generate cper ue or ce records. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Include ACA error type in aca bankHawking Zhang
ACA error types managed by driver a direct 1:1 correspondence with those managed by firmware. To address this, for each ACA bank, include both the ACA error type and the ACA SMU type. This addition is useful for creating CPER records. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Optimize the enablement of GECCCandice Li
Enable GECC only when the default memory ECC mode or the module parameter amdgpu_ras_enable is activated. v2: Add kernel message to remind users explicitly set amdgpu_ras_enable=1 before driver loading to enable GECC and set amdgpu_ras_enable=0 to disable GECC when GECC is currently enabled if needed. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Introduce funcs for populating CPERHawking Zhang
Introduce utility functions designed to assist in populating CPER records. v2: call cper_init/fini in device_ip_init/fini. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amd/include: Add amd cper headerHawking Zhang
AMD is using Common Platform Error Record (CPER) format to report all gpu hardware errors. v2: add program attribute Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu: Rename VCN clock gating function for consistencySrinivasan Shanmugam
Change the function name from vcn_v2_5_enable_clock_gating_inst to vcn_v2_5_enable_clock_gating to ensure consistency in naming. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:781: warning: expecting prototype for vcn_v2_5_enable_clock_gating_inst(). Prototype was for vcn_v2_5_enable_clock_gating() instead Cc: Leo Liu <leo.liu@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu/vcn4.0.3: drop dpm power helpersAlex Deucher
VCN 4.0.3 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu/vcn5.0.1: drop dpm power helpersAlex Deucher
VCN 5.0.1 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17drm/amdgpu/vcn5.0.1: use correct dpm helperAlex Deucher
The VCN and UVD helpers were split in commit ff69bba05f08 ("drm/amd/pm: add inst to dpm_set_powergating_by_smu") However, this happened in parallel to the vcn 5.0.1 development so it was missed there. Fixes: 346492f30ce3 ("drm/amdgpu: Add VCN_5_0_1 support") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sonny Jiang <sonjiang@amd.com> Cc: Boyuan Zhang <boyuan.zhang@amd.com>
2025-02-17drm/amdgpu/umsch: tidy up the ucode name string handlingAlex Deucher
Make the constant parts of the name part of the string we pass to amdgpu_ucode_request(). Only the version number varies from IP to IP. Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lang Yu <Lang.Yu@amd.com>