summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-21drm/amdgpu: rename enforce isolation variablesAlex Deucher
Since they will be used for both KFD and KGD user queues, rename them from kfd to userq. No intended functional change. Acked-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: add helpers to start/stop schedulingAlex Deucher
This will be used to stop/start user queue scheduling for example when switching between kernel and user queues when enforce isolation is enabled. v2: use idx v3: only stop compute/gfx queues Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: track the xcp_id associated with the queueAlex Deucher
Track this to align with KFD for enforce isolation handling. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Clear overflow for SRIOVEmily Deng
For VF, it doesn't have the permission to clear overflow, clear the bit by reset. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: rework driver parameterAlex Deucher
Replace disable_kq parameter with user_queue parameter. The parameter has the following logic: -1 = auto (ASIC specific default) 0 = user queues disabled 1 = user queues enabled and kernel queues enabled (if supported) 2 = user queues enabled and kernel queues disabled The default behavior (-1) is currently the same as 0 for current ASICs. To enable user queues (in addition to kernel queues) set user_queue=1. To enable user queues and disable kernel queues (to make all resources available to user queues), set user_queue=2. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amd/pm: Enable host limit metrics supportAsad Kamal
Enable host limit metrics support for smuv_13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/sdma7: properly reference trap interrupts for userqsAlex Deucher
We need to take a reference to the interrupts to make sure they stay enabled even if the kernel queues have disabled them. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/sdma6: properly reference trap interrupts for userqsAlex Deucher
We need to take a reference to the interrupts to make sure they stay enabled even if the kernel queues have disabled them. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amd/pm: Enable host limit metrics supportAsad Kamal
Enable host limit metrics support for smuv_13_0_6 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Enable doorbell for JPEG5_0_1Sathishkumar S
Enable doorbell for JPEG5_0_1 and adjust index for VCN5_0_1. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Update vcn doorbell range in NBIO 7.9Shiwu Zhang
Increase vcn doorbell range for gfx950 to 11. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/gfx12: properly reference EOP interrupts for userqsAlex Deucher
Regardless of whether we disable kernel queues, we need to take an extra reference to the pipe interrupts for user queues to make sure they stay enabled in case we disable them for kernel queues. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/gfx11: properly reference EOP interrupts for userqsAlex Deucher
Regardless of whether we disable kernel queues, we need to take an extra reference to the pipe interrupts for user queues to make sure they stay enabled in case we disable them for kernel queues. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdkfd: fix a bug of smi event for superuserEric Huang
rocm-smi with superuser permission doesn't show some of smi events, i.e. page fault/migration, because the condition of "(events & all)" is false. Superuser should be able to detect all events, the condiiton of "(events & all)" seems redundant, so removing it will fix the issue. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: add missing DCE6 to dce_version_to_string()Alexandre Demers
Missing DCE 6.0 6.1 and 6.4 are identified as UNKNOWN. Fix this. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: fix typo in bios_parser.cAlexandre Demers
Probably a cut and paste error from using get_integrated_info_v8's comment. This has to be get_integrated_info_v9 Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: fix duplicated value setting in dce100_resource_construct()Alexandre Demers
i2c_speed_in_khz was set twice with the same values. Looking at other DCE versions, we probably wanted to set the value for i2c_speed_in_khz_hdcp. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/radeon: fix typo in atombios.hAlexandre Demers
"aligned" not "aligend" Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: fix typo in atombios.hAlexandre Demers
"aligned" not "aligend" Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: add missing parameter name in dce110_clk_src_construct() declarationAlexandre Demers
While not needed per speaking, all the other parameters have names but this one. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: rename function to follow naming convention in dce110Alexandre Demers
The prefix dce110 is used on all functions, but init_pipes() and init_hw(). Under DCN, these sames functions are prefixed. Let's keep thing coherent. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Clean up error handling in amdgpu_userq_fence_driver_alloc()Dan Carpenter
1) Checkpatch complains if we print an error message for kzalloc() failure. The kzalloc() failure already has it's own error messages built in. Also this allocation is small enough that it is guaranteed to succeed. 2) Return directly instead of doing a goto free_fence_drv. The "fence_drv" is already NULL so no cleanup is necessary. Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Fix double free in amdgpu_userq_fence_driver_alloc()Dan Carpenter
The goto frees "fence_drv" so this is a double free bug. There is no need to call amdgpu_seq64_free(adev, fence_drv->va) since the seq64 allocation failed so change the goto to goto free_fence_drv. Also propagate the error code from amdgpu_seq64_alloc() instead of hard coding it to -ENOMEM. Fixes: e7cf21fbb277 ("drm/amdgpu: Few optimization and fixes for userq fence driver") Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: move runpm handling into core userq codeAlex Deucher
Pull it out of the MES code and into the generic code. It's not MES specific and needs to be applied to all user queues regardless of the backend. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdkfd: fix NULL check mistake for process smi eventEric Huang
The mistake will lead to NULL kernel oops, so fix it. Fixes: 4172b556fd5b ("drm/amdkfd: add smi events for process start and end") Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/sdma_v4: Register the new sdma function pointersJesse.zhang@amd.com
Register stop/start/soft_reset queue functions for sdma v4_4_2. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.hJesse.zhang@amd.com
This patch introduces new function pointers in the amdgpu_sdma structure to handle queue stop, start and soft reset operations. These will replace the older callback mechanism. The new functions are: - stop_kernel_queue: Stops a specific SDMA queue - start_kernel_queue: Starts/Restores a specific SDMA queue - soft_reset_kernel_queue: Performs soft reset on a specific SDMA queue v2: Update stop_queue/start_queue function paramters to use ring pointer instead of device/instance(Chritian) v3: move stop_queue/start_queue to struct amdgpu_sdma_instance and rename them. (Alex) v4: rework the ordering a bit (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: don't swallow errors in amdgpu_userqueue_resume_all()Alex Deucher
since we loop through the queues |= the errors. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: handle system suspend and resumeAlex Deucher
Unmap user queues on suspend and map them on resume. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: add suspend and resume helpersAlex Deucher
Add helpers to unmap and map user queues on suspend and resume. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: properly clean up userq fence driver on failureAlex Deucher
If userq creation fails, we need to properly unwind and free the user queue fence driver. v2: free idr as well (Sunil) Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: move some code aroundAlex Deucher
Move some userq fence handling code into amdgpu_userq_fence.c. This matches the other code in that file. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: rework front end call sequenceAlex Deucher
Split out the queue map from the mqd create call and split out the queue unmap from the mqd destroy call. This splits the queue setup and teardown with the actual enablement in the firmware. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: rename suspend/resume callbacksAlex Deucher
Rename to map and umap to better align with what is happening at the firmware level and remove the extra level of indirection in the MES userq code. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq/mes: remove unused headerAlex Deucher
This is unused so remove it. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-13drm/amdkfd: Add rec SDMA engines support with limited XGMIShane Xiao
This patch adds recommended SDMA engines with limited XGMI SDMA engines. It will help improve overall performance for device to device copies with this optimization. v2: Update the formatting issues and data type Signed-off-by: Shane Xiao <shane.xiao@amd.com> Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: Enhance Cleaner Shader Handling in GFX v9.0 Architecture v2Srinivasan Shanmugam
This commit modifies the gfx_v9_0_ring_emit_cleaner_shader function to use a switch statement for cleaner shader emission based on the specific GFX IP version. The function now distinguishes between different IP versions, using PACKET3_RUN_CLEANER_SHADER_9_0 for the versions 9.0.1, 9.1.0, 9.2.1, 9.2.2, 9.3.0, and 9.4.0, while retaining PACKET3_RUN_CLEANER_SHADER for version 9.4.2. v2: Simplify logic (Alex). Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER_9_0 for Cleaner Shader executionSrinivasan Shanmugam
This commit introduces the PACKET3_RUN_CLEANER_SHADER_9_0 definition, which is a command packet utilized to instruct the GPU to execute the cleaner shader for the GFX9.0 graphics architecture. The cleaner shader is a piece of GPU code that is responsible for clearing or initializing essential GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Properly clearing these resources is vital for ensuring data isolation and security between different workloads executed on the GPU. When the GPU receives this packet, it fetches and runs the cleaner shader instructions from the specified location in the packet. Thus by preventing data leaks and ensuring that previous job states do not interfere with subsequent workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amd/amdgpu: Fix out of bounds warning in amdgpu_hw_ip_infoJesse Zhang
Fix an array index out of bounds warning in the DMA IP case of amdgpu_hw_ip_info() where it was incorrectly checking adev->gfx.gfx_ring[i].no_user_submission instead of adev->sdma.instance[i].ring.no_user_submission. The mismatch caused UBSAN to report an array bounds violation since it was accessing the GFX ring array with SDMA instance indices. Fixes: 4310acd4464b ("drm/amdgpu: add ring flag for no user submissions") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdkfd: add smi events for process start and endEric Huang
rocm-smi will be able to show the events for KFD process start/end, it is the implementation of this feature. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: Use the right function for hdp flushLijo Lazar
There are a few prechecks made before HDP flush like a flush is not required on APU bare metal. Using hdp callback directly bypasses those checks. Use amdgpu_device_flush_hdp which takes care of prechecks. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: Direct ret in ras_reset_err_cnt on VFEllen Pan
With adding sriov_vf check, we directly return EOPNOTSUPP in ras_reset_error_count as we should not do anything on VF to reset RAS error count. This also fixes the issue that loading guest driver causes register violations. Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: Use generic hdp flush functionLijo Lazar
Except HDP v5.2 all use a common logic for HDP flush. Use a generic function. HDP v5.2 forces NO_KIQ logic, revisit it later. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: Set RAS EEPROM table version to v3 for umc v12_5Candice Li
Set RAS EEPROM table version to v3 for umc v12_5. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: Enable per-queue reset for SDMA v4.4.2 on IP v9.5.0Jesse.zhang@amd.com
Add support for per-queue reset on SDMA v4.4.2 when running with: 1. MEC firmware version 17 or later 2. DPM indicates SDMA reset is supported v2: Fixed supported firmware versions (Lijo) Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5.2/11.5.3 GPUsSrinivasan Shanmugam
Enable the cleaner shader for additional GFX11.5.2/11.5.3 series GPUs to ensure data isolation among GPU tasks. The cleaner shader is tasked with clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid data leakage and guarantees the accuracy of computational results. This update extends cleaner shader support to GFX11.5.2/11.5.3 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Mario Sopena-Novales <mario.novales@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amd/display/dml2: use vzalloc rather than kzallocAlex Deucher
The structures are large and they do not require contiguous memory so use vzalloc. Fixes: 70839da63605 ("drm/amd/display: Add new DCN401 sources") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4126 Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11Documentation/amdgpu: Add Ryzen AI 350 series processorsMario Limonciello
These have been announced so add them to the table. Link: https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-7-350.html Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amd/display: Add htmldocs description for fused_io interfaceRoman Li
[Why] htmldocs build warning: "Function parameter or struct member 'fused_io' not described in 'amdgpu_display_manager'". [How] Add missing description. Fixes: ce801e5d6c1b ("drm/amd/display: HDCP Locality check using DMUB Fused IO") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Roman Li <Roman.Li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: adjust enforce_isolation handlingAlex Deucher
Switch from a bool to an enum and allow more options for enforce isolation. There are now 3 modes of operation: - Disabled (0) - Enabled (serialization and cleaner shader) (1) - Enabled in legacy mode (no serialization or cleaner shader) (2) This provides better flexibility for more use cases. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>