summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2025-07-16drm/amdgpu: Check SQ_CONFIG register support on SRIOVTony Yi
On SRIOV environments, check if RLCG supports SQ_CONFIG register programming. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: track ring state associated with a fenceAlex Deucher
We need to know the wptr and sequence number associated with a fence so that we can re-emit the unprocessed state after a ring reset. Pre-allocate storage space for the ring buffer contents and add helpers to save off and re-emit the unprocessed state so that it can be re-emitted after the queue is reset. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: clean up GC reset functionsAlex Deucher
Make them consistent and use the reset flags. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: clean up jpeg reset functionsAlex Deucher
Make them consistent and use the reset flags. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/vcn: don't enable per queue resets on SR-IOVAlex Deucher
Power control is only available in bare metal. SR-IOV will need a different method. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu/jpeg4: add additional ring reset error checkingAlex Deucher
Start and stop can fail, so add checks. Fixes: 74894ffc7d0c ("drm/amdgpu: Add ring reset callback for JPEG4_0_0") Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sathishkumar S <sathishkumar.sundararaju@amd.com>
2025-07-16drm/amdgpu/jpeg3: add additional ring reset error checkingAlex Deucher
Start and stop can fail, so add checks. Fixes: 03399d0bff25 ("drm/amdgpu: Add ring reset callback for JPEG3_0_0") Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sathishkumar S <sathishkumar.sundararaju@amd.com>
2025-07-16drm/amdgpu/jpeg2: add additional ring reset error checkingAlex Deucher
Start and stop can fail, so add checks. Fixes: 500c04d2a708 ("drm/amdgpu: Add ring reset callback for JPEG2_0_0") Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sathishkumar S <sathishkumar.sundararaju@amd.com>
2025-07-16drm/amdgpu: clean up sdma reset functionsAlex Deucher
Make them consistent and drop unneeded extra variables. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: Fix missing unlocking in an error path in amdgpu_userq_create()Christophe JAILLET
If kasprintf() fails, some mutex still need to be released to avoid locking issue, as already done in all other error handling path. Fixes: c03ea34cbf88 ("drm/amdgpu: add support of debugfs for mqd information") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/all/366557fa7ca8173fd78c58336986ca56953369b9.1752087753.git.christophe.jaillet@wanadoo.fr/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: Pass along the format info from .fb_create() to ↵Ville Syrjälä
drm_helper_mode_fill_fb_struct() Plumb the format info from .fb_create() all the way to drm_helper_mode_fill_fb_struct() to avoid the redundant lookup. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: amd-gfx@lists.freedesktop.org Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-10-ville.syrjala@linux.intel.com
2025-07-16drm: Allow the caller to pass in the format info to ↵Ville Syrjälä
drm_helper_mode_fill_fb_struct() Soon all drivers should have the format info already available in the places where they call drm_helper_mode_fill_fb_struct(). Allow it to be passed along into drm_helper_mode_fill_fb_struct() instead of doing yet another redundant lookup. Start by always passing in NULL and still doing the extra lookup. The actual changes to avoid the lookup will follow. Done with cocci (with some manual fixups): @@ identifier dev, fb, mode_cmd; expression get_format_info; @@ void drm_helper_mode_fill_fb_struct(struct drm_device *dev, struct drm_framebuffer *fb, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd) { ... - fb->format = get_format_info; + fb->format = info ?: get_format_info; ... } @@ identifier dev, fb, mode_cmd; @@ void drm_helper_mode_fill_fb_struct(struct drm_device *dev, struct drm_framebuffer *fb, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd); @@ expression dev, fb, mode_cmd; @@ drm_helper_mode_fill_fb_struct(dev, fb + ,NULL ,mode_cmd); Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Inki Dae <inki.dae@samsung.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <lumag@kernel.org> Cc: Sean Paul <sean@poorly.run> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Mikko Perttunen <mperttunen@nvidia.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Chia-I Wu <olvaffe@gmail.com> Cc: Zack Rusin <zack.rusin@broadcom.com> Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> Cc: amd-gfx@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: linux-tegra@vger.kernel.org Cc: virtualization@lists.linux.dev Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-6-ville.syrjala@linux.intel.com
2025-07-16drm: Pass the format info to .fb_create()Ville Syrjälä
Pass along the format information from the top to .fb_create() so that we can avoid redundant (and somewhat expensive) lookups in the drivers. Done with cocci (with some manual fixups): @@ identifier func =~ ".*create.*"; identifier dev, file, mode_cmd; @@ struct drm_framebuffer *func( struct drm_device *dev, struct drm_file *file, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd) { ... ( - const struct drm_format_info *info = drm_get_format_info(...); | - const struct drm_format_info *info; ... - info = drm_get_format_info(...); ) <... - if (!info) - return ...; ...> } @@ identifier func =~ ".*create.*"; identifier dev, file, mode_cmd; @@ struct drm_framebuffer *func( struct drm_device *dev, struct drm_file *file, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd) { ... } @find@ identifier fb_create_func =~ ".*create.*"; identifier dev, file, mode_cmd; @@ struct drm_framebuffer *fb_create_func( struct drm_device *dev, struct drm_file *file, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd); @@ identifier find.fb_create_func; expression dev, file, mode_cmd; @@ fb_create_func(dev, file + ,info ,mode_cmd) @@ expression dev, file, mode_cmd; @@ drm_gem_fb_create(dev, file + ,info ,mode_cmd) @@ expression dev, file, mode_cmd; @@ drm_gem_fb_create_with_dirty(dev, file + ,info ,mode_cmd) @@ expression dev, file_priv, mode_cmd; identifier info, fb; @@ info = drm_get_format_info(...); ... fb = dev->mode_config.funcs->fb_create(dev, file_priv + ,info ,mode_cmd); @@ identifier dev, file_priv, mode_cmd; @@ struct drm_mode_config_funcs { ... struct drm_framebuffer *(*fb_create)(struct drm_device *dev, struct drm_file *file_priv, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd); ... }; v2: Fix kernel docs (Laurent) Fix commit msg (Geert) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Inki Dae <inki.dae@samsung.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Cc: Chun-Kuang Hu <chunkuang.hu@kernel.org> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <lumag@kernel.org> Cc: Sean Paul <sean@poorly.run> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: Marek Vasut <marex@denx.de> Cc: Stefan Agner <stefan@agner.ch> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Sandy Huang <hjc@rock-chips.com> Cc: "Heiko Stübner" <heiko@sntech.de> Cc: Andy Yan <andy.yan@rock-chips.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Mikko Perttunen <mperttunen@nvidia.com> Cc: Dave Stevenson <dave.stevenson@raspberrypi.com> Cc: "Maíra Canal" <mcanal@igalia.com> Cc: Raspberry Pi Kernel Maintenance <kernel-list@raspberrypi.com> Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Chia-I Wu <olvaffe@gmail.com> Cc: Zack Rusin <zack.rusin@broadcom.com> Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Cc: amd-gfx@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: virtualization@lists.linux.dev Cc: spice-devel@lists.freedesktop.org Cc: linux-renesas-soc@vger.kernel.org Cc: linux-tegra@vger.kernel.org Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Acked-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-5-ville.syrjala@linux.intel.com
2025-07-16drm/amdgpu: Reset the clear flag in buddy during resumeArunpravin Paneer Selvam
- Added a handler in DRM buddy manager to reset the cleared flag for the blocks in the freelist. - This is necessary because, upon resuming, the VRAM becomes cluttered with BIOS data, yet the VRAM backend manager believes that everything has been cleared. v2: - Add lock before accessing drm_buddy_clear_reset_blocks()(Matthew Auld) - Force merge the two dirty blocks.(Matthew Auld) - Add a new unit test case for this issue.(Matthew Auld) - Having this function being able to flip the state either way would be good. (Matthew Brost) v3(Matthew Auld): - Do merge step first to avoid the use of extra reset flag. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250716075125.240637-2-Arunpravin.PaneerSelvam@amd.com
2025-07-15drm/amdgpu: refine bad page loading when in the same nps modeganglxie
when loading bad page in the same nps mode, need to set the other fields fields in eeprom records manually besides retired_page Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-15drm/amdgpu: refine eeprom data checkganglxie
add eeprom data checksum check before driver unload. reset eeprom and save correct data to eeprom when check failed Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-15drm/amdgpu: The interrupt source was not releasedCe Sun
When the driver is unloaded, the interrupt source of the rma device is not released, resulting in the failure of hw_init when loading again using bad_page_threshold. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-15drm/amdgpu/vcn5: add additional ring reset error checkingAlex Deucher
Start and stop can fail, so add checks. Fixes: b54695dae995 ("drm/amd: Add per-ring reset for vcn v5.0.0 use") Reviewed-by: Mario Limonciello <mari.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com>
2025-07-15drm/amdgpu/vcn4.0.5: add additional ring reset error checkingAlex Deucher
Start and stop can fail, so add checks. Fixes: d1a46cdd0053 ("drm/amd: Add per-ring reset for vcn v4.0.5 use") Reviewed-by: Mario Limonciello <mari.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com>
2025-07-15drm/amdgpu/vcn4: add additional ring reset error checkingAlex Deucher
Start and stop can fail, so add checks. Fixes: b8b6e6f1654d ("drm/amd: Add per-ring reset for vcn v4.0.0 use") Reviewed-by: Mario Limonciello <mari.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com>
2025-07-15drm/amdgpu/gfx10: fix kiq locking in KCQ resetAlex Deucher
The ring test needs to be inside the lock. Fixes: 097af47d3cfb ("drm/amdgpu/gfx10: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
2025-07-15drm/amdgpu/gfx9.4.3: fix kiq locking in KCQ resetAlex Deucher
The ring test needs to be inside the lock. Fixes: 4c953e53cc34 ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
2025-07-15drm/amdgpu/gfx9: fix kiq locking in KCQ resetAlex Deucher
The ring test needs to be inside the lock. Fixes: fdbd69486b46 ("drm/amdgpu/gfx9: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>
2025-07-15drm/amdgpu: Use cached partition mode, if validLijo Lazar
For current partition mode queries, return the mode cached in partition manager whenever it's valid. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-15drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESETMaíra Canal
Among the scheduler's statuses, the only one that indicates an error is DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV signifies that the operation succeeded and the GPU is in a nominal state. However, to provide more information about the GPU's status, it is needed to convey more information than just "OK". Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has hung, but it has been successfully reset and is now in a nominal state again. Reviewed-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-11Merge tag 'amd-drm-next-6.17-2025-07-11' of ↵Simona Vetter
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-11: amdgpu: - Clean up function signatures - GC 10 KGQ reset fix - SDMA reset cleanups - Misc fixes - LVDS fixes - UserQ fix amdkfd: - Reset fix Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250711205548.21052-1-alexander.deucher@amd.com
2025-07-10drm/amdgpu: Fix lifetime of struct amdgpu_task_info after ring resetAndré Almeida
When a ring reset happens, amdgpu calls drm_dev_wedged_event() using struct amdgpu_task_info *ti as one of the arguments. After using *ti, a call to amdgpu_vm_put_task_info(ti) is required to correctly track its lifetime. However, it's called from a place that the ring reset path never reaches due to a goto after drm_dev_wedged_event() is called. Move amdgpu_vm_put_task_info() bellow the exit label to make sure that it's called regardless of the code path. amdgpu_vm_put_task_info() can only accept a valid address or NULL as argument, so initialise *ti to make sure we can call this function if *ti isn't used. Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info") Reported-by: Dave Airlie <airlied@gmail.com> Closes: https://lore.kernel.org/dri-devel/CAPM=9tz0rQP8VZWKWyuF8kUMqRScxqoa6aVdwWw9=5yYxyYQ2Q@mail.gmail.com/ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250704030629.1064397-1-andrealmeid@igalia.com Signed-off-by: André Almeida <andrealmeid@igalia.com>
2025-07-10drm/amdgpu: do not resume device in thaw for normal hibernationSamuel Zhang
For normal hibernation, GPU do not need to be resumed in thaw since it is not involved in writing the hibernation image. Skip resume in this case can reduce the hibernation time. On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory, this can save 50 minutes. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20250710062313.3226149-6-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-07-10drm/amdgpu: move GTT to shmem after eviction for hibernationSamuel Zhang
When hibernate with data center dGPUs, huge number of VRAM BOs evicted to GTT and takes too much system memory. This will cause hibernation fail due to insufficient memory for creating the hibernation image. Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel hibernation code to make room for hibernation image. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250710062313.3226149-3-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-07-09drm/amdgpu: fix the logic to validate fpriv and root boSunil Khatri
Fix the smatch warning, smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:2146 amdgpu_pt_info_read() error: we previously assumed 'fpriv' could be null (see line 2146) "if (!fpriv && !fpriv->vm.root.bo)", It has to be an OR condition rather than an AND which makes an NULL dereference in case fpriv is NULL. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507090525.9rDWGhz3-lkp@intel.com/ Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Link: https://lore.kernel.org/r/20250709071618.591866-1-sunil.khatri@amd.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2025-07-09drm/amdgpu: fix MQD debugfs undefined symbol when DEBUG_FS=nSunil Khatri
Fix undefined reference to amdgpu_mqd_info_fops during debugfs_create_file if DEBUG_FS=n Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Link: https://lore.kernel.org/r/20250708101551.68033-1-sunil.khatri@amd.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2025-07-08Merge remote-tracking branch 'drm/drm-next' into drm-misc-nextMaarten Lankhorst
Pull in drm-intel-next for the updates to drm panic handling. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-07-07drm/amdgpu: fix use-after-free in amdgpu_userq_suspend+0x51a/0x5a0Vitaly Prosyak
[ +0.000020] BUG: KASAN: slab-use-after-free in amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000817] Read of size 8 at addr ffff88812eec8c58 by task amd_pci_unplug/1733 [ +0.000027] CPU: 10 UID: 0 PID: 1733 Comm: amd_pci_unplug Tainted: G W 6.14.0+ #2 [ +0.000009] Tainted: [W]=WARN [ +0.000003] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000003] dump_stack_lvl+0x76/0xa0 [ +0.000011] print_report+0xce/0x600 [ +0.000009] ? srso_return_thunk+0x5/0x5f [ +0.000006] ? kasan_complete_mode_report_info+0x76/0x200 [ +0.000007] ? kasan_addr_to_slab+0xd/0xb0 [ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000707] kasan_report+0xbe/0x110 [ +0.000006] ? amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000541] __asan_report_load8_noabort+0x14/0x30 [ +0.000005] amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu] [ +0.000535] ? stop_cpsch+0x396/0x600 [amdgpu] [ +0.000556] ? stop_cpsch+0x429/0x600 [amdgpu] [ +0.000536] ? __pfx_amdgpu_userq_suspend+0x10/0x10 [amdgpu] [ +0.000536] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? kgd2kfd_suspend+0x132/0x1d0 [amdgpu] [ +0.000542] amdgpu_device_fini_hw+0x581/0xe90 [amdgpu] [ +0.000485] ? down_write+0xbb/0x140 [ +0.000007] ? __mutex_unlock_slowpath.constprop.0+0x317/0x360 [ +0.000005] ? __pfx_amdgpu_device_fini_hw+0x10/0x10 [amdgpu] [ +0.000482] ? __kasan_check_write+0x14/0x30 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? up_write+0x55/0xb0 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? blocking_notifier_chain_unregister+0x6c/0xc0 [ +0.000008] amdgpu_driver_unload_kms+0x69/0x90 [amdgpu] [ +0.000484] amdgpu_pci_remove+0x93/0x130 [amdgpu] [ +0.000482] pci_device_remove+0xae/0x1e0 [ +0.000008] device_remove+0xc7/0x180 [ +0.000008] device_release_driver_internal+0x3d4/0x5a0 [ +0.000007] device_release_driver+0x12/0x20 [ +0.000004] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000005] remove_store+0xd7/0xf0 [ +0.000005] ? __pfx_remove_store+0x10/0x10 [ +0.000006] ? __pfx__copy_from_iter+0x10/0x10 [ +0.000006] ? __pfx_dev_attr_store+0x10/0x10 [ +0.000006] dev_attr_store+0x3f/0x80 [ +0.000006] sysfs_kf_write+0x125/0x1d0 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000005] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000005] ? rw_verify_area+0x70/0x420 [ +0.000005] ? __pfx_kernfs_fop_write_iter+0x10/0x10 [ +0.000006] vfs_write+0x90d/0xe70 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000005] ? __pfx_vfs_write+0x10/0x10 [ +0.000004] ? local_clock+0x15/0x30 [ +0.000008] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_slab_free+0x5f/0x80 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fdget_pos+0x1d3/0x500 [ +0.000007] ksys_write+0x119/0x220 [ +0.000005] ? putname+0x1c/0x30 [ +0.000006] ? __pfx_ksys_write+0x10/0x10 [ +0.000007] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __pfx___x64_sys_openat+0x10/0x10 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? do_syscall_64+0x88/0x170 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? irqentry_exit+0x43/0x50 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? exc_page_fault+0x7c/0x110 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000006] RIP: 0033:0x7480c0b14887 [ +0.000005] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ +0.000005] RSP: 002b:00007fff142b0058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007480c0b14887 [ +0.000003] RDX: 0000000000000001 RSI: 00007480c0e7365a RDI: 0000000000000004 [ +0.000003] RBP: 00007fff142b0080 R08: 0000563b2e73c170 R09: 0000000000000000 [ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff142b02f8 [ +0.000003] R13: 0000563b159a72a9 R14: 0000563b159a9d48 R15: 00007480c0f19040 [ +0.000008] </TASK> [ +0.000445] Allocated by task 427 on cpu 5 at 29.342331s: [ +0.000011] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000006] kasan_save_alloc_info+0x38/0x60 [ +0.000005] __kasan_kmalloc+0xc1/0xd0 [ +0.000006] __kmalloc_cache_noprof+0x1bd/0x430 [ +0.000007] amdgpu_driver_open_kms+0x172/0x760 [amdgpu] [ +0.000493] drm_file_alloc+0x569/0x9a0 [ +0.000007] drm_client_init+0x1b7/0x410 [ +0.000007] drm_fbdev_client_setup+0x174/0x470 [ +0.000006] drm_client_setup+0x8a/0xf0 [ +0.000006] amdgpu_pci_probe+0x510/0x10c0 [amdgpu] [ +0.000483] local_pci_probe+0xe7/0x1b0 [ +0.000006] pci_device_probe+0x5bf/0x890 [ +0.000006] really_probe+0x1fd/0x950 [ +0.000005] __driver_probe_device+0x307/0x410 [ +0.000006] driver_probe_device+0x4e/0x150 [ +0.000005] __driver_attach+0x223/0x510 [ +0.000006] bus_for_each_dev+0x102/0x1a0 [ +0.000005] driver_attach+0x3d/0x60 [ +0.000006] bus_add_driver+0x309/0x650 [ +0.000005] driver_register+0x13d/0x490 [ +0.000006] __pci_register_driver+0x1ee/0x2b0 [ +0.000006] rfcomm_dlc_clear_state+0x69/0x220 [rfcomm] [ +0.000011] do_one_initcall+0x9c/0x3e0 [ +0.000007] do_init_module+0x29e/0x7f0 [ +0.000006] load_module+0x5c75/0x7c80 [ +0.000006] init_module_from_file+0x106/0x180 [ +0.000006] idempotent_init_module+0x377/0x740 [ +0.000006] __x64_sys_finit_module+0xd7/0x180 [ +0.000006] x64_sys_call+0x1f0b/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000013] Freed by task 1733 on cpu 5 at 59.907086s: [ +0.000011] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000005] kasan_save_free_info+0x3b/0x60 [ +0.000005] __kasan_slab_free+0x54/0x80 [ +0.000006] kfree+0x127/0x470 [ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu] [ +0.000493] drm_file_free.part.0+0x5b1/0xba0 [ +0.000006] drm_file_free+0x13/0x30 [ +0.000006] drm_client_release+0x1c4/0x2b0 [ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper] [ +0.000007] put_fb_info+0x97/0xe0 [ +0.000007] unregister_framebuffer+0x197/0x380 [ +0.000005] drm_fb_helper_unregister_info+0x94/0x100 [ +0.000005] drm_fbdev_client_unregister+0x3c/0x80 [ +0.000007] drm_client_dev_unregister+0x144/0x330 [ +0.000006] drm_dev_unregister+0x49/0x1b0 [ +0.000006] drm_dev_unplug+0x4c/0xd0 [ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu] [ +0.000484] pci_device_remove+0xae/0x1e0 [ +0.000008] device_remove+0xc7/0x180 [ +0.000007] device_release_driver_internal+0x3d4/0x5a0 [ +0.000006] device_release_driver+0x12/0x20 [ +0.000007] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000006] remove_store+0xd7/0xf0 [ +0.000006] dev_attr_store+0x3f/0x80 [ +0.000005] sysfs_kf_write+0x125/0x1d0 [ +0.000006] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000006] vfs_write+0x90d/0xe70 [ +0.000006] ksys_write+0x119/0x220 [ +0.000006] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000005] do_syscall_64+0x7c/0x170 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000012] The buggy address belongs to the object at ffff88812eec8000 which belongs to the cache kmalloc-rnd-07-4k of size 4096 [ +0.000016] The buggy address is located 3160 bytes inside of freed 4096-byte region [ffff88812eec8000, ffff88812eec9000) [ +0.000023] The buggy address belongs to the physical page: [ +0.000009] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12eec8 [ +0.000007] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ +0.000005] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: f5(slab) [ +0.000008] raw: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000 [ +0.000005] raw: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000040 ffff888100054500 dead000000000122 0000000000000000 [ +0.000005] head: 0000000000000000 0000000080040004 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000003 ffffea0004bbb201 ffffffffffffffff 0000000000000000 [ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 [ +0.000005] page dumped because: kasan: bad access detected [ +0.000010] Memory state around the buggy address: [ +0.000009] ffff88812eec8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff88812eec8b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] >ffff88812eec8c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ^ [ +0.000010] ffff88812eec8c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ffff88812eec8d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ================================================================== The use-after-free occurs because a delayed work item (`suspend_work`) may still be pending or running when resources it accesses are freed during device removal or file close. The previous code used `flush_work(&fpriv->evf_mgr.suspend_work.work)`, which does not wait for delayed work that has not yet started. As a result, the delayed work could run after its memory was freed, causing a use-after-free. By switching to `flush_delayed_work(&fpriv->evf_mgr.suspend_work)`, we ensure that the kernel waits for both queued and delayed work to finish before freeing memory, closing this race. Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error") Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07Revert "drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini"Vitaly Prosyak
This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b. The original patch moved `amdgpu_userq_mgr_fini()` to the driver's `postclose` callback, which is called after `drm_gem_release()` in the DRM file cleanup sequence.If a user application crashes or aborts without cleaning up its user queues, 'drm_gem_release()` may free GEM objects that are still referenced by active user queues, leading to use-after-free. By reverting, we ensure that user queues are disabled and cleaned up before any GEM objects are released, preventing this class of bug. However, this reintroduces a race during PCI hot-unplug, where device removal can race with per-file cleanup, leading to use-after-free in suspend/unplug paths. This will be fixed in the next patch. Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c") Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07drm/amdgpu/sdma: allow caller to handle kernel rings in engine resetAlex Deucher
Add a parameter to amdgpu_sdma_reset_engine() to let the caller handle the kernel rings. This allows the kernel rings to back up their unprocessed state if the reset comes in via the drm scheduler rather than KFD. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07drm/amdgpu/sdma: consolidate engine reset handlingAlex Deucher
Move the force completion handling into the common engine reset function. No need to duplicate it for every IP version. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07drm/amdgpu: Add a noverbose flag to psp_wait_forLijo Lazar
For extended wait with retries on a PSP register value, add a noverbose flag to avoid excessive error messages on each timeout. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07drm/amdgpu/gfx10: fix KGQ reset sequenceAlex Deucher
Need to reinit the ring before remapping it and all of the KIQ handling needs to be within the kiq lock. Fixes: 1741281a157f ("drm/amdgpu/gfx10: add ring reset callbacks") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07drm/amdgpu: Pass adev pointer to functionsLijo Lazar
Pass amdgpu device context instead of drm device context to some amdgpu_device_* functions. DRM device context is not required in those functions. No functional change. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-04drm/amdgpu: add support of debugfs for mqd informationSunil Khatri
Add debugfs support for mqd for each queue of the client. The address exposed to debugfs could be used to dump the mqd. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250704075548.1549849-5-sunil.khatri@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
2025-07-04drm/amdgpu: add debugfs support for VM pagetable per clientSunil Khatri
Add a debugfs file under the client directory which shares the root page table base address of the VM. This address could be used to dump the pagetable for debug memory issues. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250704075548.1549849-4-sunil.khatri@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
2025-07-04Merge tag 'amd-drm-next-6.17-2025-07-01' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-01: amdgpu: - FAMS2 fixes - OLED fixes - Misc cleanups - AUX fixes - DMCUB updates - SR-IOV hibernation support - RAS updates - DP tunneling fixes - DML2 fixes - Backlight improvements - Suspend improvements - Use scaling for non-native modes on eDP - SDMA 4.4.x fixes - PCIe DPM fixes - SDMA 5.x fixes - Cleaner shader updates for GC 9.x - Remove fence slab - ISP genpd support - Parition handling rework - SDMA FW checks for userq support - Add missing firmware declaration - Fix leak in amdgpu_ctx_mgr_entity_fini() - Freesync fix - Ring reset refactoring - Legacy dpm verbosity changes amdkfd: - GWS fix - mtype fix for ext coherent system memory - MMU notifier fix - gfx7/8 fix radeon: - CS validation support for additional GL extensions - Bump driver version for new CS validation checks From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250701194707.32905-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-06-30drm/amdkfd: add hqd_sdma_get_doorbell callbacks for gfx7/8Alex Deucher
These were missed when support was added for other generations. The callbacks are called unconditionally so we need to make sure all generations have them. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4304 Link: https://github.com/ROCm/ROCm/issues/4965 Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Cc: Jonathan Kim <jonathan.kim@amd.com> Reported-by: Johl Brown <johlbrown@gmail.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1e9d17a5dcf1242e9518e461d8e63ad35240e49e) Cc: stable@vger.kernel.org
2025-06-30drm/amdgpu: Fix memory leak in amdgpu_ctx_mgr_entity_finiLin.Cao
patch dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") removed ctx put, which will cause amdgpu_ctx_fini() cannot be called and then cause some finished fence that added by amdgpu_ctx_add_fence() cannot be released and cause memleak. Fixes: dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8cf66089e28108dedd47e6156a48489303cf525c) Cc: stable@vger.kernel.org
2025-06-30drm/amdgpu: Include sdma_4_4_4.binKent Russell
This got missed during SDMA 4.4.4 support. Fixes: 968e3811c3e8 ("drm/amdgpu: add initial support for sdma444") Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51526efe02714339ed6139f7bc348377d363200a) Cc: stable@vger.kernel.org
2025-06-30drm/amdgpu/sdma5.x: suspend KFD queues in ring resetAlex Deucher
SDMA 5.x only supports engine soft reset which resets all queues on the engine. As such, we need to suspend KFD queues around resets like we do for SDMA 4.x. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 61feed0baa1a0d094af0e07e968b1e6e875f07d0)
2025-06-30drm/amdgpu/sdma6: add more ucode version checks for userq supportAlex Deucher
Fill in the SDMA ucode version checks for more SDMA 6.x parts. v2: squash in fixes (Alex) Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Remove useless timeout error messageYiPeng Chai
The timeout is only used to interrupt polling and not need to print a error message. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix code style issueCe Sun
cocci warnings: (new ones prefixed by >>) >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6088:8-9: Unneeded variable: "r". Return "0" on line 6141 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506281925.HHIpXiO7-lkp@intel.com/ Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: refine ras error injection when eeprom initialization failedganglxie
when eeprom initialization failed, we still support ras error injection, and reserve bad pages, but do not save bad pages to eeprom Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>