linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-05-16	drm/amdkfd: Support chain runlists of XNACK+/XNACK-	Amber Lin
	If the MEC firmware supports chaining runlists of XNACK+/XNACK- processes, set SQ_CONFIG1 chicken bit and SET_RESOURCES bit 28. When the MEC/HWS supports it, KFD checks the XNACK+/XNACK- processes mix happens or not. If it does, enter over-subscription. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16	drm/radeon/cik: Clean up doorbells	Dr. David Alan Gilbert
	Free doorbells in the error paths of cik_init and in cik_fini. Build tested only. Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16	Merge tag 'drm-intel-next-fixes-2025-05-15' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/i915/kernel into drm-next - Stop writing ALPM registers when PSR is enabled - Use the correct connector while computing the link BPP limit on MST Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://lore.kernel.org/r/aCWlWk5rTE7TH1pN@jlahtine-mobl
2025-05-16	Merge tag 'mediatek-drm-next-20250515' of ↵	Dave Airlie
	https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-next Mediatek DRM Next - 20250515 1. Prepare for support MT8195/88 HDMIv2 and DDCv2 2. DPI: Cleanups and add support for more formats 3. Cleanups and sanitization 4. Replace custom compare_dev with component_compare_of Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://lore.kernel.org/r/20250514233647.15907-1-chunkuang.hu@kernel.org
2025-05-15	gpu: drm: nova: select AUXILIARY_BUS instead of depending on it	Alexandre Courbot
	CONFIG_AUXILIARY_BUS cannot be enabled explicitly, and unless we select it we have no way to include it (and thus to enable NOVA_DRM) unless another driver happens to do it for us. Fixes: cdeaeb9dd762 ("drm: nova-drm: add initial driver skeleton") Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://lore.kernel.org/r/20250515-aux_bus-v2-3-47c70f96ae9b@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-05-15	gpu: nova-core: select AUXILIARY_BUS instead of depending on it	Alexandre Courbot
	CONFIG_AUXILIARY_BUS cannot be enabled explicitly, and unless we select it we have no way to include it (and thus to enable NOVA_CORE) unless another driver happens to do it for us. Fixes: e041d81a0377 ("gpu: nova-core: register auxiliary device for nova-drm") Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://lore.kernel.org/r/20250515-aux_bus-v2-2-47c70f96ae9b@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-05-15	drm/i915/pci: Remove force_probe requirement for DG1	Ville Syrjälä
	Dunno why we still have .require_force_probe=1 on DG1 after all this time. I'm not aware of any real problems with DG1, so get rid of the force_probe requirement. Generally the difficulty with DG1 is that it requires a 4GiB BAR for the local memory, and that's not something that works on every system. Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20250411144313.11660-3-ville.syrjala@linux.intel.com
2025-05-15	drm/i915/gem: Allow EXEC_CAPTURE on recoverable contexts on DG1	Ville Syrjälä
	The intel-media-driver is currently broken on DG1 because it uses EXEC_CAPTURE with recovarable contexts. Relax the check to allow that. I've also submitted a fix for the intel-media-driver: https://github.com/intel/media-driver/pull/1920 Cc: stable@vger.kernel.org # v6.0+ Cc: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Testcase: igt/gem_exec_capture/capture-invisible Fixes: 71b1669ea9bd ("drm/i915/uapi: tweak error capture on recoverable contexts") Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20250411144313.11660-2-ville.syrjala@linux.intel.com
2025-05-15	Merge tag 'drm-misc-next-2025-05-12' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.16-rc1: Once more, with async flips. UAPI Changes: - Add IN_FORMATS_ASYNC property, use in i915. Cross-subsystem Changes: - Remove some unused debug code in dma-buf. Core Changes: Driver Changes: - Add Novatek NT37801 panel. - Allow submitting empty commands in amdxdna. - Convert cirrus to use managed request_all_regions. - Move Sitronix from tiny to their own place. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/23ded62c-6a62-4195-9c08-4dfb81eafd72@linux.intel.com
2025-05-14	drm/mediatek: Replace custom compare_dev with component_compare_of	Tang Dongxing
	Remove the custom device comparison function compare_dev and replace it with the existing kernel helper component_compare_of Signed-off-by: Tang Dongxing <tang.dongxing@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250403155419406T5YhIJKId1FWor70EWWHG@zte.com.cn/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-05-14	drm/mediatek: mtk_drm_drv: Unbind secondary mmsys components on err	AngeloGioacchino Del Regno
	When calling component_bind_all(), if a component that is included in the list fails, all of those that have been successfully bound will be unbound, but this driver has two components lists for two actual devices, as in, each mmsys instance has its own components list. In case mmsys0 (or actually vdosys0) is able to bind all of its components, but the secondary one fails, all of the components of the first are kept bound, while the ones of mmsys1/vdosys1 are correctly cleaned up. This is not right because, in case of a failure, the components are re-bound for all of the mmsys/vdosys instances without caring about the ones that were previously left in a bound state. Fix that by calling component_unbind_all() on all of the previous component masters that succeeded binding all subdevices when any of the other masters errors out. Fixes: 1ef7ed48356c ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250403104741.71045-4-angelogioacchino.delregno@collabora.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-05-14	drm/mediatek: Fix kobject put for component sub-drivers	AngeloGioacchino Del Regno
	In function mtk_drm_get_all_drm_priv(), this driver is incrementing the refcount for the sub-drivers of mediatek-drm with a call to device_find_child() when taking a reference to all of those child devices. When the component bind fails multiple times this results in a refcount_t overflow, as the reference count is never decremented: fix that by adding a call to put_device() for all of the mmsys devices in a loop, in error cases of mtk_drm_bind() and in the mtk_drm_unbind() callback. Fixes: 1ef7ed48356c ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250403104741.71045-3-angelogioacchino.delregno@collabora.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-05-14	drm/mediatek: mtk_drm_drv: Fix kobject put for mtk_mutex device ptr	AngeloGioacchino Del Regno
	This driver is taking a kobject for mtk_mutex only once per mmsys device for each drm-mediatek driver instance, differently from the behavior with other components, but it is decrementing the kobj's refcount in a loop and once per mmsys: this is not right and will result in a refcount_t underflow warning when mediatek-drm returns multiple probe deferrals in one boot (or when manually bound and unbound). Besides that, the refcount for mutex_dev was not decremented for error cases in mtk_drm_bind(), causing another refcount_t warning but this time for overflow, when the failure happens not during driver bind but during component bind. In order to fix one of the reasons why this is happening, remove the put_device(xx->mutex_dev) loop from the mtk_drm_kms_init()'s put_mutex_dev label (and drop the label) and add a single call to correctly free the single incremented refcount of mutex_dev to the mtk_drm_unbind() function to fix the refcount_t underflow. Moreover, add the same call to the error cases in mtk_drm_bind() to fix the refcount_t overflow. Fixes: 1ef7ed48356c ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250403104741.71045-2-angelogioacchino.delregno@collabora.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
2025-05-14	drm/amdgpu: add debugfs for spirom IFWI dump	Shiwu Zhang
	Expose the debugfs file node for user space to dump the IFWI image on spirom. For one transaction between PSP and host, it will read out the images on both active and inactive partitions so a buffer with two times the size of maximum IFWI image (currently 16MByte) is needed. v2: move the vbios gfl macros to the common header and rename the bo triplet struct to spirom_bo for this specific usage (Hawking) v3: return directly the result of last command execution (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14	drm/amdgpu: fix userq resource double freed	Prike Liang
	As the userq resource was already freed at the drm_release early phase, it should avoid freeing userq resource again at the later kms postclose callback. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14	drm/amdgpu: Fix circular locking in userq creation	Jesse.Zhang
	A circular locking dependency was detected between the global `adev->userq_mutex` and per-file `userq_mgr->userq_mutex` when creating user queues. The issue occurs because: 1. `amdgpu_userq_suspend()` and `amdgpu_userq_resume` take `adev->userq_mutex` first, then `userq_mgr->userq_mutex` 2. While `amdgpu_userq_create()` takes them in reverse order This patch resolves the issue by: 1. Moving the `adev->userq_mutex` lock earlier in `amdgpu_userq_create()` to cover the `amdgpu_userq_ensure_ev_fence()` call 2. Releasing it after we're done with both queue creation and the scheduling halt check v2: remove unused adev->userq_mutex lock (Prike) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14	drm/amdgpu: read back register after written for VCN v4.0.5	David (Ming Qiang) Wu
	On VCN v4.0.5 there is a race condition where the WPTR is not updated after starting from idle when doorbell is used. Adding register read-back after written at function end is to ensure all register writes are done before they can be used. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14	Revert "drm/amd/display: Hardware cursor changes color when switched to ↵	Melissa Wen
	software cursor" This reverts commit 272e6aab14bbf98d7a06b2b1cd6308a02d4a10a1. Applying degamma curve to the cursor by default breaks Linux userspace expectation. On Linux, AMD display manager enables cursor degamma ROM just for implict sRGB on HW versions where degamma is split into two blocks: degamma ROM for pre-defined TFs and `gamma correction` for user/custom curves, and degamma ROM settings doesn't apply to cursor plane. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1513 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803 Reported-by: Michel Dänzer <michel.daenzer@mailbox.org> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4144 Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14	drm/i915/alpm: Stop writing ALPM registers when PSR is enabled	Jouni Högander
	Currently we are seeing these on PTL: xe 0000:00:02.0: [drm] ERROR Timeout waiting for DDI BUF A to get active These seem to be caused by writing ALPM registers while Panel Replay is enabled. Fix this by writing ALPM registers only when Panel Replay is about to be enabled. v4: improve comment on intel_psr_panel_replay_enable_sink call v3: enable/disable ALPM from PSR code Fixes: 172757acd6f6 ("drm/i915/lobf: Add lobf enablement in post plane update") Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://lore.kernel.org/r/20250513054814.3702977-3-jouni.hogander@intel.com (cherry picked from commit a8eb102ce0944a9de2a62aa9d195861b7f26668a) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-05-14	drm/i915/alpm: Make intel_alpm_enable_sink available for PSR	Jouni Högander
	We want to enable sink ALPM from PSR code. Make intel_alpm_enable_sink available for PSR. v2: do not add kerneldoc comments Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Link: https://lore.kernel.org/r/20250513054814.3702977-2-jouni.hogander@intel.com (cherry picked from commit 2d278488761f0b5be651a3db41e615a964123d6c) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-05-13	drm/amdgpu/userq: Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock)	Arunpravin Paneer Selvam
	Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock) warning logs. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: Fix userq ttm_bo_pin and ttm_bo_unpin lockdep warnings	Arunpravin Paneer Selvam
	The ttm_bo_pin and ttm_bo_unpin warnings are resolved by moving the doorbell bo reserve up before pin/unpin. WARNING: CPU: 11 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:592 ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000277] CPU: 11 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000005] RSP: 0018:ffff88846ca879d0 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000004] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000005] RBP: ffff88846ca879e8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810b7ca848 [ +0.000004] R13: ffff88846c666250 R14: 1ffff1108d950f44 R15: ffff88846ca87aa0 [ +0.000005] FS: 00007c45ff436d00(0000) GS:ffff888409580000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000005] CR2: 00005b0c142a60e0 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000004] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000007] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000031] ? report_bug+0x282/0x2f0 [ +0.000012] ? handle_bug+0x6e/0xc0 [ +0.000007] ? exc_invalid_op+0x18/0x50 [ +0.000007] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000014] amdgpu_bo_pin+0x365/0x9d0 [amdgpu] [ +0.000191] ? __pfx_amdgpu_bo_pin+0x10/0x10 [amdgpu] [ +0.000185] ? drm_gem_object_lookup+0x81/0xc0 [ +0.000008] ? kasan_save_alloc_info+0x37/0x60 [ +0.000007] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000013] amdgpu_userqueue_get_doorbell_index+0xee/0x5f0 [amdgpu] [ +0.000209] amdgpu_userq_ioctl+0x6b4/0xd40 [amdgpu] [ +0.000193] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000190] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000011] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000194] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000021] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000185] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit+0x77/0xb0 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000009] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7167810 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000004] RDX: 00007ffff7167870 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff7167860 R08: ffff800100000000 R09: 0000000000010000 [ +0.000005] R10: 00007ffff7167950 R11: 0000000000000246 R12: 00005b0c2a51bc48 [ +0.000004] R13: 000000000000000b R14: 0000000000000000 R15: 00007ffff7167950 [ +0.000022] </TASK> [ +0.000004] irq event stamp: 80693 [ +0.000004] hardirqs last enabled at (80699): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (80704): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (80390): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (80385): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] ---[ end trace 0000000000000000 ]--- ------------------------------------------------------------------------------------------------------ [ +0.000006] WARNING: CPU: 10 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:611 ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000280] CPU: 10 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000005] RSP: 0018:ffff88846ca87888 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000004] RBP: ffff88846ca878a0 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164e90050 [ +0.000005] R13: ffff88846c666200 R14: 0000000000000001 R15: ffff888168402d28 [ +0.000004] FS: 00007c45ff436d00(0000) GS:ffff888409500000(0000) knlGS:0000000000000000 [ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000004] CR2: 00007c45f7373b20 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000005] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000008] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000012] ? report_bug+0x282/0x2f0 [ +0.000013] ? handle_bug+0x6e/0xc0 [ +0.000006] ? exc_invalid_op+0x18/0x50 [ +0.000008] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000011] ? ttm_bo_unpin+0x217/0x2c0 [ttm] [ +0.000011] amdgpu_bo_unpin+0x45/0x250 [amdgpu] [ +0.000216] amdgpu_userq_ioctl+0x2c3/0xd40 [amdgpu] [ +0.000226] ? drm_dev_exit+0x2d/0x60 [ +0.000010] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000201] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000188] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000006] ? lock_acquire+0x7c/0xc0 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000006] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000020] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000186] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? __pfx___rseq_handle_notify_resume+0x10/0x10 [ +0.000005] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 [ +0.000013] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? syscall_exit_to_user_mode+0x95/0x260 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000011] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit_to_user_mode+0x8b/0x260 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? irqentry_exit+0x77/0xb0 [ +0.000004] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000010] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7168790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000005] RDX: 00007ffff71687f0 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff71687e0 R08: 00005b0c2a49b010 R09: 0000000000000007 [ +0.000004] R10: 00005b0c2a4d7140 R11: 0000000000000246 R12: 000000000000000b [ +0.000004] R13: 00007c45ff19e5cc R14: 00005b0c2a51c538 R15: 00005b0c2a51bbd8 [ +0.000022] </TASK> [ +0.000005] irq event stamp: 87419 [ +0.000004] hardirqs last enabled at (87425): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (87430): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (87058): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] softirqs last disabled at (87053): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] ---[ end trace 0000000000000000 ]--- Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu/userq: Fix lock contention in userq fence	Arunpravin Paneer Selvam
	Fix lockdep warnings. [ +0.000637] ================================ [ +0.000004] WARNING: inconsistent lock state [ +0.000004] 6.12.0+ #18 Tainted: G W OE [ +0.000004] -------------------------------- [ +0.000004] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ +0.000004] Xwayland/1952 [HC0[0]:SC0[0]:HE1:SE1] takes: [ +0.000005] ffff8884636f4740 (&fence_drv->fence_list_lock){?...}-{2:2}, at: amdgpu_userq_fence_driver_destroy+0xb8/0x540 [amdgpu] [ +0.000208] {IN-HARDIRQ-W} state was registered at: [ +0.000004] lock_acquire.part.0+0x116/0x360 [ +0.000005] lock_acquire+0x7c/0xc0 [ +0.000005] _raw_spin_lock+0x2f/0x60 [ +0.000005] amdgpu_userq_fence_driver_process+0x75/0x400 [amdgpu] [ +0.000185] gfx_v12_0_eop_irq+0x29f/0x420 [amdgpu] [ +0.000210] amdgpu_irq_dispatch+0x2a4/0x7b0 [amdgpu] [ +0.000191] amdgpu_ih_process+0x1e1/0x3d0 [amdgpu] [ +0.000185] amdgpu_irq_handler+0x28/0xc0 [amdgpu] [ +0.000183] __handle_irq_event_percpu+0x1bb/0x590 [ +0.000005] handle_irq_event+0xab/0x1d0 [ +0.000005] handle_edge_irq+0x1fd/0xc10 [ +0.000005] __common_interrupt+0x83/0x190 [ +0.000004] common_interrupt+0xb1/0xe0 [ +0.000005] asm_common_interrupt+0x27/0x40 [ +0.000004] cpuidle_enter_state+0x2ba/0x530 [ +0.000005] cpuidle_enter+0x4f/0xb0 [ +0.000006] call_cpuidle+0x46/0xd0 [ +0.000005] do_idle+0x367/0x430 [ +0.000004] cpu_startup_entry+0x58/0x70 [ +0.000005] start_secondary+0x224/0x2b0 [ +0.000005] common_startup_64+0x13e/0x141 [ +0.000005] irq event stamp: 88271 [ +0.000004] hardirqs last enabled at (88271): [<ffffffffad9ca7a1>] _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000005] hardirqs last disabled at (88270): [<ffffffffad9ca424>] _raw_spin_lock_irqsave+0x74/0x80 [ +0.000005] softirqs last enabled at (87858): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (87849): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] other info that might help us debug this: [ +0.000004] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000004] ---- [ +0.000003] lock(&fence_drv->fence_list_lock); v2: Drop fence_list_flags and use xa_lock_irqsave() flags parameter (Christian) Fix merge conflicts. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: Avoid flooding unnecessary info messages	Wayne Lin
	It's expected that we'll encounter temporary exceptions during aux transactions. Adjust logging from drm_info to drm_dbg_dp to prevent flooding with unnecessary log messages. Fixes: 3637e457eb00 ("drm/amd/display: Fix wrong handling for AUX_DEFER case") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250513032026.838036-1-Wayne.Lin@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/i915/dp_mst: Use the correct connector while computing the link BPP ↵	Imre Deak
	limit on MST Atm, on an MST link in DSC mode intel_dp_compute_config_link_bpp_limits() calculates the maximum link bpp limit using the MST root connector's DSC capabilities. That's not correct in general: the decompression could be performed by a branch device downstream of the root branch device or the sink itself. Fix the above by passing to intel_dp_compute_config_link_bpp_limits() the actual connector being modeset, containing the correct DSC capabilities. Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Fixes: 1c5b72daff46 ("drm/i915/dp: Set the DSC link limits in intel_dp_compute_config_link_bpp_limits") Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Reviewed-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/20250509180340.554867-2-imre.deak@intel.com (cherry picked from commit 266e2fcfe2ea0d062ea392cd22f6250ae0d11c04) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-05-13	drm/amdgpu: Add GFX 9.5.0 support for per-queue/pipe reset	Jesse.Zhang
	This patch enables per-queue and per-pipe reset functionality for GFX IP v9.5.0 when using MEC firmware version 21 (0x15) or later. This change: 1. Refactors the pipe reset support check in gfx_v9_4_3_pipe_reset_support() to use the compute_supported_reset flags instead of hardcoding version checks. 2. Adds support for GFX9.5.0 (IP 9.5.0) with MEC firmware version >= 21 to enable per-queue and per-pipe reset capabilities. v2: Replaced mec version check with !!(adev->gfx.compute_supported_reset & AMDGPU_RESET_TYPE_PER_PIPE) (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: set vram type for GC 9.5.0	Tao Zhou
	Set vram type so we can take different actions according to the type. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: set flip bits for RAS bad pages	Tao Zhou
	Make the code more general, user doesn't need to pay attention to the detail of flip bits setting. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display/dc/irq: Remove duplications of hpd_ack function from IRQ	Sebastian Aguilera Novoa
	The major of dcn and dce irqs share a copy-pasted collection of copy-pasted function, which is: hpd_ack. This patch removes the multiple copy-pasted by moving them to the irq_service.c and make the irq_service's calls the functions implemented by the irq_service.c instead. The hpd_ack function is replaced by hpd0_ack and hpd1_ack, the required constants are also added. The changes were not tested on actual hardware. I am only able to verify that the changes keep the code compileable and do my best to look repeatedly if I am not actually changing any code. Signed-off-by: Sebastian Aguilera Novoa <saguileran@ime.usp.br> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp	Melissa Wen
	Similar to commit 6a057072ddd1 ("drm/amd/display: Fix null check for pipe_ctx->plane_state in dcn20_program_pipe") that addresses a null pointer dereference on dcn20_update_dchubp_dpp. This is the same function hooked for update_dchubp_dpp in dcn401, with the same issue. Fix possible null pointer deference on dcn401_program_pipe too. Fixes: 63ab80d9ac0a ("drm/amd/display: DML2.1 Post-Si Cleanup") Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: Modify the count method of defer error	Ce Sun
	The number of newly added de counts and the number of newly added error addresses remain consistent Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdkfd: drop warning in event_interrupt_isr_v1*()	Alex Deucher
	Commit ded8b3c36f17 ("drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()") enables all 16 vmids for MMHUB on GC 10 and newer for KGD since there are no KFD resources using MMHUB. With this change, KFD starts seeing MMHUB vmids in it's range with no pasid set. As such there is no need to warn, we can just ignore those interrupts. Fixes: aded8b3c36f1 ("drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()") Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: Log RAS errors during load	Lijo Lazar
	During driver load, RAS event manager may not be initialized. This will cause any ATHUB event during driver load to be skipped in dmesg log. Log the error in dmesg log for easier diagnosis. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: Fix user queue deadlock by reordering mutex locking	Jesse.Zhang
	This resolves a deadlock between user queue management and GPU reset paths by enforcing consistent lock ordering. The deadlock occurred when: 1. Process exit path (amdgpu_userq_mgr_fini) would: - Take uqm->userq_mutex - Then try to take adev->userq_mutex for list operations 2. GPU reset path (amdgpu_userq_pre_reset) would: - Take adev->userq_mutex first (for list traversal) - Then take uqm->userq_mutex The solution establishes a strict top-down locking order: 1. Always take adev->userq_mutex before any uqm->userq_mutex 2. Maintain this order consistently across all code paths Changes made: - Reordered locking in amdgpu_userq_mgr_fini() to take device lock first - Kept existing proper order in amdgpu_userq_pre_reset() - Simplified the fini flow by removing redundant operations This prevents circular dependencies while maintaining thread safety during both normal operation and GPU reset scenarios. Fixes: 4ce60dbada96 ("drm/amdgpu: store userq_managers in a list in adev") Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: Fix the kernel panic caused by RAS records exceed threshold	Ce Sun
	kernel panic caused by RAS records exceeding the threshold when load driver specifying RMA(bad_page_threshold=128) 1.Fix the warnings caused by disabling the interrupt source before it was enabled 2.Fix kernel panic when xcp sysfs is not initialized,null pointer appears during fini 3.Fix the memory leak caused by the device's early exit due to rma The first reason: [ 2744.246650] ------------[ cut here ]------------ [ 2744.246651] WARNING: CPU: 0 PID: 289 at /tmp/amd.BkfTLqYV/amd/amdgpu/amdgpu_irq.c:635 amdgpu_irq_put.cold+0x42/0x6e [amdgpu] [ 2744.247108] Modules linked in: amdgpu(OE+) amddrm_ttm_helper(OE) amdttm(OE) amdxcp(OE) amddrm_buddy(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay binfmt_misc intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel nls_iso8859_1 kvm rapl isst_if_mbox_pci isst_if_mmio pmt_telemetry pmt_crashlog isst_if_common pmt_class mei_me mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath [ 2744.247167] linear mlx5_ib ib_uverbs ib_core ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul ghash_clmulni_intel mlx5_core sysfillrect sysimgblt aesni_intel mlxfw fb_sys_fops psample cec crypto_simd cryptd rc_core i2c_i801 nvme xhci_pci tls intel_pmt drm pci_hyperv_intf nvme_core i2c_smbus i2c_ismt xhci_pci_renesas wmi pinctrl_emmitsburg [ 2744.247194] CPU: 0 PID: 289 Comm: kworker/0:1 Tainted: G OE 5.15.0-70-generic #77-Ubuntu [ 2744.247197] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C23.AG.2 11/21/2024 [ 2744.247198] Workqueue: events work_for_cpu_fn [ 2744.247206] RIP: 0010:amdgpu_irq_put.cold+0x42/0x6e [amdgpu] [ 2744.247634] Code: 79 7f ff 44 89 ee 48 c7 c7 4d 5a 42 c2 89 55 d4 e8 90 09 bc bf 8b 55 d4 4c 89 e6 4c 89 ff e8 3c 76 7f ff 8b 55 d4 84 c0 75 07 <0f> 0b e9 95 79 7f ff 49 03 5c 24 08 f0 ff 0b 75 13 4c 89 e6 4c 89 [ 2744.247636] RSP: 0018:ffa0000019e27cb0 EFLAGS: 00010246 [ 2744.247639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ff11000150fa87c0 [ 2744.247641] RDX: 0000000000000000 RSI: ffffffffc2222430 RDI: ff1100019f200000 [ 2744.247642] RBP: ffa0000019e27ce0 R08: 0000000000000003 R09: ffffffffffe41a08 [ 2744.247643] R10: 0000000000ffff0a R11: 0000000000000001 R12: ff1100019f22ce60 [ 2744.247644] R13: 0000000000000000 R14: 00000000ffffffea R15: ff1100019f200000 [ 2744.247645] FS: 0000000000000000(0000) GS:ff11007e7e400000(0000) knlGS:0000000000000000 [ 2744.247647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2744.247649] CR2: 00007f3d2002819c CR3: 0000000006810003 CR4: 0000000000771ef0 [ 2744.247650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2744.247651] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 2744.247652] PKRU: 55555554 [ 2744.247653] Call Trace: [ 2744.247654] <TASK> [ 2744.247656] sdma_v4_4_2_hw_fini+0x7a/0xc0 [amdgpu] [ 2744.247997] ? vcn_v4_0_3_hw_fini+0x5f/0xa0 [amdgpu] [ 2744.248336] amdgpu_ip_block_hw_fini+0x31/0x61 [amdgpu] [ 2744.248776] amdgpu_device_fini_hw+0x3bb/0x47b [amdgpu] [ 2744.249197] ? blocking_notifier_chain_unregister+0x56/0xb0 [ 2744.249202] amdgpu_driver_unload_kms+0x51/0x60 [amdgpu] [ 2744.249482] amdgpu_driver_load_kms.cold+0x18/0x2e [amdgpu] [ 2744.249913] amdgpu_pci_probe+0x23e/0x590 [amdgpu] [ 2744.250187] local_pci_probe+0x48/0x90 [ 2744.250191] work_for_cpu_fn+0x17/0x30 [ 2744.250196] process_one_work+0x228/0x3d0 [ 2744.250198] worker_thread+0x223/0x420 [ 2744.250200] ? process_one_work+0x3d0/0x3d0 [ 2744.250201] kthread+0x127/0x150 [ 2744.250204] ? set_kthread_struct+0x50/0x50 [ 2744.250207] ret_from_fork+0x1f/0x30 [ 2744.250212] </TASK> [ 2744.250213] ---[ end trace 488c997a88508bc3 ]--- The second reason: [ 5139.303446] Memory manager not clean during takedown. [ 5139.303509] WARNING: CPU: 145 PID: 117699 at drivers/gpu/drm/drm_mm.c:998 drm_mm_takedown+0x27/0x30 [drm] [ 5139.303542] Modules linked in: amdgpu(OE+) amddrm_ttm_helper(OE) amdttm(OE) amdxcp(OE) amddrm_buddy(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel binfmt_misc kvm nls_iso8859_1 rapl isst_if_mbox_pci pmt_telemetry pmt_crashlog isst_if_mmio pmt_class isst_if_common mei_me mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath [ 5139.303572] linear mlx5_ib ib_uverbs ib_core crct10dif_pclmul ast crc32_pclmul i2c_algo_bit ghash_clmulni_intel aesni_intel crypto_simd drm_vram_helper cryptd drm_ttm_helper mlx5_core ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core mlxfw psample intel_pmt nvme xhci_pci drm tls i2c_i801 pci_hyperv_intf nvme_core i2c_smbus i2c_ismt xhci_pci_renesas wmi pinctrl_emmitsburg [last unloaded: amdkcl] [ 5139.303588] CPU: 145 PID: 117699 Comm: modprobe Tainted: G U OE 5.15.0-70-generic #77-Ubuntu [ 5139.303590] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C23.AG.2 11/21/2024 [ 5139.303591] RIP: 0010:drm_mm_takedown+0x27/0x30 [drm] [ 5139.303605] Code: cc 66 90 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 f8 75 05 c3 cc cc cc cc 55 48 c7 c7 18 d0 10 c0 48 89 e5 e8 5a bc c3 c1 <0f> 0b 5d c3 cc cc cc cc 90 0f 1f 44 00 00 55 b9 15 00 00 00 48 89 [ 5139.303607] RSP: 0018:ffa00000325c3940 EFLAGS: 00010286 [ 5139.303608] RAX: 0000000000000000 RBX: ff1100012f5cecb0 RCX: 0000000000000027 [ 5139.303609] RDX: ff11007e7fa60588 RSI: 0000000000000001 RDI: ff11007e7fa60580 [ 5139.303610] RBP: ffa00000325c3940 R08: 0000000000000003 R09: fffffffff00c2b78 [ 5139.303610] R10: 000000000000002b R11: 0000000000000001 R12: ff1100012f5cec00 [ 5139.303611] R13: ff1100012138f068 R14: 0000000000000000 R15: ff1100012f5cec90 [ 5139.303611] FS: 00007f42ffca0000(0000) GS:ff11007e7fa40000(0000) knlGS:0000000000000000 [ 5139.303612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5139.303613] CR2: 00007f23d945ab68 CR3: 00000001212ce005 CR4: 0000000000771ee0 [ 5139.303614] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5139.303615] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 5139.303615] PKRU: 55555554 [ 5139.303616] Call Trace: [ 5139.303617] <TASK> [ 5139.303619] amdttm_range_man_fini_nocheck+0xfe/0x1c0 [amdttm] [ 5139.303625] amdgpu_ttm_fini+0x2ed/0x390 [amdgpu] [ 5139.303800] amdgpu_bo_fini+0x27/0xc0 [amdgpu] [ 5139.303959] gmc_v9_0_sw_fini+0x63/0x90 [amdgpu] [ 5139.304144] amdgpu_device_fini_sw+0x125/0x6a0 [amdgpu] [ 5139.304302] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] [ 5139.304455] devm_drm_dev_init_release+0x4a/0x80 [drm] [ 5139.304472] devm_action_release+0x12/0x20 [ 5139.304476] release_nodes+0x3d/0xb0 [ 5139.304478] devres_release_all+0x9b/0xd0 [ 5139.304480] really_probe+0x11d/0x420 [ 5139.304483] __driver_probe_device+0x119/0x190 [ 5139.304485] driver_probe_device+0x23/0xc0 [ 5139.304487] __driver_attach+0xf7/0x1f0 [ 5139.304489] ? __device_attach_driver+0x140/0x140 [ 5139.304491] bus_for_each_dev+0x7c/0xd0 [ 5139.304493] driver_attach+0x1e/0x30 [ 5139.304494] bus_add_driver+0x148/0x220 [ 5139.304496] driver_register+0x95/0x100 [ 5139.304498] __pci_register_driver+0x68/0x70 [ 5139.304500] amdgpu_init+0xbc/0x1000 [amdgpu] [ 5139.304655] ? 0xffffffffc0b8f000 [ 5139.304657] do_one_initcall+0x46/0x1e0 [ 5139.304659] ? kmem_cache_alloc_trace+0x19e/0x2e0 [ 5139.304663] do_init_module+0x52/0x260 [ 5139.304665] load_module+0xb2b/0xbc0 [ 5139.304667] __do_sys_finit_module+0xbf/0x120 [ 5139.304669] __x64_sys_finit_module+0x18/0x20 [ 5139.304670] do_syscall_64+0x59/0xc0 [ 5139.304673] ? exit_to_user_mode_prepare+0x37/0xb0 [ 5139.304676] ? syscall_exit_to_user_mode+0x27/0x50 [ 5139.304678] ? __x64_sys_mmap+0x33/0x50 [ 5139.304680] ? do_syscall_64+0x69/0xc0 [ 5139.304681] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 5139.304684] RIP: 0033:0x7f42ffdbf88d [ 5139.304686] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 [ 5139.304687] RSP: 002b:00007ffcb7427158 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 5139.304688] RAX: ffffffffffffffda RBX: 000055ce8b8f3150 RCX: 00007f42ffdbf88d [ 5139.304689] RDX: 0000000000000000 RSI: 000055ce8b8f9a70 RDI: 000000000000000a [ 5139.304690] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000011 [ 5139.304690] R10: 000000000000000a R11: 0000000000000246 R12: 000055ce8b8f9a70 [ 5139.304691] R13: 000055ce8b8f2ec0 R14: 000055ce8b8f2ab0 R15: 000055ce8b8f9aa0 [ 5139.304692] </TASK> [ 5139.304693] ---[ end trace 8536b052f7883003 ]--- Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgu: get RAS retire flip bits for new type of HBM	Tao Zhou
	Get RAS retire flip bits for HBM with different types in various NPS modes. Also set flip row bit and MCA R13 bit in PA in different NPS modes. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: implement get_retire_flip_bits for UMC v12	Tao Zhou
	The RAS bad page retire flip bits can be set per vram type, vram vendor and NPS mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: add get_retire_flip_bits for UMC	Tao Zhou
	Add the general interface to get flip bits for RAS bad page retirement. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: add vcn v5_0_0 ip headers	fanhuang
	Add vcn v5_0_0 register offset and shift masks header files Only include the registers required for MMSCH initialization Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amdgpu: adjust high bits for RAS retired page	Tao Zhou
	Per UMC address conversion algorithm, the high row bits of UMC MCA address are changed when they're converted into normalized address on specific ASICs. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd: add definition for new memory type	Tao Zhou
	Support new version of HBM. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	Refine RAS bad page records counting and parsing in eeprom V3	ganglxie
	there is only MCA records in V3, no need to care about PA records. recalculate the value of ras_num_bad_pages when parsing failed and go on with the left records instead of quit. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: Promote DC to 3.2.333	Taimur Hassan
	Summary * Refactor DMI quirks * Fix link-off issue triggered by quick unplug/replug * Fix race condition when submitting DMUB commands * Correct reply value when AUX Write incomplete * Backup / restore plane config only on update Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: Add early 8b/10b channel equalization test pattern sequence	Michael Strauss
	[WHY] Early EQ pattern sequence is required for some LTTPR + old dongle combinations. [HOW] If DP_EARLY_8B10B_TPS2 chip cap is set, this new sequence programs phy to output TPS2 before initiating link training and writes TPS1 to LTTPR training pattern register as instructed by vendor. Add function to get embedded LTTPR target address offset. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: TungYu Lu <tungyu.lu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: Program triplebuffer on all pipes	Sung Lee
	[WHY] Triplebuffer should be programmed on all pipes. Some code assumed it only needed to be called on top pipe, but as the HWSS function does not account for that, it must be called on every pipe. [HOW] Remove condition to not program triplebuffer on non-top/next pipe. Call the function unconditionally on all pipes. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Sung Lee <Sung.Lee@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: [FW Promotion] Release 0.1.10.0	Taimur Hassan
	Refactoring some IPS and panel replay structs Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: disable EASF narrow filter sharpening	Samson Tam
	[Why & How] Default should be 1 to disable EASF narrow filter sharpening. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: Return the exact value for debugging	Wayne Lin
	[Why] It's unnecessary to set operation_result as invalid reply when p_notify->result != AUX_RET_SUCCESS. [How] Set operation_result as p_notify->result to better understand the reason for the error Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: Restructure DMI quirks	Mario Limonciello
	[Why] DMI quirks are relatively big code that makes amdgpu_dm 200 lines larger. [How] Move DMI quirks into a dedicated source file and make all quirks variables for `struct amdgpu_display_manager`. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13	drm/amd/display: check stream id dml21 wrapper to get plane_id	Aurabindo Pillai
	[Why & How] Fix a false positive warning which occurs due to lack of correct checks when querying plane_id in DML21. This fixes the warning when performing a mode1 reset (cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover): [ 35.751250] WARNING: CPU: 11 PID: 326 at /tmp/amd.PHpyAl7v/amd/amdgpu/../display/dc/dml2/dml2_dc_resource_mgmt.c:91 dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751434] Modules linked in: amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit rfcomm qrtr cmac algif_hash algif_skcipher af_alg bnep amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm_amd snd_hda_core snd_hwdep snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi crct10dif_pclmul polyval_clmulni polyval_generic btusb ghash_clmulni_intel sha256_ssse3 btrtl sha1_ssse3 snd_seq btintel aesni_intel btbcm btmtk snd_seq_device crypto_simd sunrpc cryptd bluetooth snd_timer ccp binfmt_misc rapl snd i2c_piix4 wmi_bmof gigabyte_wmi k10temp i2c_smbus soundcore gpio_amdpt mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul igc ahci xhci_pci libahci xhci_pci_renesas video wmi [ 35.751501] CPU: 11 UID: 0 PID: 326 Comm: kworker/u64:9 Tainted: G OE 6.11.0-21-generic #21~24.04.1-Ubuntu [ 35.751504] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 35.751505] Hardware name: Gigabyte Technology Co., Ltd. X670E AORUS PRO X/X670E AORUS PRO X, BIOS F30 05/22/2024 [ 35.751506] Workqueue: amdgpu-reset-dev amdgpu_debugfs_reset_work [amdgpu] [ 35.751638] RIP: 0010:dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751794] Code: 6d 0c 00 00 8b 84 24 88 00 00 00 41 3b 44 9c 20 0f 84 fc 07 00 00 48 83 c3 01 48 83 fb 06 75 b3 4c 8b 64 24 68 4c 8b 6c 24 40 <0f> 0b b8 06 00 00 00 49 8b 94 24 a0 49 00 00 89 c3 83 f8 07 0f 87 [ 35.751796] RSP: 0018:ffffbfa3805d7680 EFLAGS: 00010246 [ 35.751798] RAX: 0000000000010000 RBX: 0000000000000006 RCX: 0000000000000000 [ 35.751799] RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000 [ 35.751800] RBP: ffffbfa3805d78f0 R08: 0000000000000000 R09: 0000000000000000 [ 35.751801] R10: 0000000000000000 R11: 0000000000000000 R12: ffffbfa383249000 [ 35.751802] R13: ffffa0e68f280000 R14: ffffbfa383249658 R15: 0000000000000000 [ 35.751803] FS: 0000000000000000(0000) GS:ffffa0edbe580000(0000) knlGS:0000000000000000 [ 35.751804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.751805] CR2: 00005d847ef96c58 CR3: 000000041de3e000 CR4: 0000000000f50ef0 [ 35.751806] PKRU: 55555554 [ 35.751807] Call Trace: [ 35.751810] <TASK> [ 35.751816] ? show_regs+0x6c/0x80 [ 35.751820] ? __warn+0x88/0x140 [ 35.751822] ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751964] ? report_bug+0x182/0x1b0 [ 35.751969] ? handle_bug+0x6e/0xb0 [ 35.751972] ? exc_invalid_op+0x18/0x80 [ 35.751974] ? asm_exc_invalid_op+0x1b/0x20 [ 35.751978] ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.752117] ? math_pow+0x48/0xa0 [amdgpu] [ 35.752256] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752260] ? math_pow+0x48/0xa0 [amdgpu] [ 35.752400] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752403] ? math_pow+0x11/0xa0 [amdgpu] [ 35.752524] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752526] ? core_dcn4_mode_programming+0xe4d/0x20d0 [amdgpu] [ 35.752663] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752669] dml21_validate+0x3d4/0x980 [amdgpu] Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>