summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-30drm/amdgpu: move guilty handling into ring resetsAlex Deucher
Move guilty logic into the ring reset callbacks. This allows each ring reset callback to better handle fence errors and force completions in line with the reset behavior for each IP. It also allows us to remove the ring guilty callback since that logic now lives in the reset callback. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: move force completion into ring resetsAlex Deucher
Move the force completion handling into each ring reset function so that each engine can determine whether or not it needs to force completion on the jobs in the ring. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/panthor: Wait for _READY register when powering onSteven Price
panthor_gpu_block_power_on() takes a register offset (rdy_reg) for the purpose of waiting for the power transition to complete. However, a copy/paste error converting to use the new 64 register functions switched it to using the pwrtrans_reg register instead. Fix the function to use the correct register. Fixes: 4d230aa209ed ("drm/panthor: Add 64-bit and poll register accessors") Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20250630140704.432409-1-steven.price@arm.com
2025-06-30drm/amdgpu: rework queue reset scheduler interactionChristian König
Stopping the scheduler for queue reset is generally a good idea because it prevents any worker from touching the ring buffer. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: update ring reset function signatureAlex Deucher
Going forward, we'll need more than just the vmid. Add the guilty amdgpu_fence. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: remove job parameter from amdgpu_fence_emit()Alex Deucher
What we actually care about is the amdgpu_fence object so pass that in explicitly to avoid possible mistakes in the future. The job_run_counter handling can be safely removed at this point as we no longer support job resubmission. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdkfd: add hqd_sdma_get_doorbell callbacks for gfx7/8Alex Deucher
These were missed when support was added for other generations. The callbacks are called unconditionally so we need to make sure all generations have them. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4304 Link: https://github.com/ROCm/ROCm/issues/4965 Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Cc: Jonathan Kim <jonathan.kim@amd.com> Reported-by: Johl Brown <johlbrown@gmail.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix memory leak in amdgpu_ctx_mgr_entity_finiLin.Cao
patch dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") removed ctx put, which will cause amdgpu_ctx_fini() cannot be called and then cause some finished fence that added by amdgpu_ctx_add_fence() cannot be released and cause memleak. Fixes: dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: indent an if statementDan Carpenter
The "return true;" line wasn't indented. Also checkpatch likes when we align the && conditions. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Use dma_buf from GEM object instanceThomas Zimmermann
Avoid dereferencing struct drm_gem_object.import_attach for the imported dma-buf. The dma_buf field in the GEM object instance refers to the same buffer. Prepares to make import_attach an implementation detail of PRIME. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. v2: - keep amdgpu_bo_print_info() as-is (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Convert from DRM_* to dev_*Lijo Lazar
Convert from generic DRM_* to dev_* calls to have device context info. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd/pm: Fetch SMUv13.0.12 xgmi max speed/widthLijo Lazar
On SMU v13.0.12 SOCs, fetch the max values of xgmi speed/width from firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdkfd: Don't call mmput from MMU notifier callbackPhilip Yang
If the process is exiting, the mmput inside mmu notifier callback from compactd or fork or numa balancing could release the last reference of mm struct to call exit_mmap and free_pgtable, this triggers deadlock with below backtrace. The deadlock will leak kfd process as mmu notifier release is not called and cause VRAM leaking. The fix is to take mm reference mmget_non_zero when adding prange to the deferred list to pair with mmput in deferred list work. If prange split and add into pchild list, the pchild work_item.mm is not used, so remove the mm parameter from svm_range_unmap_split and svm_range_add_child. The backtrace of hung task: INFO: task python:348105 blocked for more than 64512 seconds. Call Trace: __schedule+0x1c3/0x550 schedule+0x46/0xb0 rwsem_down_write_slowpath+0x24b/0x4c0 unlink_anon_vmas+0xb1/0x1c0 free_pgtables+0xa9/0x130 exit_mmap+0xbc/0x1a0 mmput+0x5a/0x140 svm_range_cpu_invalidate_pagetables+0x2b/0x40 [amdgpu] mn_itree_invalidate+0x72/0xc0 __mmu_notifier_invalidate_range_start+0x48/0x60 try_to_unmap_one+0x10fa/0x1400 rmap_walk_anon+0x196/0x460 try_to_unmap+0xbb/0x210 migrate_page_unmap+0x54d/0x7e0 migrate_pages_batch+0x1c3/0xae0 migrate_pages_sync+0x98/0x240 migrate_pages+0x25c/0x520 compact_zone+0x29d/0x590 compact_zone_order+0xb6/0xf0 try_to_compact_pages+0xbe/0x220 __alloc_pages_direct_compact+0x96/0x1a0 __alloc_pages_slowpath+0x410/0x930 __alloc_pages_nodemask+0x3a9/0x3e0 do_huge_pmd_anonymous_page+0xd7/0x3e0 __handle_mm_fault+0x5e3/0x5f0 handle_mm_fault+0xf7/0x2e0 hmm_vma_fault.isra.0+0x4d/0xa0 walk_pmd_range.isra.0+0xa8/0x310 walk_pud_range+0x167/0x240 walk_pgd_range+0x55/0x100 __walk_page_range+0x87/0x90 walk_page_range+0xf6/0x160 hmm_range_fault+0x4f/0x90 amdgpu_hmm_range_get_pages+0x123/0x230 [amdgpu] amdgpu_ttm_tt_get_user_pages+0xb1/0x150 [amdgpu] init_user_pages+0xb1/0x2a0 [amdgpu] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu] kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu] kfd_ioctl+0x29d/0x500 [amdgpu] Fixes: fa582c6f3684 ("drm/amdkfd: Use mmget_not_zero in MMU notifier") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Include sdma_4_4_4.binKent Russell
This got missed during SDMA 4.4.4 support. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Include <linux/export.h> when neededAndré Almeida
Fix the following compile time warning when building with W=1: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amd: Do not include <linux/export.h> when unusedAndré Almeida
Fix the following compile time warning when building with W=1: warning: EXPORT_SYMBOL() is not used, but #include <linux/export.h> is present Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30amdkfd: MTYPE_UC for ext-coherent system memoryDavid Yat Sin
Set memory mtype to UC host memory when ext-coherent flag is set and memory is registered as a SVM allocation. Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu/sdma5.x: suspend KFD queues in ring resetAlex Deucher
SDMA 5.x only supports engine soft reset which resets all queues on the engine. As such, we need to suspend KFD queues around resets like we do for SDMA 4.x. Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/panel: raydium-rm67200: Add missing drm_display_mode flagsAndy Yan
Add missing drm_display_mode DRM_MODE_FLAG_NVSYNC | DRM_MODE_FLAG_NHSYNC flags. Those are used by various bridges(e.g. dw-mipi-dsi) in the pipeline to correctly configure its sync signals polarity. Tested on rk3568/rk3576/rk3588 EVB. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250618080955.691048-1-andyshrk@163.com
2025-06-30drm/panel: raydium-rm67200: Move initialization from enable() to prepare stageAndy Yan
The DSI host has different modes in prepare() and enable() functions, prepare() is in LP command mode and enable() is in HS video mode. >From our experience, generally the initialization sequence needs to be sent in the LP command mode. Move the setup init function from enable() to prepare() to fix a display shift on rk3568 evb. Tested on rk3568/rk3576/rk3588 EVB. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250618091520.691590-1-andyshrk@163.com
2025-06-30drivers/panel: raydium-rm67200: Make reset-gpio optionalAndy Yan
Although the datasheet of the panel module describes that it has a reset pin, in the actual hardware design, we often use an RC circuit to control the reset, and rarely use GPIO to control the reset. This is the way it is done on our numerous development boards (such as RK3568/RK3576 EVB). So make the reset-gpio optional. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250616070536.670519-2-andyshrk@163.com
2025-06-30dt-bindings: display: panel: Make reset-gpio as optional for Raydium RM67200Andy Yan
Although the datasheet of the panel module describes that it has a reset pin, in the actual hardware design, we often use an RC circuit to control the reset, and rarely use GPIO to control the reset. This is the way it is done on our numerous development boards (such as RK3568, RK3576 EVB). So make the reset-gpio optional. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250616070536.670519-1-andyshrk@163.com
2025-06-30drm/panel: Add driver for DJN HX83112B LCD panelLuca Weiss
Add support for the 2160x1080 LCD panel from DJN (98-03057-6598B-I) bundled with a HX83112B driver IC, as found on the Fairphone 3 smartphone. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Luca Weiss <luca@lucaweiss.eu> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250611-fp3-display-v4-3-ef67701e7687@lucaweiss.eu
2025-06-30dt-bindings: display: panel: Add Himax HX83112BLuca Weiss
Himax HX83112B is a display driver IC used to drive LCD DSI panels. Describe it and the Fairphone 3 panel (98-03057-6598B-I) from DJN using it. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luca Weiss <luca@lucaweiss.eu> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250611-fp3-display-v4-2-ef67701e7687@lucaweiss.eu
2025-06-30dt-bindings: vendor-prefixes: document Shenzhen DJN Optronics TechnologyLuca Weiss
Add the vendor prefix for DJN (http://en.djnlcd.com/). Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luca Weiss <luca@lucaweiss.eu> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250611-fp3-display-v4-1-ef67701e7687@lucaweiss.eu
2025-06-30drm/sched/tests: Make timedout_job callback a better role modelPhilipp Stanner
Since the drm_mock_scheduler does not have real users in userspace, nor does it have real hardware or firmware rings, it's not necessary to signal timedout fences nor free jobs - from a functional standpoint. Still, the dma_fence framework establishes the hard rule that all fences must always get signaled. The unit tests, moreover, should as much as possible represent the intended usage of the scheduler API. Furthermore, this later enables simplifying the mock scheduler's teardown code path. Make sure timed out hardware fences get signaled with the appropriate error code. Signed-off-by: Philipp Stanner <phasta@kernel.org> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250605134154.191764-2-phasta@kernel.org
2025-06-30drm/i915/backlight: Use drm_edp_backlight_enableSuraj Kandpal
Use drm dp helper to enable backlight now that it has been modified to set PANEL_LUMINANCE_CONTROL_ENABLE bit based on if capability supports it and the driver wants it. Remove the dead code. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-14-suraj.kandpal@intel.com
2025-06-30drm/i915/backlight: Use drm helper to set edp backlightSuraj Kandpal
Now that the drm helper sets the backlight using luminance too we can use that. Remove the obselete function. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-13-suraj.kandpal@intel.com
2025-06-30drm/i915/backlight: Use drm helper to initialize edp backlightSuraj Kandpal
Now that drm_edp_backlight init has been modified to take into account the setup of lumininace based brightness manipulation we can just use that. --v2 -Fix commit message [Arun] Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-12-suraj.kandpal@intel.com
2025-06-30drm/dp: Enable backlight control using luminanceSuraj Kandpal
Add flag to enable brightness control via luminance value when enabling edp backlight. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-11-suraj.kandpal@intel.com
2025-06-30drm/dp: Change argument type of drm_edp_backlight_enableSuraj Kandpal
Change the argument type to u32 for the default level being sent since it has to now account for luminance value which has to be set for DP_EDP_PANEL_LUMINANCE_TARGET_VALUE. --v2 -No need to typecast [Jani] Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-10-suraj.kandpal@intel.com
2025-06-30drm/dp: Modify drm_edp_backlight_set_levelSuraj Kandpal
Modify drm_edp_backlight_set_level to be able to set the value for register in DP_EDP_PANEL_TARGET_LUMINANCE_VALUE. We multiply the level with 1000 since we get the value in Nits and the register accepts it in milliNits. --v2 -Add comment regarding the unit [Arun] Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-9-suraj.kandpal@intel.com
2025-06-30drm/dp: Change argument type for drm_edp_backlight_set_levelSuraj Kandpal
Use u32 for level variable as one may need to pass value for DP_EDP_PANEL_TARGET_LUMINANCE_VALUE. --v2 -Typecase is not needed [Jani] Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-8-suraj.kandpal@intel.com
2025-06-30drm/dp: Modify drm_edp_probe_stateSuraj Kandpal
Modify drm_edp_probe_state to read current level from DP_EDP_PANEL_TARGET_LUMINANCE_VALUE. We divide it by 1000 since the value in this register is in millinits. --v2 -Add comment on the unit sent back [Arun] Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-7-suraj.kandpal@intel.com
2025-06-30drm/dp: Change current_level argument type to u32Suraj Kandpal
Change the current_level argument type to u32 from u16 since it can now carry the value which it gets from DP_EDP_PANEL_TARGET_LUMINANCE_VALUE. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-6-suraj.kandpal@intel.com
2025-06-30drm/dp: Move from u16 to u32 for max in drm_edp_backlight_infoSuraj Kandpal
Use u32 instead of u16 for max variable in drm_edp_backlight_info since it can now hold max luminance range value which is u32. We will set this max with max_luminance value when luminance_set is true. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-5-suraj.kandpal@intel.com
2025-06-30drm/dp: Add argument for max luminance in drm_edp_backlight_initSuraj Kandpal
Add new argument to drm_edp_backlight_init which gives the max_luminance which will be needed to set the max values for backlight. --v2 -Use pass only max luminance instead of luminance_range_info struct [Arun] Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-4-suraj.kandpal@intel.com
2025-06-30drm/dp: Add argument in drm_edp_backlight_initSuraj Kandpal
Add bool argument in drm_edp_backlight init to provide the drivers option to choose if they want to use luminance values to manipulate brightness. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-3-suraj.kandpal@intel.com
2025-06-30drm/dp: Introduce new member in drm_backlight_infoSuraj Kandpal
Introduce luminance_set flag which indicates if we can manipulate backlight using luminance value or not which is only possible after eDP v1.5. Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://lore.kernel.org/r/20250620063445.3603086-2-suraj.kandpal@intel.com
2025-06-30drm/vmwgfx: drop printing the TTM refcount for debuggingChristian König
That is something TTM internal which is about to get dropped. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Ian Forbes <ian.forbes@broadcom.com> Link: https://lore.kernel.org/r/20250616130726.22863-1-christian.koenig@amd.com
2025-06-30drm/i915/gt: Fix timeline left held on VMA alloc errorJanusz Krzysztofik
The following error has been reported sporadically by CI when a test unbinds the i915 driver on a ring submission platform: <4> [239.330153] ------------[ cut here ]------------ <4> [239.330166] i915 0000:00:02.0: [drm] drm_WARN_ON(dev_priv->mm.shrink_count) <4> [239.330196] WARNING: CPU: 1 PID: 18570 at drivers/gpu/drm/i915/i915_gem.c:1309 i915_gem_cleanup_early+0x13e/0x150 [i915] ... <4> [239.330640] RIP: 0010:i915_gem_cleanup_early+0x13e/0x150 [i915] ... <4> [239.330942] Call Trace: <4> [239.330944] <TASK> <4> [239.330949] i915_driver_late_release+0x2b/0xa0 [i915] <4> [239.331202] i915_driver_release+0x86/0xa0 [i915] <4> [239.331482] devm_drm_dev_init_release+0x61/0x90 <4> [239.331494] devm_action_release+0x15/0x30 <4> [239.331504] release_nodes+0x3d/0x120 <4> [239.331517] devres_release_all+0x96/0xd0 <4> [239.331533] device_unbind_cleanup+0x12/0x80 <4> [239.331543] device_release_driver_internal+0x23a/0x280 <4> [239.331550] ? bus_find_device+0xa5/0xe0 <4> [239.331563] device_driver_detach+0x14/0x20 ... <4> [357.719679] ---[ end trace 0000000000000000 ]--- If the test also unloads the i915 module then that's followed with: <3> [357.787478] ============================================================================= <3> [357.788006] BUG i915_vma (Tainted: G U W N ): Objects remaining on __kmem_cache_shutdown() <3> [357.788031] ----------------------------------------------------------------------------- <3> [357.788204] Object 0xffff888109e7f480 @offset=29824 <3> [357.788670] Allocated in i915_vma_instance+0xee/0xc10 [i915] age=292729 cpu=4 pid=2244 <4> [357.788994] i915_vma_instance+0xee/0xc10 [i915] <4> [357.789290] init_status_page+0x7b/0x420 [i915] <4> [357.789532] intel_engines_init+0x1d8/0x980 [i915] <4> [357.789772] intel_gt_init+0x175/0x450 [i915] <4> [357.790014] i915_gem_init+0x113/0x340 [i915] <4> [357.790281] i915_driver_probe+0x847/0xed0 [i915] <4> [357.790504] i915_pci_probe+0xe6/0x220 [i915] ... Closer analysis of CI results history has revealed a dependency of the error on a few IGT tests, namely: - igt@api_intel_allocator@fork-simple-stress-signal, - igt@api_intel_allocator@two-level-inception-interruptible, - igt@gem_linear_blits@interruptible, - igt@prime_mmap_coherency@ioctl-errors, which invisibly trigger the issue, then exhibited with first driver unbind attempt. All of the above tests perform actions which are actively interrupted with signals. Further debugging has allowed to narrow that scope down to DRM_IOCTL_I915_GEM_EXECBUFFER2, and ring_context_alloc(), specific to ring submission, in particular. If successful then that function, or its execlists or GuC submission equivalent, is supposed to be called only once per GEM context engine, followed by raise of a flag that prevents the function from being called again. The function is expected to unwind its internal errors itself, so it may be safely called once more after it returns an error. In case of ring submission, the function first gets a reference to the engine's legacy timeline and then allocates a VMA. If the VMA allocation fails, e.g. when i915_vma_instance() called from inside is interrupted with a signal, then ring_context_alloc() fails, leaving the timeline held referenced. On next I915_GEM_EXECBUFFER2 IOCTL, another reference to the timeline is got, and only that last one is put on successful completion. As a consequence, the legacy timeline, with its underlying engine status page's VMA object, is still held and not released on driver unbind. Get the legacy timeline only after successful allocation of the context engine's VMA. v2: Add a note on other submission methods (Krzysztof Karas): Both execlists and GuC submission use lrc_alloc() which seems free from a similar issue. Fixes: 75d0a7f31eec ("drm/i915: Lift timeline into intel_context") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12061 Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Reviewed-by: Krzysztof Niemiec <krzysztof.niemiec@intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250611104352.1014011-2-janusz.krzysztofik@linux.intel.com
2025-06-30dt-bindings: display: vop2: Add optional PLL clock property for rk3576Cristian Ciocaltea
As with the RK3588 SoC, RK3576 also allows the use of HDMI PHY PLL as an alternative and more accurate pixel clock source for VOP2. Document the optional PLL clock property. Moreover, given that this is part of a series intended to address some recent display problems, provide the appropriate tags to facilitate backporting. Fixes: c3b7c5a4d7c1 ("dt-bindings: display: vop2: Add rk3576 support") Cc: stable@vger.kernel.org Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: "Rob Herring (Arm)" <robh@kernel.org> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250612-rk3576-hdmitx-fix-v1-1-4b11007d8675@collabora.com
2025-06-30drm/i915/display: Fix macro HAS_ULTRAJOINERAnkit Nautiyal
Currently, Ultrajoiner is supported only on Xe2_HPD. Update the HAS_ULTRAJOINER macro to reflect the same. v2: Clarify the commit message to specify platform. (Jani) Bspec: 69556 Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Reviewed-by: Karthik B S <karthik.b.s@intel.com> Link: https://lore.kernel.org/r/20250611053039.377695-1-ankit.k.nautiyal@intel.com
2025-06-29drm/ci: i915: cml: Fix the runner tagVignesh Raman
The GitLab runner tags are case sensitive, and Flip-hatch's tag was incorrectly lowercase. This prevented jobs from being picked up by the runner. Fix the runner tag for Flip-hatch. Based on https://gitlab.freedesktop.org/mesa/mesa/-/commit/03b480d3 Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-29drm/ci: Remove sdm845/cheza jobsRob Clark
These runners are no more. So remove the jobs. Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-06-28drm/ci: uprev mesa and ci-templatesVignesh Raman
The current s3cp stopped working after the migration. Update to the latest mesa and ci-templates to get s3cp working again and adapt to recent changes in mesa-ci. Acked-by: Daniel Stone <daniels@collabora.com> Acked-by: Helen Koike <helen.fornazier@gmail.com> Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-28drm/ci: python-artifacts: use shallow cloneVignesh Raman
The python-artifacts job has a timeout of 10 minutes, which causes build failures as it was unable to clone the repository within the specified limits. Set GIT_DEPTH to 10 to speed up cloning and avoid build failures due to timeouts when fetching the full repository. Acked-by: Daniel Stone <daniels@collabora.com> Acked-by: Helen Koike <helen.fornazier@gmail.com> Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-28Merge remote-tracking branch 'drm/drm-next' into msm-nextRob Clark
Back-merge drm-next to (indirectly) get arm-smmu updates for making stall-on-fault more reliable. Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
2025-06-28drm/xe: Fix conflicting intel_pcode_* symbolsLucas De Marchi
If CONFIG_DRM_XE_DISPLAY is set, the xe module can only be built as module to avoid duplicate symbols from i915. The interface for pcode was added without considering that, so the build breaks if both xe and i915 are built-in. Since the intel_pcode_* functions should only be called from the display side (xe side should call the xe interface directly) and there's already a protection in Kconfig to avoid the problematic configuration, ifdef it out in case CONFIG_DRM_XE_DISPLAY is disabled. Closes: https://lore.kernel.org/r/3667a992-a24b-4e49-aab2-5ca73f2c0a56@infradead.org Fixes: d9465cc8ac2d ("drm/xe/pcode: add struct drm_device based interface") Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250627-xe-kunit-v2-1-756fe5cd56cf@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>