summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2024-01-31drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'Srinivasan Shanmugam
Add a NULL check for the kzalloc call that allocates memory for dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously, if kzalloc failed to allocate memory and returned NULL, the code would attempt to use the NULL pointer. The fix is to check if kzalloc returns NULL, and if so, log an error message and skip the rest of the current loop iteration with the continue statement. This prevents the code from attempting to use the NULL pointer. Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202401300629.ICnCt983-lkp@intel.com/ Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size") Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd: Don't init MEC2 firmware when it fails to loadDavid McFarland
The same calls are made directly above, but conditional on the firmware loading and validating successfully. Cc: stable@vger.kernel.org Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland <corngood@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Fix the warning info in mode1 resetMa Jun
Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: Fix dcn35 8k30 Underflow/Corruption IssueFangzhi Zuo
[why] odm calculation is missing for pipe split policy determination and cause Underflow/Corruption issue. [how] Add the odm calculation. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: Underflow workaround by increasing SR exit latencyNicholas Susanto
[Why] On 14us for exit latency time causes underflow for 8K monitor with HDR on. Increasing the latency to 28us fixes the underflow. [How] Increase the latency to 28us. This workaround should be sufficient before we figure out why SR exit so long. Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Nicholas Susanto <nicholas.susanto@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: fix incorrect mpc_combine array sizeWenjing Liu
[why] MAX_SURFACES is per stream, while MAX_PLANES is per asic. The mpc_combine is an array that records all the planes per asic. Therefore MAX_PLANES should be used as the array size. Using MAX_SURFACES causes array overflow when there are more than 3 planes. [how] Use the MAX_PLANES for the mpc_combine array size. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com> Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: Fix DPSTREAM CLK on and off sequenceDmytro Laktyushkin
[Why] Secondary DP2 display fails to light up in some instances [How] Clock needs to be on when DPSTREAMCLK*_EN =1. This change moves dtbclk_p enable/disable point to make sure this is the case Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Daniel Miess <daniel.miess@amd.com> Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: fix USB-C flag update after enc10 feature initCharlene Liu
[why] BIOS's integration info table not following the original order which is phy instance is ext_displaypath's array index. [how] Move them to follow the original order. Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Charlene Liu <charlene.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: increased min_dcfclk_mhz and min_fclk_mhzSohaib Nadeem
[why] Originally, PMFW said min FCLK is 300Mhz, but min DCFCLK can be increased to 400Mhz because min FCLK is now 600Mhz so FCLK >= 1.5 * DCFCLK hardware requirement will still be satisfied. Increasing min DCFCLK addresses underflow issues (underflow occurs when phantom pipe is turned on for some Sub-Viewport configs). [how] Increasing DCFCLK by raising the min_dcfclk_mhz Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Sohaib Nadeem <sohaib.nadeem@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31Revert "drm/amd/display: initialize all the dpm level's stutter latency"Charlene Liu
Revert commit 885c71ad791c ("drm/amd/display: initialize all the dpm level's stutter latency") Because it causes some regression Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Charlene Liu <charlene.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdkfd: Use correct drm device for cgroup permission checkMukul Joshi
On GFX 9.4.3, for a given KFD node, fetch the correct drm device from XCP manager when checking for cgroup permissions. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdkfd: Use S_ENDPGM_SAVED in trap handlerJay Cornwall
This instruction has no functional difference to S_ENDPGM but allows performance counters to track save events correctly. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Laurent Morichetti <laurent.morichetti@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdkfd: Correct partial migration virtual addrPhilip Yang
Partial migration to system memory should use migrate.addr, not prange->start as virtual address to allocate system memory page. Fixes: a546a2768440 ("drm/amdkfd: Use partial migrations/mapping for GPU/CPU page faults in SVM") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: move the drm client creation behind drm device registrationLe Ma
This patch is to eliminate interrupt warning below: "[drm] Fence fallback timer expired on ring sdma0.0". An early vm pt clearing job is sent to SDMA ahead of interrupt enabled. And re-locating the drm client creation following after drm_dev_register looks like a more proper flow. v2: wrap the drm client creation Fixes: 1819200166ce ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/nouveau/svm: remove unused but set variablesJani Nikula
Fix the W=1 warning -Wunused-but-set-variable. Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: nouveau@lists.freedesktop.org Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/8b133e7ec0e9aef728be301ac019c5ddcb3bbf51.1704908087.git.jani.nikula@intel.com
2024-01-31drm/nouveau/acr/ga102: remove unused but set variableJani Nikula
Fix the W=1 warning -Wunused-but-set-variable. Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: nouveau@lists.freedesktop.org Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/4d9f62fa6963acfd8b7d8f623799ba3a516e347d.1704908087.git.jani.nikula@intel.com
2024-01-31drm/amdgpu: Reset IH OVERFLOW_CLEAR bitFriedrich Vock
Allows us to detect subsequent IH ring buffer overflows as well. Cc: Joshua Ashton <joshua@froggi.es> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspendYifan Zhang
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status. Beside, current set function callbacks are empty with no real effect. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: use PSP address query commandTao Zhou
Get UMC physical address from PSP in RAS error address coversion. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: add PSP RAS address query commandTao Zhou
Convert mca address to physical address or vice versa via RAS TA. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0Yifan Zhang
No need to set GC golden settings in driver from gfx 11.5.0 onwards. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdkfd: reserve the BO before validating itLang Yu
Fix a warning. v2: Avoid unmapping attachment repeatedly when ERESTARTSYS. v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix) [ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.708989] Call Trace: [ 41.708992] <TASK> [ 41.708996] ? show_regs+0x6c/0x80 [ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709008] ? __warn+0x93/0x190 [ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709024] ? report_bug+0x1f9/0x210 [ 41.709035] ? handle_bug+0x46/0x80 [ 41.709041] ? exc_invalid_op+0x1d/0x80 [ 41.709048] ? asm_exc_invalid_op+0x1f/0x30 [ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm] [ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu] [ 41.709337] ? srso_alias_return_thunk+0x5/0x7f [ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu] [ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu] [ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu] [ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu] [ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu] [ 41.709945] ? srso_alias_return_thunk+0x5/0x7f [ 41.709949] ? tomoyo_file_ioctl+0x20/0x30 [ 41.709959] __x64_sys_ioctl+0x9c/0xd0 [ 41.709967] do_syscall_64+0x3f/0x90 [ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush") Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'Srinivasan Shanmugam
The error message buffer overflow 'dc->links' 12 <= 12 suggests that the code is trying to access an element of the dc->links array that is beyond its bounds. In C, arrays are zero-indexed, so an array with 12 elements has valid indices from 0 to 11. Trying to access dc->links[12] would be an attempt to access the 13th element of a 12-element array, which is a buffer overflow. To fix this, ensure that the loop does not go beyond the last valid index when accessing dc->links[i + 1] by subtracting 1 from the loop condition. This would ensure that i + 1 is always a valid index in the array. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12 Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic") Cc: PeiChen Huang <peichen.huang@amd.com> Cc: Aric Cyr <aric.cyr@amd.com> Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'Srinivasan Shanmugam
Add a NULL check for the kzalloc call that allocates memory for dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously, if kzalloc failed to allocate memory and returned NULL, the code would attempt to use the NULL pointer. The fix is to check if kzalloc returns NULL, and if so, log an error message and skip the rest of the current loop iteration with the continue statement. This prevents the code from attempting to use the NULL pointer. Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202401300629.ICnCt983-lkp@intel.com/ Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size") Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'Srinivasan Shanmugam
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()' Fixes the below: drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r' Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriovYiPeng Chai
Need to resume ras during gpu reset for gfx v9_4_3 sriov Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: disable RAS feature when finiTao Zhou
Send RAS disable feature command in fini. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amdgpu: Update boot time errors polling sequenceHawking Zhang
Update boot time errors polling sequence to align with the latest firmware change. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Frank Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/amd: Don't init MEC2 firmware when it fails to loadDavid McFarland
The same calls are made directly above, but conditional on the firmware loading and validating successfully. Cc: stable@vger.kernel.org Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland <corngood@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/xe: Make all GuC ABI shift values unsignedMatthew Brost
All GuC ABI definitions are unsigned and not defining as unsigned is causing build errors [1]. [1] https://lore.kernel.org/all/20240123111235.3097079-1-geert@linux-m68k.org/ Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240131025424.2087936-1-matthew.brost@intel.com
2024-01-31drm/xe: Convert job timeouts from assert to warningMatt Roper
xe_assert() is intended to be used only for "impossible" situations that should never be hit (and if they are hit it means there's a driver bug somewhere); assertions are only compiled into debug builds. Although we expect jobs submitted by the kernel to be well-behaved and run without error, timeouts are a legitimate possibility for reasons beyond our control (bad firmware, flaky hardware, etc.). We should use a real WARN if we encounter these, even for non-debug builds, to ensure the issue is being properly highlighted in bug reports and such. Also give the WARNs more human-readable messages and move them below the general notice-level message that gets printed for any kind of timeout to make the errors a bit more understandable. v2: - Convert the VM / exec_queue_killed assertion as well. (MattB) Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240130200308.1429134-2-matthew.d.roper@intel.com
2024-01-31drm/amdgpu: Fix the warning info in mode1 resetMa Jun
Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31drm/xe: Don't use __user error pointersThomas Hellström
The error pointer macros are not aware of __user pointers and as a consequence sparse warns. Have the copy_mask() function return an integer instead of a __user pointer. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-5-thomas.hellstrom@linux.intel.com
2024-01-31drm/xe: Annotate mcr_[un]lock()Thomas Hellström
These functions acquire and release the gt::mcr_lock. Annotate accordingly. Fix the corresponding sparse warning. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues") Cc: Francois Dugast <francois.dugast@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-4-thomas.hellstrom@linux.intel.com
2024-01-31drm/xe: drop display/ subdir from include directoriesJani Nikula
There are very few places that need to include anything from under display/. Require the display/ prefix in #include directives, and drop the subdirectory from the header search path. Sort the include lists while at it. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240122101428.2683468-2-jani.nikula@intel.com
2024-01-31drm/xe: move xe_display.[ch] under display/Jani Nikula
All the other display related files are under display/ subdirectory, also move xe_display.[ch] there. Sort the build list while at it. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240122101428.2683468-1-jani.nikula@intel.com
2024-01-31drm/imx: prefer snprintf over sprintfJani Nikula
This will trade the W=1 warning -Wformat-overflow to -Wformat-truncation. This lets us enable -Wformat-overflow subsystem wide. Cc: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/14c0108a54007a8360d84162a1d63cba9613b945.1704908087.git.jani.nikula@intel.com
2024-01-31drm/amdgpu: prefer snprintf over sprintfJani Nikula
This will trade the W=1 warning -Wformat-overflow to -Wformat-truncation. This lets us enable -Wformat-overflow subsystem wide. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pan, Xinhui <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fea7a52924f98b1ac24f4a7e6ba21d7754422430.1704908087.git.jani.nikula@intel.com
2024-01-30drm/xe: Only allow 1 ufence per exec / bind IOCTLMatthew Brost
The way exec ufences are coded only 1 ufence per IOCTL will be signaled. It is possible to fix this but for current use cases 1 ufence per IOCTL is sufficient. Enforce a limit of 1 ufence per IOCTL (both exec and bind to be uniform). v2: - Add fixes tag (Thomas) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Mika Kahola <mika.kahola@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124234413.1640825-1-matthew.brost@intel.com
2024-01-30drm/vram-helper: Fix 'multi-line' kernel-doc commentsAnna-Maria Behnsen
Reformat lines in kernel-doc comments, which make use of the backslash at the end to suggest it is a multi-line comment. kernel-doc is able to process e.g. the short description of a function properly, even if it is across two lines. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240122093152.22536-2-anna-maria@linutronix.de
2024-01-30drm/xe: Add batch buffer addresses to devcoredumpJosé Roberto de Souza
Those addresses are necessary to Mesa tools knows where in VM are the batch buffers to parse and print instructions that are human readable. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240130135648.30211-2-jose.souza@intel.com
2024-01-30drm/xe: Add functions to convert regular address to canonical address and backJosé Roberto de Souza
Some instructions requires canonical address like MI_BATCH_BUFFER_START(UMDs must call xe_exec with a canonical address for Xe2+). So here adding functions to convert regular address to canonical address and back, the first user of this functions will be added in the next patch. v3: - inline removed - rename highest_address_bit_get() to ppgtt_msb_get() v4: - use xe->info.va_bits instead of xe->info.dma_mask_size BSpec: 47626 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240130135648.30211-1-jose.souza@intel.com
2024-01-30drm/xe: Use function to emit PIPE_CONTROLJosé Roberto de Souza
This reduces code duplication in xe_ring_ops. v2: - fix flags of emit_pipe_imm_ggtt() - reduce to only one function v3: - fix emit_pipe_imm_ggtt() stall_only check Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240130132249.8615-1-jose.souza@intel.com
2024-01-30drm/xe/guc: Reduce a print from warn to debugKarthik Poosa
Reduce debug print from warn to debug to avoid unnecessary warning message in dmesg: the firmware loading logic already has the right printk priority level when checking the firmware version. Fixes: c5a06c9169f3 ("drm/xe/guc: Enable WA 14018913170") Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240125165652.3764711-1-karthik.poosa@intel.com [ slightly reword debug and commit messages ] Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-01-30drm/vmwgfx: Fix the lifetime of the bo cursor memoryZack Rusin
The cleanup can be dispatched while the atomic update is still active, which means that the memory acquired in the atomic update needs to not be invalidated by the cleanup. The buffer objects in vmw_plane_state instead of using the builtin map_and_cache were trying to handle the lifetime of the mapped memory themselves, leading to crashes. Use the map_and_cache instead of trying to manage the lifetime of the buffer objects held by the vmw_plane_state. Fixes kernel oops'es in IGT's kms_cursor_legacy forked-bo. Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Fixes: bb6780aa5a1d ("drm/vmwgfx: Diff cursors when using cmds") Cc: <stable@vger.kernel.org> # v6.2+ Reviewed-by: Martin Krastev <martin.krastev@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240126200804.732454-6-zack.rusin@broadcom.com
2024-01-30drm/vmwgfx: Fix vmw_du_get_cursor_mob fencing of newly-created MOBsMartin Krastev
The fencing of MOB creation used in vmw_du_get_cursor_mob was incompatible with register-based device communication employed by this routine. As a result cursor MOB creation was racy, leading to potentially broken/missing mouse cursor on desktops using CursorMob device feature. Fixes: 53bc3f6fb6b3 ("drm/vmwgfx: Clean up cursor mobs") Signed-off-by: Martin Krastev <martin.krastev@broadcom.com> Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240126200804.732454-5-zack.rusin@broadcom.com
2024-01-30drm/vmwgfx: Make all surfaces shareableMaaz Mombasawala
There is no real need to have a separate pool for shareable and non-shareable surfaces. Make all surfaces shareable, regardless of whether the drm_vmw_surface_flag_shareable has been specified. Signed-off-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Reviewed-by: Martin Krastev <martin.krastev@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240126200804.732454-3-zack.rusin@broadcom.com
2024-01-30drm/vmwgfx: Refactor drm connector probing for display modesMartin Krastev
Implement drm_connector_helper_funcs.mode_valid and .get_modes, replacing custom drm_connector_funcs.fill_modes code with drm_helper_probe_single_connector_modes; for STDU, LDU & SOU display units. Signed-off-by: Martin Krastev <martin.krastev@broadcom.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240126200804.732454-2-zack.rusin@broadcom.com
2024-01-30drm/i915/xe2lpd: Move registers to PICALucas De Marchi
Some registers for DDI A/B moved to PICA and now follow the same format as the ones for the PORT_TC ports. The wrapper here deals with 2 issues: - Share the implementation between xe2lpd and previous platforms: there are minor layout changes, it's mostly the register location that changed - Handle offsets after TC ports v2: - Explain better the trick to use just the second range (Matt Roper) - Add missing conversions after rebase (Matt Roper) - Use macro instead of inline function, avoiding includes in the header (Jani) - Prefix old macros with underscore so they don't get used by mistake, and name the new ones using the previous names v3: Use the same logic for the recently-introduced XELPDP_PORT_MSGBUS_TIMER (Gustavo) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240126224638.4132016-3-lucas.demarchi@intel.com
2024-01-30drm/i915/xe2lpd: Move D2D enable/disableLucas De Marchi
Bits to enable/disable and check state for D2D moved from XELPDP_PORT_BUF_CTL1 to DDI_BUF_CTL (now named DDI_CTL_DE in the spec). Make the functions mtl_ddi_disable_d2d() and mtl_ddi_enable_d2d generic to work with multiple reg location and bitfield layout. v2: Set/Clear XE2LPD_DDI_BUF_D2D_LINK_ENABLE in saved_port_bits when enabling/disabling D2D so DDI_BUF_CTL is correctly programmed in other places without overriding these bits (Clint) v3: Leave saved_port_bits alone as those bits are not meant to be modified outside of the port initialization. Rather propagate the additional bit in DDI_BUF_CTL to be set when that register is written again after D2D is enabled. Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240126224638.4132016-2-lucas.demarchi@intel.com