summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2023-06-17drm/ingenic: Kconfig: select REGMAP and REGMAP_MMIOSui Jingfeng
Otherwise its failed to pass basic compile test on platform without REGMAP_MMIO selected by defconfig make -j$(nproc) ARCH=mips CROSS_COMPILE=mips64el-linux-gnuabi64- SYNC include/config/auto.conf.cmd Checking missing-syscalls for N32 CALL scripts/checksyscalls.sh Checking missing-syscalls for O32 CALL scripts/checksyscalls.sh CALL scripts/checksyscalls.sh MODPOST Module.symvers ERROR: modpost: "__devm_regmap_init_mmio_clk" [drivers/gpu/drm/ingenic/ingenic-drm.ko] undefined! make[1]: *** [scripts/Makefile.modpost:136: Module.symvers] Error 1 make: *** [Makefile:1978: modpost] Error 2 V2: Order alphabetically Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Paul Cercueil <paul@crapouillou.net> Link: https://patchwork.freedesktop.org/patch/msgid/20230607110650.569522-1-suijingfeng@loongson.cn
2023-06-17Merge tag 'drm-misc-fixes-2023-06-16' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes maybe in time for v6.4-rc7: - qaic leak and null deref fix. - Fix runtime pm in nouveau. - Fix array overflow in ti-sn65dsi86 pwm chip handling. - Assorted null check fixes in nouveau. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <dev@lankhorst.se> Link: https://patchwork.freedesktop.org/patch/msgid/641eb8a8-fbd7-90ad-0805-310b7fec9344@lankhorst.se
2023-06-16drm/i915/psr: Re-enable PSR1 on hsw/bdwVille Syrjälä
All known issues fixed now, so re-enable PSR1 on hsw/bdw. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-14-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Allow PSR with sprite enabled on hsw/bdwVille Syrjälä
Can't see why we'd want the sprite blocking PSR entry. Mask it out. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-13-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Don't skip both TP1 and TP2/3 on hsw/bdwVille Syrjälä
WA 0479 says: "Do not skip both TP1 and TP2/TP3". Let's just stick the minimum 100us TP2/3 time in there to avoid that. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-12-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Do no mask display register writes on hsw/bdwVille Syrjälä
hsw/bdw lack the pipe register vs. display register distinction in their PSR masking capabilities. So to keep our CURSURFLIVE tricks working we need to just unmask all display register writes on these platforms. The downside being that any display regitster (eg. even SWF regs) will cause a PSR exit. Note that WaMaskMMIOWriteForPSR asks us to mask this on bdw, but that won't work since we need those CURSURFLIVE tricks. Observations on actual hardware show that this causes one extra PSR exit ~every 10 seconds, which is pretty much irrelevant. I suspect this is due to the pcode poking at IPS_CTL. Disabling IPS does not stop it however, so either I'm wrong or pcode pokes at the register regardless of whether it's actually trying to enable/disable IPS. Also when the machine is busy (eg. just running 'find /') these extra PSR exits cease, which again points at pcode or some other PM entity as being the culprit. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-11-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Implement WaPsrDPRSUnmaskVBlankInSRD:hswVille Syrjälä
Bspec asks us to unmask "vblank to registers" in the DPRS unit. Note that I was unable to observe any change in hardware behviour due to this bit on HSW. But let's do this anyway in case it matters in some cases, and the corresponding bit on BDW is abolutely critical as without it the hardware won't generate any vblanks whatsoever after PSR exit. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-10-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Implement WaPsrDPAMaskVBlankInSRD:hswVille Syrjälä
Implement WaPsrDPAMaskVBlankInSRD:hsw, which makes the hardware generate the extra vblank between link training and first frame being transmitted. This is the same thing that's controlled by TRANS_CHICKEN[21] on skl+ (but due to the funky double buffering it's effectively always at the rest value after DC5 exit). So for consistent behaviour we want every platform to generate said vblank. BDW is already setting this up correctly. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-9-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Restore PSR interrupt handler for HSWVille Syrjälä
Add the PSR interrupt handling code back for HSW. Looks like the removal was never completed anyway since the irq setup code was lest untouched. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-8-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: HSW/BDW have no PSR2Ville Syrjälä
Deal with HSW/BDW in transcoder_has_psr2(). Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-7-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Bring back HSW/BDW PSR AUX CH registers/setupVille Syrjälä
Reintroduce the special PSR AUX CH setup for hsw/bdw. Not all of it was even removed (BDW AUX data registers were left behind). Update the code to use REG_BIT() & co. while at it. v2: Define the SRD_AUX_CTL bits in terms of DP_AUX_CTL bits (Jouni) Add a comment explaining the hand rolled DPCD write (Jouni) Cc: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-6-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Reintroduce HSW PSR1 registersVille Syrjälä
Add back hsw'w special SRD/PSR1 registers. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-5-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Wrap PSR1 register with functionsVille Syrjälä
In preparation for re-introducing HSW's different PSR1 register offeets let's just wrap all the registers into functions. Avoids having to make the register macros more complex. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-4-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915/psr: Fix BDW PSR AUX CH data register offsetsVille Syrjälä
The multiplication got replaced by an addition in some cleanup. This means we never write the correct data to some of the BDW PSR data registers and thus we fail to actually wake up the panel from PSR. Fixes: 4ab4fa103217 ("drm/i915/psr: Make PSR registers relative to transcoders") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-3-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/i915: Re-init clock gating on coming out of PC8+Ville Syrjälä
PC8+ clobbers a bunch of displays registers which need to be restored by hand or else we lost a bunch of workarounds. The important ones for us are at least CHICKEN_PAR2* and CHICKEN_PIPESL*. Curiously at least some CHICKEN_PAR1* registers are preserved by the hardware/firmware. Unfortunately Bspec doens't really specify what gets clobbered vs. preserved so further reverse engieering might be warranted to figure out the specifics. Note that PCH_LP_PARTITION_LEVEL_DISABLE is also set by lpt_init_clock_gating() so the rmw in hsw_disable_pc8() is now redundant. Remove it. TODO: I suspect most gt stuff doesn't need this and we should finish moving all of them from init_clock_gating() to a more appropriate place... Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230609141404.12729-2-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
2023-06-16drm/bridge: tc358764: Fix debug print parameter orderMarek Vasut
The debug print parameters were swapped in the output and they were printed as decimal values, both the hardware address and the value. Update the debug print to print the parameters in correct order, and use hexadecimal print for both address and value. Fixes: f38b7cca6d0e ("drm/bridge: tc358764: Add DSI to LVDS bridge driver") Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230615152817.359420-1-marex@denx.de
2023-06-16drm/msm/dsi: split dsi_ctrl_config() functionDmitry Baryshkov
It makes no sense to pass NULL parameters to dsi_ctrl_config() in the disable case. Split dsi_ctrl_config() into enable and disable parts and drop unused params. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org> Patchwork: https://patchwork.freedesktop.org/patch/542559/ Link: https://lore.kernel.org/r/20230614224402.296825-2-dmitry.baryshkov@linaro.org
2023-06-16drm/msm/dsi: dsi_host: drop unused clocksDmitry Baryshkov
Several source clocks are not used anymore, so stop handling them. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org> Patchwork: https://patchwork.freedesktop.org/patch/542558/ Link: https://lore.kernel.org/r/20230614224402.296825-1-dmitry.baryshkov@linaro.org
2023-06-16drm/msm/dpu: remove unused INTF_NONE interfacesDmitry Baryshkov
sm6115, sm6375 and qcm2290 do not have INTF_0. Drop corresponding interface definitions. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org> Patchwork: https://patchwork.freedesktop.org/patch/542180/ Link: https://lore.kernel.org/r/20230613001004.3426676-4-dmitry.baryshkov@linaro.org
2023-06-16drm/msm/dpu: correct MERGE_3D lengthDmitry Baryshkov
Each MERGE_3D block has just two registers. Correct the block length accordingly. Fixes: 4369c93cf36b ("drm/msm/dpu: initial support for merge3D hardware block") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Patchwork: https://patchwork.freedesktop.org/patch/542177/ Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org> Link: https://lore.kernel.org/r/20230613001004.3426676-3-dmitry.baryshkov@linaro.org
2023-06-16drm/msm/dpu: fix sc7280 and sc7180 PINGPONG done interruptsDmitry Baryshkov
During IRQ conversion we have lost the PP_DONE interrupts for sc7280 platform. This was left unnoticed, because this interrupt is only used for CMD outputs and probably no sc7[12]80 systems use DSI CMD panels. Fixes: 667e9985ee24 ("drm/msm/dpu: replace IRQ lookup with the data in hw catalog") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org> Patchwork: https://patchwork.freedesktop.org/patch/542175/ Link: https://lore.kernel.org/r/20230613001004.3426676-2-dmitry.baryshkov@linaro.org
2023-06-16nouveau: fix client work fence deletion raceDave Airlie
This seems to have existed for ever but is now more apparant after commit 9bff18d13473 ("drm/ttm: use per BO cleanup workers") My analysis: two threads are running, one in the irq signalling the fence, in dma_fence_signal_timestamp_locked, it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks. The second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the fence, cleanups the object and frees the work item, which contains the callback. Thread one goes again and tries to call the callback and causes the use-after-free. Proposed fix: lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done or the memory is freed. Reviewed-by: Karol Herbst <kherbst@redhat.com> Fixes: 11e451e74050 ("drm/nouveau: remove fence wait code from deferred client work handler") Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://lore.kernel.org/dri-devel/20230615024008.1600281-1-airlied@gmail.com/
2023-06-16Merge tag 'drm-intel-next-2023-06-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 feature pull #2 for v6.5: Features and functionality: - Meteorlake PM demand (Vinod, Mika) - Switch to dedicated workqueues to stop using flush_scheduled_work() (Luca) Refactoring and cleanups: - Move display runtime init under display/ (Matt) - Async flip error message clarifications (Arun) Fixes: - Remove 10bit gamma on desktop gen3 parts, they don't support it (Ville) - Fix driver probe error handling if driver creation fails (Matt) - Fix all -Wunused-but-set-variable warnings, and enable it for i915 (Jani) - Stop using edid_blob_ptr (Jani) - Fix log level for "CDS interlane align done" (Khaled) - Fix an unnecessary include prefix (Matt) Merges: - Backmerge drm-next to sync with drm-intel-gt-next (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87o7lnpxz2.fsf@intel.com
2023-06-15drm/dp_mst: Clear MSG_RDY flag before sending new messageWayne Lin
[Why] The sequence for collecting down_reply from source perspective should be: Request_n->repeat (get partial reply of Request_n->clear message ready flag to ack DPRX that the message is received) till all partial replies for Request_n are received->new Request_n+1. Now there is chance that drm_dp_mst_hpd_irq() will fire new down request in the tx queue when the down reply is incomplete. Source is restricted to generate interveleaved message transactions so we should avoid it. Also, while assembling partial reply packets, reading out DPCD DOWN_REP Sideband MSG buffer + clearing DOWN_REP_MSG_RDY flag should be wrapped up as a complete operation for reading out a reply packet. Kicking off a new request before clearing DOWN_REP_MSG_RDY flag might be risky. e.g. If the reply of the new request has overwritten the DPRX DOWN_REP Sideband MSG buffer before source writing one to clear DOWN_REP_MSG_RDY flag, source then unintentionally flushes the reply for the new request. Should handle the up request in the same way. [How] Separete drm_dp_mst_hpd_irq() into 2 steps. After acking the MST IRQ event, driver calls drm_dp_mst_hpd_irq_send_new_request() and might trigger drm_dp_mst_kick_tx() only when there is no on going message transaction. Changes since v1: * Reworked on review comments received -> Adjust the fix to let driver explicitly kick off new down request when mst irq event is handled and acked -> Adjust the commit message Changes since v2: * Adjust the commit message * Adjust the naming of the divided 2 functions and add a new input parameter "ack". * Adjust code flow as per review comments. Changes since v3: * Update the function description of drm_dp_mst_hpd_irq_handle_event Changes since v4: * Change ack of drm_dp_mst_hpd_irq_handle_event() to be an array align the size of esi[] Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Increase hmm range get pages timeoutPhilip Yang
If hmm_range_fault returns -EBUSY, we should call hmm_range_fault again to validate the remaining pages. On one system with NUMA auto balancing enabled, hmm_range_fault takes 6 seconds for 1GB range because CPU migrate the range one page at a time. To be safe, increase timeout value to 1 second for 128MB range. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Enable translate further for GC v9.4.3Philip Yang
To extend UTCL2 reach. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/ssd130x: Remove hardcoded bits-per-pixel in ssd130x_buf_alloc()Javier Martinez Canillas
The driver only supports OLED controllers that have a native DRM_FORMAT_C1 pixel format and that is why it has harcoded a division of the width by 8. But the driver might be extended to support devices that have a different pixel format. So it's better to use the struct drm_format_info helpers to compute the size of the buffer, used to store the pixels in native format. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230609170941.1150941-6-javierm@redhat.com
2023-06-15drm/ssd130x: Don't allocate buffers on each plane updateJavier Martinez Canillas
The resolutions for these panels are fixed and defined in the Device Tree, so there's no point to allocate the buffers on each plane update and that can just be done once. Let's do the allocation and free on the encoder enable and disable helpers since that's where others initialization and teardown operations are done. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230609170941.1150941-5-javierm@redhat.com
2023-06-15drm/ssd130x: Set the page height value in the device info dataJavier Martinez Canillas
The driver only supports OLED controllers that have a page height of 8 but there are devices that have different page heights. So it is better to not hardcode this value and instead have it as a per controller data value. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230609170941.1150941-4-javierm@redhat.com
2023-06-15drm/ssd130x: Make default width and height to be controller dependentJavier Martinez Canillas
Currently the driver hardcodes the default values to 96x16 pixels but this default resolution depends on the controller. The datasheets for the chips describes the following display controller resolutions: - SH1106: 132 x 64 Dot Matrix OLED/PLED - SSD1305: 132 x 64 Dot Matrix OLED/PLED - SSD1306: 128 x 64 Dot Matrix OLED/PLED - SSD1307: 128 x 39 Dot Matrix OLED/PLED - SSD1309: 128 x 64 Dot Matrix OLED/PLED Add this information to the devices' info structures, and use it set as a default if not defined in DT rather than hardcoding to an arbitrary value. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230609170941.1150941-2-javierm@redhat.com
2023-06-15drm/i915/huc: Fix missing error code in intel_huc_init()Harshit Mogalapalli
Smatch warns: drivers/gpu/drm/i915/gt/uc/intel_huc.c:388 intel_huc_init() warn: missing error code 'err' When the allocation of VMAs fail: The value of err is zero at this point and it is passed to PTR_ERR and also finally returning zero which is success instead of failure. Fix this by adding the missing error code when VMA allocation fails. Fixes: 08872cb13a71 ("drm/i915/mtl/huc: auth HuC via GSC") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230614223646.2583633-1-daniele.ceraolospurio@intel.com
2023-06-15drm/amdgpu: Remove unused NBIO interfaceLijo Lazar
Set compute partition mode interface in NBIO is no longer used. Remove the only implementation from NBIO v7.9 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdkfd: update user space last_event_ageJames Zhu
Update user space last_event_age when event age is enabled. It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdkfd: set activated flag true when event age unmatchsJames Zhu
Set waiter's activated flag true when event age unmatchs with last_event_age. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdkfd: add event_age tracking when receiving interruptJames Zhu
Add event_age tracking when receiving interrupt. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/scheduler: avoid infinite loop if entity's dependency is a scheduled ↵ZhenGuo Yin
error fence [Why] drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false. If entity's dependency is a scheduler error fence and drm_sched_stop is called due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely. [How] Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup callback for the dependency with scheduled error fence. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: add entity error check in amdgpu_ctx_get_entityZhenGuo Yin
[Why] UMD is not aware of entity error, and will keep submitting jobs into the error entity. [How] Add entity error check when getting entity from ctx. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: add VM generation tokenChristian König
Instead of using the VRAM lost counter add a 64bit token which indicates if a context or job is still valid to use. Should the VRAM be lost or the page tables need re-creation the token will change indicating that userspace needs to act and re-create the contexts and re-submit the work. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: reset VM when an error is detectedChristian König
When some problem with the updates of page tables is detected reset the state machine of the VM and re-create all page tables from scratch. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: abort submissions during prepare on errorChristian König
Forward errors from previous submissions to this one. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: mark soft recovered fences with -ENODATAChristian König
Set the fence error code before trying to soft-recover it. It gets overwritten when a hard recovery is required. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: mark force completed fences with -ECANCELEDChristian König
When we force complete fences we should mark them as canceled. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: add amdgpu_error_* debugfs fileChristian König
This allows us to insert some error codes into the bottom of the pipeline on an engine. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: mark GC 9.4.3 experimental for nowAlex Deucher
Mark as experimental for now until we get closer to production to avoid possible undesireable behavior when mixing newer boards with older kernels. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Use PSP FW API for partition switchLijo Lazar
Use PSP firmware interface for switching compute partitions. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Change nbio v7.9 xcp status definitionLijo Lazar
PARTITION_MODE field in PARTITION_COMPUTE_STATUS register is defined as below by firmware. SPX = 0, DPX = 1, TPX = 2, QPX = 3, CPX = 4 Change driver definition accordingly. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: make sure that BOs have a backing storeChristian König
It's perfectly possible that the BO is about to be destroyed and doesn't have a backing store associated with it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: make sure BOs are locked in amdgpu_vm_get_memoryChristian König
We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Add checking mc_vram_sizeStanley.Yang
Do not compare injection address with mc_vram_size if mc_vram_size is zero. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Optimize checking ras supportedStanley.Yang
Using "is_app_apu" to identify device in the native APU mode or carveout mode. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>