summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2024-05-15Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "This is the main pull request for the drm subsystems for 6.10. In drivers the main thing is a new driver for ARM Mali firmware based GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and scattered changes to everything else. In the core a bunch of headers and Kconfig was refactored, along with the addition of a new panic handler which is meant to provide a user friendly message when a panic happens and graphical display is enabled. New drivers: - panthor: ARM Mali/Immortalis CSF-based GPU driver Core: - add a CONFIG_DRM_WERROR option - make more headers self-contained - grab resv lock in pin/unpin - fix vmap resv locking - EDID/eDP panel matching - Kconfig cleanups - DT sound bindings - Add SIZE_HINTS property for cursor planes - Add struct drm_edid_product_id and helpers. - Use drm device based logging in more drm functions. - drop seq_file.h from a bunch of places - use drm_edid driver conversions dp: - DP Tunnel documentation - MST read sideband cap - Adaptive sync SDP prep work ttm: - improve placement for TTM BOs in idle/busy handling panic: - Fixes for drm-panic, and option to test it. - Add drm panic to simpledrm, mgag200, imx, ast bridge: - improve init ordering - adv7511: allow GPIO pin sharing - tc358775: add tc358675 support panel: - AUO B120XAN01.0 - Samsung s6e3fa7 - BOE NT116WHM-N44 - CMN N116BCA-EA1, - CrystalClear CMT430B19N00 - Startek KD050HDFIA020-C020A - powertip PH128800T006-ZHC01 - Innolux G121X1-L03 - LG sw43408 - Khadas TS050 V2 - EDO RM69380 OLED - CSOT MNB601LS1-1 amdgpu: - HDCP/ODM/RAS fixes - Devcoredump improvements - Expose VCN activity via sysfs - SMY 13.0.x updates - Enable fast updates on DCN 3.1.4 - Add dclk and vclk reporting on additional devices - Add ACA RAS infrastructure - Implement TLB flush fence - EEPROM handling fixes - SMUIO 14.0.2 support - SMU 14.0.1 Updates - SMU 14.0.2 support - Sync page table freeing with TLB flushes - DML2 refactor - DC debug improvements - DCN 3.5.x Updates - GPU reset fixes - HDP fix for second GFX pipe on GC 10.x - Enable secondary GFX pipe on GC 10.3 - Refactor and clean up BACO/BOCO/BAMACO handling - Remove invalid TTM resource start check - UAF fix in VA IOCTL - GPUVM page fault redirection to secondary IH rings for IH 6.x - Initial support for mapping kernel queues via MES - Fix VRAM memory accounting amdkfd: - MQD handling cleanup - Preemption handling fixes for XCDs - TLB flush fix for GC 9.4.2 - Properly clean up workqueue during module unload - Fix memory leak process create failure - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace - Fix eviction fence handling - Fix leak in GPU memory allocation failure case - DMABuf import handling fix - Enable SQ watchpoint for gfx10 i915: - Adding new DG2 PCI ID - add context hints for GT frequency - enable only one CCS for compute workloads - new workarounds - Fix UAF on destroy against retire race and remove two earlier partial fixes - Limit the reserved VM space to only the platforms that need it - Fix gt reset with GuC submission is disable - Add and use gt_to_guc() wrapper i915/xe display: - Lunar Lake display enabling, including cdclk and other refactors - BIOS/VBT/opregion related refactor - Digital port related refactor/clean-up - Fix 2s boot time regression on DP panel replay init - Remove duplication on audio enable/disable on SDVO and g4x+ DP - Disable AuxCCS framebuffers if built for Xe - Make crtc disable more atomic - Increase DP idle pattern wait timeout to 2ms - Start using container_of_const() for some extra const safety - Fix Jasper Lake boot freeze - Enable MST mode for 128b/132b single-stream sideband - Enable Adaptive Sync SDP Support for DP - Fix MTL supported DP rates - removal of UHBR13.5 - PLL refactoring - Limit eDP MSO pipe only for display version 20 - More display refactor towards independence from i915 dev_priv - Convert i915/xe fbdev to DRM client - More initial work to make display code more independent from i915 xe: - improved error capture - clean up some uAPI leftovers - devcoredump update - Add BMG mocs table - Handle GSCCS ER interrupt - Implement xe2- and GuC workarounds - struct xe_device cleanup - Hwmon updates - Add LRC parsing for more GPU instruction - Increase VM_BIND number of per-ioctl Ops - drm/xe: Add XE_BO_GGTT_INVALIDATE flag - Initial development for SR-IOV support - Add new PCI IDs to DG2 platform - Move userptr over to start using hmm_range_fault msm: - Switched to generating register header files during build process instead of shipping pre-generated headers - Merged DPU and MDP4 format databases. - DP: - Stop using compat string to distinguish DP and eDP cases - Added support for X Elite platform (X1E80100) - Reworked DP aux/audio support - Added SM6350 DP to the bindings - GPU: - a7xx perfcntr reg fixes - MAINTAINERS updates - a750 devcoredump support radeon: - Silence UBSAN warnings related to flexible arrays nouveau: - move some uAPI objects to uapi headers omapdrm: - console fix ast: - add i2c polling qaic: - add debugfs entries exynos: - fix platform_driver .owner - drop cleanup code mediatek: - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe() - Add GAMMA 12-bit LUT support for MT8188 - Rename mtk_drm_* to mtk_* - Drop driver owner initialization - Correct calculation formula of PHY Timing" * tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel: (1477 commits) drm/xe/ads: Use flexible-array drm/xe: Use ordered WQ for G2H handler drm/msm/gen_header: allow skipping the validation drm/msm/a6xx: Cleanup indexed regs const'ness drm/msm: Add devcoredump support for a750 drm/msm: Adjust a7xx GBIF debugbus dumping drm/msm: Update a6xx registers XML drm/msm: Fix imported a750 snapshot header for upstream drm/msm: Import a750 snapshot registers from kgsl MAINTAINERS: Add Konrad Dybcio as a reviewer for the Adreno driver MAINTAINERS: Add a separate entry for Qualcomm Adreno GPU drivers drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails drm/msm/adreno: fix CP cycles stat retrieval on a7xx drm/msm/a7xx: allow writing to CP_BV counter selection registers drm: zynqmp_dpsub: Always register bridge Revert "drm/bridge: ti-sn65dsi83: Fix enable error path" drm/fb_dma: Add checks in drm_fb_dma_get_scanout_buffer() drm/fbdev-generic: Do not set physical framebuffer address drm/panthor: Fix the FW reset logic drm/panthor: Make sure we handle 'unknown group state' case properly ...
2024-05-10drm/amdgpu: Fix comparison in amdgpu_res_cpu_visibleMichel Dänzer
It incorrectly claimed a resource isn't CPU visible if it's located at the very end of CPU visible VRAM. Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org
2024-05-02Revert "drm: Switch DRM_DISPLAY_HELPER to depends on"Geert Uytterhoeven
This reverts commit e075e496f516bf92bc0cbaf94d64e8d4a6b58321, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/1ba76cc4d96a8afefff5d1bc42fb1e1329c5da68.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-02Revert "drm: Switch DRM_DISPLAY_DP_HELPER to depends on"Geert Uytterhoeven
This reverts commit 0323287de87d7e6e9c22c57d7440aa353a2298d0, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/89ac456805746b6d0c888f10c5120b11aacd3319.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-02Revert "drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on"Geert Uytterhoeven
This reverts commit 3166e7e6d935caaef07605a5c90773fbf9ffeaf4, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/a40e70a0abd3d841c23c107d452a43fdd70ef37a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-02Revert "drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on"Geert Uytterhoeven
This reverts commit f6d2dc03fa8546b284dd8c1af027d9fac5725921, as helper code should always be selected by the driver that needs it, for the convenience of the final user configuring a kernel. The user who configures a kernel should not need to know which helpers are needed for the driver he is interested in. Making a driver depend on helper code means that the user needs to know which helpers to enable first, which is very user-unfriendly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patchwork.freedesktop.org/patch/msgid/bd288a5943dab8609f2d1f2bf413595a61df727a.1713780345.git.geert+renesas@glider.be Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-04-30drm/amdgpu: fix doorbell regressionShashank Sharma
This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2Christian König
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org
2024-04-30drm/amdgpu: Fix VRAM memory accountingMukul Joshi
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30Merge tag 'amd-drm-next-6.10-2024-04-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.10-2024-04-26: amdgpu: - Misc code cleanups and refactors - Support setting reset method at runtime - Report OD status - SMU 14.0.1 fixes - SDMA 4.4.2 fixes - VPE fixes - MES fixes - Update BO eviction priorities - UMSCH fixes - Reset fixes - Freesync fixes - GFXIP 9.4.3 fixes - SDMA 5.2 fixes - MES UAF fix - RAS updates - Devcoredump updates for dumping IP state - DSC fixes - JPEG fix - Fix VRAM memory accounting - VCN 5.0 fixes - MES fixes - UMC 12.0 updates - Modify contiguous flags handling - Initial support for mapping kernel queues via MES amdkfd: - Fix rescheduling of restore worker - VRAM accounting for SVM migrations - mGPU fix - Enable SQ watchpoint for gfx10 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240426221245.1613332-1-alexander.deucher@amd.com
2024-04-29Merge v6.9-rc6 into drm-nextDaniel Vetter
Thomas needs the defio fixes, Maíra needs the vkms fixes and Joonas has some fun with i915-gem conflicts. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-04-26drm/amdgpu: fix the warning about the expression (int)size - lenJesse Zhang
Converting size from size_t to int may overflow. v2: keep reverse xmas tree order (Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu/mes: add mes mapping legacy queue supportJack Xiao
Add mes mapping legacy queue framework support. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: fix uninitialized scalar variable warningTim Huang
Clear warning that uses uninitialized value fw_size. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: skip ip dump if devcoredump flag is setSunil Khatri
Do not dump the ip registers during driver reload in passthrough environment. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Modify the contiguous flags behaviourArunpravin Paneer Selvam
Now we have two flags for contiguous VRAM buffer allocation. If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the buffer's placement function. This patch will change the default behaviour of the two flags. When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS - This means contiguous is not mandatory. - we will try to allocate the contiguous buffer. Say if the allocation fails, we fallback to allocate the individual pages. When we setTTM_PL_FLAG_CONTIGUOUS - This means contiguous allocation is mandatory. - we are setting this in amdgpu_bo_pin_restricted() before bo validation and check this flag in the vram manager file. - if this is set, we should allocate the buffer pages contiguously. the allocation fails, we return -ENOSPC. v2: - keep the mem_flags and bo->flags check as is(Christian) - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the amdgpu_bo_pin_restricted function placement range iteration loop(Christian) - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block (Christian) - Keep the kernel BO allocation as is(Christain) - If BO pin vram allocation failed, we need to return -ENOSPC as RDMA cannot work with scattered VRAM pages(Philip) v3(Christian): - keep contiguous flag handling outside of pages_per_block calculation - remove the hacky implementation in contiguous flag error handling code v4(Christian): - use any variable and return value for non-contiguous fallback v5: rebase to amd-staging-drm-next branch Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdkfd: Enable SQ watchpoint for gfx10Lancelot SIX
There are new control registers introduced in gfx10 used to configure hardware watchpoints triggered by SMEM instructions: SQ_WATCH{0,1,2,3}_{CNTL_ADDR_HI,ADDR_L}. Those registers work in a similar way as the TCP_WATCH* registers currently used for gfx9 and above. This patch adds support to program the SQ_WATCH registers for gfx10. The SQ_WATCH?_CNTL.MASK field has one bit more than TCP_WATCH?_CNTL.MASK, so SQ watchpoints can have a finer granularity than TCP_WATCH watchpoints. In this patch, we keep the capabilities advertised to the debugger unchanged (HSA_DBG_WATCH_ADDR_MASK_*_BIT_GFX10) as this reflects what both TCP and SQ watchpoints can do and both watchpoints are configured together. Signed-off-by: Lancelot SIX <lancelot.six@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and ↵Srinivasan Shanmugam
rlc_microcode() The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating about potential truncation of output when using the snprintf function. The issue was due to the size of the buffer 'ucode_prefix' being too small to accommodate the maximum possible length of the string being written into it. The string being written is "amdgpu/%s_mec.bin" or "amdgpu/%s_rlc.bin", where %s is replaced by the value of 'chip_name'. The length of this string without the %s is 16 characters. The warning message indicated that 'chip_name' could be up to 29 characters long, resulting in a total of 45 characters, which exceeds the buffer size of 30 characters. To resolve this issue, the size of the 'ucode_prefix' buffer has been reduced from 30 to 15. This ensures that the maximum possible length of the string being written into the buffer will not exceed its size, thus preventing potential buffer overflow and truncation issues. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: In function ‘gfx_v9_4_3_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 379 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); | ^~ ...... 439 | r = gfx_v9_4_3_init_rlc_microcode(adev, ucode_prefix); | ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 379 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=] 413 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); | ^~ ...... 443 | r = gfx_v9_4_3_init_cp_compute_microcode(adev, ucode_prefix); | ~~~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30 413 | snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 86301129698b ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0") Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Fix ras mode2 reset failure in ras aca modeYiPeng Chai
Fix ras mode2 reset failure in ras aca mode. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: fix double free err_addr pointer warningsBob Zhou
In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages will be run many times so that double free err_addr in some special case. So set the err_addr to NULL to avoid the warnings. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: initialize the last_jump_jiffies in atom_exec_contextJesse Zhang
The parameter "last_jump_jiffies" should be initialized before being used in the function atom_op_jump. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add check before free wb entryJesse Zhang
Check if ring is not a mes queue before freeing the wb entry, because we only allocate a wb entry when it's not a mes queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byteBob Zhou
After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be conducted to put wrong value. So return and check the i2c transfer result. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add error handle to avoid out-of-boundsBob Zhou
if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should be stop to avoid out-of-bounds read, so directly return -EINVAL. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Initialize timestamp for some legacy SOCsMa Jun
Initialize the interrupt timestamp for some legacy SOCs to fix the coverity issue "Uninitialized scalar variable" Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Use new interface to reserve bad pageYiPeng Chai
Use new interface to reserve bad page. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Fix address translation defectYiPeng Chai
retired_page is page frame and should be expanded to the full address when querying status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: support ACA logging ecc errorsYiPeng Chai
support ACA logging ecc errors. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add poison consumption handlerYiPeng Chai
Add poison consumption handler. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: prepare to handle pasid poison consumptionYiPeng Chai
Prepare to handle pasid poison consumption. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: retire bad pages for umc v12_0YiPeng Chai
Retire bad pages for umc v12_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add condition check for amdgpu_umc_fill_error_recordYiPeng Chai
Add condition check for amdgpu_umc_fill_error_record. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Add delay work to retire bad pagesYiPeng Chai
Add delay work to retire bad pages. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: umc v12_0 logs ecc errorsYiPeng Chai
1. umc v12_0 logs ecc errors. 2. Reserve newly detected ecc error pages. 3. Add tag for bad pages, so that they can be retired later. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: umc v12_0 converts error addressYiPeng Chai
Umc v12_0 converts error address. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add interface to update umc v12_0 ecc statusYiPeng Chai
Add interface to update umc v12_0 ecc status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add poison creation handlerYiPeng Chai
Add poison creation handler. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: prepare for logging ecc errorsYiPeng Chai
Prepare for logging ecc errors. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add message fifo to handle RAS poison eventsYiPeng Chai
Add message fifo to handle RAS poison events. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_relocJesse Zhang
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001. V2: To really improve the handling we would actually need to have a separate value of 0xffffffff.(Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu/mes11: Use a separate fence per transactionAlex Deucher
We can't use a shared fence location because each transaction should be considered independently. Reviewed-by: Shaoyun.liu <shaoyunl@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add a spinlock to wb allocationAlex Deucher
As we use wb slots more dynamically, we need to lock access to avoid racing on allocation or free. Reviewed-by: Shaoyun.liu <shaoyunl@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: update fw_share for VCN5Sonny Jiang
kmd_fw_shared changed in VCN5 Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: Fix VRAM memory accountingMukul Joshi
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: update jpeg max decode resolutionSathishkumar S
jpeg ip version v2.1 and higher supports 16kx16k resolution decode Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add ip dump for each ip in devcoredumpSunil Khatri
Add ip dump for each ip of the asic in the devcoredump for all the ips where a callback is registered for register dump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: dump ip state before reset for each ipSunil Khatri
Invoke the dump_ip_state function for each ip before the asic resets and save the register values for debugging via devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add support for gfx v10 printSunil Khatri
Add support to print ip information to be used to print registers in devcoredump buffer. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add protype for print ip stateSunil Khatri
Add the protoype for print ip state to be used to print the registers in devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26drm/amdgpu: add support of gfx10 register dumpSunil Khatri
Adding gfx10 gc registers to be used for register dump via devcoredump during a gpu reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>