Age | Commit message (Collapse) | Author |
|
Users might switch to ROCM build does not include MEC SJT FW and driver
needs to consider this case.w
Signed-off-by: yfeng1 <yfeng1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.14-2024-12-18:
amdgpu:
- RAS updates
- ISP updates
- SDMA queue reset support
- Rework DPM powergating interfaces
- Documentation updates and cleanups
- Panel replay fixes
- DCN 3.5 updates
- DP tunneling fixes
- Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate
- Add debugfs interfaces for forcing scheduling to specific engine instances
- GG 9.5 updates
- IH 4.4 updates
- Make missing optional firmware less noisy
- PSP 13.x updates
- SMU 13.x updates
- VCN 5.x updates
- JPEG 5.x updates
- Misc cleanups
- GC 12.x updates
- DRM panic support
- DC FAMS updates
- DSC fixes
- job handling fixes
amdkfd:
- GG 9.5 updates
- Logging improvements
- Misc cleanups
- Various Optimizations
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241218201758.2580723-1-alexander.deucher@amd.com
|
|
Before, every time fdinfo is queried we try to lock all the BOs in the
VM and calculate memory usage from scratch. This works okay if the
fdinfo is rarely read and the VMs don't have a ton of BOs. If either of
these conditions is not true, we get a massive performance hit.
In this new revision, we track the BOs as they change states. This way
when the fdinfo is queried we only need to take the status lock and copy
out the usage stats with minimal impact to the runtime performance. With
this new approach however, we would no longer be able to track active
buffers.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-6-Yunxiang.Li@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
amdgpu_vm_bo_invalidate doesn't use the adev parameter and not all
callers have a reference to adev handy, so remove it for cleanliness.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-5-Yunxiang.Li@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
When memory stats is generated fresh everytime by going though all the
BOs, their active information is quite easy to get. But if the stats are
tracked with BO's state this becomes harder since the job scheduling
part doesn't really deal with individual buffers.
Make drm-active- optional to enable amdgpu to switch to the second
method.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-3-Yunxiang.Li@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
Backmerge linux 6.13-rc3 as amd next has some dependencies on fixes in it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Third time's the charm, I hope?
Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The static analyser tool gave the following advice:
./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup
→ 1266 tmp = kmalloc(used_size, GFP_KERNEL);
1267 if (!tmp)
1268 return -ENOMEM;
1269
→ 1270 memcpy(tmp, &host_telemetry->body.error_count, used_size);
Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics.
Original code works without fault, so this is not a bug fix but proposed improvement.
Link: https://lwn.net/Articles/198928/
Fixes: 84a2947ecc85 ("drm/amdgpu: Implement virt req_ras_err_count")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Zhigang Luo <Zhigang.Luo@amd.com>
Cc: Victor Skvortsov <victor.skvortsov@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Yunxiang Li <Yunxiang.Li@amd.com>
Cc: Jack Xiao <Jack.Xiao@amd.com>
Cc: Vignesh Chander <Vignesh.Chander@amd.com>
Cc: Danijel Slivka <danijel.slivka@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If IH primary ring and KFD ih fifo overflows, we may miss CP, SDMA
interrupts and cause application soft hang. Show warning message with
ring name if overflow happens.
Add function to get ih ring name to avoid duplicating it. To keep
warning message consistent between GPU generations, change all
*_ih.c except ASICs older than Vega which has only one ih ring.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
After GPU page fault, there are lots of page fault interrupts generated
at short period even with CAM filter enabled because the fault address
is different. Each page fault copy to KFD ih fifo to send event to user
space by KFD interrupt worker, this could cause KFD ih fifo overflow
while other processes generate events at same time.
KFD process is aborted after GPU page fault, we only need one GPU page
fault interrupt sent to KFD ih fifo to send memory exception event to
user space.
Incease KFD ih fifo size to 2 times of IH primary ring size, to handle
the burst events case.
This patch handle the gfx v9 path, cover retry on/off and CAM filter
on/off cases.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This partially reverts commit 194eb174cbe4fe2b3376ac30acca2dc8c8beca00.
This commit introduced a new state variable into adev without even
remotely worrying about CPU barriers.
Since we already have the amdgpu_in_reset() function exactly for this
use case partially revert that.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
As soon as the prepare phase is completed the VM might be released,
better set it to NULL.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The VM pointer might already be outdated when that function is called.
Use the PASID instead to gather the information instead.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable psp v14_0_3 RAS support for non-SRIOV configurations.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The sdma context empty interrupt is dropped in amdgpu_irq_dispatch
as unregistered interrupt src_id 243, this interrupt accounts to 1/3 of
total interrupts and causes IH primary ring overflow when running
stressful benchmark application. Disable this interrupt has no side
effect found.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The use of 1 << i in scheduler mask calculations can result in an
unintentional integer overflow due to the expression being
evaluated as a 32-bit signed integer.
This patch replaces 1 << i with 1ULL << i to ensure the operation
is performed as a 64-bit unsigned integer, preventing overflow
Discovered in coverity scan, CID 1636393, 1636175, 1636007, 1635853
Fixes: c5c63d9cb5d3 ("drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs")
Signed-off-by: Karol Przybylski <karprzy7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the helper function rather than reading it directly.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Clean up code to quiet the compiler on us failing to check the return
code.
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Driver has different ways to fetch VBIOS. If one of the methods doesn't
find an authentic one, it will show misleading info messages eventhough
a subsequent method finds a valid VBIOS. Keep the message level at debug
and add device context.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This init is useless because base.sched will be cleared to 0 in drm_sched_job_init
because of commit 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()").
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It's unused.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Since 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
accessing job->base.sched can produce unexpected results as the initialisation
of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the
memset.
This commit fixes an issue when a CS would fail validation and would
be rejected after job->num_ibs is incremented. In this case,
amdgpu_ib_free(ring->adev, ...) will be called, which would crash the
machine because the ring value is bogus.
To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this
because the device is actually not used in this function.
The next commit will remove the ring argument completely.
Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove the logically dead code in the last return statement of
amdgpu_ras_eeprom_init. The condition res < 0 is redundant since
res is already checked for a negative value earlier. Replace
return res < 0 ? res : 0; with return 0 to improve clarity.
Fixes: 63d4c081a556 ("drm/amdgpu: Optimize EEPROM RAS table I/O")
Closes: https://scan7.scan.coverity.com/#/project-view/52337/11354?selectedIssue=1602413
Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support for the drm_panic module, which displays a pretty user
friendly message on the screen when a Linux kernel panic occurs.
Adapt Lu Yao's code to use common helpers derived from
Jocelyn's patch. This extends the non-DC code to enable
access to non-CPU accessible VRAM and adds support for
other DCE versions.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Lu Yao <yaolu@kylinos.cn>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Harry Wentland <harry.wentland@amd.com>
|
|
If the kernel hasn't been compiled with PCIe hotplug support this
can lead to problems with dGPUs that use BOCO because they effectively
drop off the bus.
To prevent issues, disable BOCO support when compiled without PCIe hotplug.
Reported-by: Gabriel Marcano <gabemarcano@yahoo.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241211155601.3585256-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
- The previous patch only considered the case for baremetal
and is not applicable for SRIOV code path. We also need to
init fw_share for SRIOV VF
Fixes: 928cd772e18f ("drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pull this out of Jocelyn's patch and make it generic.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Lu Yao <yaolu@kylinos.cn>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Harry Wentland <harry.wentland@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.13-2024-12-11:
amdgpu:
- ISP hw init fix
- SR-IOV fixes
- Fix contiguous VRAM mapping for UVD on older GPUs
- Fix some regressions due to drm scheduler changes
- Workload profile fixes
- Cleaner shader fix
amdkfd:
- Fix DMA map direction for migration
- Fix a potential null pointer dereference
- Cacheline size fixes
- Runtime PM fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241211215449.741848-1-alexander.deucher@amd.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
[airlied: handle module ns conflict]
drm-misc-next for 6.14:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- Remove driver date from drm_driver
Driver Changes:
- amdxdna: New driver!
- ivpu: Fix qemu crash when using passthrough
- nouveau: expose GSP-RM logging buffers via debugfs
- panfrost: Add MT8188 Mali-G57 MC3 support
- panthor: misc improvements,
- rockchip: Gamma LUT support
- tidss: Misc improvements
- virtio: convert to helpers, add prime support for scanout buffers
- v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
- vc4: Add support for BCM2712
- vkms: Improvements all across the board
- panels:
- Introduce backlight quirks infrastructure
- New panels: KDB KD116N2130B12
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241205-agile-straight-pegasus-aca7f4@houat
|
|
They should be the same, but use the proper variable.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
They should be the same, but use the proper variable.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add the calls to the vcn 5.0.1 code.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Shared with vcn 5.0.0.
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This helps to avoid a spurious PME event on hotplug to Azalia.
Cc: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Reported-and-tested-by: ionut_n2001@yahoo.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215884
Tested-by: Gabriel Marcano <gabemarcano@yahoo.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241211024414.7840-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
sdmv7 queue reset already supports by mmio, add its sys file.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reset gfx/compute queue through mmio based on me_id and queue_id.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reset sdma queue through mmio based on me_id and queue_id.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
VF device sets the RAS flag when mailbox data can't be read properly.
There is no conclusive way to tell if the real source is RAS error.
Therefore VF schedules a KFD based reset which doesn't set RAS source.
SKip checking RAS source for any VF scheduled recovery.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reported-by: Vojislav Tomasevic <vojislav.tomasevic@amd.com>
Reviewed-by: Yiqing Yao <yiqing.yao@amd.com>
Tested-by: Yiqing Yao <yiqing.yao@amd.com>
Fixes: e1ee2111ca48 ("drm/amdgpu: Prefer RAS recovery for scheduler hang")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Implement sdma queue reset callback by mes_reset_queue_mmio.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Extracts the resume sequence for per sdma instance from sdma_v7_0_gfx_resume.
This function can be used in start or restart scenarios of specific instances.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add nbif v6_3_1 fatal error handling support.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add VCN_5_0_1 IP block to kernel boot
Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add vcn support for VCN_5_0_1
v2: rebase, squash in fixes (Alex)
Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
enable JPEG5_0_1 ip block
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|