summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2025-05-29drm/amdgpu: handle old RAS eeprom data in non-nps1 modeganglxie
Get MCA address from PA in nps1, then convert MCA address to PA in specific nps mode. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu: Add userq fence support to SDMAv6.0Arunpravin Paneer Selvam
Add userq fence support to SDMAv6.0 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu: amdgpu_vram_mgr_new(): Clamp lpfn to total vramJohn Olender
The drm_mm allocator tolerated being passed end > mm->size, but the drm_buddy allocator does not. Restore the pre-buddy-allocator behavior of allowing such placements. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3448 Signed-off-by: John Olender <john.olender@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-05-29drm/amdgpu/vcn5.0.1: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v5.0.1 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn5: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v5.0.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn4.0.5: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.5 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn4.0.3: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.3 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn4: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn3: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v3.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn2.5: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v2.5 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn2: read back register after writtenDavid (Ming Qiang) Wu
The addition of register read-back in VCN v2.0 is intended to prevent potential race conditions. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29drm/amdgpu/vcn1: read back register after writtenDavid (Ming Qiang) Wu
V3: drop changes where readbacks have implemented. This patch set is to add readbacks only. V2: use common register UVD_STATUS for readback (standard PCI MMIO behavior, i.e. readback post all writes to let the writes hit the hardware) add readback in ..._stop() for more coverage. Similar to the changes made for VCN v4.0.5 where readback to post the writes to avoid race with the doorbell, the addition of register readback support in other VCN versions is intended to prevent potential race conditions, even though such issues have not been observed yet. This change ensures consistency across different VCN variants and helps avoid similar issues. The overhead introduced is negligible. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-28drm/amd/amdgpu: Add GPIO resources required for amdispPratap Nirujogi
ISP is a child device to GFX, and its device specific information is not available in ACPI. Adding the 2 GPIO resources required for ISP_v4_1_1 in amdgpu_isp driver. - GPIO 0 to allow sensor driver to enable and disable sensor module. - GPIO 85 to allow ISP driver to enable and disable ISP RGB streaming mode. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-28Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "As part of building up nova-core/nova-drm pieces we've brought in some rust abstractions through this tree, aux bus being the main one, with devres changes also in the driver-core tree. Along with the drm core abstractions and enough nova-core/nova-drm to use them. This is still all stub work under construction, to build the nova driver upstream. The other big NVIDIA related one is nouveau adds support for Hopper/Blackwell GPUs, this required a new GSP firmware update to 570.144, and a bunch of rework in order to support multiple fw interfaces. There is also the introduction of an asahi uapi header file as a precursor to getting the real driver in later, but to unblock userspace mesa packages while the driver is trapped behind rust enablement. Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe, and msm being the main ones, and some changes to vsprintf. new drivers: - bring in the asahi uapi header standalone - nova-drm: stub driver rust dependencies (for nova-core): - auxiliary - bus abstractions - driver registration - sample driver - devres changes from driver-core - revocable changes core: - add Apple fourcc modifiers - add virtio capset definitions - extend EXPORT_SYNC_FILE for timeline syncobjs - convert to devm_platform_ioremap_resource - refactor shmem helper page pinning - DP powerup/down link helpers - extended %p4cc in vsprintf.c to support fourcc prints - change vsprintf %p4cn to %p4chR, remove %p4cn - Add drm_file_err function - IN_FORMATS_ASYNC property - move sitronix from tiny to their own subdir rust: - add drm core infrastructure rust abstractions (device/driver, ioctl, file, gem) dma-buf: - adjust sg handling to not cache map on attach - allow setting dma-device for import - Add a helper to sort and deduplicate dma_fence arrays docs: - updated drm scheduler docs - fbdev todo update - fb rendering - actual brightness ttm: - fix delayed destroy resv object bridge: - add kunit tests - convert tc358775 to atomic - convert drivers to devm_drm_bridge_alloc - convert rk3066_hdmi to bridge driver scheduler: - add kunit tests panel: - refcount panels to improve lifetime handling - Powertip PH128800T004-ZZA01 - NLT NL13676BC25-03F, Tianma TM070JDHG34-00 - Himax HX8279/HX8279-D DDIC - Visionox G2647FB105 - Sitronix ST7571 - ZOTAC rotation quirk vkms: - allow attaching more displays i915: - xe3lpd display updates - vrr refactor - intel_display struct conversions - xe2hpd memory type identification - add link rate/count to i915_display_info - cleanup VGA plane handling - refactor HDCP GSC - fix SLPC wait boosting reference counting - add 20ms delay to engine reset - fix fence release on early probe errors xe: - SRIOV updates - BMG PCI ID update - support separate firmware for each GT - SVM fix, prelim SVM multi-device work - export fan speed - temp disable d3cold on BMG - backup VRAM in PM notifier instead of suspend/freeze - update xe_ttm_access_memory to use GPU for non-visible access - fix guc_info debugfs for VFs - use copy_from_user instead of __copy_from_user - append PCIe gen5 limitations to xe_firmware document amdgpu: - DSC cleanup - DC Scaling updates - Fused I2C-over-AUX updates - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - SR-IOV updates - RAS updates - PSP 12 cleanups - GC 9.5 updates - SMU 13.x updates - VCN / JPEG SR-IOV updates amdkfd: - Update error messages for SDMA - Userptr updates - XNACK fixes radeon: - CIK doorbell cleanup nouveau: - add support for NVIDIA r570 GSP firmware - enable Hopper/Blackwell support nova-core: - fix task list - register definition infrastructure - move firmware into own rust module - register auxiliary device for nova-drm nova-drm: - initial driver skeleton msm: - GPU: - ACD (adaptive clock distribution) for X1-85 - drop fictional address_space_size - improve GMU HFI response time out robustness - fix crash when throttling during boot - DPU: - use single CTL path for flushing on DPU 5.x+ - improve SSPP allocation code for better sharing - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550 - Added SAR2130P support - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660 - DP: - switch to new audio helpers - better LTTPR handling - DSI: - Added support for SA8775P - Added SAR2130P support - HDMI: - Switched to use new helpers for ACR data - Fixed old standing issue of HPD not working in some cases amdxdna: - add dma-buf support - allow empty command submits renesas: - add dma-buf support - add zpos, alpha, blend support panthor: - fail properly for NO_MMAP bos - add SET_LABEL ioctl - debugfs BO dumping support imagination: - update DT bindings - support TI AM68 GPU hibmc: - improve interrupt handling and HPD support virtio: - add panic handler support rockchip: - add RK3588 support - add DP AUX bus panel support ivpu: - add heartbeat based hangcheck mediatek: - prepares support for MT8195/99 HDMIv2/DDCv2 anx7625: - improve HPD tegra: - speed up firmware loading * tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits) drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr() drm/xe: Default auto_link_downgrade status to false drm/xe/guc: Make creation of SLPC debugfs files conditional drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue() drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read drm/i915/ptl: Use everywhere the correct DDI port clock select mask drm/nouveau/kms: add support for GB20x drm/dp: add option to disable zero sized address only transactions. drm/nouveau: add support for GB20x drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle drm/nouveau: add support for GB10x drm/nouveau/gf100-: track chan progress with non-WFI semaphore release drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos drm/nouveau: add support for GH100 drm/nouveau: improve handling of 64-bit BARs drm/nouveau/gv100-: switch to volta semaphore methods drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM ...
2025-05-28drm/amdgpu: update trace format to match gpu_scheduler_tracePierre-Eric Pelloux-Prayer
Log fences using the same format for coherency. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250526125505.2360-11-pierre-eric.pelloux-prayer@amd.com
2025-05-28drm: Get rid of drm_sched_job.idPierre-Eric Pelloux-Prayer
Its only purpose was for trace events, but jobs can already be uniquely identified using their fence. The downside of using the fence is that it's only available after 'drm_sched_job_arm' was called which is true for all trace events that used job.id so they can safely switch to using it. Suggested-by: Tvrtko Ursulin <tursulin@igalia.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250526125505.2360-9-pierre-eric.pelloux-prayer@amd.com
2025-05-28drm/sched: Store the drm client_id in drm_sched_fencePierre-Eric Pelloux-Prayer
This will be used in a later commit to trace the drm client_id in some of the gpu_scheduler trace events. This requires changing all the users of drm_sched_job_init to add an extra parameter. The newly added drm_client_id field in the drm_sched_fence is a bit of a duplicate of the owner one. One suggestion I received was to merge those 2 fields - this can't be done right now as amdgpu uses some special values (AMDGPU_FENCE_OWNER_*) that can't really be translated into a client id. Christian is working on getting rid of those; when it's done we should be able to squash owner/drm_client_id together. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250526125505.2360-3-pierre-eric.pelloux-prayer@amd.com
2025-05-27Merge tag 'irq-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_*() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_*() ...
2025-05-22drm/amdgpu: seq64 memory unmap uses uninterruptible lockPhilip Yang
To unmap and free seq64 memory when drm node close to free vm, if there is signal accepted, then taking vm lock failed and leaking seq64 va mapping, and then dmesg has error log "still active bo inside vm". Change to use uninterruptible lock fix the mapping leaking and no dmesg error log. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: update ras support checkMangesh Gadre
update ras support check for vcn 5.0.1 Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for jpeg 5.0.1Mangesh Gadre
Enable jpeg ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Enable RAS for vcn 5.0.1Mangesh Gadre
Enable vcn ras posion processing and aca error logging Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Remove duplicated "context still alive" checkTvrtko Ursulin
When amdgpu_ctx_mgr_fini() calls amdgpu_ctx_mgr_entity_fini() it contains the exact same "context still alive" check as it will do next. Remove the duplicated copy. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Make amdgpu_ctx_mgr_entity_fini staticTvrtko Ursulin
Function amdgpu_ctx_mgr_entity_fini() only has a single local caller so lets make it local. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Update runtime pm checksAlex Deucher
Don't enable BACO when in passthrough. PCI resets don't work correctly when in BACO. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amd/pm: Use external link order for xgmi dataLijo Lazar
xgmi_port_num interface reports external link number for port number. To be consistent, use the external link number for reporting other XGMI link data also. v2: For invalid link number return -EINVAL (Kevin) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Add sysfs nodes for partitionLijo Lazar
Add sysfs nodes to provide compute paritition specific data. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram jpeg poison irqStanley.Yang
Register aqua vanjaram jpeg poison irq, add jpeg poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Register aqua vanjaram vcn poison irqStanley.Yang
Register aqua vanjaram vcn poison irq, add vcn poison handle. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Fix eviction fence worker race during fd closeJesse.Zhang
The current cleanup order during file descriptor close can lead to a race condition where the eviction fence worker attempts to access a destroyed mutex from the user queue manager: [ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564 [ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] The issue occurs because: 1. We destroy the user queue manager (including its mutex) first 2. Then try to destroy eviction fences which may have pending work 3. The eviction fence worker may try to access the already-destroyed mutex Fix this by reordering the cleanup to: 1. First mark the fd as closing and destroy eviction fences, which flushes any pending work 2. Then safely destroy the user queue manager after we're certain no more fence work will be executed The copy in amdgpu_driver_postclose_kms() needs to be removed (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: lock the eviction fence for wq signals itPrike Liang
Lock and refer to the eviction fence before the eviction fence schedules work queue tries to signal it. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16gpu: Switch to irq_domain_create_linear()Jiri Slaby (SUSE)
irq_domain_add_linear() is going away as being obsolete now. Switch to the preferred irq_domain_create_linear(). That differs in the first parameter: It takes more generic struct fwnode_handle instead of struct device_node. Therefore, of_fwnode_handle() is added around the parameter. Note some of the users can likely use dev->fwnode directly instead of indirect of_fwnode_handle(dev->of_node). But dev->fwnode is not guaranteed to be set for all, so this has to be investigated on case to case basis (by people who can actually test with the HW). [ tglx: Fix up subject prefix ] Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-19-jirislaby@kernel.org
2025-05-16drm/amdgpu/jpeg: sriov support for jpeg_v5_0_1fanhuang
initialization table handshake with mmsch Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu/vcn: sriov support for vcn_v5_0_1fanhuang
initialization table handshake with mmsch Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu: fix use-after-unlock in eviction fence destroyArvind Yadav
The eviction fence destroy path incorrectly calls dma_fence_put() on evf_mgr->ev_fence after releasing the ev_fence_lock. This introduces a potential use-after-unlock or race because another thread concurrently modifies evf_mgr->ev_fence. Fix this by grabbing a local reference to evf_mgr->ev_fence under the lock and using that for dma_fence_put() after waiting. Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu: Allow NPS2-CPX combination for VFsLijo Lazar
CPX partition mode is compatible with NPS2 on aquavanjaram VFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu/mmsch: Add MMSCH v5_0 support for sriovfanhuang
These structures are basically ported from MMSCH v4_0 The structures are the same as v4_0 except for the init header Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu: Use compatible NPS mode infoLijo Lazar
Compatible NPS modes for a partition mode are exposed through xcp_config interface. To determine if a compute partition mode is valid, check if the current NPS mode is part of compatible NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdgpu: Add pldm version reportingAsad Kamal
Add pldm version reporting through sysfs node Signed-off-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-16drm/amdkfd: Support chain runlists of XNACK+/XNACK-Amber Lin
If the MEC firmware supports chaining runlists of XNACK+/XNACK- processes, set SQ_CONFIG1 chicken bit and SET_RESOURCES bit 28. When the MEC/HWS supports it, KFD checks the XNACK+/XNACK- processes mix happens or not. If it does, enter over-subscription. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14drm/amdgpu: read back register after written for VCN v4.0.5David (Ming Qiang) Wu
On VCN v4.0.5 there is a race condition where the WPTR is not updated after starting from idle when doorbell is used. Adding register read-back after written at function end is to ensure all register writes are done before they can be used. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 07c9db090b86e5211188e1b351303fbc673378cf) Cc: stable@vger.kernel.org
2025-05-14drm/amdgpu: add debugfs for spirom IFWI dumpShiwu Zhang
Expose the debugfs file node for user space to dump the IFWI image on spirom. For one transaction between PSP and host, it will read out the images on both active and inactive partitions so a buffer with two times the size of maximum IFWI image (currently 16MByte) is needed. v2: move the vbios gfl macros to the common header and rename the bo triplet struct to spirom_bo for this specific usage (Hawking) v3: return directly the result of last command execution (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14drm/amdgpu: fix userq resource double freedPrike Liang
As the userq resource was already freed at the drm_release early phase, it should avoid freeing userq resource again at the later kms postclose callback. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14drm/amdgpu: Fix circular locking in userq creationJesse.Zhang
A circular locking dependency was detected between the global `adev->userq_mutex` and per-file `userq_mgr->userq_mutex` when creating user queues. The issue occurs because: 1. `amdgpu_userq_suspend()` and `amdgpu_userq_resume` take `adev->userq_mutex` first, then `userq_mgr->userq_mutex` 2. While `amdgpu_userq_create()` takes them in reverse order This patch resolves the issue by: 1. Moving the `adev->userq_mutex` lock earlier in `amdgpu_userq_create()` to cover the `amdgpu_userq_ensure_ev_fence()` call 2. Releasing it after we're done with both queue creation and the scheduling halt check v2: remove unused adev->userq_mutex lock (Prike) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-14drm/amdgpu: read back register after written for VCN v4.0.5David (Ming Qiang) Wu
On VCN v4.0.5 there is a race condition where the WPTR is not updated after starting from idle when doorbell is used. Adding register read-back after written at function end is to ensure all register writes are done before they can be used. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13drm/amdgpu: fix incorrect MALL size for GFX1151Tim Huang
On GFX1151, the reported MALL cache size reflects only half of its actual size; this adjustment corrects the discrepancy. Signed-off-by: Tim Huang <tim.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0a5c060b593ad152318f89e5564bfdfcff8a6ac0) Cc: stable@vger.kernel.org
2025-05-13drm/amdgpu: csa unmap use uninterruptible lockPhilip Yang
After process exit to unmap csa and free GPU vm, if signal is accepted and then waiting to take vm lock is interrupted and return, it causes memory leaking and below warning backtrace. Change to use uninterruptible wait lock fix the issue. WARNING: CPU: 69 PID: 167800 at amd/amdgpu/amdgpu_kms.c:1525 amdgpu_driver_postclose_kms+0x294/0x2a0 [amdgpu] Call Trace: <TASK> drm_file_free.part.0+0x1da/0x230 [drm] drm_close_helper.isra.0+0x65/0x70 [drm] drm_release+0x6a/0x120 [drm] amdgpu_drm_release+0x51/0x60 [amdgpu] __fput+0x9f/0x280 ____fput+0xe/0x20 task_work_run+0x67/0xa0 do_exit+0x217/0x3c0 do_group_exit+0x3b/0xb0 get_signal+0x14a/0x8d0 arch_do_signal_or_restart+0xde/0x100 exit_to_user_mode_loop+0xc1/0x1a0 exit_to_user_mode_prepare+0xf4/0x100 syscall_exit_to_user_mode+0x17/0x40 do_syscall_64+0x69/0xc0 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7dbbfb3c171a6f63b01165958629c9c26abf38ab) Cc: stable@vger.kernel.org
2025-05-13drm/amdgpu/userq: Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock)Arunpravin Paneer Selvam
Fix DEBUG_LOCKS_WARN_ON(lock->magic != lock) warning logs. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13drm/amdgpu: Fix userq ttm_bo_pin and ttm_bo_unpin lockdep warningsArunpravin Paneer Selvam
The ttm_bo_pin and ttm_bo_unpin warnings are resolved by moving the doorbell bo reserve up before pin/unpin. WARNING: CPU: 11 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:592 ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000277] CPU: 11 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000005] RSP: 0018:ffff88846ca879d0 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000004] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000005] RBP: ffff88846ca879e8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810b7ca848 [ +0.000004] R13: ffff88846c666250 R14: 1ffff1108d950f44 R15: ffff88846ca87aa0 [ +0.000005] FS: 00007c45ff436d00(0000) GS:ffff888409580000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000005] CR2: 00005b0c142a60e0 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000004] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000007] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000031] ? report_bug+0x282/0x2f0 [ +0.000012] ? handle_bug+0x6e/0xc0 [ +0.000007] ? exc_invalid_op+0x18/0x50 [ +0.000007] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000014] amdgpu_bo_pin+0x365/0x9d0 [amdgpu] [ +0.000191] ? __pfx_amdgpu_bo_pin+0x10/0x10 [amdgpu] [ +0.000185] ? drm_gem_object_lookup+0x81/0xc0 [ +0.000008] ? kasan_save_alloc_info+0x37/0x60 [ +0.000007] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000013] amdgpu_userqueue_get_doorbell_index+0xee/0x5f0 [amdgpu] [ +0.000209] amdgpu_userq_ioctl+0x6b4/0xd40 [amdgpu] [ +0.000193] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000190] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000011] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000194] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000021] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000185] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit+0x77/0xb0 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000009] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7167810 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000004] RDX: 00007ffff7167870 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff7167860 R08: ffff800100000000 R09: 0000000000010000 [ +0.000005] R10: 00007ffff7167950 R11: 0000000000000246 R12: 00005b0c2a51bc48 [ +0.000004] R13: 000000000000000b R14: 0000000000000000 R15: 00007ffff7167950 [ +0.000022] </TASK> [ +0.000004] irq event stamp: 80693 [ +0.000004] hardirqs last enabled at (80699): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (80704): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (80390): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (80385): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] ---[ end trace 0000000000000000 ]--- ------------------------------------------------------------------------------------------------------ [ +0.000006] WARNING: CPU: 10 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:611 ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000280] CPU: 10 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000005] RSP: 0018:ffff88846ca87888 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000004] RBP: ffff88846ca878a0 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164e90050 [ +0.000005] R13: ffff88846c666200 R14: 0000000000000001 R15: ffff888168402d28 [ +0.000004] FS: 00007c45ff436d00(0000) GS:ffff888409500000(0000) knlGS:0000000000000000 [ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000004] CR2: 00007c45f7373b20 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000005] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000008] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000012] ? report_bug+0x282/0x2f0 [ +0.000013] ? handle_bug+0x6e/0xc0 [ +0.000006] ? exc_invalid_op+0x18/0x50 [ +0.000008] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000011] ? ttm_bo_unpin+0x217/0x2c0 [ttm] [ +0.000011] amdgpu_bo_unpin+0x45/0x250 [amdgpu] [ +0.000216] amdgpu_userq_ioctl+0x2c3/0xd40 [amdgpu] [ +0.000226] ? drm_dev_exit+0x2d/0x60 [ +0.000010] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000201] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000188] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000006] ? lock_acquire+0x7c/0xc0 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000006] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000020] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000186] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? __pfx___rseq_handle_notify_resume+0x10/0x10 [ +0.000005] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 [ +0.000013] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? syscall_exit_to_user_mode+0x95/0x260 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000011] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit_to_user_mode+0x8b/0x260 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? irqentry_exit+0x77/0xb0 [ +0.000004] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000010] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7168790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000005] RDX: 00007ffff71687f0 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff71687e0 R08: 00005b0c2a49b010 R09: 0000000000000007 [ +0.000004] R10: 00005b0c2a4d7140 R11: 0000000000000246 R12: 000000000000000b [ +0.000004] R13: 00007c45ff19e5cc R14: 00005b0c2a51c538 R15: 00005b0c2a51bbd8 [ +0.000022] </TASK> [ +0.000005] irq event stamp: 87419 [ +0.000004] hardirqs last enabled at (87425): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (87430): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (87058): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] softirqs last disabled at (87053): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] ---[ end trace 0000000000000000 ]--- Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13drm/amdgpu/userq: Fix lock contention in userq fenceArunpravin Paneer Selvam
Fix lockdep warnings. [ +0.000637] ================================ [ +0.000004] WARNING: inconsistent lock state [ +0.000004] 6.12.0+ #18 Tainted: G W OE [ +0.000004] -------------------------------- [ +0.000004] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ +0.000004] Xwayland/1952 [HC0[0]:SC0[0]:HE1:SE1] takes: [ +0.000005] ffff8884636f4740 (&fence_drv->fence_list_lock){?...}-{2:2}, at: amdgpu_userq_fence_driver_destroy+0xb8/0x540 [amdgpu] [ +0.000208] {IN-HARDIRQ-W} state was registered at: [ +0.000004] lock_acquire.part.0+0x116/0x360 [ +0.000005] lock_acquire+0x7c/0xc0 [ +0.000005] _raw_spin_lock+0x2f/0x60 [ +0.000005] amdgpu_userq_fence_driver_process+0x75/0x400 [amdgpu] [ +0.000185] gfx_v12_0_eop_irq+0x29f/0x420 [amdgpu] [ +0.000210] amdgpu_irq_dispatch+0x2a4/0x7b0 [amdgpu] [ +0.000191] amdgpu_ih_process+0x1e1/0x3d0 [amdgpu] [ +0.000185] amdgpu_irq_handler+0x28/0xc0 [amdgpu] [ +0.000183] __handle_irq_event_percpu+0x1bb/0x590 [ +0.000005] handle_irq_event+0xab/0x1d0 [ +0.000005] handle_edge_irq+0x1fd/0xc10 [ +0.000005] __common_interrupt+0x83/0x190 [ +0.000004] common_interrupt+0xb1/0xe0 [ +0.000005] asm_common_interrupt+0x27/0x40 [ +0.000004] cpuidle_enter_state+0x2ba/0x530 [ +0.000005] cpuidle_enter+0x4f/0xb0 [ +0.000006] call_cpuidle+0x46/0xd0 [ +0.000005] do_idle+0x367/0x430 [ +0.000004] cpu_startup_entry+0x58/0x70 [ +0.000005] start_secondary+0x224/0x2b0 [ +0.000005] common_startup_64+0x13e/0x141 [ +0.000005] irq event stamp: 88271 [ +0.000004] hardirqs last enabled at (88271): [<ffffffffad9ca7a1>] _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000005] hardirqs last disabled at (88270): [<ffffffffad9ca424>] _raw_spin_lock_irqsave+0x74/0x80 [ +0.000005] softirqs last enabled at (87858): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (87849): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] other info that might help us debug this: [ +0.000004] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000004] ---- [ +0.000003] lock(&fence_drv->fence_list_lock); v2: Drop fence_list_flags and use xa_lock_irqsave() flags parameter (Christian) Fix merge conflicts. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>