Age | Commit message (Collapse) | Author |
|
s/userqueue/userq/
1. remove the mix of amdgpu_userqueue and amdgpu_userq
2. to be consistent with other amdgpu_userq_fence.c
3. it's shorter
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add queue id support to the user queue wait IOCTL
drm_amdgpu_userq_wait structure.
This is required to retrieve the wait user queue and maintain
the fence driver references in it so that the user queue in
the same context releases their reference to the fence drivers
at some point before queue destruction.
Otherwise, we would gather those references until we
don't have any more space left and crash.
v2: Modify the UAPI comment as per the mesa and libdrm UAPI comment.
Libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/408
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34493
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move some userq fence handling code into amdgpu_userq_fence.c.
This matches the other code in that file.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
A deadlock situation has arised between the userq
signal ioctl and the eviction fence. In this scenario,
the function amdgpu_userq_signal_ioctl() has acquired a reservation
lock on the read/write buffer object (BO) through drm_exec.
Subsequently, it calls amdgpu_userqueue_ensure_ev_fence(),
which is in a waiting for the userq resume work.
Meanwhile, the userq suspend worker has initiated the userq resume
work(amdgpu_userqueue_resume_worker). This userq resume work attempts
to validate the vm->done BO, leading to amdgpu_userqueue_validate_bos
also attempting to reservation lock the same write BO that is already
locked by amdgpu_userq_signal_ioctl.
As a result, the resume work becomes stalled, causing
amdgpu_userqueue_ensure_ev_fence to remain in a waiting state.
Call Trace:
[ 242.836469] INFO: task gnome-shel:cs0:1288 blocked for more than 120 seconds.
[ 242.836486] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4
[ 242.836491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.836494] task:gnome-shel:cs0 state:D stack:0 pid:1288 tgid:1282 ppid:1180 flags:0x00000002
[ 242.836503] Call Trace:
[ 242.836508] <TASK>
[ 242.836517] __schedule+0x3e0/0xb10
[ 242.836530] ? srso_return_thunk+0x5/0x5f
[ 242.836541] schedule+0x31/0x120
[ 242.836546] schedule_timeout+0x150/0x160
[ 242.836551] ? srso_return_thunk+0x5/0x5f
[ 242.836555] ? sysvec_call_function+0x69/0xd0
[ 242.836562] ? srso_return_thunk+0x5/0x5f
[ 242.836567] ? preempt_count_add+0x7f/0xd0
[ 242.836577] __wait_for_common+0x91/0x180
[ 242.836582] ? __pfx_schedule_timeout+0x10/0x10
[ 242.836590] wait_for_completion+0x28/0x30
[ 242.836595] __flush_work+0x16c/0x290
[ 242.836602] ? __pfx_wq_barrier_func+0x10/0x10
[ 242.836611] flush_delayed_work+0x3a/0x60
[ 242.836621] amdgpu_userqueue_ensure_ev_fence+0x2d/0xb0 [amdgpu]
[ 242.836966] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu]
[ 242.837171] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.837365] drm_ioctl_kernel+0xae/0x100 [drm]
[ 242.837398] drm_ioctl+0x2a1/0x500 [drm]
[ 242.837420] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.837622] ? srso_return_thunk+0x5/0x5f
[ 242.837627] ? srso_return_thunk+0x5/0x5f
[ 242.837630] ? _raw_spin_unlock_irqrestore+0x2b/0x50
[ 242.837635] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 242.837811] __x64_sys_ioctl+0x99/0xd0
[ 242.837820] x64_sys_call+0x1209/0x20d0
[ 242.837825] do_syscall_64+0x51/0x120
[ 242.837830] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 242.837835] RIP: 0033:0x7f2f33f1a94f
[ 242.837838] RSP: 002b:00007f2f24ffea30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 242.837842] RAX: ffffffffffffffda RBX: 00007f2f24ffebd0 RCX: 00007f2f33f1a94f
[ 242.837845] RDX: 00007f2f24ffebd0 RSI: 00000000c0306457 RDI: 000000000000000d
[ 242.837847] RBP: 00007f2f24ffeab0 R08: 0000000000000000 R09: 0000000000000000
[ 242.837849] R10: 00007f2f24ffecd0 R11: 0000000000000246 R12: 00007f2f25000640
[ 242.837851] R13: 00000000c0306457 R14: 000000000000000d R15: 00007fff3b39c1e0
[ 242.837858] </TASK>
[ 242.837865] INFO: task Xwayland:cs0:1517 blocked for more than 120 seconds.
[ 242.837869] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4
[ 242.837872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.837874] task:Xwayland:cs0 state:D stack:0 pid:1517 tgid:1338 ppid:1282 flags:0x00004002
[ 242.837878] Call Trace:
[ 242.837880] <TASK>
[ 242.837883] __schedule+0x3e0/0xb10
[ 242.837890] schedule+0x31/0x120
[ 242.837894] schedule_preempt_disabled+0x1c/0x30
[ 242.837897] __mutex_lock.constprop.0+0x386/0x6e0
[ 242.837902] ? srso_return_thunk+0x5/0x5f
[ 242.837905] ? __timer_delete_sync+0x81/0xe0
[ 242.837911] __mutex_lock_slowpath+0x13/0x20
[ 242.837915] mutex_lock+0x3b/0x50
[ 242.837919] amdgpu_userqueue_ensure_ev_fence+0x35/0xb0 [amdgpu]
[ 242.838138] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu]
[ 242.838340] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.838531] drm_ioctl_kernel+0xae/0x100 [drm]
[ 242.838559] drm_ioctl+0x2a1/0x500 [drm]
[ 242.838580] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.838778] ? srso_return_thunk+0x5/0x5f
[ 242.838783] ? srso_return_thunk+0x5/0x5f
[ 242.838786] ? _raw_spin_unlock_irqrestore+0x2b/0x50
[ 242.838791] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 242.838967] __x64_sys_ioctl+0x99/0xd0
[ 242.838972] x64_sys_call+0x1209/0x20d0
[ 242.838975] do_syscall_64+0x51/0x120
[ 242.838979] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 242.838982] RIP: 0033:0x7f9118b1a94f
[ 242.838985] RSP: 002b:00007f910cdff760 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 242.838989] RAX: ffffffffffffffda RBX: 00007f910cdff910 RCX: 00007f9118b1a94f
[ 242.838991] RDX: 00007f910cdff910 RSI: 00000000c0306457 RDI: 000000000000000c
[ 242.838993] RBP: 00007f910cdff7e0 R08: 0000000000000000 R09: 0000000000000001
[ 242.838995] R10: 00007f910cdff9d4 R11: 0000000000000246 R12: 00007f910ce00640
[ 242.838997] R13: 00000000c0306457 R14: 000000000000000c R15: 00007fff9dd11d10
[ 242.839004] </TASK>
v2: Addressed review comemnts from Christian.
v3/v4: Addressed review comemnts from Christian.
- Move drm_exec drm_exec loop after userq fence create.
- cleanup the newly created userq fence in case of error.
v5 - Addressed review comemnts from Christian.
- Create a new amdgpu_userq_fence_alloc() function for allocation.
- Calling dma_fence_put for cleanup procedure.
- make amdgpu_userq_fence_create() function static.
- drm_exec_init is called after mutex_unlock.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add gpu address support to seq64 alloc function.
v1:(Christian)
- Add the user of this new interface change to the same
patch.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Few optimization and fixes for userq fence driver.
v1:(Christian):
- Remove unnecessary comments.
- In drm_exec_init call give num_bo_handles as last parameter it would
making allocation of the array more efficient
- Handle return value of __xa_store() and improve the error handling of
amdgpu_userq_fence_driver_alloc().
v2:(Christian):
- Revert userq_xa xarray init to XA_FLAGS_LOCK_IRQ.
- move the xa_unlock before the error check of the call xa_err(__xa_store())
and moved this change to a separate patch as this is adding a missing error
handling.
- Removed the unnecessary comments.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch introduces new IOCTL for userqueue secure semaphore.
The signal IOCTL called from userspace application creates a drm
syncobj and array of bo GEM handles and passed in as parameter to
the driver to install the fence into it.
The wait IOCTL gets an array of drm syncobjs, finds the fences
attached to the drm syncobjs and obtain the array of
memory_address/fence_value combintion which are returned to
userspace.
v2: (Christian)
- Install fence into GEM BO object.
- Lock all BO's using the dma resv subsystem
- Reorder the sequence in signal IOCTL function.
- Get write pointer from the shadow wptr
- use userq_fence to fetch the va/value in wait IOCTL.
v3: (Christian)
- Use drm_exec helper for the proper BO drm reserve and avoid BO
lock/unlock issues.
- fence/fence driver reference count logic for signal/wait IOCTLs.
v4: (Christian)
- Fixed the drm_exec calling sequence
- use dma_resv_for_each_fence_unlock if BO's are not locked
- Modified the fence_info array storing logic.
v5: (Christian)
- Keep fence_drv until wait queue execution.
- Add dma_fence_wait for other fences.
- Lock BO's using drm_exec as the number of fences in them could
change.
- Install signaled fences as well into BO/Syncobj.
- Move Syncobj fence installation code after the drm_exec_prepare_array.
- Directly add dma_resv_usage_rw(args->bo_flags....
- remove unnecessary dma_fence_put.
v6: (Christian)
- Add xarray stuff to store the fence_drv
- Implement a function to iterate over the xarray and drop
the fence_drv references.
- Add drm_exec_until_all_locked() wrapper
- Add a check that if we haven't exceeded the user allocated num_fences
before adding dma_fence to the fences array.
v7: (Christian)
- Use memdup_user() for kmalloc_array + copy_from_user
- Move the fence_drv references from the xarray into the newly created fence
and drop the fence_drv references when we signal this fence.
- Move this locking of BOs before the "if (!wait_info->num_fences)",
this way you need this code block only once.
- Merge the error handling code and the cleanup + return 0 code.
- Initializing the xa should probably be done in the userq code.
- Remove the userq back pointer stored in fence_drv.
- Pass xarray as parameter in amdgpu_userq_walk_and_drop_fence_drv()
v8: (Christian)
- Move fence_drv references must come before adding the fence to the list.
- Use xa_lock_irqsave_nested for nested spinlock operations.
- userq_mgr should be per fpriv and not one per device.
- Restructure the interrupt process code for the early exit of the loop.
- The reference acquired in the syncobj fence replace code needs to be
kept around.
- Modify the dma_fence acquire placement in wait IOCTL.
- Move USERQ_BO_WRITE flag to UAPI header file.
- drop the fence drv reference after telling the hw to stop accessing it.
- Add multi sync object support to userq signal IOCTL.
V9: (Christian)
- Store all the fence_drv ref to other drivers and not ourself.
- Remove the userq fence xa implementation and replace with
kvmalloc_array.
v10: (Christian)
- Add a comment for the userq_xa xarray
- drop the if check of userq_fence->fence_drv_array
- use the i variable to initialize userq_fence->fence_drv_array_count
- drop the fence reference before you free the array in the error handling,
otherwise it could be that some references leaked.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Developed a userqueue fence driver for the userqueue process shared
BO synchronization.
Create a dma fence having write pointer as the seqno and allocate a
seq64 memory for each user queue process and feed this memory address
into the firmware/hardware, thus the firmware writes the read pointer
into the given address when the process completes it execution.
Compare wptr and rptr, if rptr >= wptr, signal the fences for the waiting
process to consume the buffers.
v2: Worked on review comments from Christian for the following
modifications
- Add wptr as sequence number into the fence
- Add a reference count for the fence driver
- Add dma_fence_put below the list_del as it might
frees the userq fence.
- Trim unnecessary code in interrupt handler.
- Check dma fence signaled state in dma fence creation
function for a potential problem of hardware completing
the job processing beforehand.
- Add necessary locks.
- Create a list and process all the unsignaled fences.
- clean up fences in destroy function.
- implement .signaled callback function
v3: Worked on review comments from Christian
- Modify naming convention for reference counted objects
- Fix fence driver reference drop issue
- Drop amdgpu_userq_fence_driver_process() function return value
v4: Worked on review comments from Christian
- Moved fence driver allocation into amdgpu_userq_fence_driver_alloc()
- Added detail doc mentioning the differences b/w
two spinlocks declared.
v5: Worked on review comments from Christian
- Check before upcast and remove local variable
- Add error handling in fence_drv alloc function.
- Move rptr read fn outside of the loop and remove WARN_ON in
destroy function.
v6:
- clear the seq64 memory in user fence driver(Christian)
- fix for the wptr va bo mapping(Christian)
- move the fence_drv xa entry erase code from the interrupt handler
into user fence destroy function
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|