drm/nouveau: sched: avoid job races between entities

If a sched job depends on a dma-fence from a job from the same GPU scheduler instance, but a different scheduler entity, the GPU scheduler does only wait for the particular job to be scheduled, rather than for the job to fully complete. This is due to the GPU scheduler assuming that there is a scheduler instance per ring. However, the current implementation, in order to avoid arbitrary amounts of kthreads, has a single scheduler instance while scheduler entities represent rings. As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all out-fences in order to force the scheduler to wait for full job completion for dependent jobs from different entities and same scheduler instance. There is some work in progress [1] to address the issues of firmware schedulers; once it is in-tree the scheduler topology in Nouveau should be re-worked accordingly. [1] https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/ Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collaboralcom> Link: https://patchwork.freedesktop.org/patch/msgid/20230811010632.2473-1-dakr@redhat.com
author: Danilo Krummrich <dakr@redhat.com> 2023-08-11 03:06:25 +0200
committer: Danilo Krummrich <dakr@redhat.com> 2023-08-24 02:57:41 +0200
commit: 6cdcc65fdb0bc59bfca75d0b6fdc54d6ca347ddf (patch)
tree: 3bec10ac0bdf1fd7eb4e76b5ee548330d514a0da /drivers/gpu/drm/nouveau
parent: 9c319a0f6d52a8ad1364884b3baff57676f26de8 (diff)
1 files changed, 22 insertions, 0 deletions
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 3424a1bf6af3..88217185e0f3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -292,6 +292,28 @@ nouveau_job_submit(struct nouveau_job *job)
 	if (job->sync)
 		done_fence = dma_fence_get(job->done_fence);
 
+	/* If a sched job depends on a dma-fence from a job from the same GPU
+	 * scheduler instance, but a different scheduler entity, the GPU
+	 * scheduler does only wait for the particular job to be scheduled,
+	 * rather than for the job to fully complete. This is due to the GPU
+	 * scheduler assuming that there is a scheduler instance per ring.
+	 * However, the current implementation, in order to avoid arbitrary
+	 * amounts of kthreads, has a single scheduler instance while scheduler
+	 * entities represent rings.
+	 *
+	 * As a workaround, set the DRM_SCHED_FENCE_DONT_PIPELINE for all
+	 * out-fences in order to force the scheduler to wait for full job
+	 * completion for dependent jobs from different entities and same
+	 * scheduler instance.
+	 *
+	 * There is some work in progress [1] to address the issues of firmware
+	 * schedulers; once it is in-tree the scheduler topology in Nouveau
+	 * should be re-worked accordingly.
+	 *
+	 * [1] https://lore.kernel.org/dri-devel/20230801205103.627779-1-matthew.brost@intel.com/
+	 */
+	set_bit(DRM_SCHED_FENCE_DONT_PIPELINE, &job->done_fence->flags);
+
 	if (job->ops->armed_submit)
 		job->ops->armed_submit(job);
author	Danilo Krummrich <dakr@redhat.com>	2023-08-11 03:06:25 +0200
committer	Danilo Krummrich <dakr@redhat.com>	2023-08-24 02:57:41 +0200
commit	6cdcc65fdb0bc59bfca75d0b6fdc54d6ca347ddf (patch)
tree	3bec10ac0bdf1fd7eb4e76b5ee548330d514a0da /drivers/gpu/drm/nouveau
parent	9c319a0f6d52a8ad1364884b3baff57676f26de8 (diff)