diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-21 14:56:17 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-21 14:56:17 -0800 |
commit | 28eb75e178d389d325f1666e422bc13bbbb9804c (patch) | |
tree | 20417b4e798f98fc5687e80c1e0126afcf437c70 /drivers/gpu/drm/scheduler/sched_main.c | |
parent | 071b34dcf71523a559b6c39f5d21a268a9531b50 (diff) | |
parent | a163b895077861598be48c1cf7f4a88413c28b22 (diff) |
Merge tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"There's a lot of rework, the panic helper support is being added to
more drivers, v3d gets support for HW superpages, scheduler
documentation, drm client and video aperture reworks, some new
MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
has some Pantherlake enablement and xe is getting some SRIOV bits, but
just lots of stuff everywhere.
core:
- split DSC helpers from DP helpers
- clang build fixes for drm/mm test
- drop simple pipeline support for gem vram
- document submission error signaling
- move drm_rect to drm core module from kms helper
- add default client setup to most drivers
- move to video aperture helpers instead of drm ones
tests:
- new framebuffer tests
ttm:
- remove swapped and pinned BOs from TTM lru
panic:
- fix uninit spinlock
- add ABGR2101010 support
bridge:
- add TI TDP158 support
- use standard PM OPS
dma-fence:
- use read_trylock instead of read_lock to help lockdep
scheduler:
- add errno to sched start to report different errors
- add locking to drm_sched_entity_modify_sched
- improve documentation
xe:
- add drm_line_printer
- lots of refactoring
- Enable Xe2 + PES disaggregation
- add new ARL PCI ID
- SRIOV development work
- fix exec unnecessary implicit fence
- define and parse OA sync props
- forcewake refactoring
i915:
- Enable BMG/LNL ultra joiner
- Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+
- use DSB for plane/color mgmt
- Arrow lake PCI IDs
- lots of i915/xe display refactoring
- enable PXP GuC autoteardown
- Pantherlake (PTL) Xe3 LPD display enablement
- Allow fastset HDR infoframe changes
- write DP source OUI for non-eDP sinks
- share PCI IDs between i915 and xe
amdgpu:
- SDMA queue reset support
- SMU 13.0.6, JPEG 4.0.3 updates
- Initial runtime repartitioning support
- rework IP structs for multiple IP instances
- Fetch EDID from _DDC if available
- SMU13 zero rpm user control
- lots of fixes/cleanups
amdkfd:
- Increase event FIFO size
- add topology cap flag for per queue reset
msm:
- DPU:
- SA8775P support
- (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support
- Enable large framebuffer support
- Drop MSM8998 and SDM845
- DP:
- SA8775P support
- GPU:
- a7xx preemption support
- Adreno A663 support
ast:
- warn about unsupported TX chips
ivpu:
- add coredump
- add pantherlake support
rockchip:
- 4K@60Hz display enablement
- generate pll programming tables
panthor:
- add timestamp query API
- add realtime group priority
- add fdinfo support
etnaviv:
- improve handling of DMA address limits
- improve GPU hangcheck
exynos:
- Decon Exynos7870 support
mediatek:
- add OF graph support
omap:
- locking fixes
bochs:
- convert to gem/shmem from simpledrm
v3d:
- support big/super pages
- add gemfs
vc4:
- BCM2712 support refactoring
- add YUV444 format support
udmabuf:
- folio related fixes
nouveau:
- add panic support on nv50+"
* tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits)
drm/xe/guc: Fix dereference before NULL check
drm/amd: Fix initialization mistake for NBIO 7.7.0
Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
drm/amd/display: Fix failure to read vram info due to static BP_RESULT
drm/amdgpu: enable GTT fallback handling for dGPUs only
drm/amd/amdgpu: limit single process inside MES
drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X
drm/amdgpu/mes12: correct kiq unmap latency
drm/amdgpu: Support vcn and jpeg error info parsing
drm/amd : Update MES API header file for v11 & v12
drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
drm/amdkfd: change kfd process kref count at creation
drm/amdgpu: Cleanup shift coding style
drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
drm/amdgpu: Implement virt req_ras_err_count
drm/amdgpu: VF Query RAS Caps from Host if supported
drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
drm/amd/display: 3.2.309
drm/amd/display: Adjust VSDB parser for replay feature
...
Diffstat (limited to 'drivers/gpu/drm/scheduler/sched_main.c')
-rw-r--r-- | drivers/gpu/drm/scheduler/sched_main.c | 86 |
1 files changed, 61 insertions, 25 deletions
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index e97c6c60bc96..7ce25281c74c 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -41,7 +41,7 @@ * 4. Entities themselves maintain a queue of jobs that will be scheduled on * the hardware. * - * The jobs in a entity are always scheduled in the order that they were pushed. + * The jobs in an entity are always scheduled in the order in which they were pushed. * * Note that once a job was taken from the entities queue and pushed to the * hardware, i.e. the pending queue, the entity must not be referenced anymore @@ -159,35 +159,33 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq, + ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the * other to update the rb tree structure. */ - spin_lock(&entity->rq_lock); - spin_lock(&entity->rq->lock); + lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); - spin_unlock(&entity->rq_lock); } /** @@ -219,15 +217,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, void drm_sched_rq_add_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity) { + lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); + if (!list_empty(&entity->list)) return; - spin_lock(&rq->lock); - atomic_inc(rq->sched->score); list_add_tail(&entity->list, &rq->entities); - - spin_unlock(&rq->lock); } /** @@ -241,6 +238,8 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity) { + lockdep_assert_held(&entity->lock); + if (list_empty(&entity->list)) return; @@ -253,7 +252,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, rq->current_entity = NULL; if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); spin_unlock(&rq->lock); } @@ -355,7 +354,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched, return ERR_PTR(-ENOSPC); } - rq->current_entity = entity; reinit_completion(&entity->entity_idle); break; } @@ -601,6 +599,9 @@ static void drm_sched_job_timedout(struct work_struct *work) * callers responsibility to release it manually if it's not part of the * pending list any more. * + * This function is typically used for reset recovery (see the docu of + * drm_sched_backend_ops.timedout_job() for details). Do not call it for + * scheduler teardown, i.e., before calling drm_sched_fini(). */ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) { @@ -673,16 +674,20 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) */ cancel_delayed_work(&sched->work_tdr); } - EXPORT_SYMBOL(drm_sched_stop); /** * drm_sched_start - recover jobs after a reset * * @sched: scheduler instance + * @errno: error to set on the pending fences * + * This function is typically used for reset recovery (see the docu of + * drm_sched_backend_ops.timedout_job() for details). Do not call it for + * scheduler startup. The scheduler itself is fully operational after + * drm_sched_init() succeeded. */ -void drm_sched_start(struct drm_gpu_scheduler *sched) +void drm_sched_start(struct drm_gpu_scheduler *sched, int errno) { struct drm_sched_job *s_job, *tmp; @@ -697,13 +702,13 @@ void drm_sched_start(struct drm_gpu_scheduler *sched) atomic_add(s_job->credits, &sched->credit_count); if (!fence) { - drm_sched_job_done(s_job, -ECANCELED); + drm_sched_job_done(s_job, errno ?: -ECANCELED); continue; } if (dma_fence_add_callback(fence, &s_job->cb, drm_sched_job_done_cb)) - drm_sched_job_done(s_job, fence->error); + drm_sched_job_done(s_job, fence->error ?: errno); } drm_sched_start_timeout_unlocked(sched); @@ -778,6 +783,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs); * Drivers must make sure drm_sched_job_cleanup() if this function returns * successfully, even when @job is aborted before drm_sched_job_arm() is called. * + * Note that this function does not assign a valid value to each struct member + * of struct drm_sched_job. Take a look at that struct's documentation to see + * who sets which struct member with what lifetime. + * * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware * has died, which can mean that there's no valid runqueue for a @entity. * This function returns -ENOENT in this case (which probably should be -EIO as @@ -803,6 +812,14 @@ int drm_sched_job_init(struct drm_sched_job *job, return -EINVAL; } + /* + * We don't know for sure how the user has allocated. Thus, zero the + * struct so that unallowed (i.e., too early) usage of pointers that + * this function does not set is guaranteed to lead to a NULL pointer + * exception instead of UB. + */ + memset(job, 0, sizeof(*job)); + job->entity = entity; job->credits = credits; job->s_fence = drm_sched_fence_alloc(entity, owner); @@ -1333,6 +1350,19 @@ EXPORT_SYMBOL(drm_sched_init); * @sched: scheduler instance * * Tears down and cleans up the scheduler. + * + * This stops submission of new jobs to the hardware through + * drm_sched_backend_ops.run_job(). Consequently, drm_sched_backend_ops.free_job() + * will not be called for all jobs still in drm_gpu_scheduler.pending_list. + * There is no solution for this currently. Thus, it is up to the driver to make + * sure that + * a) drm_sched_fini() is only called after for all submitted jobs + * drm_sched_backend_ops.free_job() has been called or that + * b) the jobs for which drm_sched_backend_ops.free_job() has not been called + * after drm_sched_fini() ran are freed manually. + * + * FIXME: Take care of the above problem and prevent this function from leaking + * the jobs in drm_gpu_scheduler.pending_list under any circumstances. */ void drm_sched_fini(struct drm_gpu_scheduler *sched) { @@ -1348,7 +1378,7 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched) list_for_each_entry(s_entity, &rq->entities, list) /* * Prevents reinsertion and marks job_queue as idle, - * it will removed from rq in drm_sched_entity_fini + * it will be removed from the rq in drm_sched_entity_fini() * eventually */ s_entity->stopped = true; @@ -1428,8 +1458,10 @@ EXPORT_SYMBOL(drm_sched_wqueue_ready); /** * drm_sched_wqueue_stop - stop scheduler submission - * * @sched: scheduler instance + * + * Stops the scheduler from pulling new jobs from entities. It also stops + * freeing jobs automatically through drm_sched_backend_ops.free_job(). */ void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched) { @@ -1441,8 +1473,12 @@ EXPORT_SYMBOL(drm_sched_wqueue_stop); /** * drm_sched_wqueue_start - start scheduler submission - * * @sched: scheduler instance + * + * Restarts the scheduler after drm_sched_wqueue_stop() has stopped it. + * + * This function is not necessary for 'conventional' startup. The scheduler is + * fully operational after drm_sched_init() succeeded. */ void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched) { |