drm/i915/guc: Capture error state on context reset

We receive notification of an engine reset from GuC at its completion. Meaning GuC has potentially cleared any HW state we may have been interested in capturing. GuC resumes scheduling on the engine post-reset, as the resets are meant to be transparent, further muddling our error state. There is ongoing work to define an API for a GuC debug state dump. The suggestion for now is to manually disable FW initiated resets in cases where debug state is needed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-19-matthew.brost@intel.com
author: Matthew Brost <matthew.brost@intel.com> 2021-07-26 17:23:33 -0700
committer: John Harrison <John.C.Harrison@Intel.com> 2021-07-27 17:31:59 -0700
commit: 573ba126aef37c8315e5bb68d2dad515efa96994 (patch)
tree: 671ca22ec14a9b93c075ad9f47bc8566f68fb161 /drivers/gpu/drm/i915/gt/intel_engine_cs.c
parent: c17b637928f030caac2d1c737959b9627011ac49 (diff)
1 files changed, 9 insertions, 2 deletions
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 1eaa658507e1..0da7868c5a13 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1731,7 +1731,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	drm_printf(m, "\tRequests:\n");
 
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	rq = intel_engine_execlist_find_hung_request(engine);
 	if (rq) {
 		struct intel_timeline *tl = get_timeline(rq);
 
@@ -1842,11 +1842,18 @@ static bool match_ring(struct i915_request *rq)
 }
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine)
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
 	struct i915_request *request, *active = NULL;
 
 	/*
+	 * This search does not work in GuC submission mode. However, the GuC
+	 * will report the hanging context directly to the driver itself. So
+	 * the driver should never get here when in GuC mode.
+	 */
+	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
+
+	/*
 	 * We are called by the error capture, reset and to dump engine
 	 * state at random points in time. In particular, note that neither is
 	 * crucially ordered with an interrupt. After a hang, the GPU is dead
author	Matthew Brost <matthew.brost@intel.com>	2021-07-26 17:23:33 -0700
committer	John Harrison <John.C.Harrison@Intel.com>	2021-07-27 17:31:59 -0700
commit	573ba126aef37c8315e5bb68d2dad515efa96994 (patch)
tree	671ca22ec14a9b93c075ad9f47bc8566f68fb161 /drivers/gpu/drm/i915/gt/intel_engine_cs.c
parent	c17b637928f030caac2d1c737959b9627011ac49 (diff)