Merge tag 'drm-intel-gt-next-2024-04-26' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-next

UAPI Changes: - drm/i915/guc: Use context hints for GT frequency Allow user to provide a low latency context hint. When set, KMD sends a hint to GuC which results in special handling for this context. SLPC will ramp the GT frequency aggressively every time it switches to this context. The down freq threshold will also be lower so GuC will ramp down the GT freq for this context more slowly. We also disable waitboost for this context as that will interfere with the strategy. We need to enable the use of SLPC Compute strategy during init, but it will apply only to contexts that set this bit during context creation. Userland can check whether this feature is supported using a new param- I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission enabled platforms as they use SLPC for frequency management. The Mesa usage model for this flag is here - https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint - drm/i915/gt: Enable only one CCS for compute workload Enable only one CCS engine by default with all the compute sices allocated to it. While generating the list of UABI engines to be exposed to the user, exclude any additional CCS engines beyond the first instance *** NOTE: This W/A will make all DG2 SKUs appear like single CCS SKUs by default to mitigate a hardware bug. All the EUs will still remain usable, and all the userspace drivers have been confirmed to be able to dynamically detect the change in number of CCS engines and adjust. For the smaller percent of applications that get perf benefit from letting the userspace driver dispatch across all 4 CCS engines we will be introducing a sysfs control as a later patch to choose 4 CCS each with 25% EUs (or 50% if 2 CCS). NOTE: A regression has been reported at https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10895 However Andi has been triaging the issue and we're closing in a fix to the gap in the W/A implementation: https://lists.freedesktop.org/archives/intel-gfx/2024-April/348747.html Driver Changes: - Add new and fix to existing workarounds: Wa_14018575942 (MTL), Wa_16019325821 (Gen12.70), Wa_14019159160 (MTL), Wa_16015675438, Wa_14020495402 (Gen12.70) (Tejas, John, Lucas) - Fix UAF on destroy against retire race and remove two earlier partial fixes (Janusz) - Limit the reserved VM space to only the platforms that need it (Andi) - Reset queue_priority_hint on parking for execlist platforms (Chris) - Fix gt reset with GuC submission is disabled (Nirmoy) - Correct capture of EIR register on hang (John) - Remove usage of the deprecated ida_simple_xx() API - Refactor confusing __intel_gt_reset() (Nirmoy) - Fix the fix for GuC reset lock confusion (John) - Simplify/extend platform check for Wa_14018913170 (John) - Replace dev_priv with i915 (Andi) - Add and use gt_to_guc() wrapper (Andi) - Remove bogus null check (Rodrigo, Dan) . Selftest improvements (Janusz, Nirmoy, Daniele) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZitVBTvZmityDi7D@jlahtine-mobl.ger.corp.intel.com
author: Dave Airlie <airlied@redhat.com> 2024-04-30 14:20:31 +1000
committer: Dave Airlie <airlied@redhat.com> 2024-04-30 14:40:43 +1000
commit: 68b89e23c2282877b0d411e07a3ef90490d6fe30 (patch)
tree: b9be4b583185edfdb72255819ae8c3680166fb67 /drivers/gpu/drm/i915/gt/intel_reset.c
parent: b84bc948528e6474ba48596144e9c17466a98448 (diff)
parent: 4d3421e04c5dc38baf15224c051256204f223c15 (diff)
1 files changed, 43 insertions, 8 deletions
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index c8e9aa41fdea..6161f7a3ff70 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -764,7 +764,7 @@ wa_14015076503_end(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 			 HECI_H_GS1_ER_PREP, 0);
 }
 
-int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask)
+static int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 {
 	const int retries = engine_mask == ALL_ENGINES ? RESET_MAX_RETRIES : 1;
 	reset_func reset;
@@ -879,8 +879,17 @@ static intel_engine_mask_t reset_prepare(struct intel_gt *gt)
 	intel_engine_mask_t awake = 0;
 	enum intel_engine_id id;
 
-	/* For GuC mode, ensure submission is disabled before stopping ring */
-	intel_uc_reset_prepare(&gt->uc);
+	/**
+	 * For GuC mode with submission enabled, ensure submission
+	 * is disabled before stopping ring.
+	 *
+	 * For GuC mode with submission disabled, ensure that GuC is not
+	 * sanitized, do that after engine reset. reset_prepare()
+	 * is followed by engine reset which in this mode requires GuC to
+	 * process any CSB FIFO entries generated by the resets.
+	 */
+	if (intel_uc_uses_guc_submission(&gt->uc))
+		intel_uc_reset_prepare(&gt->uc);
 
 	for_each_engine(engine, gt, id) {
 		if (intel_engine_pm_get_if_awake(engine))
@@ -978,7 +987,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
 
 	/* Even if the GPU reset fails, it should still stop the engines */
 	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
-		__intel_gt_reset(gt, ALL_ENGINES);
+		intel_gt_reset_all_engines(gt);
 
 	for_each_engine(engine, gt, id)
 		engine->submit_request = nop_submit_request;
@@ -1089,7 +1098,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	/* We must reset pending GPU events before restoring our submission */
 	ok = !HAS_EXECLISTS(gt->i915); /* XXX better agnosticism desired */
 	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
-		ok = __intel_gt_reset(gt, ALL_ENGINES) == 0;
+		ok = intel_gt_reset_all_engines(gt) == 0;
 	if (!ok) {
 		/*
 		 * Warn CI about the unrecoverable wedged condition.
@@ -1133,10 +1142,10 @@ static int do_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 {
 	int err, i;
 
-	err = __intel_gt_reset(gt, ALL_ENGINES);
+	err = intel_gt_reset_all_engines(gt);
 	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
 		msleep(10 * (i + 1));
-		err = __intel_gt_reset(gt, ALL_ENGINES);
+		err = intel_gt_reset_all_engines(gt);
 	}
 	if (err)
 		return err;
@@ -1227,6 +1236,9 @@ void intel_gt_reset(struct intel_gt *gt,
 
 	intel_overlay_reset(gt->i915);
 
+	/* sanitize uC after engine reset */
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		intel_uc_reset_prepare(&gt->uc);
 	/*
 	 * Next we need to restore the context, but we don't use those
 	 * yet either...
@@ -1270,7 +1282,30 @@ error:
 	goto finish;
 }
 
-static int intel_gt_reset_engine(struct intel_engine_cs *engine)
+/**
+ * intel_gt_reset_all_engines() - Reset all engines in the given gt.
+ * @gt: the GT to reset all engines for.
+ *
+ * This function resets all engines within the given gt.
+ *
+ * Returns:
+ * Zero on success, negative error code on failure.
+ */
+int intel_gt_reset_all_engines(struct intel_gt *gt)
+{
+	return __intel_gt_reset(gt, ALL_ENGINES);
+}
+
+/**
+ * intel_gt_reset_engine() - Reset a specific engine within a gt.
+ * @engine: engine to be reset.
+ *
+ * This function resets the specified engine within a gt.
+ *
+ * Returns:
+ * Zero on success, negative error code on failure.
+ */
+int intel_gt_reset_engine(struct intel_engine_cs *engine)
 {
 	return __intel_gt_reset(engine->gt, engine->mask);
 }
author	Dave Airlie <airlied@redhat.com>	2024-04-30 14:20:31 +1000
committer	Dave Airlie <airlied@redhat.com>	2024-04-30 14:40:43 +1000
commit	68b89e23c2282877b0d411e07a3ef90490d6fe30 (patch)
tree	b9be4b583185edfdb72255819ae8c3680166fb67 /drivers/gpu/drm/i915/gt/intel_reset.c
parent	b84bc948528e6474ba48596144e9c17466a98448 (diff)
parent	4d3421e04c5dc38baf15224c051256204f223c15 (diff)