diff options
author | Zhanjun Dong <zhanjun.dong@intel.com> | 2024-10-04 12:34:27 -0700 |
---|---|---|
committer | Matt Roper <matthew.d.roper@intel.com> | 2024-10-08 09:39:58 -0700 |
commit | ecb6336463911d6eb684998754f8701d0f437f18 (patch) | |
tree | 80ef747b7088e9183142cd5c864fe6ff81857830 /drivers/gpu/drm/xe/xe_guc_capture.h | |
parent | 8bfc496327ce0f3bd02445048e3a70cc97accc6d (diff) |
drm/xe/guc: Plumb GuC-capture into dev coredump
When we decide to kill a job, (from guc_exec_queue_timedout_job), we could
end up with 4 possible scenarios at this starting point of this decision:
1. the guc-captured register-dump is already there.
2. the driver is wedged.mode > 1, so GuC-engine-reset / GuC-err-capture
will not happen.
3. the user has started the driver in execlist-submission mode.
4. the guc-captured register-dump is not ready yet so we force GuC to kill
that context now, but:
A. we don't know yet if GuC will be successful on the engine-reset
and get the guc-err-capture, else kmd will do a manual reset later
OR B. guc will be successful and we will get a guc-err-capture
shortly.
So to accomdate the scenarios of 2 and 4A, we will need to do a manual KMD
capture first(which is not be reliable in guc-submission mode) and decide
later if we need to use that for the cases of 2 or 4A. So this flow is
part of the implementation for this patch.
Provide xe_guc_capture_get_reg_desc_list to get the register dscriptor
list.
Add manual capture by read from hw engine if GuC capture is not ready.
If it becomes ready at later time, GuC sourced data will be used.
Although there may only be a small delay between (1) the check for whether
guc-err-capture is available at the start of guc_exec_queue_timedout_job
and (2) the decision on using a valid guc-err-capture or manual-capture,
lets not take any chances and lock the matching node down so it doesn't
get re-claimed if GuC-Err-Capture subsystem is running out of pre-cached
nodes.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-6-zhanjun.dong@intel.com
Diffstat (limited to 'drivers/gpu/drm/xe/xe_guc_capture.h')
-rw-r--r-- | drivers/gpu/drm/xe/xe_guc_capture.h | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/drivers/gpu/drm/xe/xe_guc_capture.h b/drivers/gpu/drm/xe/xe_guc_capture.h index 4acf44472a63..fe695ab08a74 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.h +++ b/drivers/gpu/drm/xe/xe_guc_capture.h @@ -12,6 +12,9 @@ #include "xe_guc_fwif.h" struct xe_guc; +struct xe_hw_engine; +struct xe_hw_engine_snapshot; +struct xe_sched_job; static inline enum guc_capture_list_class_type xe_guc_class_to_capture_class(u16 class) { @@ -44,7 +47,14 @@ int xe_guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type, enum guc_capture_list_class_type capture_class, size_t *size); int xe_guc_capture_getnullheader(struct xe_guc *guc, void **outptr, size_t *size); size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc); +const struct __guc_mmio_reg_descr_group * +xe_guc_capture_get_reg_desc_list(struct xe_gt *gt, u32 owner, u32 type, + enum guc_capture_list_class_type capture_class, bool is_ext); +struct __guc_capture_parsed_output *xe_guc_capture_get_matching_and_lock(struct xe_sched_job *job); +void xe_engine_snapshot_capture_for_job(struct xe_sched_job *job); +void xe_engine_guc_capture_print(struct xe_hw_engine_snapshot *snapshot, struct drm_printer *p); void xe_guc_capture_steered_list_init(struct xe_guc *guc); +void xe_guc_capture_put_matched_nodes(struct xe_guc *guc); int xe_guc_capture_init(struct xe_guc *guc); #endif |