summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gt/uc
AgeCommit message (Collapse)Author
2021-12-20drm/i915/guc: Only assign guc_id.id when stealing guc_idMatthew Brost
Previously assigned whole guc_id structure (list, spin lock) which is incorrect, only assign the guc_id.id. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-3-matthew.brost@intel.com (cherry picked from commit 939d8e9c87e704fd5437e2c8b80929591fe540eb) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-12-20drm/i915/guc: Use correct context lock when callig clr_context_registeredMatthew Brost
s/ce/cn/ when grabbing guc_state.lock before calling clr_context_registered. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-2-matthew.brost@intel.com (cherry picked from commit b25db8c782ad7ae80d4cea2a09c222f4f8980bb9) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-11-17drm/i915/guc: fix NULL vs IS_ERR() checkingDan Carpenter
The intel_engine_create_virtual() function does not return NULL. It returns error pointers. Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211116114916.GB11936@kili (cherry picked from commit fc12b70d12d07598cde27cc17dbfafc2a2a33ff8) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-10-27drm/i915/guc: Fix recursive lock in GuC submissionMatthew Brost
Use __release_guc_id (lock held) rather than release_guc_id (acquires lock), add lockdep annotations. 213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr [ 213.283459] ============================================ [ 213.283462] WARNING: possible recursive locking detected {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }} [ 213.283470] -------------------------------------------- [ 213.283472] kworker/u24:0/8 is trying to acquire lock: [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915] {{[ 213.283618] }} {{ but task is already holding lock:}} [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283720] }} {{ other info that might help us debug this:}} [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0 [ 213.283728] ---- [ 213.283730] lock(&guc->submission_state.lock); [ 213.283734] lock(&guc->submission_state.lock); {{[ 213.283737] }} {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8: [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283860] }} {{ stack backtrace:}} [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18 [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915] [ 213.283957] Call Trace: [ 213.283960] dump_stack_lvl+0x57/0x72 [ 213.283966] __lock_acquire.cold+0x191/0x2d3 [ 213.283972] lock_acquire+0xb5/0x2b0 [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915] [ 213.284139] ? lock_release+0xb9/0x280 [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60 [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915] [ 213.284310] process_one_work+0x270/0x550 [ 213.284315] worker_thread+0x52/0x3b0 [ 213.284319] ? process_one_work+0x550/0x550 [ 213.284322] kthread+0x135/0x160 [ 213.284326] ? set_kthread_struct+0x40/0x40 [ 213.284331] ret_from_fork+0x1f/0x30 and a bit later in the trace: {{ 227.499864] do_raw_spin_lock+0x94/0xa0}} [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60 [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915] [ 227.500209] ? mark_held_locks+0x49/0x70 [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915] [ 227.500320] reset_prepare+0x78/0x90 [i915] [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915] [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915] [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915] [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915] [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915] [ 227.500767] i915_pci_probe+0x84/0x170 [i915] [ 227.500838] local_pci_probe+0x42/0x80 [ 227.500842] pci_device_probe+0xd9/0x190 [ 227.500844] really_probe+0x1f2/0x3f0 [ 227.500847] __driver_probe_device+0xfe/0x180 [ 227.500848] driver_probe_device+0x1e/0x90 [ 227.500850] __driver_attach+0xc4/0x1d0 [ 227.500851] ? __device_attach_driver+0xe0/0xe0 [ 227.500853] ? __device_attach_driver+0xe0/0xe0 [ 227.500854] bus_for_each_dev+0x64/0x90 [ 227.500856] bus_add_driver+0x12e/0x1f0 [ 227.500857] driver_register+0x8f/0xe0 [ 227.500859] i915_init+0x1d/0x8f [i915] [ 227.500934] ? 0xffffffffc144a000 [ 227.500936] do_one_initcall+0x58/0x2d0 [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80 [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0 [ 227.500944] do_init_module+0x5c/0x270 [ 227.500946] __do_sys_finit_module+0x95/0xe0 [ 227.500949] do_syscall_64+0x38/0x90 [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 227.500953] RIP: 0033:0x7ffa59d2ae0d [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48 [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004 [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070 [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90 [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0 v2: (CI build) - Fix build error Fixes: 1a52faed31311 ("drm/i915/guc: Take GT PM ref when deregistering context") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com (cherry picked from commit 12a9917e9e84fef4efa73c09b32870df0b1ed795) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-10-15drm/i915/guc: Handle errors in multi-lrc requestsMatthew Brost
If an error occurs in the front end when multi-lrc requests are getting generated we need to skip these in the backend but we still need to emit the breadcrumbs seqno. An issues arises because with multi-lrc breadcrumbs there is a handshake between the parent and children to make forward progress. If all the requests are not present this handshake doesn't work. To work around this, if multi-lrc request has an error we skip the handshake but still emit the breadcrumbs seqno. v2: (John Harrison) - Add comment explaining the skipping of the handshake logic - Fix typos in the commit message v3: (John Harrison) - Fix up some comments about the math to NOP the ring Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-22-matthew.brost@intel.com
2021-10-15drm/i915: Multi-BB execbufMatthew Brost
Allow multiple batch buffers to be submitted in a single execbuf IOCTL after a context has been configured with the 'set_parallel' extension. The number batches is implicit based on the contexts configuration. This is implemented with a series of loops. First a loop is used to find all the batches, a loop to pin all the HW contexts, a loop to create all the requests, a loop to submit (emit BB start, etc...) all the requests, a loop to tie the requests to the VMAs they touch, and finally a loop to commit the requests to the backend. A composite fence is also created for the generated requests to return to the user and to stick in dma resv slots. No behavior from the existing IOCTL should be changed aside from when throttling because the ring for a context is full. In this situation, i915 will now wait while holding the object locks. This change was done because the code is much simpler to wait while holding the locks and we believe there isn't a huge benefit of dropping these locks. If this proves false we can restructure the code to drop the locks during the wait. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Matthew Brost) - Return proper error value if i915_request_create fails v3: (John Harrison) - Add comment explaining create / add order loops + locking - Update commit message explaining different in IOCTL behavior - Line wrap some comments - eb_add_request returns void - Return -EINVAL rather triggering BUG_ON if cmd parser used (Checkpatch) - Check eb->batch_len[*current_batch] v4: (CI) - Set batch len if passed if via execbuf args - Call __i915_request_skip after __i915_request_commit (Kernel test robot) - Initialize rq to NULL in eb_pin_timeline v5: (John Harrison) - Fix typo in comments near bb order loops Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com
2021-10-15drm/i915/guc: Implement no mid batch preemption for multi-lrcMatthew Brost
For some users of multi-lrc, e.g. split frame, it isn't safe to preempt mid BB. To safely enable preemption at the BB boundary, a handshake between parent and child is needed, syncing the set of BBs at the beginning and end of each batch. This is implemented via custom emit_bb_start & emit_fini_breadcrumb functions and enabled by default if a context is configured by set parallel extension. Lastly, this patch updates the process descriptor to the correct size as the memory used in the handshake is directly after the process descriptor. v2: (John Harrison) - Fix a few comments wording - Add struture for parent page layout v3: (John Harrison) - A structure for sync semaphore - Use offsetof to calc address - Update commit message v4: (John Harrison) - Fix typos in comment explaining memory map of scratch page Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-20-matthew.brost@intel.com
2021-10-15drm/i915/guc: Add basic GuC multi-lrc selftestMatthew Brost
Add very basic (single submission) multi-lrc selftest. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-19-matthew.brost@intel.com
2021-10-15drm/i915/guc: Connect UAPI to GuC multi-lrc interfaceMatthew Brost
Introduce 'set parallel submit' extension to connect UAPI to GuC multi-lrc interface. Kernel doc in new uAPI should explain it all. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Daniel Vetter) - Add IGT link and placeholder for media UMD link v3: (Kernel test robot) - Fix warning in unpin engines call (John Harrison) - Reword a bunch of the kernel doc v4: (John Harrison) - Add comment why perma-pin is done after setting gem context - Update some comments / docs for proto contexts v5: (John Harrison) - Rework perma-pin comment - Add BUG_IN if context is pinned when setting gem context Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-17-matthew.brost@intel.com
2021-10-15drm/i915/guc: Update debugfs for GuC multi-lrcMatthew Brost
Display the workqueue status in debugfs for GuC contexts that are in parent-child relationship. v2: (John Harrison) - Output number children in debugfs Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-16-matthew.brost@intel.com
2021-10-15drm/i915/guc: Implement multi-lrc resetMatthew Brost
Update context and full GPU reset to work with multi-lrc. The idea is parent context tracks all the active requests inflight for itself and its children. The parent context owns the reset replaying / canceling requests as needed. v2: (John Harrison) - Simply loop in find active request - Add comments to find ative request / reset loop v3: (John Harrison) - s/its'/its/g - Fix comment when searching for active request - Reorder if state in __guc_reset_context v4: (Kernel test robot) - Delete unused is_multi_lrc function Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-15-matthew.brost@intel.com
2021-10-15drm/i915/guc: Insert submit fences between requests in parent-child relationshipMatthew Brost
The GuC must receive requests in the order submitted for contexts in a parent-child relationship to function correctly. To ensure this, insert a submit fence between the current request and last request submitted for requests / contexts in a parent child relationship. This is conceptually similar to a single timeline. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-14-matthew.brost@intel.com
2021-10-15drm/i915/guc: Implement multi-lrc submissionMatthew Brost
Implement multi-lrc submission via a single workqueue entry and single H2G. The workqueue entry contains an updated tail value for each request, of all the contexts in the multi-lrc submission, and updates these values simultaneously. As such, the tasklet and bypass path have been updated to coalesce requests into a single submission. v2: (John Harrison) - s/wqe/wqi - Use FIELD_PREP macros - Add GEM_BUG_ONs ensures length fits within field - Add comment / white space to intel_guc_write_barrier (Kernel test robot) - Make need_tasklet a static function v3: (Docs) - A comment for submission_stall_reason v4: (Kernel test robot) - Initialize return value in bypass tasklt submit function (John Harrison) - Add comment near work queue defs - Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2 - Update write_barrier comment to talk about work queue v5: (John Harrison) - Fix typo in work queue comment Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-13-matthew.brost@intel.com
2021-10-15drm/i915/guc: Implement parallel context pin / unpin functionsMatthew Brost
Parallel contexts are perma-pinned by the upper layers which makes the backend implementation rather simple. The parent pins the guc_id and children increment the parent's pin count on pin to ensure all the contexts are unpinned before we disable scheduling with the GuC / or deregister the context. v2: (Daniel Vetter) - Perma-pin parallel contexts Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-12-matthew.brost@intel.com
2021-10-15drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_idsMatthew Brost
Assign contexts in parent-child relationship consecutive guc_ids. This is accomplished by partitioning guc_id space between ones that need to be consecutive (1/16 available guc_ids) and ones that do not (15/16 of available guc_ids). The consecutive search is implemented via the bitmap API. This is a precursor to the full GuC multi-lrc implementation but aligns to how GuC mutli-lrc interface is defined - guc_ids must be consecutive when using the GuC multi-lrc interface. v2: (Daniel Vetter) - Explicitly state why we assign consecutive guc_ids v3: (John Harrison) - Bring back in spin lock Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-11-matthew.brost@intel.com
2021-10-15drm/i915/guc: Ensure GuC schedule operations do not operate on child contextsMatthew Brost
In GuC parent-child contexts the parent context controls the scheduling, ensure only the parent does the scheduling operations. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-10-matthew.brost@intel.com
2021-10-15drm/i915/guc: Add multi-lrc context registrationMatthew Brost
Add multi-lrc context registration H2G. In addition a workqueue and process descriptor are setup during multi-lrc context registration as these data structures are needed for multi-lrc submission. v2: (John Harrison) - Move GuC specific fields into sub-struct - Clean up WQ defines - Add comment explaining math to derive WQ / PD address v3: (John Harrison) - Add PARENT_SCRATCH_SIZE define - Update comment explaining multi-lrc register v4: (John Harrison) - Move PARENT_SCRATCH_SIZE to common file Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-9-matthew.brost@intel.com
2021-10-15drm/i915: Add logical engine mappingMatthew Brost
Add logical engine mapping. This is required for split-frame, as workloads need to be placed on engines in a logically contiguous manner. v2: (Daniel Vetter) - Add kernel doc for new fields v3: (Tvrtko) - Update comment for new logical_mask field v4: (John Harrison) - Update comment for new logical_mask field Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-6-matthew.brost@intel.com
2021-10-15drm/i915/guc: Take engine PM when a context is pinned with GuC submissionMatthew Brost
Taking a PM reference to prevent intel_gt_wait_for_idle from short circuiting while any user context has scheduling enabled. Returning GT idle when it is not can cause all sorts of issues throughout the stack. v2: (Daniel Vetter) - Add might_lock annotations to pin / unpin function v3: (CI) - Drop intel_engine_pm_might_put from unpin path as an async put is used v4: (John Harrison) - Make intel_engine_pm_might_get/put work with GuC virtual engines - Update commit message v5: - Update commit message again Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-4-matthew.brost@intel.com
2021-10-15drm/i915/guc: Take GT PM ref when deregistering contextMatthew Brost
Taking a PM reference to prevent intel_gt_wait_for_idle from short circuiting while a deregister context H2G is in flight. To do this must issue the deregister H2G from a worker as context can be destroyed from an atomic context and taking GT PM ref blows up. Previously we took a runtime PM from this atomic context which worked but will stop working once runtime pm autosuspend in enabled. So this patch is two fold, stop intel_gt_wait_for_idle from short circuting and fix runtime pm autosuspend. v2: (John Harrison) - Split structure changes out in different patch (Tvrtko) - Don't drop lock in deregister_destroyed_contexts v3: (John Harrison) - Flush destroyed contexts before destroying context reg pool Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-3-matthew.brost@intel.com
2021-10-15drm/i915/guc: Move GuC guc_id allocation under submission state sub-structMatthew Brost
Move guc_id allocation under submission state sub-struct as a future patch will reuse the spin lock as a global submission state lock. Moving this into sub-struct makes ownership of fields / lock clear. v2: (Docs) - Add comment for submission_state sub-structure v3: (John Harrison) - Fixup a few comments v4: (John Harrison) - Fix typo Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-2-matthew.brost@intel.com
2021-10-01drm/i915/guc: Move and improve error message for missed CTB replyMichal Wajdeczko
If we timeout waiting for a CT reply we print very simple error message. Improve that and by moving error reporting to the caller we can use CT_ERROR instead of DRM_ERROR and report just fence as error code will be reported later anyway. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-5-michal.wajdeczko@intel.com
2021-10-01drm/i915/guc: Print error name on CTB send failureMichal Wajdeczko
Instead of plain error value (%d) print more user friendly error name (%pe). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-4-michal.wajdeczko@intel.com
2021-10-01drm/i915/guc: Print error name on CTB (de)registration failureMichal Wajdeczko
Instead of plain error value (%d) print more user friendly error name (%pe). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-3-michal.wajdeczko@intel.com
2021-10-01drm/i915/guc: Verify result from CTB (de)register actionMichal Wajdeczko
In commit b839a869dfc9 ("drm/i915/guc: Add support for data reporting in GuC responses") we missed the hypothetical case that GuC might return positive non-zero value as success data. While that would be lucky treated as error case, and at the end will result in reporting valid -EIO, in the meantime this value will be passed to ERR_PTR that could be misleading. v2: rebased Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210926184545.1407-2-michal.wajdeczko@intel.com
2021-09-24drm/i915: Reduce the number of objects subject to memcpy recoverThomas Hellström
We really only need memcpy restore for objects that affect the operability of the migrate context. That is, primarily the page-table objects of the migrate VM. Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early restores using memcpy and a way to assign LMEM page-table object flags to be used by the vms. Restore objects without this flag with the gpu blitter and only objects carrying the flag using TTM memcpy. Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and defer for a later audit which vms actually need it. Most importantly, user- allocated vms with pinned page-table objects can be restored using the blitter. Performance-wise memcpy restore is probably as fast as gpu restore if not faster, but using gpu restore will help tackling future restrictions in mappable LMEM size. v4: - Don't mark the aliasing ppgtt page table flags for early resume, but rather the ggtt page table flags as intended. (Matthew Auld) - The check for user buffer objects during early resume is pointless, since they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld) v5: - Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored before we fire up the migrate context. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com
2021-09-24drm/i915/gt: Register the migrate contexts with their enginesThomas Hellström
Pinned contexts, like the migrate contexts need reset after resume since their context image may have been lost. Also the GuC needs to register pinned contexts. Add a list to struct intel_engine_cs where we add all pinned contexts on creation, and traverse that list at resume time to reset the pinned contexts. This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest for now, but proper LMEM backup / restore is needed for full suspend functionality. However, note that even with full LMEM backup / restore it may be desirable to keep the reset since backing up the migrate context images must happen using memcpy() after the migrate context has become inactive, and for performance- and other reasons we want to avoid memcpy() from LMEM. Also traverse the list at guc_init_lrc_mapping() calling guc_kernel_context_pin() for the pinned contexts, like is already done for the kernel context. v2: - Don't reset the contexts on each __engine_unpark() but rather at resume time (Chris Wilson). v3: - Reset contexts in the engine sanitize callback. (Chris Wilson) Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Brost Matthew <matthew.brost@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-6-thomas.hellstrom@linux.intel.com
2021-09-23drm/i915/guc, docs: Fix pdfdocs build error by removing nested gridAkira Yokosawa
Nested grids in grid-table cells are not specified as proper ReST constructs. Commit 572f2a5cd974 ("drm/i915/guc: Update firmware to v62.0.0") added a couple of kerneldoc tables of the form: +---+-------+------------------------------------------------------+ | 1 | 31:0 | +------------------------------------------------+ | +---+-------+ | | | |...| | | Embedded `HXG Message`_ | | +---+-------+ | | | | n | 31:0 | +------------------------------------------------+ | +---+-------+------------------------------------------------------+ For "make htmldocs", they happen to work as one might expect, but they are incompatible with "make latexdocs" and "make pdfdocs", and cause the generated gpu.tex file to become incomplete and unbuildable by xelatex. Restore the compatibility by removing those nested grids in the tables. Size comparison of generated gpu.tex: Sphinx 2.4.4 Sphinx 4.2.0 v5.14: 3238686 3841631 v5.15-rc1: 376270 432729 with this fix: 3377846 3998095 Fixes: 572f2a5cd974 ("drm/i915/guc: Update firmware to v62.0.0") Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4a227569-074f-c501-58bb-d0d8f60a8ae9@gmail.com
2021-09-20drm/i915/guc: Enable GuC submission by default on DG1Matthew Brost
Enable GuC submission by default on DG1 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210916162819.27848-5-matthew.brost@intel.com
2021-09-20drm/i915/guc: Add DG1 GuC / HuC firmware defsDaniele Ceraolo Spurio
Add DG1 GuC / HuC firmware defs Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210916162819.27848-4-matthew.brost@intel.com
2021-09-20drm/i915/guc: put all guc objects in lmem when availableDaniele Ceraolo Spurio
The firmware binary has to be loaded from lmem and the recommendation is to put all other objects in there as well. Note that we don't fall back to system memory if the allocation in lmem fails because all objects are allocated during driver load and if we have issues with lmem at that point something is seriously wrong with the system, so no point in trying to handle it. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210916162819.27848-3-matthew.brost@intel.com
2021-09-20drm/i915: Do not define vma on stackVenkata Sandeep Dhanalakota
Defining vma on stack can cause stack overflow, if vma gets populated with new fields. v2: (Daniel Vetter) - Add kerneldoc for new field Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210916162819.27848-2-matthew.brost@intel.com
2021-09-18drm/i915: rename debugfs_gt filesLucas De Marchi
We shouldn't be using debugfs_ namespace for this functionality. Rename debugfs_gt.[ch] to intel_gt_debugfs.[ch] and then make functions, defines and structs follow suit. While at it and since we are renaming the header, sort the includes alphabetically. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210918025754.1254705-1-lucas.demarchi@intel.com
2021-09-13drm/i915/guc: Add GuC kernel docMatthew Brost
Add GuC kernel doc for all structures added thus far for GuC submission and update the main GuC submission section with the new interface details. v2: - Drop guc_active.lock DOC v3: - Fixup a few kernel doc comments (Daniele) v4 (Daniele): - Implement doc suggestions from John - Add kerneldoc for all members of the GuC structure and pull the file in i915.rst v5 (Daniele): - Implement new doc suggestions from John Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-24-matthew.brost@intel.com
2021-09-13drm/i915/guc: Drop guc_active move everything into guc_stateMatthew Brost
Now that we have locking hierarchy of sched_engine->lock -> ce->guc_state everything from guc_active can be moved into guc_state and protected the guc_state.lock. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-23-matthew.brost@intel.com
2021-09-13drm/i915/guc: Move fields protected by guc->contexts_lock into sub structureMatthew Brost
To make ownership of locking clear move fields (guc_id, guc_id_ref, guc_id_link) to sub structure guc_id in intel_context. Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-22-matthew.brost@intel.com
2021-09-13drm/i915/guc: Move GuC priority fields in context under guc_activeMatthew Brost
Move GuC management fields in context under guc_active struct as this is where the lock that protects theses fields lives. Also only set guc_prio field once during context init. v2: (Daniele) - set CONTEXT_SET_INIT Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-21-matthew.brost@intel.com
2021-09-13drm/i915/guc: Drop pin count check trick between sched_disable and re-pinMatthew Brost
Drop pin count check trick between a sched_disable and re-pin, now rely on the lock and counter of the number of committed requests to determine if scheduling should be disabled on the context. Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-20-matthew.brost@intel.com
2021-09-13drm/i915/guc: Proper xarray usage for contexts_lookupMatthew Brost
Lock the xarray and take ref to the context if needed. v2: (Checkpatch) - Add new line after declaration (Daniel Vetter) - Correct put / get accounting in xa_for_loops v3: (Checkpatch) - Extra new line Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-19-matthew.brost@intel.com
2021-09-13drm/i915/guc: Rework and simplify lockingMatthew Brost
Rework and simplify the locking with GuC subission. Drop sched_state_no_lock and move all fields under the guc_state.sched_state and protect all these fields with guc_state.lock . This requires changing the locking hierarchy from guc_state.lock -> sched_engine.lock to sched_engine.lock -> guc_state.lock. v2: (Daniele) - Don't check fields outside of lock during sched disable, check less fields within lock as some of the outside are no longer needed Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-18-matthew.brost@intel.com
2021-09-13drm/i915/guc: Move guc_blocked fence to struct guc_stateMatthew Brost
Move guc_blocked fence to struct guc_state as the lock which protects the fence lives there. s/ce->guc_blocked/ce->guc_state.blocked/g v2: (Daniele) - s/blocked_fence/blocked/g Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-17-matthew.brost@intel.com
2021-09-13drm/i915/guc: Release submit fence from an irq_workMatthew Brost
A subsequent patch will flip the locking hierarchy from ce->guc_state.lock -> sched_engine->lock to sched_engine->lock -> ce->guc_state.lock. As such we need to release the submit fence for a request from an IRQ to break a lock inversion - i.e. the fence must be release went holding ce->guc_state.lock and the releasing of the can acquire sched_engine->lock. v2: (Daniele) - Delete request from list before calling irq_work_queue Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-16-matthew.brost@intel.com
2021-09-13drm/i915/guc: Reset LRC descriptor if register returns -ENODEVMatthew Brost
Reset LRC descriptor if a context register returns -ENODEV as this means we are mid-reset. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-15-matthew.brost@intel.com
2021-09-13drm/i915/guc: Don't touch guc_state.sched_state without a lockMatthew Brost
Before we did some clever tricks to not use the a lock when touching guc_state.sched_state in certain cases. Don't do that, enforce the use of the lock. v2: (kernel test robo ) - Add __maybe_unused to sched_state_is_init() v3: rebase after the unused code path removal has been moved to an earlier patch. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-14-matthew.brost@intel.com
2021-09-13drm/i915/guc: Take context ref when cancelling requestMatthew Brost
A context can get destroyed after cancelling a request, if a context or GT reset occurs, so take a reference to context when cancelling a request. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-13-matthew.brost@intel.com
2021-09-13drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2HMatthew Brost
While debugging an issue with full GT resets I went down a rabbit hole thinking the scrubbing of lost G2H wasn't working correctly. This proved to be incorrect as this was working just fine but this chase inspired me to write a selftest to prove that this works. This simple selftest injects errors dropping various G2H and then issues a full GT reset proving that the scrubbing of these G2H doesn't blow up. v2: (Daniel Vetter) - Use ifdef instead of macros for selftests v3: (Checkpatch) - A space after 'switch' statement v4: (Daniele) - A comment saying GT won't idle if G2H are lost Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-12-matthew.brost@intel.com
2021-09-13drm/i915/guc: Copy whole golden context, set engine state size of subsetMatthew Brost
When the GuC does a media reset, it copies a golden context state back into the corrupted context's state. The address of the golden context and the size of the engine state restore are passed in via the GuC ADS. The i915 had a bug where it passed in the whole size of the golden context, not the size of the engine state to restore resulting in a memory corruption. Also copy the entire golden context on init rather than just the engine state that is restored. v2 (Daniele): use defines to avoid duplicated const variables (John). Fixes: 481d458caede ("drm/i915/guc: Add golden context to GuC ADS") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-11-matthew.brost@intel.com
2021-09-13drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, ↵Matthew Brost
not registered When unblocking a context, do not enable scheduling if the context is banned, guc_id invalid, or not registered. v2: (Daniele) - Add helper for unblock Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-10-matthew.brost@intel.com
2021-09-13drm/i915/guc: Kick tasklet after queuing a requestMatthew Brost
Kick tasklet after queuing a request so it submitted in a timely manner. Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for new inteface") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-9-matthew.brost@intel.com
2021-09-13drm/i915/guc: Workaround reset G2H is received after schedule done G2HMatthew Brost
If the context is reset as a result of the request cancellation the context reset G2H is received after schedule disable done G2H which is the wrong order. The schedule disable done G2H release the waiting request cancellation code which resubmits the context. This races with the context reset G2H which also wants to resubmit the context but in this case it really should be a NOP as request cancellation code owns the resubmit. Use some clever tricks of checking the context state to seal this race until the GuC firmware is fixed. v2: (Checkpatch) - Fix typos v3: (Daniele) - State that is a bug in the GuC firmware Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-7-matthew.brost@intel.com