summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/intel_ringbuffer.c
AgeCommit message (Collapse)Author
2019-04-24drm/i915: Move GraphicsTechnology files under gt/Chris Wilson
Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
2019-04-18drm/i915: Setup the RCS ring prior to executionChris Wilson
We need to set the various ring registers prior to restarting the engine, or else we may restart it after reset/resume in an ill-defined state. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190418132720.3716-2-chris@chris-wilson.co.uk
2019-04-18drm/i915: Stop overwriting RING_IMR in rcs resumeChris Wilson
We store the engine->imr mask and set up the RING_IMR register on restarting the engine. We do not then want to overwrite it with an incomplete mask later as we may then lose interrupts! Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190418132720.3716-1-chris@chris-wilson.co.uk
2019-04-16drm/i915: add GEN2_ prefix to the I{E, I, M, S}R registersPaulo Zanoni
This discussion started because we use token pasting in the GEN{2,3}_IRQ_INIT and GEN{2,3}_IRQ_RESET macros, so gen2-4 passes an empty argument to those macros, making the code a little weird. The original proposal was to just add a comment as the empty argument, but Ville suggested we just add a prefix to the registers, and that indeed sounds like a more elegant solution. Now doing this is kinda against our rules for register naming since we only add gens or platform names as register prefixes when the given gen/platform changes a register that already existed before. On the other hand, we have so many instances of IIR/IMR in comments that adding a prefix would make the users of these register more easily findable, in addition to make our token pasting macros actually readable. So IMHO opening an exception here is worth it. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190410235344.31199-4-paulo.r.zanoni@intel.com
2019-04-10drm/i915: Only reset the pinned kernel contexts on resumeChris Wilson
On resume, we know that the only pinned contexts in danger of seeing corruption are the kernel context, and so we do not need to walk the list of all GEM contexts as we tracked them on each engine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190410190120.830-1-chris@chris-wilson.co.uk
2019-03-29drm/i915: fix i9xx irq enable/disableDaniele Ceraolo Spurio
Those functions are used on gen4 as well and gen4 does have a non-RCS engine, so remove the BUG_ON and flip back the logic to what it was before the ENGINE_READ/WRITE update v2: update the posting read as well (Chris, Ville). Fixes: baba6e572b38 ("drm/i915: take a reference to uncore in the engine and use it") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190329165018.32953-1-daniele.ceraolospurio@intel.com
2019-03-26drm/i915: take a reference to uncore in the engine and use itDaniele Ceraolo Spurio
A few advantages: - Prepares us for the planned split of display uncore from GT uncore - Improves our engine-centric view of the world in the engine code and allows us to avoid jumping back to dev_priv. - Allows us to wrap accesses to engine register in nice macros that automatically pick the right mmio base. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-10-daniele.ceraolospurio@intel.com
2019-03-26drm/i915: switch intel_wait_for_register to uncoreDaniele Ceraolo Spurio
The intel_uncore structure is the owner of register access, so subclass the function to it. While at it, use a local uncore var and switch to the new read/write functions where it makes sense. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-9-daniele.ceraolospurio@intel.com
2019-03-26drm/i915: intel_wait_for_register_fw to uncoreDaniele Ceraolo Spurio
The intel_uncore structure is the owner of register access, so subclass the function to it. While at it, use a local uncore var and switch to the new read/write functions where it makes sense. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-8-daniele.ceraolospurio@intel.com
2019-03-21drm/i915: Flush pages on acquisitionChris Wilson
When we return pages to the system, we ensure that they are marked as being in the CPU domain since any external access is uncontrolled and we must assume the worst. This means that we need to always flush the pages on acquisition if we need to use them on the GPU, and from the beginning have used set-domain. Set-domain is overkill for the purpose as it is a general synchronisation barrier, but our intent is to only flush the pages being swapped in. If we move that flush into the pages acquisition phase, we know then that when we have obj->mm.pages, they are coherent with the GPU and need only maintain that status without resorting to heavy handed use of set-domain. The principle knock-on effect for userspace is through mmap-gtt pagefaulting. Our uAPI has always implied that the GTT mmap was async (especially as when any pagefault occurs is unpredicatable to userspace) and so userspace had to apply explicit domain control itself (set-domain). However, swapping is transparent to the kernel, and so on first fault we need to acquire the pages and make them coherent for access through the GTT. Our use of set-domain here leaks into the uABI that the first pagefault was synchronous. This is unintentional and baring a few igt should be unoticed, nevertheless we bump the uABI version for mmap-gtt to reflect the change in behaviour. Another implication of the change is that gem_create() is presumed to create an object that is coherent with the CPU and is in the CPU write domain, so a set-domain(CPU) following a gem_create() would be a minor operation that merely checked whether we could allocate all pages for the object. On applying this change, a set-domain(CPU) causes a clflush as we acquire the pages. This will have a small impact on mesa as we move the clflush here on !llc from execbuf time to create, but that should have minimal performance impact as the same clflush exists but is now done early and because of the clflush issue, userspace recycles bo and so should resist allocating fresh objects. Internally, the presumption that objects are created in the CPU write-domain and remain so through writes to obj->mm.mapping is more prevalent than I expected; but easy enough to catch and apply a manual flush. For the future, we should push the page flush from the central set_pages() into the callers so that we can more finely control when it is applied, but for now doing it one location is easier to validate, at the cost of sometimes flushing when there is no need. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
2019-03-21drm/i915: Stop storing the context name as the timeline nameChris Wilson
The timeline->name is only used for convenience in pretty printing the i915_request.fence->ops->get_timeline_name() and it is just as convenient to pull it from the gem_context directly. The few instances of its use inside GEM_TRACE() has proven more of a nuisance than helpful, so not worth saving imo. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-4-chris@chris-wilson.co.uk
2019-03-20drm/i915: use intel_uncore for all forcewake get/putDaniele Ceraolo Spurio
Now that the internal code all works on intel_uncore, flip the external-facing interface. v2: fix GVT. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-4-daniele.ceraolospurio@intel.com
2019-03-19drm/i915: Hold a reference to the active HW contextChris Wilson
For virtual engines, we need to keep the HW context alive while it remains in use. For regular HW contexts, they are created and kept alive until the end of the GEM context. For simplicity, generalise the requirements and keep an active reference to each HW context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-2-chris@chris-wilson.co.uk
2019-03-19drm/i915: Lock the gem_context->active_list while dropping the linkChris Wilson
On unpinning the intel_context, we remove it from the active list inside the GEM context. This list is supposed to be guarded by the GEM context mutex, so remember to take it! Fixes: 7e3d9a59410d ("drm/i915: Track active engines within a context") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-1-chris@chris-wilson.co.uk
2019-03-18drm/i915: Hold a ref to the ring while retiringChris Wilson
As the final request on a ring may hold the reference to this ring (via retiring the last pinned context), we may find ourselves chasing a dangling pointer on completion of the list. A quick solution is to hold a reference to the ring itself as we retire along it so that we only free it after we stop dereferencing it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk
2019-03-18drm/i915: Switch to use HWS indices rather than addressesChris Wilson
If we use the STORE_DATA_INDEX function we can use a fixed offset and avoid having to lookup up the engine HWS address. A step closer to being able to emit the final breadcrumb during request_add rather than later in the submission interrupt handler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-9-chris@chris-wilson.co.uk
2019-03-12drm/i915: Consolidate reset-request debug messageChris Wilson
Move the pair of messages to the common callsite where it makes sense to include a bit more information about which request is being reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190312111146.10662-1-chris@chris-wilson.co.uk
2019-03-08drm/i915: Introduce intel_context.pin_mutex for pin managementChris Wilson
Introduce a mutex to start locking the HW contexts independently of struct_mutex, with a view to reducing the coarse struct_mutex. The intel_context.pin_mutex is used to guard the transition to and from being pinned on the gpu, and so is required before starting to build any request. The intel_context will then remain pinned until the request completes, but the mutex can be released immediately unpin completion of pinning the context. A slight variant of the above is used by per-context sseu that wants to inspect the pinned status of the context, and requires that it remains stable (either !pinned or pinned) across its operation. By using the pin_mutex to serialise operations while pin_count==0, we can take that pin_mutex for stabilise the boolean pin status. v2: for Tvrtko! * Improved commit message. * Dropped _gpu suffix from gen8_modify_rpcs_gpu. v3: Repair the locking for sseu selftests Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-7-chris@chris-wilson.co.uk
2019-03-08drm/i915: Track the pinned kernel contexts on each engineChris Wilson
Each engine acquires a pin on the kernel contexts (normal and preempt) so that the logical state is always available on demand. Keep track of each engines pin by storing the returned pointer on the engine for quick access. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-6-chris@chris-wilson.co.uk
2019-03-08drm/i915: Make context pinning part of intel_context_opsChris Wilson
Push the intel_context pin callback down from intel_engine_cs onto the context itself by virtue of having a central caller for intel_context_pin() being able to lookup the intel_context itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-5-chris@chris-wilson.co.uk
2019-03-08drm/i915: Move over to intel_context_lookup()Chris Wilson
In preparation for an ever growing number of engines and so ever increasing static array of HW contexts within the GEM context, move the array over to an rbtree, allocated upon first use. Unfortunately, this imposes an rbtree lookup at a few frequent callsites, but we should be able to mitigate those by moving over to using the HW context as our primary type and so only incur the lookup on the boundary with the user GEM context and engines. v2: Check for no HW context in guc_stage_desc_init Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk
2019-03-08drm/i915: Store the intel_context_ops in the intel_engine_csChris Wilson
If we place a pointer to the engine specific intel_context_ops in the engine itself, we can assign the ops pointer on initialising the context, and then rely on it being set. This simplifies the code in later patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-3-chris@chris-wilson.co.uk
2019-03-08drm/i915: Track active engines within a contextChris Wilson
For use in the next patch, if we track which engines have been used by the HW, we can reduce the work required to flush our state off the HW to those engines. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-1-chris@chris-wilson.co.uk
2019-03-05drm/i915: Store the BIT(engine->id) as the engine's maskChris Wilson
In the next patch, we are introducing a broad virtual engine to encompass multiple physical engines, losing the 1:1 nature of BIT(engine->id). To reflect the broader set of engines implied by the virtual instance, lets store the full bitmask. v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/) v3: Tvrtko voted for moah churn so teach everyone to not mention ring and use $class$instance throughout. v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and VCS[0-4] in later gen. We opt to keep the code consistent and use 0-index naming throughout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
2019-02-26drm/i915: Remove i915_request.global_seqnoChris Wilson
Having weaned the interrupt handling off using a single global execution queue, we no longer need to emit a global_seqno. Note that we still have a few assumptions about execution order along engine timelines, but this removes the most obvious artefact! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk
2019-02-26drm/i915: Remove access to global seqno in the HWSPChris Wilson
Stop accessing the HWSP to read the global seqno, and stop tracking the mirror in the engine's execution timeline -- it is unused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk
2019-02-26drm/i915: Replace global_seqno with a hangcheck heartbeat seqnoChris Wilson
To determine whether an engine has 'stuck', we simply check whether or not is still on the same seqno for several seconds. To keep this simple mechanism intact over the loss of a global seqno, we can simply add a new global heartbeat seqno instead. As we cannot know the sequence in which requests will then be completed, we use a primitive random number generator instead (with a cycle long enough to not matter over an interval of a few thousand requests between hangcheck samples). The alternative to using a dedicated seqno on every request is to issue a heartbeat request and query its progress through the system. Sadly this requires us to reduce struct_mutex so that we can issue requests without requiring that bkl. v2: And without the extra CS_STALL for the hangcheck seqno -- we don't need strict serialisation with what comes later, we just need to be sure we don't write the hangcheck seqno before our batch is flushed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-1-chris@chris-wilson.co.uk
2019-02-07drm/i915: Hack and slash, throttle execbuffer hogsChris Wilson
Apply backpressure to hogs that emit requests faster than the GPU can process them by waiting for their ring to be less than half-full before proceeding with taking the struct_mutex. This is a gross hack to apply throttling backpressure, the long term goal is to remove the struct_mutex contention so that each client naturally waits, preferably in an asynchronous, nonblocking fashion (pipelined operations for the win), for their own resources and never blocks another client within the driver at least. (Realtime priority goals would extend to ensuring that resource contention favours high priority clients as well.) This patch only limits excessive request production and does not attempt to throttle clients that block waiting for eviction (either global GTT or system memory) or any other global resources, see above for the long term goal. No microbenchmarks are harmed (to the best of my knowledge). Testcase: igt/gem_exec_schedule/pi-ringfull-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
2019-01-29drm/i915: Replace global breadcrumbs with per-context interrupt trackingChris Wilson
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd"), the issue of handling multiple clients waiting in parallel was brought to our attention. The requirement was that every client should be woken immediately upon its request being signaled, without incurring any cpu overhead. To handle certain fragility of our hw meant that we could not do a simple check inside the irq handler (some generations required almost unbounded delays before we could be sure of seqno coherency) and so request completion checking required delegation. Before commit 688e6c725816, the solution was simple. Every client waiting on a request would be woken on every interrupt and each would do a heavyweight check to see if their request was complete. Commit 688e6c725816 introduced an rbtree so that only the earliest waiter on the global timeline would woken, and would wake the next and so on. (Along with various complications to handle requests being reordered along the global timeline, and also a requirement for kthread to provide a delegate for fence signaling that had no process context.) The global rbtree depends on knowing the execution timeline (and global seqno). Without knowing that order, we must instead check all contexts queued to the HW to see which may have advanced. We trim that list by only checking queued contexts that are being waited on, but still we keep a list of all active contexts and their active signalers that we inspect from inside the irq handler. By moving the waiters onto the fence signal list, we can combine the client wakeup with the dma_fence signaling (a dramatic reduction in complexity, but does require the HW being coherent, the seqno must be visible from the cpu before the interrupt is raised - we keep a timer backup just in case). Having previously fixed all the issues with irq-seqno serialisation (by inserting delays onto the GPU after each request instead of random delays on the CPU after each interrupt), we can rely on the seqno state to perfom direct wakeups from the interrupt handler. This allows us to preserve our single context switch behaviour of the current routine, with the only downside that we lose the RT priority sorting of wakeups. In general, direct wakeup latency of multiple clients is about the same (about 10% better in most cases) with a reduction in total CPU time spent in the waiter (about 20-50% depending on gen). Average herd behaviour is improved, but at the cost of not delegating wakeups on task_prio. v2: Capture fence signaling state for error state and add comments to warm even the most cold of hearts. v3: Check if the request is still active before busywaiting v4: Reduce the amount of pointer misdirection with list_for_each_safe and using a local i915_request variable inside the loops v5: Add a missing pluralisation to a purely informative selftest message. References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-29drm/i915: Identify active requestsChris Wilson
To allow requests to forgo a common execution timeline, one question we need to be able to answer is "is this request running?". To track whether a request has started on HW, we can emit a breadcrumb at the beginning of the request and check its timeline's HWSP to see if the breadcrumb has advanced past the start of this request. (This is in contrast to the global timeline where we need only ask if we are on the global timeline and if the timeline has advanced past the end of the previous request.) There is still confusion from a preempted request, which has already started but relinquished the HW to a high priority request. For the common case, this discrepancy should be negligible. However, for identification of hung requests, knowing which one was running at the time of the hang will be much more important. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
2019-01-28drm/i915: Track the context's seqno in its own timeline HWSPChris Wilson
Now that we have allocated ourselves a cacheline to store a breadcrumb, we can emit a write from the GPU into the timeline's HWSP of the per-context seqno as we complete each request. This drops the mirroring of the per-engine HWSP and allows each context to operate independently. We do not need to unwind the per-context timeline, and so requests are always consistent with the timeline breadcrumb, greatly simplifying the completion checks as we no longer need to be concerned about the global_seqno changing mid check. One complication though is that we have to be wary that the request may outlive the HWSP and so avoid touching the potentially danging pointer after we have retired the fence. We also have to guard our access of the HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe. At this point, we are emitting both per-context and global seqno and still using the single per-engine execution timeline for resolving interrupts. v2: s/fake_complete/mark_complete/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk
2019-01-28drm/i915: Allocate a status page for each timelineChris Wilson
Allocate a page for use as a status page by a group of timelines, as we only need a dword of storage for each (rounded up to the cacheline for safety) we can pack multiple timelines into the same page. Each timeline will then be able to track its own HW seqno. v2: Reuse the common per-engine HWSP for the solitary ringbuffer timeline, so that we do not have to emit (using per-gen specialised vfuncs) the breadcrumb into the distinct timeline HWSP and instead can keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the sleight-of-hand for the global/per-context seqno switchover, we will store both temporarily (and so use a custom offset for the shared timeline HWSP until the switch over). v3: Keep things simple and allocate a page for each timeline, page sharing comes next. v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over again in selftests. v5: And caught red handed copying create timeline + check. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-3-chris@chris-wilson.co.uk
2019-01-28drm/i915: Always allocate an object/vma for the HWSPChris Wilson
Currently we only allocate an object and vma if we are using a GGTT virtual HWSP, and a plain struct page for a physical HWSP. For convenience later on with global timelines, it will be useful to always have the status page being tracked by a struct i915_vma. Make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-4-chris@chris-wilson.co.uk
2019-01-25drm/i915: Remove GPU reset dependence on struct_mutexChris Wilson
Now that the submission backends are controlled via their own spinlocks, with a wave of a magic wand we can lift the struct_mutex requirement around GPU reset. That is we allow the submission frontend (userspace) to keep on submitting while we process the GPU reset as we can suspend the backend independently. The major change is around the backoff/handoff strategy for performing the reset. With no mutex deadlock, we no longer have to coordinate with any waiter, and just perform the reset immediately. Testcase: igt/gem_mmap_gtt/hang # regresses Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk
2019-01-25drm/i915: Remove manual breadcumb countingChris Wilson
Now that we know we measure the size of the engine->emit_breadcrumb() correctly, we can remove the previous manual counting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125120005.25191-1-chris@chris-wilson.co.uk
2019-01-25drm/i915: Measure the required reserved size for request emissionChris Wilson
Instead of tediously and fragilely counting up the number of dwords required to emit the breadcrumb to seal a request, fake a request and measure it automatically once during engine setup. The downside is that this requires a fair amount of mocking to create a proper breadcrumb. Still, should be less error prone in future as the breadcrumb size fluctuates! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125100520.20163-1-chris@chris-wilson.co.uk
2019-01-22drm/i915: Tidy common test_bit probing of i915_request->fence.flagsChris Wilson
A repeated pattern is to test the signaled bit of our request->fence.flags. Make this an inline to shorten a few lines and remove unnecessary line continuations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190121222117.23305-20-chris@chris-wilson.co.uk
2019-01-09drm/i915: drop all drmP.h includesJani Nikula
Needs just a few additional includes here and there. Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190108082709.3748-1-jani.nikula@intel.com
2019-01-07drm/i915/hsw: Flush RING_IMR changes before changing the global GT IMR (vecs)Chris Wilson
Haswell also requires the RING_IMR flush for its unique vebox setup to avoid losing interrupts, as per 476af9c26063 ("drm/i915/gen6: Flush RING_IMR changes before changing the global GT IMR"): On Baytail, notably, we can still detect missed interrupt syndrome (where we never spot a completed request). In this case, it can be alleviated by always keeping the interrupt unmasked, implying that the interrupt is being lost in the window after modifying the IMR. (This is the reason we still have the posting reads on enable_irq, if we remove them we miss interrupts!) Having narrowed the issue down to the IMR, rather than keeping it always enabled, applying the usual posting read/flush of the RING_IMR before unmasking the GT IMR also seems to prevent the missed interrupt. So be it. References: 476af9c26063 ("drm/i915/gen6: Flush RING_IMR changes before changing the global GT IMR") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190105115647.4970-1-chris@chris-wilson.co.uk
2019-01-03drm/i915/gen6: Flush RING_IMR changes before changing the global GT IMRChris Wilson
On Baytail, notably, we can still detect missed interrupt syndrome (where we never spot a completed request). In this case, it can be alleviated by always keeping the interrupt unmasked, implying that the interrupt is being lost in the window after modifying the IMR. (This is the reason we still have the posting reads on enable_irq, if we remove them we miss interrupts!) Having narrowed the issue down to the IMR, rather than keeping it always enabled, applying the usual posting read/flush of the RING_IMR before unmasking the GT IMR also seems to prevent the missed interrupt. So be it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190102163524.19353-1-chris@chris-wilson.co.uk
2019-01-02drm/i915: start moving runtime device info to a separate structJani Nikula
First move the low hanging fruit, the fields that are only initialized runtime. Use RUNTIME_INFO() exclusively to access the fields. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/c24fe7a4b0492a888690c46814c0ff21ce2f12b1.1546267488.git.jani.nikula@intel.com
2018-12-31drm/i915: Drop unused engine->irq_seqno_barrier w/aChris Wilson
Now that we have eliminated the CPU-side irq_seqno_barrier by moving the delays on the GPU before emitting the MI_USER_INTERRUPT, we can remove the engine->irq_seqno_barrier infrastructure. Though intentionally slowing down the GPU is nasty, so is the code we can now remove! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-6-chris@chris-wilson.co.uk
2018-12-31drm/i915/ringbuffer: Move irq seqno barrier to the GPU for gen5Chris Wilson
The irq_seqno_barrier is a tradeoff between doing work on every request (on the GPU) and doing work after every interrupt (on the CPU). We presume we have many more requests than interrupts! However, for Ironlake, the workaround is a pretty hideous usleep() and so even though it was found we need to repeat the MI_STORE_DWORD_IMM 8 times, or about 1us of GPU time, doing so is preferrable than requiring a sleep of 125-250us on the CPU where we desire to respond immediately (ideally from within the interrupt handler)! The additional MI_STORE_DWORD_IMM also have the side-effect of flushing MI operations from userspace which are not caught by MI_FLUSH! Testcase: igt/gem_sync Testcase: igt/gem_exec_whisper Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-5-chris@chris-wilson.co.uk
2018-12-31drm/i915/ringbuffer: Move irq seqno barrier to the GPU for gen7Chris Wilson
The irq_seqno_barrier is a tradeoff between doing work on every request (on the GPU) and doing work after every interrupt (on the CPU). We presume we have many more requests than interrupts! However, the current w/a for Ivybridge is an implicit delay that currently fails sporadically and consistently if we move the w/a into the irq handler itself. This makes the CPU barrier untenable for upcoming interrupt handler changes and so we need to replace it with a delay on the GPU before we send the MI_USER_INTERRUPT. As it turns out that delay is 32x MI_STORE_DWORD_IMM, or about 0.6us per request! Quite nasty, but the lesser of two evils looking to the future. Testcase: igt/gem_sync Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-4-chris@chris-wilson.co.uk
2018-12-31drm/i915/ringbuffer: Remove irq-seqno w/a for gen6 xcsChris Wilson
The MI_FLUSH_DW does appear coherent with the following MI_USER_INTERRUPT, but only on Sandybridge. Ivybridge requires a heavier hammer, but on Sandybridge we can stop requiring the irq_seqno barrier. Testcase: igt/gem_sync Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-3-chris@chris-wilson.co.uk
2018-12-31drm/i915/ringbuffer: Remove irq-seqno w/a for gen6/7 rcsChris Wilson
Having transitioned to using PIPECONTROL to combine the flush with the breadcrumb write using their post-sync functions, assume that this will resolve the serialisation with the subsequent MI_USER_INTERRUPT. That is when inspecting the breadcrumb after an interrupt we can rely on the write being posted (i.e. the HWSP will be coherent). Testing using gem_sync shows that the PIPECONTROL + CS stall does serialise the command streamer sufficient that the breadcrumb lands before the MI_USER_INTERRUPT. The same is not true for MI_FLUSH_DW. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-2-chris@chris-wilson.co.uk
2018-12-31drm/i915: Remove redundant trailing request flushChris Wilson
Now that we perform the request flushing inline with emitting the breadcrumb, we can remove the now redundant manual flush. And we can also remove the infrastructure that remained only for its purpose. v2: emit_breadcrumb_sz is in dwords, but rq->reserved_space is in bytes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-1-chris@chris-wilson.co.uk
2018-12-28drm/i915/ringbuffer: Pull the render flush into breadcrumb emissionChris Wilson
In preparation for removing the manual EMIT_FLUSH prior to emitting the breadcrumb implement the flush inline with writing the breadcrumb for ringbuffer emission. With a combined flush+breadcrumb, we can use a single operation to both flush and after the flush is complete (post-sync) write the breadcrumb. This gives us a strongly ordered operation that should be sufficient to serialise the write before we emit the interrupt; and therefore we may take the opportunity to remove the irq_seqno_barrier w/a for gen6+. Although using the PIPECONTROL to write the breadcrumb is slower than MI_STORE_DWORD_IMM, by combining the operations into one and removing the extra flush (next patch) it is faster For gen2-5, we simply combine the MI_FLUSH into the breadcrumb emission, though maybe we could find a solution here to the seqno-vs-interrupt issue on Ironlake by mixing up the flush? The answer is no, adding an MI_FLUSH before the interrupt is insufficient. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228153114.4948-2-chris@chris-wilson.co.uk
2018-12-28drm/i915: Remove HW semaphores for gen7 inter-engine synchronisationChris Wilson
The writing is on the wall for the existence of a single execution queue along each engine, and as a consequence we will not be able to track dependencies along the HW queue itself, i.e. we will not be able to use HW semaphores on gen7 as they use a global set of registers (and unlike gen8+ we can not effectively target memory to keep per-context seqno and dependencies). On the positive side, when we implement request reordering for gen7 we also can not presume a simple execution queue and would also require removing the current semaphore generation code. So this bring us another step closer to request reordering for ringbuffer submission! The negative side is that using interrupts to drive inter-engine synchronisation is much slower (4us -> 15us to do a nop on each of the 3 engines on ivb). This is much better than it was at the time of introducing the HW semaphores and equally important userspace weaned itself off intermixing dependent BLT/RENDER operations (the prime culprit was glyph rendering in UXA). So while we regress the microbenchmarks, it should not impact the user. References: https://bugs.freedesktop.org/show_bug.cgi?id=108888 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk
2018-12-28drm/i915: Restrict PSMI context load w/a to Haswell GT1Chris Wilson
After we found a workaround for a hang on context load, Ben Widawsky found confirmation that it was for an issue with waking from rc6 and loading a context image. The workaround from on high suggests that we should I915_WRITE(RING_WAIT_FOR_RC6_EXIT(engine->mmio_base), _MASKED_FIELD(RING_RC6_SEL_WRITE_ADDR_MASK, RING_RC6_SEL_WRITE_ADDR_UPPER_LEFT)); in our rc6 setup for Haswell GT1, but on applying that we find instead that the machine encounters a GT forcewake error and locks up. As we are removing HW semaphore usage in the next patch, and the suggested workaround is no improvement, we need to decouple the PSMI workaround from HAS_SEMAPHORES to IS_HSW_GT1. References: 2c550183476d ("drm/i915: Disable PSMI sleep messages on all rings around context switches") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-1-chris@chris-wilson.co.uk