summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_gem_evict.c
AgeCommit message (Collapse)Author
2019-12-05drm/i915: Ignore most failures during evict-vmChris Wilson
Removing all vma from the VM is best effort -- we only remove all those ready to be removed, so forgive and VMA that becomes pinned. While forgiving those that become pinned, also take a second look for any that became unpinned as we waited. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205113726.413351-1-chris@chris-wilson.co.uk
2019-10-04drm/i915: Move request runtime management onto gtChris Wilson
Requests are run from the gt and are tided into the gt runtime power management, so pull the runtime request management under gt/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk
2019-10-04drm/i915: Merge wait_for_timelines with retire_requestChris Wilson
wait_for_timelines is essentially the same loop as retiring requests (with an extra timeout), so merge the two into one routine. v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk
2019-10-04drm/i915: Pull i915_vma_pin under the vm->mutexChris Wilson
Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04drm/i915: Use helpers for drm_mm_node booleansChris Wilson
A subset of 71724f708997 ("drm/mm: Use helpers for drm_mm_node booleans") in order to prepare drm-intel-next-queued for subsequent patches before we can backmerge 71724f708997 itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004142226.13711-1-chris@chris-wilson.co.uk
2019-09-09drm/i915: cleanup cache-coloringMatthew Auld
Try to tidy up the cache-coloring such that we rid the code of any mm.color_adjust assumptions, this should hopefully make it more obvious in the code when we need to actually use the cache-level as the color, and as a bonus should make adding a different color-scheme simpler. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190909124052.22900-3-matthew.auld@intel.com
2019-08-21drm/i915: Replace PIN_NONFAULT with calls to PIN_NOEVICTChris Wilson
When under severe stress for GTT mappable space, the LRU eviction model falls off a cliff. We spend all our time scanning the much larger non-mappable area searching for something within the mappable zone we can evict. Turn this on its head by only using the full vma for the object if it is already pinned in the mappable zone or there is sufficient *free* space to accommodate it (prioritizing speedy reuse). If there is not, immediately fall back to using small chunks (tilerow for GTT mmap, single pages for pwrite/relocation) and using random eviction before doing a full search. Testcase: igt/gem_concurrent_blt References: https://bugs.freedesktop.org/show_bug.cgi?id=110848 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190821123234.19194-1-chris@chris-wilson.co.uk
2019-08-07drm/i915: remove unnecessary includes of intel_display_types.h headerJani Nikula
With its original name intel_drv.h the intel_display_types.h header was superfluously cargo-cult included all over the place, while it's really mostly about display internals. Remove the unnecessary includes. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e3d737f0ab87c55969e62c1e077e15c04c238297.1565085692.git.jani.nikula@intel.com
2019-08-07drm/i915: rename intel_drv.h to display/intel_display_types.hJani Nikula
Everything about the file is about display, and mostly about types related to display. Move under display/ as intel_display_types.h to reflect the facts. There's still plenty to clean up, but start off with moving the file where it logically belongs and naming according to contents. v2: fix the include guard name in the renamed file Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190806113933.11799-1-jani.nikula@intel.com
2019-05-28drm/i915: Move more GEM objects under gem/Chris Wilson
Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
2019-04-24drm/i915: Invert the GEM wakeref hierarchyChris Wilson
In the current scheme, on submitting a request we take a single global GEM wakeref, which trickles down to wake up all GT power domains. This is undesirable as we would like to be able to localise our power management to the available power domains and to remove the global GEM operations from the heart of the driver. (The intent there is to push global GEM decisions to the boundary as used by the GEM user interface.) Now during request construction, each request is responsible via its logical context to acquire a wakeref on each power domain it intends to utilize. Currently, each request takes a wakeref on the engine(s) and the engines themselves take a chipset wakeref. This gives us a transition on each engine which we can extend if we want to insert more powermangement control (such as soft rc6). The global GEM operations that currently require a struct_mutex are reduced to listening to pm events from the chipset GT wakeref. As we reduce the struct_mutex requirement, these listeners should evaporate. Perhaps the biggest immediate change is that this removes the struct_mutex requirement around GT power management, allowing us greater flexibility in request construction. Another important knock-on effect, is that by tracking engine usage, we can insert a switch back to the kernel context on that engine immediately, avoiding any extra delay or inserting global synchronisation barriers. This makes tracking when an engine and its associated contexts are idle much easier -- important for when we forgo our assumed execution ordering and need idle barriers to unpin used contexts. In the process, it means we remove a large chunk of code whose only purpose was to switch back to the kernel context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
2019-03-08drm/i915: Remove has-kernel-contextChris Wilson
We can no longer assume execution ordering, and in particular we cannot assume which context will execute last. One side-effect of this is that we cannot determine if the kernel-context is resident on the GPU, so remove the routines that claimed to do so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-4-chris@chris-wilson.co.uk
2019-03-08drm/i915: Reduce presumption of request ordering for barriersChris Wilson
Currently we assume that we know the order in which requests run and so can determine if we need to reissue a switch-to-kernel-context prior to idling. That assumption does not hold for the future, so instead of tracking which barriers have been used, simply determine if we have ever switched away from the kernel context by using the engine and before idling ensure that all engines that have been used since the last idle are synchronously switched back to the kernel context for safety (and else of shrinking memory while idle). v2: Use intel_engine_mask_t and ALL_ENGINES Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk
2019-01-28drm/i915: Pull VM lists under the VM mutex.Chris Wilson
A starting point to counter the pervasive struct_mutex. For the goal of avoiding (or at least blocking under them!) global locks during user request submission, a simple but important step is being able to manage each clients GTT separately. For which, we want to replace using the struct_mutex as the guard for all things GTT/VM and switch instead to a specific mutex inside i915_address_space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-2-chris@chris-wilson.co.uk
2019-01-28drm/i915: Stop tracking MRU activity on VMAChris Wilson
Our goal is to remove struct_mutex and replace it with fine grained locking. One of the thorny issues is our eviction logic for reclaiming space for an execbuffer (or GTT mmaping, among a few other examples). While eviction itself is easy to move under a per-VM mutex, performing the activity tracking is less agreeable. One solution is not to do any MRU tracking and do a simple coarse evaluation during eviction of active/inactive, with a loose temporal ordering of last insertion/evaluation. That keeps all the locking constrained to when we are manipulating the VM itself, neatly avoiding the tricky handling of possible recursive locking during execbuf and elsewhere. Note that discarding the MRU (currently implemented as a pair of lists, to avoid scanning the active list for a NONBLOCKING search) is unlikely to impact upon our efficiency to reclaim VM space (where we think a LRU model is best) as our current strategy is to use random idle replacement first before doing a search, and over time the use of softpinned 48b per-ppGTT is growing (thereby eliminating any need to perform any eviction searches, in theory at least) with the remaining users being found on much older devices (gen2-gen6). v2: Changelog and commentary rewritten to elaborate on the duality of a single list being both an inactive and active list. v3: Consolidate bool parameters into a single set of flags; don't comment on the duality of a single variable being a multiplicity of bits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-1-chris@chris-wilson.co.uk
2019-01-09drm/i915: drop all drmP.h includesJani Nikula
Needs just a few additional includes here and there. Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190108082709.3748-1-jani.nikula@intel.com
2018-07-09drm/i915: Provide a timeout to i915_gem_wait_for_idle()Chris Wilson
Usually we have no idea about the upper bound we need to wait to catch up with userspace when idling the device, but in a few situations we know the system was idle beforehand and can provide a short timeout in order to very quickly catch a failure, long before hangcheck kicks in. In the following patches, we will use the timeout to curtain two overly long waits, where we know we can expect the GPU to complete within a reasonable time or declare it broken. In particular, with a broken GPU we expect it to fail during the initial GPU setup where do a couple of context switches to record the defaults. This is a task that takes a few milliseconds even on the slowest of devices, but we may have to wait 60s for hangcheck to give in and declare the machine inoperable. In this a case where any gpu hang is unacceptable, both from a timeliness and practical standpoint. The other improvement is that in selftests, we do not need to arm an independent timer to inject a wedge, as we can just limit the timeout on the wait directly. v2: Include the timeout parameter in the trace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
2018-02-21drm/i915: Rename drm_i915_gem_request to i915_requestChris Wilson
We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-25drm/i915: Use same test for eviction and submitting kernel contextChris Wilson
During evict, we wish to idle the GPU if we see that the GGTT is full. However, our test for idle in i915_gem_evict_something() and in i915_gem_switch_to_kernel_context() do not match leading to disappointment - we never believe that we are idle and keep trying to flush the GGTT ad infinitum. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103438 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171024220855.30155-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-25drm/i915: Call cond_resched() before repeating i915_gem_evict_something()Chris Wilson
Insert a breakpoint, a chance to escape back to the scheduler and run something else for a bit, if we find that the GGTT is full and needs to be idled in order to make some room. In practice, this should only be an issue in stress tests as the wait itself will normally give the chance for the scheduler to intervene and make progress. References: https://bugs.freedesktop.org/show_bug.cgi?id=103438 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171024205053.7845-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-12drm/i915/selftests: Exercise adding requests to a full GGTTChris Wilson
A bug recently encountered involved the issue where are we were submitting requests to different ppGTT, each would pin a segment of the GGTT for its logical context and ring. However, this is invisible to eviction as we do not tie the context/ring VMA to a request and so do not automatically wait upon it them (instead they are marked as pinned, preventing eviction entirely). Instead the eviction code must flush those contexts by switching to the kernel context. This selftest tries to fill the GGTT with contexts to exercise a path where the switch-to-kernel-context failed to make forward progress and we fail with ENOSPC. v2: Make the hole in the filled GGTT explicit. v3: Swap out the arbitrary timeout for a private notification from i915_gem_evict_something() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-10-12drm/i915: Fix eviction when the GGTT is idle but fullChris Wilson
In the full-ppgtt world, we can fill the GGTT full of context objects. These context objects are currently implicitly tracked by the requests that pin them i.e. they are only unpinned when the request is completed and retired, but we do not have the link from the vma to the request (anymore). In order to unpin those contexts, we have to issue another request and wait upon the switch to the kernel context. The bug during eviction was that we assumed that a full GGTT meant we would have requests on the GGTT timeline, and so we missed situations where those requests where merely in flight (and when even they have not yet been submitted to hw yet). The fix employed here is to change the already-is-idle test to no look at the execution timeline, but count the outstanding requests and then check that we have switched to the kernel context. Erring on the side of overkill here just means that we stall a little longer than may be strictly required, but we only expect to hit this path in extreme corner cases where returning an erroneous error is worse than the delay. v2: Logical inversion when swapping over branches. Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171012125726.14736-1-chris@chris-wilson.co.uk
2017-10-09drm/i915: Check PIN_NONFAULT overlaps in evict_for_nodeChris Wilson
If the caller says that he doesn't want to evict any other faulting vma, honour that flag. The logic was used in evict_something, but not the more specific evict_for_node, now being used as a preliminary probe since commit 606fec956c0e ("drm/i915: Prefer random replacement before eviction search"). Fixes: 606fec956c0e ("drm/i915: Prefer random replacement before eviction search") Fixes: 821188778b9b ("drm/i915: Choose not to evict faultable objects from the GGTT") References: https://bugs.freedesktop.org/show_bug.cgi?id=102490 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-4-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-10-09drm/i915: Track user GTT faulting per-vmaChris Wilson
We don't wish to refault the entire object (other vma) when unbinding one partial vma. To do this track which vma have been faulted into the user's address space. v2: Use a local vma_offset to tidy up a multiline unmap_mapping_range(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-18drm/i915: Convert execbuf to use struct-of-array packing for critical fieldsChris Wilson
When userspace is doing most of the work, avoiding relocs (using NO_RELOC) and opting out of implicit synchronisation (using ASYNC), we still spend a lot of time processing the arrays in execbuf, even though we now should have nothing to do most of the time. One issue that becomes readily apparent in profiling anv is that iterating over the large execobj[] is unfriendly to the loop prefetchers of the CPU and it much prefers iterating over a pair of arrays rather than one big array. v2: Clear vma[] on construction to handle errors during vma lookup Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-3-chris@chris-wilson.co.uk
2017-06-16drm/i915: Eliminate lots of iterations over the execobjects arrayChris Wilson
The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-15drm/i915: Split vma exec_link/evict_linkChris Wilson
Currently the vma has one link member that is used for both holding its place in the execbuf reservation list, and in any eviction list. This dual property is quite tricky and error prone. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-3-chris@chris-wilson.co.uk
2017-06-15drm/i915: Use vma->exec_entry as our double-entry placeholderChris Wilson
This has the benefit of not requiring us to manipulate the vma->exec_link list when tearing down the execbuffer, and is a marginally cheaper test to detect the user error. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-2-chris@chris-wilson.co.uk
2017-03-31drm/i915: Move retire-requests into i915_gem_wait_for_idle()Chris Wilson
As we now distinguish everywhere that can call i915_gem_retire_requests() following a successful wait_for_idle, we can remove the duplication by moving that call into i915_gem_wait_for_idle() itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-03-09drm/i915: use correct node for handling cache domain evictionMatthew Auld
It looks like we were incorrectly comparing vma->node against itself instead of the target node, when evicting for a node on systems where we need guard pages between regions with different cache domains. As a consequence we can end up trying to needlessly evict neighbouring nodes, even if they have the same cache domain, and if they were pinned we would fail the eviction. Fixes: 625d988acc28 ("drm/i915: Extract reserving space in the GTT to a helper") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170306235414.23407-3-matthew.auld@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-02-15drm/i915: Remove i915_address_space.startChris Wilson
Once upon a time, back in the UMS days, we supported userspace initialising the GTT and sharing portions of the GTT with other users. Now, we own the GTT (both global and per-process) and the tables always start at 0 - so we can remove i915_address_space.start and forget about this old complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170215084357.19977-20-chris@chris-wilson.co.uk
2017-02-13drm/i915: Initial selftests for exercising evictionChris Wilson
Very simple tests to just ask eviction to find some free space in a full GTT and one with some available space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-41-chris@chris-wilson.co.uk
2017-02-10Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queuedDaniel Vetter
Chris Wilson needs the new drm_driver->release callback to make sure the shiny new dma-buf testcases don't oops the driver on unload. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-02-06drm/i915: Use page coloring to provide the guard page at the end of the GTTChris Wilson
As we now mark the reserved hole (drm_mm.head_node) with the special UNEVICTABLE color, we can use the page coloring to avoid prefetching of the CS beyond the end of the GTT. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170206084547.27921-3-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2017-02-03drm: Improve drm_mm search (and fix topdown allocation) with rbtreesChris Wilson
The drm_mm range manager claimed to support top-down insertion, but it was neither searching for the top-most hole that could fit the allocation request nor fitting the request to the hole correctly. In order to search the range efficiently, we create a secondary index for the holes using either their size or their address. This index allows us to find the smallest hole or the hole at the bottom or top of the range efficiently, whilst keeping the hole stack to rapidly service evictions. v2: Search for holes both high and low. Rename flags to mode. v3: Discover rb_entry_safe() and use it! v4: Kerneldoc for enum drm_mm_insert_mode. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Sean Paul <seanpaul@chromium.org> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Alexandre Courbot <gnurou@gmail.com> Cc: Eric Anholt <eric@anholt.net> Cc: Sinclair Yeh <syeh@vmware.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
2017-01-12drm/i915: Detect vma reserved for execbuf in evict-for-nodeChris Wilson
The vma->exec_list is still the only means we have for both reserving an object in execbuf, and for constructing the eviction list. So during the construction of the eviction list, we must treat anything already on the exec_list as being pinned. Yes, this sharing of two semantically different lists will be fixed! But in the meantime, we have the issue that this is tripping up CI since we started using i915_gem_gtt_reserve_node() + i915_gem_evict_for_node() from the regular execbuf reservation path in commit 606fec956c0e ("drm/i915: Prefer random replacement before eviction search"): [ 108.424063] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:254! [ 108.424072] invalid opcode: 0000 [#1] PREEMPT SMP [ 108.424079] Modules linked in: snd_hda_intel i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core mei_me snd_pcm lpc_ich mei sdhci_pci sdhci mmc_core e1000e ptp pps_core [last unloaded: i915] [ 108.424132] CPU: 1 PID: 6865 Comm: gem_cs_tlb Tainted: G U 4.10.0-rc3-CI-CI_DRM_2049+ #1 [ 108.424143] Hardware name: Hewlett-Packard HP EliteBook 8440p/172A, BIOS 68CCU Ver. F.24 09/13/2013 [ 108.424154] task: ffff88012ae22600 task.stack: ffffc90000a14000 [ 108.424220] RIP: 0010:i915_gem_evict_for_node+0x237/0x410 [i915] [ 108.424229] RSP: 0018:ffffc90000a17a58 EFLAGS: 00010202 [ 108.424237] RAX: 0000000000005871 RBX: ffff88012d1ad778 RCX: 0000000000000000 [ 108.424246] RDX: 000000007ffff000 RSI: ffffc90000a17a68 RDI: ffff880127e694d8 [ 108.424255] RBP: ffffc90000a17aa0 R08: ffffc90000a17a68 R09: 0000000000000000 [ 108.424264] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000080000000 [ 108.424273] R13: ffffc90000a17a68 R14: ffff880127e694d8 R15: ffffffffa0387330 [ 108.424283] FS: 00007f8236e3d8c0(0000) GS:ffff880137c40000(0000) knlGS:0000000000000000 [ 108.424293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 108.424305] CR2: 00007f82347a2000 CR3: 000000012c866000 CR4: 00000000000006e0 [ 108.424317] Call Trace: [ 108.424368] i915_gem_gtt_reserve+0x67/0x80 [i915] [ 108.424424] __i915_vma_do_pin+0x248/0x620 [i915] [ 108.424487] ? __i915_vma_do_pin+0x162/0x620 [i915] [ 108.424540] i915_gem_execbuffer_reserve_vma.isra.8+0x153/0x1f0 [i915] [ 108.424591] i915_gem_execbuffer_reserve.isra.9+0x40e/0x440 [i915] [ 108.424643] i915_gem_do_execbuffer.isra.15+0x6d9/0x1b20 [i915] [ 108.424696] i915_gem_execbuffer2+0xc0/0x250 [i915] [ 108.424712] drm_ioctl+0x200/0x450 [ 108.424760] ? i915_gem_execbuffer+0x330/0x330 [i915] [ 108.424776] do_vfs_ioctl+0x90/0x6e0 [ 108.424789] ? up_read+0x1a/0x40 [ 108.424800] ? trace_hardirqs_on_caller+0x122/0x1b0 [ 108.424813] SyS_ioctl+0x3c/0x70 [ 108.424828] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 108.424839] RIP: 0033:0x7f8235867357 [ 108.424848] RSP: 002b:00007ffdc14504c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 108.424866] RAX: ffffffffffffffda RBX: 00007ffdc1450600 RCX: 00007f8235867357 [ 108.424878] RDX: 00007ffdc14505a0 RSI: 0000000040406469 RDI: 0000000000000003 [ 108.424890] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000022 [ 108.424903] R10: 0000000000000007 R11: 0000000000000246 R12: 0000000000000002 [ 108.424915] R13: 0000000000419101 R14: 00007ffdc1450600 R15: 00007ffdc14505f0 [ 108.424928] Code: 45 b8 8b 4d c0 4c 89 f2 48 89 de ff d0 49 8b 07 4c 8b 45 b8 48 85 c0 75 dd 65 ff 0d d4 a1 c8 5f 0f 84 47 01 00 00 e9 0d fe ff ff <0f> 0b 45 31 f6 4c 8b 65 c8 49 8b 04 24 4d 39 ec 49 8d 9c 24 28 [ 108.425055] RIP: i915_gem_evict_for_node+0x237/0x410 [i915] RSP: ffffc90000a17a58 Fixes: 172ae5b4c8c1 ("drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)") Fixes: 606fec956c0e ("drm/i915: Prefer random replacement before eviction search") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170111182132.19174-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-11drm/i915: Extract reserving space in the GTT to a helperChris Wilson
Extract drm_mm_reserve_node + calling i915_gem_evict_for_node into its own routine so that it can be shared rather than duplicated. v2: Kerneldoc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: igvt-g-dev@lists.01.org Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170111112312.31493-2-chris@chris-wilson.co.uk
2017-01-10drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZEChris Wilson
Start converting over from the byte count to its semantic macro, either we want to allocate the size of a physical page in main memory or we want the size of a virtual page in the GTT. 4096 could mean either, but PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve code comprehension and future changes. In the future, we may want to use variable GTT page sizes and so have the challenge of knowing which hardcoded values were used to represent a physical page vs the virtual page. v2: Look for a few more 4096s to convert, discover IS_ALIGNED(). v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep bdw stolen w/a as 4096 until we know better. v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page sizes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-01-05drm/i915: Clear ret before unbinding in i915_gem_evict_something()Chris Wilson
Missed when rebasing patches, I failed to set ret to zero before starting the unbind loop (which depends upon ret being zero). Reported-by: Matthew Auld <matthew.william.auld@gmail.com> Fixes: 9332f3b1b99a ("drm/i915: Combine loops within i915_gem_evict_something") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170105155940.10033-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Cc: <stable@vger.kernel.org> # v4.9+
2017-01-04Merge tag 'drm-misc-next-2016-12-30' of ↵Daniel Vetter
git://anongit.freedesktop.org/git/drm-misc into drm-intel-next-queued Directly merge drm-misc into drm-intel since Dave is on vacation and we need the various drm-misc patches (fb format rework, drm mm fixes, selftest framework and others). Also pulled back -rc2 in first to resync with drm-intel-fixes and make sure I can reuse the exact rerere solutions from drm-tip for safety, and because I'm lazy. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-12-28drm: Apply tight eviction scanning to color_adjustChris Wilson
Using mm->color_adjust makes the eviction scanner much tricker since we don't know the actual neighbours of the target hole until after it is created (after scanning is complete). To work out whether we need to evict the neighbours because they impact upon the hole, we have to then check the hole afterwards - requiring an extra step in the user of the eviction scanner when they apply color_adjust. v2: Massage kerneldoc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-34-chris@chris-wilson.co.uk
2016-12-28drm: Compute tight evictions for drm_mm_scanChris Wilson
Compute the minimal required hole during scan and only evict those nodes that overlap. This enables us to reduce the number of nodes we need to evict to the bare minimum. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-31-chris@chris-wilson.co.uk
2016-12-28drm: Unconditionally do the range check in drm_mm_scan_add_block()Chris Wilson
Doing the check is trivial (low cost in comparison to overall eviction) and helps simplify the code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161222083641.2691-29-chris@chris-wilson.co.uk
2016-12-27drm: Extract struct drm_mm_scan from struct drm_mmChris Wilson
The scan state occupies a large proportion of the struct drm_mm and is rarely used and only contains temporary state. That makes it suitable to moving to its struct and onto the stack of the callers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> [danvet: Fix up etnaviv to compile, was missing a BUG_ON.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-12-12drm/i915: Retire before attempting to evict from the active listsChris Wilson
Some object retain an extra pin whilst they are active (e.g. contexts). This excludes them from being considered for eviction unless we idle the GPU. If before we look at the active list, we retire beforehand we can hopefully remove a few excess pins and reduce the amount of searching required. v2: Similar principle applies to evict_for_vma Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161209150555.602-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2016-12-05drm/i915: Fix i915_gem_evict_for_vma (soft-pinning)Chris Wilson
Soft-pinning depends upon being able to check for availabilty of an interval and evict overlapping object from a drm_mm range manager very quickly. Currently it uses a linear list, and so performance is dire and not suitable as a general replacement. Worse, the current code will oops if it tries to evict an active buffer. It also helps if the routine reports the correct error codes as expected by its callers and emits a tracepoint upon use. For posterity since the wrong patch was pushed (i.e. that missed these key points and had known bugs), this is the changelog that should have been on commit 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer"): Userspace can pass in an offset that it presumes the object is located at. The kernel will then do its utmost to fit the object into that location. The assumption is that userspace is handling its own object locations (for example along with full-ppgtt) and that the kernel will rarely have to make space for the user's requests. This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following: * if the user supplies a virtual address via the execobject->offset *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then that object is placed at that offset in the address space selected by the context specifier in execbuffer. * the location must be aligned to the GTT page size, 4096 bytes * as the object is placed exactly as specified, it may be used by this execbuffer call without relocations pointing to it It may fail to do so if: * EINVAL is returned if the object does not have a 4096 byte aligned address * the object conflicts with another pinned object (either pinned by hardware in that address space, e.g. scanouts in the aliasing ppgtt) or within the same batch. EBUSY is returned if the location is pinned by hardware EINVAL is returned if the location is already in use by the batch * EINVAL is returned if the object conflicts with its own alignment (as meets the hardware requirements) or if the placement of the object does not fit within the address space All other execbuffer errors apply. Presence of this execbuf extension may be queried by passing I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for a reported value of 1 (or greater). v2: Combine the hole/adjusted-hole ENOSPC checks v3: More color, more splitting, more blurb. Fixes: 506a8e87d8d2 ("drm/i915: Add soft-pinning API for execbuffer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-2-chris@chris-wilson.co.uk
2016-11-29drm/i915: Convert vm->dev backpointer to vm->i915Chris Wilson
99% of the time we access i915_address_space->dev we want the i915 device and not the drm device, so let's store the drm_i915_private backpointer instead. The only real complication here are the inlines in i915_vma.h where drm_i915_private is not yet defined and so we have to choose an alternate path for our asserts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-10-28drm/i915: Enable multiple timelinesChris Wilson
With the infrastructure converted over to tracking multiple timelines in the GEM API whilst preserving the efficiency of using a single execution timeline internally, we can now assign a separate timeline to every context with full-ppgtt. v2: Add a comment to indicate the xfer between timelines upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
2016-10-28drm/i915: Markup GEM API with lockdep assertsChris Wilson
Add lockdep_assert_held(struct_mutex) to the API preamble of the internal GEM interfaces. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk
2016-10-24drm/i915: Move user fault tracking to a separate listChris Wilson
We want to decouple RPM and struct_mutex, but currently RPM has to walk the list of bound objects and remove userspace mmapping before we suspend (otherwise userspace may continue to access the GTT whilst it is powered down). This currently requires the struct_mutex to walk the bound_list, but if we move that to a separate list and lock we can take the first step towards removing the struct_mutex. v2: Split runtime suspend unmapping vs regular unmapping, to make the locking (and barriers) clearer. Add the object to the userfault_list prior to inserting the first PTE, the race between add/revoke depends upon struct_mutex for regular unmappings and rpm for runtime-suspend. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1 Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk