linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-06-19	drm/i915: Differentiate between sw write location into ring and last hw read	Chris Wilson
	We need to keep track of the last location we ask the hw to read up to (RING_TAIL) separately from our last write location into the ring, so that in the event of a GPU reset we do not tell the HW to proceed into a partially written request (which can happen if that request is waiting for an external signal before being executed). v2: Refactor intel_ring_reset() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_fence/await-hang Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests") Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> (cherry picked from commit e6ba9992de6c63fe86c028b4876338e1cb7dac34) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615131129.3061-1-chris@chris-wilson.co.uk
2017-06-19	drm/i915: Update DRIVER_DATE to 20170619	Daniel Vetter
	Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-06-17	drm/msm: Separate locking of buffer resources from struct_mutex	Sushmita Susheelendra
	Buffer object specific resources like pages, domains, sg list need not be protected with struct_mutex. They can be protected with a buffer object level lock. This simplifies locking and makes it easier to avoid potential recursive locking scenarios for SVM involving mmap_sem and struct_mutex. This also removes unnecessary serialization when creating buffer objects, and also between buffer object creation and GPU command submission. Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org> [robclark: squash in handling new locking for shrinker] Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-17	drm/nouveau/disp/nv50-: avoid creating ORs that aren't present on HW	Ben Skeggs
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-06-16	drm/i915/cfl: Introduce Coffee Lake workarounds.	Rodrigo Vivi
	Coffee Lake inherit most of Kabylake production workarounds. v2: Fix typo on commit message and remove WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC, since as Mika pointed out they shouldn't be here for cfl according to BSpec. Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1497653398-15722-1-git-send-email-rodrigo.vivi@intel.com
2017-06-16	amdgpu: use drm sync objects for shared semaphores (v6)	Dave Airlie
	This creates a new command submission chunk for amdgpu to add in and out sync objects around the submission. Sync objects are managed via the drm syncobj ioctls. The command submission interface is enhanced with two new chunks, one for syncobj pre submission dependencies, and one for post submission sync obj signalling, and just takes a list of handles for each. This is based on work originally done by David Zhou at AMD, with input from Christian Konig on what things should look like. In theory VkFences could be backed with sync objects and just get passed into the cs as syncobj handles as well. NOTE: this interface addition needs a version bump to expose it to userspace. TODO: update to dep_sync when rebasing onto amdgpu master. (with this - r-b from Christian) v1.1: keep file reference on import. v2: move to using syncobjs v2.1: change some APIs to just use p pointer. v3: make more robust against CS failures, we now add the wait sems but only remove them once the CS job has been submitted. v4: rewrite names of API and base on new syncobj code. v5: move post deps earlier, rename some apis v6: lookup post deps earlier, and just replace fences in post deps stage (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-06-16	amdgpu/cs: split out fence dependency checking (v2)	Dave Airlie
	This just splits out the fence depenency checking into it's own function to make it easier to add semaphore dependencies. v2: rebase onto other changes. v1-Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-06-16	drm/amdgpu: don't check the default value for vm size	Alex Deucher
	Avoids printing spurious messages like this: [ 3.102059] amdgpu 0000:01:00.0: VM size (-1) must be a power of 2 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-06-16	drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH	Dhinakaran Pandiyan
	Although we use 9 bits of Device ID for identifying PCH, only 8 bits are stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the platforms with LP PCH. v2: Drop PCH_LPT_LP change (Imre) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Imre Deak <imre.deak@intel.com> Fixes: commit ec7e0bb35f8d ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH") Reported-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1497641774-29104-1-git-send-email-dhinakaran.pandiyan@intel.com
2017-06-16	drm/i915: Stash a pointer to the obj's resv in the vma	Chris Wilson
	During execbuf, a mandatory step is that we add this request (this fence) to each object's reservation_object. Inside execbuf, we track the vma, and to add the fence to the reservation_object then means having to first chase the obj, incurring another cache miss. We can reduce the number of cache misses by stashing a pointer to the reservation_object in the vma itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616140525.6394-1-chris@chris-wilson.co.uk
2017-06-16	drm/i915: Async GPU relocation processing	Chris Wilson
	If the user requires patching of their batch or auxiliary buffers, we currently make the alterations on the cpu. If they are active on the GPU at the time, we wait under the struct_mutex for them to finish executing before we rewrite the contents. This happens if shared relocation trees are used between different contexts with separate address space (and the buffers then have different addresses in each), the 3D state will need to be adjusted between execution on each context. However, we don't need to use the CPU to do the relocation patching, as we could queue commands to the GPU to perform it and use fences to serialise the operation with the current activity and future - so the operation on the GPU appears just as atomic as performing it immediately. Performing the relocation rewrites on the GPU is not free, in terms of pure throughput, the number of relocations/s is about halved - but more importantly so is the time under the struct_mutex. v2: Break out the request/batch allocation for clearer error flow. v3: A few asserts to ensure rq ordering is maintained Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16	drm/i915: Allow execbuffer to use the first object as the batch	Chris Wilson
	Currently, the last object in the execlist is the always the batch. However, when building the batch buffer we often know the batch object first and if we can use the first slot in the execlist we can emit relocation instructions relative to it immediately and avoid a separate pass to adjust the relocations to point to the last execlist slot. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16	drm/i915: Wait upon userptr get-user-pages within execbuffer	Chris Wilson
	This simply hides the EAGAIN caused by userptr when userspace causes resource contention. However, it is quite beneficial with highly contended userptr users as we avoid repeating the setup costs and kernel-user context switches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-06-16	drm/i915: First try the previous execbuffer location	Chris Wilson
	When choosing a slot for an execbuffer, we ideally want to use the same address as last time (so that we don't have to rebind it) and the same address as expected by the user (so that we don't have to fixup any relocations pointing to it). If we first try to bind the incoming execbuffer->offset from the user, or the currently bound offset that should hopefully achieve the goal of avoiding the rebind cost and the relocation penalty. However, if the object is not currently bound there we don't want to arbitrarily unbind an object in our chosen position and so choose to rebind/relocate the incoming object instead. After we report the new position back to the user, on the next pass the relocations should have settled down. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com>
2017-06-16	drm/i915: Store a persistent reference for an object in the execbuffer cache	Chris Wilson
	If we take a reference to the object/vma when it is first used in an execbuf, we can keep that reference until the object's file-local handle is closed. Thereby saving a frequent ref/unref pair. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16	drm/i915: Eliminate lots of iterations over the execobjects array	Chris Wilson
	The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16	drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations	Chris Wilson
	If we write a relocation into the buffer, we require our own implicit synchronisation added after the start of the execbuf, outside of the user's control. As we may end up clflushing, or doing the patch itself on the GPU, asynchronously we need to look at the implicit serialisation on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this object. If the user does trigger a stall for relocations, we make sure the stall is complete enough so that the batch is not submitted before we complete those relocations. Fixes: 77ae9957897d ("drm/i915: Enable userspace to opt-out of implicit fencing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16	drm/i915: Pass vma to relocate entry	Chris Wilson
	We can simplify our tracking of pending writes in an execbuf to the single bit in the vma->exec_entry->flags, but that requires the relocation function knowing the object's vma. Pass it along. Note we have only been using a single bit to track flushing since commit cc889e0f6ce6a63c62db17d702ecfed86d58083f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list unconditionally flushed all render caches before the breadcrumb and commit 6ac42f4148bc27e5ffd18a9ab0eac57f58822af4 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sat Jul 21 12:25:01 2012 +0200 drm/i915: Replace the complex flushing logic with simple invalidate/flush all did away with the explicit GPU domain tracking. This was then codified into the ABI with NO_RELOC in commit ed5982e6ce5f106abcbf071f80730db344a6da42 Author: Daniel Vetter <daniel.vetter@ffwll.ch> # Oi! Patch stealer! Date: Thu Jan 17 22:23:36 2013 +0100 drm/i915: Allow userspace to hint that the relocations were known Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16	drm/i915: Store a direct lookup from object handle to vma	Chris Wilson
	The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizable hashtable to jump from the object to the per-ctx vma. rhashtable was considered but we don't need the online resizing feature and the extra complexity proved to undermine its usefulness. Instead, we simply reallocate the hastable on demand in a background task and serialize it before iterating. In non-full-ppgtt modes, multiple files and multiple contexts can share the same vma. This leads to having multiple possible handle->vma links, so we only use the first to establish the fast path. The majority of buffers are not shared and so we should still be able to realise speedups with multiple clients. v2: Prettier names, more magic. v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16	drm/i915: Fix retrieval of hangcheck stats	Chris Wilson
	The default context is always supported (as it contains the global hangcheck stats) and the contexts for hangcheck are not limited to any ring. This was dropped in 2013 because it was supposed to have been included with Ben's full-ppgtt patch set. It never landed and the bug remains. References: https://bugs.freedesktop.org/show_bug.cgi?id=65845 Link: http://patchwork.freedesktop.org/patch/msgid/1372175222-27622-1-git-send-email-mika.kuoppala@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616132849.29597-1-chris@chris-wilson.co.uk
2017-06-16	drm/msm/hdmi: Fix HDMI pink strip issue seen on 8x96	Archit Taneja
	A 2 pixel wide pink strip was observed on the left end of some HDMI monitors configured in a HDMI mode. It turned out that we were missing out on configuring AVI infoframes, and unlike APQ8064, the 8x96 HDMI H/W seems to be sensitive to that. Add configuration of AVI infoframes. While at it, make sure that hdmi_audio_update is only called when we've detected that the monitor supports HDMI. Signed-off-by: Archit Taneja <architt@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm/hdmi: 8996 PLL: Populate unprepare	Archit Taneja
	Without doing anything in unprepare, the HDMI driver isn't able to switch modes successfully. Calling set_rate with a new rate results in an un-locked PLL. If we reset the PLL in unprepare, the PLL is able to lock with the new rate. Signed-off-by: Archit Taneja <architt@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm/hdmi: Use bitwise operators when building register values	Liviu Dudau
	Commit c0c0d9eeeb8d ("drm/msm: hdmi audio support") uses logical OR operators to build up a value to be written in the REG_HDMI_AUDIO_INFO0 and REG_HDMI_AUDIO_INFO1 registers when it should have used bitwise operators. Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Fixes: c0c0d9eeeb8d ("drm/msm: hdmi audio support") Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: update generated headers	Rob Clark
	Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: remove address-space id	Rob Clark
	Now that the msm_gem supports an arbitrary number of vma's, we no longer need to assign an id (index) to each address space. So rip out the associated code. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: support for an arbitrary number of address spaces	Rob Clark
	It means we have to do a list traversal where we once had an index into a table. But the list will normally have one or two entries. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: refactor how we handle vram carveout buffers	Rob Clark
	Pull some of the logic out into msm_gem_new() (since we don't need to care about the imported-bo case), and don't defer allocating pages. The latter is generally a good idea, since if we are using VRAM carveout to allocate contiguous buffers (ie. no IOMMU), the allocation is more likely to fail. So failing at allocation time is a more sane option. Plus this simplifies things in the next patch. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: pass address-space to _get_iova() and friends	Rob Clark
	No functional change, that will come later. But this will make it easier to deal with dynamically created address spaces (ie. per- process pagetables for gpu). Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm/mdp4+5: move aspace/id to base class	Rob Clark
	Before we can shift to passing the address-space object to _get_iova(), we need to fix a few places (dsi+fbdev) that were hard-coding the adress space id. That gets somewhat easier if we just move these to the kms base class. Prep work for next patch. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm/mdp5: kill pipe_lock	Rob Clark
	It serves no purpose, things should be sufficiently synchronized already by atomic framework. And it is somewhat awkward to be holding a spinlock when msm_gem_iova() is going to start needing to grab a mutex. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: fix locking inconsistency for gpu->hw_init()	Rob Clark
	Most, but not all, paths where calling the with struct_mutex held. The fast-path in msm_gem_get_iova() (plus some sub-code-paths that only run the first time) was masking this issue. So lets just always hold struct_mutex for hw_init(). And sprinkle some WARN_ON()'s and might_lock() to avoid this sort of problem in the future. Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: Remove memptrs->wptr	Jordan Crouse
	memptrs->wptr seems to be unused. Remove it to avoid confusing the upcoming preemption code. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: Add a struct to pass configuration to msm_gpu_init()	Jordan Crouse
	The amount of information that we need to pass into msm_gpu_init() is steadily increasing, so add a new struct to stabilize the function call and make it easier to add new configuration down the line. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA	Jordan Crouse
	Modify the 'pad' member of struct drm_msm_gem_info to 'flags'. If the user sets 'flags' to non-zero it means that they want a IOVA for the GEM object instead of a mmap() offset. Return the iova in the 'offset' member. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> [robclark: s/hint/flags in commit msg] Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: Remove idle function hook	Jordan Crouse
	There isn't any generic code that uses ->idle so remove it. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: Remove DRM_MSM_NUM_IOCTLS	Jordan Crouse
	The ioctl array is sparsely populated but the compiler will make sure that it is sufficiently sized for all the values that we have so we can safely use ARRAY_SIZE() instead of having a constantly changing #define in the uapi header. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/msm: gpu: Enable zap shader for A5XX	Jordan Crouse
	The A5XX GPU powers on in "secure" mode. In secure mode the GPU can only render to buffers that are marked as secure and inaccessible to the kernel and user through a series of hardware protections. In practice secure mode is used to draw things like a UI on a secure video frame. In order to switch out of secure mode the GPU executes a special shader that clears out the GMEM and other sensitve registers and then writes a register. Because the kernel can't be trusted the shader binary is signed and verified and programmed by the secure world. To do this we need to read the MDT header and the segments from the firmware location and put them in memory and present them for approval. For targets without secure support there is an out: if the secure world doesn't support secure then there are no hardware protections and we can freely write the SECVID_TRUST register from the CPU. We don't have 100% confidence that we can query the secure capabilities at run time but we have enough calls that need to go right to give us some confidence that we're at least doing something useful. Of course if we guess wrong you trigger a permissions violation which usually ends up in a system crash but thats a problem that shows up immediately. [v2: use child device per Bjorn] [v3: use generic MDT loader per Bjorn] [v4: use managed dma functions and ifdefs for the MDT loader] [v5: Add depends for QCOM_MDT_LOADER] Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> [robclark: fix Kconfig to use select instead of depends + #if IS_ENABLED()] Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16	drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty	Chris Wilson
	For ease of use (i.e. avoiding a few checks and function calls), store the object's cache coherency next to the cache is dirty bit. Specifically this patch aims to reduce the frequency of no-op calls to i915_gem_object_clflush() to counter-act the increase of such calls for GPU only objects in the previous patch. v2: Replace cache_dirty & ~cache_coherent with cache_dirty && !cache_coherent as gcc generates much better code for the latter (Tvrtko) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16	drm/i915: Mark CPU cache as dirty on every transition for CPU writes	Chris Wilson
	Currently, we only mark the CPU cache as dirty if we skip a clflush. This leads to some confusion where we have to ask if the object is in the write domain or missed a clflush. If we always mark the cache as dirty, this becomes a much simply question to answer. The goal remains to do as few clflushes as required and to do them as late as possible, in the hope of deferring the work to a kthread and not block the caller (e.g. execbuf, flips). v2: Always call clflush before GPU execution when the cache_dirty flag is set. This may cause some extra work on llc systems that migrate dirty buffers back and forth - but we do try to limit that by only setting cache_dirty at the end of the gpu sequence. v3: Always mark the cache as dirty upon a level change, as we need to invalidate any stale cachelines due to external writes. Reported-by: Dongwon Kim <dongwon.kim@intel.com> Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16	drm/i915: Make i915_vma_destroy() static	Chris Wilson
	i915_vma_destroy() is now not used outside of i915_vma.c so we can remove the export and make the function static. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616123508.12673-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
2017-06-16	drm/i915: Actually attach the tv_format property to the SDVO connector	Ville Syrjälä
	Attach the tv_format property to the SDVO connector instead of passing a '0' in place of the pointer to the property. This got broken when the SDVO connector properties were converted to atomic. We can thank sparse for catching this: drivers/gpu/drm/i915/intel_sdvo.c:2742:75: warning: Using plain integer as NULL pointer Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Fixes: 630d30a4ee27 ("drm/i915: Convert intel_sdvo connector properties to atomic.") Link: http://patchwork.freedesktop.org/patch/msgid/20170615172308.10121-1-ville.syrjala@linux.intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-06-16	drm/arm: mali-dp: Use CMA helper for plane buffer address calculation	Liviu Dudau
	CMA has gained a recent helper function for calculating the start of a plane buffer's physical address. Use that instead of the hand rolled version. Cc: Brian Starkey <brian.starkey@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
2017-06-16	drm/mali-dp: Check PM status when sharing interrupt lines	Liviu Dudau
	If an instance of Mali DP hardware shares the interrupt line with another hardware (usually another instance of the Mali DP) its interrupt handler can get called when the device is suspended. Check the PM status before making access to the hardware registers to avoid deadlocks. Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
2017-06-16	drm/arm: malidp: Use crtc->mode_valid() callback	Jose Abreu
	Now that we have a callback to check if crtc supports a given mode we can use it in malidp so that we restrict the number of probbed modes to the ones we can actually display. Also, remove the mode_fixup() callback as this is no longer needed because mode_valid() will be called before. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Carlos Palminha <palminha@synopsys.com> Cc: Alexey Brodkin <abrodkin@synopsys.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@linux.ie> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Archit Taneja <architt@codeaurora.org> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
2017-06-16	Merge tag 'gvt-next-2017-06-08' of https://github.com/01org/gvt-linux into ↵	Jani Nikula
	drm-intel-next-queued gvt-next-2017-06-08 First gvt-next pull for 4.13: - optimization for per-VM mmio save/restore (Changbin) - optimization for mmio hash table (Changbin) - scheduler optimization with event (Ping) - vGPU reset refinement (Fred) - other misc refactor and cleanups, etc. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170608093547.bjgs436e3iokrzdm@zhen-hp.sh.intel.com
2017-06-16	drm/nouveau: use proper prototype in nouveau_pmops_runtime() definition	Arnd Bergmann
	There is a prototype for this function in the header, but the function itself lacks a 'void' in the argument list, causing a harmless warning when building with 'make W=1': drivers/gpu/drm/nouveau/nouveau_drm.c: In function 'nouveau_pmops_runtime': drivers/gpu/drm/nouveau/nouveau_drm.c:730:1: error: old-style function definition [-Werror=old-style-definition] Fixes: 321f5c5f2c49 ("drm/nouveau: replace multiple open-coded runpm support checks with function") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-06-16	drm/nouveau: Skip vga_fini on non-PCI device	Mikko Perttunen
	As with vga_init, this function doesn't make sense on non-PCI devices, and the Thunderbolt check in it dereferences a NULL pointer in that case. Add some code to skip this function when the device is not a PCI device. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-06-16	drm/nouveau/tegra: Don't leave GPU in reset	Mikko Perttunen
	On Tegra186 systems with certain firmware revisions, leaving the GPU in reset can cause a hang. To prevent this, don't leave the GPU in reset. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-06-16	drm/nouveau/tegra: Skip manual unpowergating when not necessary	Mikko Perttunen
	On Tegra186, powergating is handled by the BPMP power domain provider and the "legacy" powergating API is not available. Therefore skip these calls if we are attached to a power domain. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-06-16	drm/nouveau/hwmon: Change permissions to numeric	Oscar Salvador
	This patch replaces the symbolic permissions with the numeric ones. Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>