linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-09-07	drm/vc4: crtc: Move the HVS gamma LUT setup to our init function	Maxime Ripard
	Since most of the HVS channel is setup in the init function, let's move the gamma setup there too. As this makes the HVS mode_set function empty, let's remove it in the process. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/d439da8f1592a450a6ad35ab1f9e77def17c7965.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Move HVS init and close to a function	Maxime Ripard
	In order to make further refactoring easier, let's move the HVS channel setup / teardown to their own function. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/fb1b5299d1636ddce8340b51a80d51641839f83b.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Move PV dump to config_pv	Maxime Ripard
	Now that we only configure the PixelValve in vc4_crtc_config_pv, it doesn't really make much sense to dump its register content in its caller. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/c195af7d9e140a2a6db32992ee7e54071c6f94ba.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Turn pixelvalve reset into a function	Maxime Ripard
	The driver resets the pixelvalve FIFO in a number of occurences without always using the same sequence. Since this will be critical for BCM2711, let's move that sequence to a function so that we are consistent. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/fb31003a9eee02c4b949556299ff41f0a113499a.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Disable color management for HVS5	Maxime Ripard
	The HVS5 uses different color matrices. Disable color management support for now. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/e528e2edf0a1be3930196d437e548114dd9fcf59.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Add HDMI1 encoder type	Maxime Ripard
	The BCM2711 sports a second HDMI controller, so let's add that second HDMI encoder type. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/6ba56d2421a4ad59ce72178e8f37eacfbd72cb33.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Rename HDMI encoder type to HDMI0	Maxime Ripard
	The previous generations were only supporting a single HDMI controller, but that's about to change, so put an index as well to differentiate between the two controllers. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/84e11e4793aaa30d6e5c56e305d22404ac5a932d.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Add function to compute FIFO level bits	Maxime Ripard
	The longer FIFOs in vc5 pixelvalves means that the FIFO full level doesn't fit in the original register field and that we also have a secondary field. In order to prepare for this, let's move the registers fill part to a helper function. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/e46a3823128af50c1c833de8fa9b95e9b86c2f66.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Add FIFO depth to vc4_crtc_data	Maxime Ripard
	Not all pixelvalve FIFOs in vc5 have the same depth, so we need to add that to our vc4_crtc_data structure to be able to compute the fill level properly later on. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/7df3549c1bea9b0a27c784dc416bb9a831e4e18f.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Assign output to channel automatically	Maxime Ripard
	The HVS found in the BCM2711 has 6 outputs and 3 FIFOs, with each output being connected to a pixelvalve, and some muxing between the FIFOs and outputs. Any output cannot feed from any FIFO though, and they all have a bunch of constraints. In order to support this, let's store the possible FIFOs each output can be assigned to in the vc4_crtc_data, and use that information at atomic_check time to iterate over all the CRTCs enabled and assign them FIFOs. The channel assigned is then set in the vc4_crtc_state so that the rest of the driver can use it. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/f9aba3814ef37156ff36f310118cdd3954dd3dc5.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: kms: Convert to for_each_new_crtc_state	Maxime Ripard
	The vc4 atomic commit loop has an handrolled loop that is basically identical to for_each_new_crtc_state, let's convert it to that helper. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/a712d2b70aaee20379cfc52c2141aa2f6e2a9d5b.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Enable and disable the PV in atomic_enable / disable	Maxime Ripard
	The VIDEN bit in the pixelvalve currently being used to enable or disable the pixelvalve seems to not be enough in some situations, which whill end up with the pixelvalve stalling. In such a case, even re-enabling VIDEN doesn't bring it back and we need to clear the FIFO. This can only be done if the pixelvalve is disabled though. In order to overcome this, we can configure the pixelvalve during mode_set_no_fb by calling vc4_crtc_config_pv, but only enable it in atomic_enable and flush the FIFO there, and in atomic_disable disable the pixelvalve again. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/e97596f62f4df83424d994a23465463ac60f986e.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Use local chan variable	Maxime Ripard
	The vc4_crtc_handle_page_flip already has a local variable holding the value of vc4_crtc->channel, so let's use it instead. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/439c589baec72ddb89159857a2d078fdd77b02a2.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Rename HVS channel to output	Maxime Ripard
	In vc5, the HVS has 6 outputs and 3 FIFOs (or channels), with pixelvalves each being assigned to a given output, but each output can then be muxed to feed from multiple FIFOs. Since vc4 had that entirely static, both were probably equivalent, but since that changes, let's rename hvs_channel to hvs_output in the vc4_crtc_data, since a pixelvalve is really connected to an output, and not to a FIFO. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/b7618bb17b1c435c5d6ce50bcde2fe9243281d02.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Move the cob allocation outside of bind	Maxime Ripard
	The COB allocation depends on the HVS channel used for a given pixelvalve. While the channel allocation was entirely static in vc4, vc5 changes that and at bind time, a pixelvalve can be assigned to multiple HVS channels. Let's prepare that rework by allocating the COB when it's actually needed. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/484cbd4b00cfeee425295df438222258cc39a3dd.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Use a shared interrupt	Maxime Ripard
	Some pixelvalves in vc5 use the same interrupt line so let's register our interrupt handler as a shared one. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/5a915d374357f41083ac71779fa9b2c35a339c2f.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: crtc: Deal with different number of pixel per clock	Maxime Ripard
	Some of the HDMI pixelvalves in vc5 output two pixels per clock cycle. Let's put the number of pixel output per clock cycle in the CRTC data and update the various calculations to reflect that. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/18a3bb079981ba820132b37e736a4bb371234d2e.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: plane: Create more planes	Maxime Ripard
	Let's now create more planes that can be affected to all the CRTCs. vc4 has 3 CRTCs, 1 primary and 1 cursor each, and was having 24 (8 planes per CRTC) overlays. However, vc5 has 5 CRTCs, so keeping the same logic would put us at 50 planes which is well above the 32 planes limit imposed by DRM. Using 16 seems like a good tradeoff between staying under 32 and yet providing enough planes. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/b41003001541fc2bb23668c699c0369ff7983be8.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: plane: Optimize the LBM allocation size	Dave Stevenson
	The current code is using the maximum of the source line size and the destination line size to compute the size of the LBM to allocate. While this is simpler, it starts to be an issue with modes such as 4k with a quite long that will consume all the available memory, so we no longer have that luxury. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://patchwork.freedesktop.org/patch/msgid/b9e091883a4f7395c5b6a4f7c6070225934293db.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: plane: Change LBM alignment constraint on LBM	Dave Stevenson
	The HVS5 needs an alignment of 64bytes for its LBM memory, so let's reflect it. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://patchwork.freedesktop.org/patch/msgid/6f9c4fe1eb9258a3f1d0f21af6a99c42472ac531.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: hvs: Boost the core clock during modeset	Maxime Ripard
	In order to prevent timeouts and stalls in the pipeline, the core clock needs to be maxed at 500MHz during a modeset on the BCM2711. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/37ed9e0124c5cce005ddc8dafe821d8b0da036ff.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/vc4: Add support for the BCM2711 HVS5	Dave Stevenson
	The HVS found in the BCM2711 is slightly different from the previous generations. Most notably, the display list layout changes a bit, the LBM doesn't have the same size and the formats ordering for some formats is swapped. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Tested-by: Chanwoo Choi <cw00.choi@samsung.com> Tested-by: Hoegeun Kwon <hoegeun.kwon@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/1d02fab3b916d639c2dc05608c117bbd8230ebe8.1599120059.git-series.maxime@cerno.tech
2020-09-07	drm/i915: Unlock the shared hwsp_gtt object after pinning	Thomas Hellström
	The hwsp_gtt object is used for sub-allocation and could therefore be shared by many contexts causing unnecessary contention during concurrent context pinning. However since we're currently locking it only for pinning, it remains resident until we unpin it, and therefore it's safe to drop the lock early, allowing for concurrent thread access. Signed-off-by: Thomas Hellström <thomas.hellstrom@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Filter wake_flags passed to default_wake_function	Chris Wilson
	(NOTE: This is the minimal backportable fix, a full fix is being developed at https://patchwork.freedesktop.org/patch/388048/) The flags passed to the wait_entry.func are passed onwards to try_to_wake_up(), which has a very particular interpretation for its wake_flags. In particular, beyond the published WF_SYNC, it has a few internal flags as well. Since we passed the fence->error down the chain via the flags argument, these ended up in the default_wake_function confusing the kernel/sched. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2110 Fixes: ef4688497512 ("drm/i915: Propagate fence errors") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728152144.1100-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added a note and link about more complete fix] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Remove i915_request.lock requirement for execution callbacks	Chris Wilson
	To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. Previous fix 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") however did not correctly serialize request retirement with the execution callbacks. We were using the i915_request.lock to serialise adding an execution callback with __i915_request_submit. However, if we use an atomic llist_add to serialise multiple waiters and then check to see if the request is already executing, we can remove the irq-spinlock and fix serialization between retirement and execution callbacks in one go. v2: Avoid using the irq_work when outside of the irq-spinlocks, where we can execute the callbacks immediately. v3: Pay close attention to the order of setting ACTIVE on retirement, we need to ensure the request is signaled and breadcrumbs detached before we finish removing the request from the engine. v4: Expanded commit message. Fixes: 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Be wary of data races when reading the active execlists	Chris Wilson
	To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. The latest occurence of this was spotted by KCSAN: [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915] [ 1413.563221] [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1: [ 1413.563548] __await_execution+0x217/0x370 [i915] [ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915] [ 1413.564235] i915_request_await_object+0x421/0x490 [i915] [ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915] [ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915] [ 1413.564998] drm_ioctl_kernel+0x156/0x1b0 [ 1413.565022] drm_ioctl+0x2ff/0x480 [ 1413.565046] __x64_sys_ioctl+0x87/0xd0 [ 1413.565069] do_syscall_64+0x4d/0x80 [ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To complicate matters, we have to both avoid the read tearing of *active and avoid any write tearing as perform the pending[] -> inflight[] promotion of the execlists. This is because we cannot rely on the memcpy doing u64 aligned copies on all kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE annotations to satisfy KCSAN. v2: When in doubt, write the same comment again. v3: Expanded commit message. Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Add ww locking to pin_to_display_plane, v2.	Maarten Lankhorst
	Use ww locking for pin_to_display_plane for all the pinning and locking. With the locking removed from set_cache_level, we need to fix i915_gem_set_caching_ioctl to take the object reservation lock. As this is a single lock, we don't need to use the ww dance. Changes since v1: - Do not use ww locking in i915_gem_set_caching_ioctl (Thomas). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-24-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Add ww locking to vm_fault_gtt	Maarten Lankhorst
	We want to start requiring the reservation_lock instead of obj->mm.lock for pinning objects, take the ww lock inside vm_fault_gtt as a first step towards the legacy lock removal. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-23-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Move i915_vma_lock in the selftests to avoid lock inversion, v3.	Maarten Lankhorst
	Make sure vma_lock is not used as inner lock when kernel context is used, and add ww handling where appropriate. Ensure that execbuf selftests keep passing by using ww handling. Changes since v2: - Fix i915_gem_context finally. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-22-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Use ww pinning for intel_context_create_request()	Maarten Lankhorst
	We want to get rid of intel_context_pin(), convert intel_context_create_request() first. :) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-21-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915/selftests: Fix locking inversion in lrc selftest.	Maarten Lankhorst
	This function does not use intel_context_create_request, so it has to use the same locking order as normal code. This is required to shut up lockdep in selftests. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-20-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Dirty hack to fix selftests locking inversion	Maarten Lankhorst
	Some i915 selftests still use i915_vma_lock() as inner lock, and intel_context_create_request() intel_timeline->mutex as outer lock. Fortunately for selftests this is not an issue, they should be fixed but we can move ahead and cleanify lockdep now. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-19-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Convert i915_perf to ww locking as well	Maarten Lankhorst
	We have the ordering of timeline->mutex vs resv_lock wrong, convert the i915_pin_vma and intel_context_pin as well to future-proof this. We may need to do future changes to do this more transaction-like, and only get down to a single i915_gem_ww_ctx, but for now this should work. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-18-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Kill last user of intel_context_create_request outside of selftests	Maarten Lankhorst
	Instead of using intel_context_create_request(), use intel_context_pin() and i915_create_request directly. Now all those calls are gone outside of selftests. :) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-17-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Convert i915_gem_object/client_blt.c to use ww locking as well, v2.	Maarten Lankhorst
	This is the last part outside of selftests that still don't use the correct lock ordering of timeline->mutex vs resv_lock. With gem fixed, there are a few places that still get locking wrong: - gvt/scheduler.c - i915_perf.c - Most if not all selftests. Changes since v1: - Add intel_engine_pm_get/put() calls to fix use-after-free when using intel_engine_get_pool(). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-16-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Make sure execbuffer always passes ww state to i915_vma_pin.	Maarten Lankhorst
	As a preparation step for full object locking and wait/wound handling during pin and object mapping, ensure that we always pass the ww context in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this happens. This also requires changing the order of eb_parse slightly, to ensure we pass ww at a point where we could still handle -EDEADLK safely. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Rework intel_context pinning to do everything outside of pin_mutex	Maarten Lankhorst
	Instead of doing everything inside of pin_mutex, we move all pinning outside. Because i915_active has its own reference counting and pinning is also having the same issues vs mutexes, we make sure everything is pinned first, so the pinning in i915_active only needs to bump refcounts. This allows us to take pin refcounts correctly all the time. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-14-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Pin engine before pinning all objects, v5.	Maarten Lankhorst
	We want to lock all gem objects, including the engine context objects, rework the throttling to ensure that we can do this. Now we only throttle once, but can take eb_pin_engine while acquiring objects. This means we will have to drop the lock to wait. If we don't have to throttle we can still take the fastpath, if not we will take the slowpath and wait for the throttle request while unlocked. The engine has to be pinned as first step, otherwise gpu relocations won't work. Changes since v1: - Only need to get a throttled request in the fastpath, no need for a global flag any more. - Always free the waited request correctly. Changes since v2: - Use intel_engine_pm_get()/put() to keeep engine pool alive during EDEADLK handling. Changes since v3: - Fix small rq leak. Changes since v4: - Use a single reloc_context, for intel_context_pin_ww(). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-13-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Nuke arguments to eb_pin_engine	Maarten Lankhorst
	Those arguments are already set as eb.file and eb.args, so kill off the extra arguments. This will allow us to move eb_pin_engine() to after we reserved all BO's. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-12-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Add ww context handling to context_barrier_task	Maarten Lankhorst
	This is required if we want to pass a ww context in intel_context_pin and gen6_ppgtt_pin(). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-11-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Use ww locking in intel_renderstate.	Maarten Lankhorst
	We want to start using ww locking in intel_context_pin, for this we need to lock multiple objects, and the single i915_gem_object_lock is not enough. Convert to using ww-waiting, and make sure we always pin intel_context_state, even if we don't have a renderstate object. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-10-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Use per object locking in execbuf, v12.	Maarten Lankhorst
	Now that we changed execbuf submission slightly to allow us to do all pinning in one place, we can now simply add ww versions on top of struct_mutex. All we have to do is a separate path for -EDEADLK handling, which needs to unpin all gem bo's before dropping the lock, then starting over. This finally allows us to do parallel submission, but because not all of the pinning code uses the ww ctx yet, we cannot completely drop struct_mutex yet. Changes since v1: - Keep struct_mutex for now. :( Changes since v2: - Make sure we always lock the ww context in slowpath. Changes since v3: - Don't call __eb_unreserve_vma in eb_move_to_gpu now; this can be done on normal unlock path. - Unconditionally release vmas and context. Changes since v4: - Rebased on top of struct_mutex reduction. Changes since v5: - Remove training wheels. Changes since v6: - Fix accidentally broken -ENOSPC handling. Changes since v7: - Handle gt buffer pool better. Changes since v8: - Properly clear variables, to make -EDEADLK handling not BUG. Change since v9: - Fix unpinning fence on pnv and below. Changes since v10: - Make relocation gpu chaining working again. Changes since v11: - Remove relocation chaining, pain to make it work. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-9-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Parse command buffer earlier in eb_relocate(slow)	Maarten Lankhorst
	We want to introduce backoff logic, but we need to lock the pool object as well for command parsing. Because of this, we will need backoff logic for the engine pool obj, move the batch validation up slightly to eb_lookup_vmas, and the actual command parsing in a separate function which can get called from execbuf relocation fast and slowpath. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-8-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Remove locking from i915_gem_object_prepare_read/write	Maarten Lankhorst
	Execbuffer submission will perform its own WW locking, and we cannot rely on the implicit lock there. This also makes it clear that the GVT code will get a lockdep splat when multiple batchbuffer shadows need to be performed in the same instance, fix that up. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-7-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.	Maarten Lankhorst
	i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory eviction. We don't use it yet, but lets start adding the definition first. To use it, we have to pass a non-NULL ww to gem_object_lock, and don't unlock directly. It is done in i915_gem_ww_ctx_fini. Changes since v1: - Change ww_ctx and obj order in locking functions (Jonas Lahtinen) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	Revert "drm/i915/gem: Split eb_vma into its own allocation"	Maarten Lankhorst
	This reverts commit 0f1dd02295f3 ("drm/i915/gem: Split eb_vma into its own allocation") and also moves all unreserving to a single place at the end, which is a minor simplification. With the WW locking, we will drop all references only at the end when unlocking, so refcounting can now be removed. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-5-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	Revert "drm/i915/gem: Drop relocation slowpath".	Maarten Lankhorst
	This reverts commit 7dc8f1143778 ("drm/i915/gem: Drop relocation slowpath"). We need the slowpath relocation for taking ww-mutex inside the page fault handler, and we will take this mutex when pinning all objects. We also functionally revert ef398881d27d ("drm/i915/gem: Limit struct_mutex to eb_reserve"), as we need the struct_mutex in the slowpath as well, and a tiny part of 003d8b9143a6 ("drm/i915/gem: Only call eb_lookup_vma once during execbuf ioctl"). Specifically, we make the -EAGAIN handling part of fallback to slowpath again. With this, we have a proper working slowpath again, which will allow us to do fault handling with WW locks held. [mlankhorst: Adjusted for reloc_gpu_flush() changes] Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [mlankhorst: Removed extra reloc_gpu_flush()] Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-4-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915: Revert relocation chaining commits.	Maarten Lankhorst
	This reverts commit 964a9b0f611ee ("drm/i915/gem: Use chained reloc batches") and commit 0e97fbb080553 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf"). When adding ww locking to execbuf, it's hard enough to deal with a single BO that is part of relocation execution. Chaining is hard to get right, and with GPU relocation deprecated, it's best to drop this altogether, instead of trying to fix something we will remove. This is not a completely 1:1 revert, we reset rq_size to 0 in reloc_cache_init, this was from e3d291301f99 ("drm/i915/gem: Implement legacy MI_STORE_DATA_IMM"), because we don't want to break the selftests. (Daniel) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-3-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	Revert "drm/i915/gem: Async GPU relocations only"	Maarten Lankhorst
	This reverts commit 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only"), and related commit 7ac2d2536dfa7 ("drm/i915/gem: Delete unused code"). Async GPU relocations are not the path forward, we want to remove GPU accelerated relocation support eventually when userspace is fixed to use VM_BIND, and this is the first step towards that. We will keep async gpu relocations around for now, until userspace is fixed. Relocation support will be disabled completely on platforms where there was never any userspace that depends on it, as the hardware doesn't require it from at least gen9+ onward. For older platforms, the plan is to use cpu relocations only. The igt side is fixed in igt commit 39e9aa1032a4e ("tests/i915: Remove subtests that rely on async relocation behavior"). Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-2-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-09-07	drm/i915/gem: Free the fence after a fence-chain lookup failure	Chris Wilson
	If dma_fence_chain_find_seqno() reports an error, it does so in its preamble before it disposes of the input fence. On handling the error, we need to drop the reference to the fence. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2292 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 13149e8bafc4 ("drm/i915: add syncobj timeline support") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200806161056.17593-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>