linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2020-04-25	drm/i915: Add live selftests for indirect ctx batchbuffers	Mika Kuoppala
	Indirect ctx batchbuffers are a hw feature of which batch can be run, by hardware, during context restoration stage. Driver can setup a batchbuffer and also an offset into the context image. When context image is marshalled from memory to registers, and when the offset from the start of context register state is equal of what driver pre-determined, batch will run. So one can manipulate context restoration process at cacheline granularity, given some limitations, as you need to have rudimentaries in place before you can run a batch. Add selftest which will write the ring start register to a canary spot. This will test that hardware will run a batchbuffer for the context in question. v2: request wait fix, naming (Chris) v3: test order (Chris) v4: rebase Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200424214841.28076-3-mika.kuoppala@linux.intel.com
2020-04-25	drm/i915: Add per ctx batchbuffer wa for timestamp	Mika Kuoppala
	Restoration of a previous timestamp can collide with updating the timestamp, causing a value corruption. Combat this issue by using indirect ctx bb to modify the context image during restoring process. We can preload value into scratch register. From which we then do the actual write with LRR. LRR is faster and thus less error prone as probability of race drops. v2: tidying (Chris) v3: lrr for all engines v4: grp v5: reg bit v6: wa_bb_offset, virtual engines (Chris) References: HSDES#16010904313 Testcase: igt/i915_selftest/gt_lrc Suggested-by: Joseph Koston <joseph.koston@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200424230546.30271-1-mika.kuoppala@linux.intel.com
2020-04-25	drm/i915: Add engine scratch register to live_lrc_fixed	Mika Kuoppala
	General purpose registers are per engine and in a fixed location. Add to live_lrc_fixed. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200424214841.28076-1-mika.kuoppala@linux.intel.com
2020-04-24	drm/i915/gt: Use the RPM config register to determine clk frequencies	Chris Wilson
	For many configuration details within RC6 and RPS we are programming intervals for the internal clocks. From gen11, these clocks are configuration via the RPM_CONFIG and so for convenience, we would like to convert to/from more natural units (ns). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200424162805.25920-2-chris@chris-wilson.co.uk
2020-04-24	drm/i915/gt: Trace RPS events	Chris Wilson
	Add tracek to the RPS events (interrupts, worker, enabling, threshold selection, frequency setting), so that if we have to debug reticent HW we have some traces to start from. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200424162805.25920-1-chris@chris-wilson.co.uk
2020-04-24	drm/i915/gt: Prefer soft-rc6 over RPS DOWN_TIMEOUT	Chris Wilson
	The RPS DOWN_TIMEOUT interrupt is signaled after a period of rc6, and upon receipt of that interrupt we reprogram the GPU clocks down to the next idle notch [to help convserve power during rc6]. However, on execlists, we benefit from soft-rc6 immediately parking the GPU and setting idle frequencies upon idling [within a jiffie], and here the interrupt prevents us from restarting from our last frequency. In the process, we can simply opt for a static pm_events mask and rely on the enable/disable interrupts to flush the worker on parking. This will reduce the amount of oscillation observed during steady workloads with microsleeps, as each time the rc6 timeout occurs we immediately follow with a waitboost for a dropped frame. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422001703.1697-1-chris@chris-wilson.co.uk
2020-04-24	drm/i915: Only close vma we open	Chris Wilson
	The history of i915_vma_close() is confusing, as is its use. As the lifetime of the i915_vma is currently bounded by the object it is attached to, we needed a means of identify when a vma was no longer in use by userspace (via the user's fd). This is further complicated by that only ppgtt vma should be closed at the user's behest, as the ggtt were always shared. Now that we attach the vma to a lut on the user's context, the open count does indicate how many unique and open context/vm are referencing this vma from the user. As such, we can and should just use the open_count to track when the vma is still in use by userspace. It's a poor man's replacement for reference counting. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1193 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422190558.30509-1-chris@chris-wilson.co.uk
2020-04-24	drm/i915: Make define for lrc state offset	Mika Kuoppala
	More often than not, we need a byte offset into lrc register state from the start of the hw state. Make it so. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200423182355.21837-3-mika.kuoppala@linux.intel.com
2020-04-24	drm/i915/selftests: Add context batchbuffers registers to live_lrc_fixed	Mika Kuoppala
	Add per ctx bb and indirect ctx bb register locations to live_lrc_fixed for verification. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200423224159.22078-1-chris@chris-wilson.co.uk
2020-04-23	drm/i915/gt: Check carefully for an idle engine in wait-for-idle	Chris Wilson
	intel_gt_wait_for_idle() tries to wait until all the outstanding requests are retired and the GPU is idle. As a side effect of retiring requests, we may submit more work to flush any pm barriers, and so the wait-for-idle tries to flush the background pm work and catch the new requests. However, if the work completed in the background before we were able to flush, it would queue the extra barrier request without us noticing -- and so we would return from wait-for-idle with one request remaining. (This breaks e.g. record_default_state where we need to wait until that barrier is retired, and it may slow suspend down by causing us to wait on the background retirement worker as opposed to immediately retiring the barrier.) However, since we track if there has been a submission since the engine pm barrier, we can very quickly detect if the idle barrier is still outstanding. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1763 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200423085940.28168-1-chris@chris-wilson.co.uk
2020-04-23	drm/i915/gt: Carefully order virtual_submission_tasklet	Chris Wilson
	During the virtual engine's submission tasklet, we take the request and insert into the submission queue on each of our siblings. This seems quite simply, and so no problems with ordering. However, the sibling execlists' submission tasklets may run concurrently with the virtual engine's tasklet, submitting the request to HW before the virtual finishes its task of telling all the siblings. If this happens, the sibling tasklet may reorder the ve->sibling[] array that the virtual engine tasklet is processing. This can only reorder within the elements already processed by the virtual engine, nevertheless the race is detected by KCSAN: [ 185.580014] BUG: KCSAN: data-race in execlists_dequeue [i915] / virtual_submission_tasklet [i915] [ 185.580054] [ 185.580076] write to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 2: [ 185.580553] execlists_dequeue+0x6ad/0x1600 [i915] [ 185.581044] __execlists_submission_tasklet+0x48/0x60 [i915] [ 185.581517] execlists_submission_tasklet+0xd3/0x170 [i915] [ 185.581554] tasklet_action_common.isra.0+0x42/0x90 [ 185.581585] __do_softirq+0xc8/0x206 [ 185.581613] run_ksoftirqd+0x15/0x20 [ 185.581641] smpboot_thread_fn+0x15a/0x270 [ 185.581669] kthread+0x19a/0x1e0 [ 185.581695] ret_from_fork+0x1f/0x30 [ 185.581717] [ 185.581736] read to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 0: [ 185.582231] virtual_submission_tasklet+0x10e/0x5c0 [i915] [ 185.582265] tasklet_action_common.isra.0+0x42/0x90 [ 185.582291] __do_softirq+0xc8/0x206 [ 185.582315] run_ksoftirqd+0x15/0x20 [ 185.582340] smpboot_thread_fn+0x15a/0x270 [ 185.582368] kthread+0x19a/0x1e0 [ 185.582395] ret_from_fork+0x1f/0x30 [ 185.582417] We can prevent this race by checking for the ve->request after looking up the sibling array. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200423115315.26825-1-chris@chris-wilson.co.uk
2020-04-22	drm/i915/execlists: Drop request-before-CS assertion	Chris Wilson
	When we migrated to execlists, one of the conditions we wanted to test for was whether the breadcrumb seqno was being written before the breadcumb interrupt was delivered. This was following on from issues observed on previous generations which were not so strongly ordered. With the removal of the missed interrupt detection, we have not reliable means of detecting the out-of-order seqno/interrupt but instead tried to assert that the relationship between the CS event interrupt and the breadwrite should be strongly ordered. However, Icelake proves it is possible for the HW implementation to forget about minor little details such as write ordering and so the order between processing the CS event and the breadcrumb is unreliable. Remove the unreliable assertion, but leave a debug telltale in case we have reason to suspect. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1658 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422141749.28709-1-chris@chris-wilson.co.uk
2020-04-22	drm/i915/selftests: Try to detect rollback during batchbuffer preemption	Chris Wilson
	Since batch buffers dominant execution time, most preemption requests should naturally occur during execution of a batch buffer. We wish to verify that should a preemption occur within a batch buffer, when we come to restart that batch buffer, it occurs at the interrupted instruction and most importantly does not rollback to an earlier point. v2: Do not clear the GPR at the start of the batch, but rely on them being clear for new contexts. Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422100903.25216-1-chris@chris-wilson.co.uk
2020-04-22	drm/i915/selftests: Disable heartbeat around RPS interrupt testing	Chris Wilson
	For verifying reciving the EI interrupts, we need to hold the GPU in very precise conditions (in terms of C0 cycles during the EI). If we preempt the busy load to handle the heartbeat, this may perturb the busy load causing us to miss the interrupt. The other tests, while not as time sensitive, may also be slightly perturbed, so apply the heartbeat protection across all the measurements. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200422083855.26842-1-chris@chris-wilson.co.uk
2020-04-21	drm/i915/selftests: Unroll the CS frequency loop	Chris Wilson
	Having noticed that MI_BB_START is incurring a memory stall (see the correlation with uncore frequency), we have to unroll the loop in order to diminish the impact of the MI_BB_START on the instruction throughput. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200421171351.19575-1-chris@chris-wilson.co.uk
2020-04-21	drm/i915/gt: Poison residual state [HWSP] across resume.	Chris Wilson
	Since we may lose the content of any buffer when we relinquish control of the system (e.g. suspend/resume), we have to be careful not to rely on regaining control. A good method to detect when we might be using garbage is by always injecting that garbage prior to first use on load/resume/etc. v2: Drop sanitize callback on cleanup v3: Move seqno reset to timeline enter, so we reset all timelines. However, this is done on every activation during runtime and not reset. The similar level of paranoia we apply to correcting context state after a period of inactivity. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200421092504.7416-1-chris@chris-wilson.co.uk
2020-04-21	drm/i915/selftests: Disable C-states when measuring RPS frequency response	Chris Wilson
	Let's isolate the impact of cpu frequency selection on determing the GPU throughput in response to selection of RPS frequencies. For real systems, we do have to be concerned with the impact of integrating c-states, p-states and rp-states, but for the sake of proving whether or not RPS works, one baby step at a time. For the record, as one would hope, it does not seem to impact on the measured performance, but we do it anyway to reduce the number of variables. Later, we can extend the testing to encourage the the cpu/pkg to try and sleep while the GPU is busy. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200421142236.8614-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20200421142236.8614-1-chris@chris-wilson.co.uk
2020-04-21	drm/i915/selftests: Show the full scaling curve on failure	Chris Wilson
	If we detect that the RPS end points do not scale perfectly, take the time to measure all the in between values as well. We are aborting the test, so we might as well spend the available time gathering critical debug information instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200421124636.22554-1-chris@chris-wilson.co.uk
2020-04-21	drm/i915/selftests: Show the pstate limits on any failure to reset min	Chris Wilson
	We want to see the pstate limits whenever we fail to set the minimum frequency as that may help for debugging. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420203040.8984-1-chris@chris-wilson.co.uk
2020-04-20	drm/i915/selftests: Exercise dynamic reclocking with RPS	Chris Wilson
	After having testing all the RPS controls individually, we need to take a step back and check how our RPS worker integrates them to perform dynamic GPU reclocking. So do that by submitting a spinner and wait and see what happens. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-6-chris@chris-wilson.co.uk
2020-04-20	drm/i915/selftests: Show the pcode frequency table on error	Chris Wilson
	If we encounter an error while scaling, read back the frequency tables from the pcu. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-5-chris@chris-wilson.co.uk
2020-04-20	drm/i915/selftests: Split RPS frequency measurement	Chris Wilson
	Split the frequency measurement into two modes, so that we can judge the impact of the llc setup on top of the pure CS frequency scaling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-4-chris@chris-wilson.co.uk
2020-04-20	drm/i915/selftests: Check RPS controls	Chris Wilson
	Check that the GPU does respond to our RPS frequency requests by setting our desired frequency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-3-chris@chris-wilson.co.uk
2020-04-20	drm/i915/selftests: Skip energy consumption tests if not controlling freq	Chris Wilson
	If we can not manipulate the frequency with RPS, then comparing min/max power consumption is pointless / misleading. We will leave the warning about not being able to control the frequency selection via RPS to other tests so as not to confuse this more specialised check. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-2-chris@chris-wilson.co.uk
2020-04-20	drm/i915/selftests: Verify frequency scaling with RPS	Chris Wilson
	One of the core tenents of reclocking the GPU is that its throughput scales with the clock frequency. We can observe this by incrementing a loop counter on the GPU, and compare the different execution rates at the notional RPS frequencies. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420172739.11620-1-chris@chris-wilson.co.uk
2020-04-20	drm/i915/gt: Move the late flush_submission in retire to the end	Chris Wilson
	Avoid flushing the submission queue (of others) under the client's timeline lock, but instead move it to the end so that we may catch more. References: https://gitlab.freedesktop.org/drm/intel/-/issues/1066 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200420125356.26614-2-chris@chris-wilson.co.uk
2020-04-18	drm/i915: Refactor setting dma info to a common helper	Michael J. Ruhl
	DMA_MASK bit values are different for different generations. This will become more difficult to manage over time with the open coded usage of different versions of the device. Fix by: disallow setting of dma mask in AGP path (< GEN(5) for i915, add dma_mask_size to the device info configuration, updating open code call sequence to the latest interface, refactoring into a common function for setting the dma segment and mask info Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> cc: Brian Welty <brian.welty@intel.com> cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200417195107.68732-1-michael.j.ruhl@intel.com
2020-04-17	drm/i915/selftests: Check power consumption at min/max frequencies	Chris Wilson
	A basic premise of RPS is that at lower frequencies, not only do we run slower, but we save power compared to higher frequencies. For example, when idle, we set the minimum frequency just in case there is some residual current. Since the power curve should be a physical relationship, if we find no power saving it's likely that we've broken our frequency handling, so test! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200417152018.13079-2-chris@chris-wilson.co.uk
2020-04-17	drm/i915/selftests: Move gpu energy measurement into its own little lib	Chris Wilson
	Move the handy utility to measure the GPU energy consumption using RAPL msr into a common lib so that it can be reused easily. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200417152018.13079-1-chris@chris-wilson.co.uk
2020-04-17	drm/i915/selftests: Take the engine wakeref around __rps_up_interrupt	Chris Wilson
	Since we are touching the device to read the registers, we are required to ensure the device is awake at the time. Currently, we believe ourselves to be inside the active request [thus an active engine wakeref], but since that may be retired in the background, we can spontaneously lose the wakeref and the ability to probe the HW. <4> [379.686703] RPM wakelock ref not held during HW access <4> [379.686805] WARNING: CPU: 7 PID: 4869 at ./drivers/gpu/drm/i915/intel_runtime_pm.h:115 gen12_fwtable_read32+0x233/0x300 [i915] <4> [379.686808] Modules linked in: i915(+) vgem snd_hda_codec_hdmi mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ax88179_178a usbnet mii ghash_clmulni_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e mei_me ptp mei pps_core intel_lpss_pci prime_numbers [last unloaded: i915] <4> [379.686827] CPU: 7 PID: 4869 Comm: i915_selftest Tainted: G U 5.7.0-rc1-CI-CI_DRM_8313+ #1 <4> [379.686830] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2457.A13.1912190237 12/19/2019 <4> [379.686883] RIP: 0010:gen12_fwtable_read32+0x233/0x300 [i915] <4> [379.686887] Code: d8 ea e0 0f 0b e9 19 fe ff ff 80 3d ad 12 2d 00 00 0f 85 17 fe ff ff 48 c7 c7 b0 32 3e a0 c6 05 99 12 2d 00 01 e8 2d d8 ea e0 <0f> 0b e9 fd fd ff ff 8b 05 c4 75 56 e2 85 c0 0f 85 84 00 00 00 48 <4> [379.686889] RSP: 0018:ffffc90000727970 EFLAGS: 00010286 <4> [379.686892] RAX: 0000000000000000 RBX: ffff88848cc20ee8 RCX: 0000000000000001 <4> [379.686894] RDX: 0000000080000001 RSI: ffff88843b1f0900 RDI: 00000000ffffffff <4> [379.686896] RBP: 0000000000000000 R08: ffff88843b1f0900 R09: 0000000000000000 <4> [379.686898] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000a058 <4> [379.686900] R13: 0000000000000001 R14: ffff88848cc2bf30 R15: 00000000ffffffea <4> [379.686902] FS: 00007f7d63f5e300(0000) GS:ffff8884a0180000(0000) knlGS:0000000000000000 <4> [379.686904] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [379.686907] CR2: 000055e5c30f4988 CR3: 000000042e190002 CR4: 0000000000760ee0 <4> [379.686910] PKRU: 55555554 <4> [379.686911] Call Trace: <4> [379.686986] live_rps_interrupt+0xb14/0xc10 [i915] <4> [379.687051] ? intel_rps_unpark+0xb0/0xb0 [i915] <4> [379.687057] ? __trace_bprintk+0x57/0x80 <4> [379.687143] __i915_subtests+0xb8/0x210 [i915] <4> [379.687222] ? __i915_live_teardown+0x50/0x50 [i915] <4> [379.687291] ? __intel_gt_live_setup+0x30/0x30 [i915] <4> [379.687361] __run_selftests+0x112/0x170 [i915] <4> [379.687431] i915_live_selftests+0x2c/0x60 [i915] <4> [379.687491] i915_pci_probe+0x93/0x1b0 [i915] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200417093928.17822-2-chris@chris-wilson.co.uk
2020-04-17	drm/i915/selftests: Delay spinner before waiting for an interrupt	Chris Wilson
	It seems that although (perhaps because of the memory stall?) the spinner has signaled that it has started, it still takes some time to spin up to 100% utilisation of the HW. Since the test depends on the full utilisation of the HW to trigger the RPS interrupt, wait a little bit and flush the interrupt status to be sure that the event we see if from the spinner. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200417093928.17822-1-chris@chris-wilson.co.uk
2020-04-17	drm/i915/gt: Scrub execlists state on resume	Chris Wilson
	Before we resume, we reset the HW so we restart from a known good state. However, as a part of the reset process, we drain our pending CS event queue -- and if we are resuming that does not correspond to internal state. On setup, we are scrubbing the CS pointers, but alas only on setup. Apply the sanitization not just to setup, but to all resumes. Reported-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200416114117.3460-1-chris@chris-wilson.co.uk
2020-04-16	Merge drm/drm-next into drm-intel-next-queued	Joonas Lahtinen
	Backmerging in order to pull "topic/phy-compliance". Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-04-16	drm/i915/gt: Update PMINTRMSK holding fw	Chris Wilson
	If we use a non-forcewaked write to PMINTRMSK, it does not take effect until much later, if at all, causing a loss of RPS interrupts and no GPU reclocking, leaving the GPU running at the wrong frequency for long periods of time. Reported-by: Francisco Jerez <currojerez@riseup.net> Suggested-by: Francisco Jerez <currojerez@riseup.net> Fixes: 35cc7f32c298 ("drm/i915/gt: Use non-forcewake writes for RPS") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Francisco Jerez <currojerez@riseup.net> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Cc: <stable@vger.kernel.org> # v5.6+ Link: https://patchwork.freedesktop.org/patch/msgid/20200415170318.16771-2-chris@chris-wilson.co.uk
2020-04-16	drm/i915/selftests: Exercise basic RPS interrupt generation	Chris Wilson
	Since we depend upon RPS generating interrupts after evaluation intervals to determine when to up/down clock the GPU, it is imperative that we successfully enable interrupt generation! Verify that we do see an interrupt if we keep the GPU busy for an entire EI. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200415170318.16771-1-chris@chris-wilson.co.uk
2020-04-15	drm/i915/tgl: Initialize multicast register steering for workarounds	Matt Roper
	Even though the bspec is missing gen12 register details for the MCR selector register (0xFDC), this is confirmed by hardware folks to be a mistake; the register does exist and we do indeed need to steer multicast register reads to an appropriate instance the same as we did on gen11. Note that despite the lack of documentation we were still using the MCR selector to read INSTDONE and such in read_subslice_reg() too. Cc: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200414211118.2787489-4-matthew.d.roper@intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
2020-04-10	drm/i915/selftests: Check for an already completed timeslice	Chris Wilson
	With timeslice yielding on a semaphore, we may complete timeslices much faster than we were expecting and already have yielded the stuck request. Before complaining that timeslicing is not enabled, check that we haven't already applied the switch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200410081638.19893-1-chris@chris-wilson.co.uk
2020-04-08	drm/i915/selftests: Take an explicit ref for rq->batch	Chris Wilson
	Since we are peeking into the batch object of the request, it is beholden on us to hold a reference to it. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1634 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200408091723.28937-1-chris@chris-wilson.co.uk
2020-04-08	drm/i915/gt: Mark up racy check of breadcrumb irq enabled	Chris Wilson
	We control b->irq_enabled inside the b->irq_lock, but we check before entering the spinlock whether or not the interrupt is currently unmasked. [ 1511.735208] BUG: KCSAN: data-race in __intel_breadcrumbs_disarm_irq [i915] / intel_engine_disarm_breadcrumbs [i915] [ 1511.735231] [ 1511.735242] write to 0xffff8881f75fc214 of 1 bytes by interrupt on cpu 2: [ 1511.735440] __intel_breadcrumbs_disarm_irq+0x4b/0x160 [i915] [ 1511.735635] signal_irq_work+0x337/0x710 [i915] [ 1511.735652] irq_work_run_list+0xd7/0x110 [ 1511.735666] irq_work_run+0x1d/0x50 [ 1511.735681] smp_irq_work_interrupt+0x21/0x30 [ 1511.735701] irq_work_interrupt+0xf/0x20 [ 1511.735722] __do_softirq+0x6f/0x206 [ 1511.735736] irq_exit+0xcd/0xe0 [ 1511.735756] do_IRQ+0x44/0xc0 [ 1511.735773] ret_from_intr+0x0/0x1c [ 1511.735787] schedule+0x0/0xb0 [ 1511.735803] worker_thread+0x194/0x670 [ 1511.735823] kthread+0x19a/0x1e0 [ 1511.735837] ret_from_fork+0x1f/0x30 [ 1511.735848] [ 1511.735867] read to 0xffff8881f75fc214 of 1 bytes by task 432 on cpu 1: [ 1511.736068] intel_engine_disarm_breadcrumbs+0x22/0x80 [i915] [ 1511.736263] __engine_park+0x107/0x5d0 [i915] [ 1511.736453] ____intel_wakeref_put_last+0x44/0x90 [i915] [ 1511.736648] __intel_wakeref_put_last+0x5a/0x70 [i915] [ 1511.736842] intel_context_exit_engine+0xf2/0x100 [i915] [ 1511.737044] i915_request_retire+0x6b2/0x770 [i915] [ 1511.737244] retire_requests+0x7a/0xd0 [i915] [ 1511.737438] intel_gt_retire_requests_timeout+0x3a7/0x6f0 [i915] [ 1511.737633] i915_drop_caches_set+0x1e7/0x260 [i915] [ 1511.737650] simple_attr_write+0xfa/0x110 [ 1511.737665] full_proxy_write+0x94/0xc0 [ 1511.737679] __vfs_write+0x4b/0x90 [ 1511.737697] vfs_write+0xfc/0x280 [ 1511.737718] ksys_write+0x78/0x100 [ 1511.737732] __x64_sys_write+0x44/0x60 [ 1511.737751] do_syscall_64+0x6e/0x2c0 [ 1511.737769] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200408092916.5355-1-chris@chris-wilson.co.uk
2020-04-08	drm/i915/gt: Mark up racy read of intel_ring.head	Chris Wilson
	The intel_ring.head is updated as the requests are retired, but is sampled at any time as we submit requests. Furthermore, it tracks RING_HEAD which is inherently asynchronous. [ 148.630314] BUG: KCSAN: data-race in execlists_dequeue [i915] / i915_request_retire [i915] [ 148.630349] [ 148.630374] write to 0xffff8881f4e28ddc of 4 bytes by task 90 on cpu 2: [ 148.630752] i915_request_retire+0xed/0x770 [i915] [ 148.631123] retire_requests+0x7a/0xd0 [i915] [ 148.631491] engine_retire+0xa6/0xe0 [i915] [ 148.631523] process_one_work+0x3af/0x640 [ 148.631552] worker_thread+0x80/0x670 [ 148.631581] kthread+0x19a/0x1e0 [ 148.631609] ret_from_fork+0x1f/0x30 [ 148.631629] [ 148.631652] read to 0xffff8881f4e28ddc of 4 bytes by task 14288 on cpu 3: [ 148.632019] execlists_dequeue+0x1300/0x1680 [i915] [ 148.632384] __execlists_submission_tasklet+0x48/0x60 [i915] [ 148.632770] execlists_submit_request+0x38e/0x3c0 [i915] [ 148.633146] submit_notify+0x8f/0xc0 [i915] [ 148.633512] __i915_sw_fence_complete+0x5d/0x3e0 [i915] [ 148.633875] i915_sw_fence_complete+0x58/0x80 [i915] [ 148.634238] i915_sw_fence_commit+0x16/0x20 [i915] [ 148.634613] __i915_request_queue+0x60/0x70 [i915] [ 148.634985] i915_gem_do_execbuffer+0x2de0/0x42b0 [i915] [ 148.635366] i915_gem_execbuffer2_ioctl+0x2ab/0x580 [i915] [ 148.635400] drm_ioctl_kernel+0xe9/0x130 [ 148.635429] drm_ioctl+0x27d/0x45e [ 148.635456] ksys_ioctl+0x89/0xb0 [ 148.635482] __x64_sys_ioctl+0x42/0x60 [ 148.635510] do_syscall_64+0x6e/0x2c0 [ 148.635542] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 645.071436] BUG: KCSAN: data-race in gen8_emit_fini_breadcrumb [i915] / i915_request_retire [i915] [ 645.071456] [ 645.071467] write to 0xffff8881efe403dc of 4 bytes by task 14668 on cpu 3: [ 645.071647] i915_request_retire+0xed/0x770 [i915] [ 645.071824] i915_request_create+0x6c/0x160 [i915] [ 645.072000] i915_gem_do_execbuffer+0x206d/0x42b0 [i915] [ 645.072177] i915_gem_execbuffer2_ioctl+0x2ab/0x580 [i915] [ 645.072194] drm_ioctl_kernel+0xe9/0x130 [ 645.072208] drm_ioctl+0x27d/0x45e [ 645.072222] ksys_ioctl+0x89/0xb0 [ 645.072235] __x64_sys_ioctl+0x42/0x60 [ 645.072248] do_syscall_64+0x6e/0x2c0 [ 645.072263] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 645.072275] [ 645.072285] read to 0xffff8881efe403dc of 4 bytes by interrupt on cpu 2: [ 645.072458] gen8_emit_fini_breadcrumb+0x158/0x300 [i915] [ 645.072636] __i915_request_submit+0x204/0x430 [i915] [ 645.072809] execlists_dequeue+0x8e1/0x1680 [i915] [ 645.072982] __execlists_submission_tasklet+0x48/0x60 [i915] [ 645.073154] execlists_submit_request+0x38e/0x3c0 [i915] [ 645.073330] submit_notify+0x8f/0xc0 [i915] [ 645.073499] __i915_sw_fence_complete+0x5d/0x3e0 [i915] [ 645.073668] i915_sw_fence_wake+0xc2/0x130 [i915] [ 645.073836] __i915_sw_fence_complete+0x2cf/0x3e0 [i915] [ 645.074006] i915_sw_fence_complete+0x58/0x80 [i915] [ 645.074175] dma_i915_sw_fence_wake+0x3e/0x80 [i915] [ 645.074344] signal_irq_work+0x62f/0x710 [i915] [ 645.074360] irq_work_run_list+0xd7/0x110 [ 645.074373] irq_work_run+0x1d/0x50 [ 645.074386] smp_irq_work_interrupt+0x21/0x30 [ 645.074400] irq_work_interrupt+0xf/0x20 [ 645.074414] _raw_spin_unlock_irqrestore+0x34/0x40 [ 645.074585] execlists_submission_tasklet+0xde/0x170 [i915] [ 645.074602] tasklet_action_common.isra.0+0x42/0x90 [ 645.074617] __do_softirq+0xc8/0x206 [ 645.074629] irq_exit+0xcd/0xe0 [ 645.074642] do_IRQ+0x44/0xc0 [ 645.074654] ret_from_intr+0x0/0x1c [ 645.074667] finish_task_switch+0x73/0x230 [ 645.074679] __schedule+0x1c5/0x4c0 [ 645.074691] schedule+0x45/0xb0 [ 645.074704] worker_thread+0x194/0x670 [ 645.074716] kthread+0x19a/0x1e0 [ 645.074729] ret_from_fork+0x1f/0x30 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200407221832.15465-1-chris@chris-wilson.co.uk
2020-04-08	drm/i915/uc: prefer struct drm_device based logging	Jani Nikula
	Prefer struct drm_device based logging over struct device based logging. No functional changes. Cc: Wambui Karuga <wambui.karugax@gmail.com> Reviewed-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200402114819.17232-17-jani.nikula@intel.com
2020-04-08	drm/i915/gt: prefer struct drm_device based logging	Jani Nikula
	Prefer struct drm_device based logging over struct device based logging. No functional changes. Cc: Wambui Karuga <wambui.karugax@gmail.com> Reviewed-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200402114819.17232-16-jani.nikula@intel.com
2020-04-08	drm/i915/uc: prefer struct drm_device based logging	Jani Nikula
	Prefer struct drm_device based logging over struct device based logging. No functional changes. Cc: Wambui Karuga <wambui.karugax@gmail.com> Reviewed-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200402114819.17232-10-jani.nikula@intel.com
2020-04-08	drm/i915/selftests: Drop vestigal timeslicing assert	Chris Wilson
	Since the semaphore interrupt may cause us to yield the timeslice immediately, we may cancel the timer before we notice the submission is complete. The assertion is no longer valid due to the race with the interrupt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200407222625.15542-1-chris@chris-wilson.co.uk
2020-04-07	drm/i915/gt: Yield the timeslice if caught waiting on a user semaphore	Chris Wilson
	If we find ourselves waiting on a MI_SEMAPHORE_WAIT, either within the user batch or in our own preamble, the engine raises a GT_WAIT_ON_SEMAPHORE interrupt. We can unmask that interrupt and so respond to a semaphore wait by yielding the timeslice, if we have another context to yield to! The only real complication is that the interrupt is only generated for the start of the semaphore wait, and is asynchronous to our process_csb() -- that is, we may not have registered the timeslice before we see the interrupt. To ensure we don't miss a potential semaphore blocking forward progress (e.g. selftests/live_timeslice_preempt) we mark the interrupt and apply it to the next timeslice regardless of whether it was active at the time. v2: We use semaphores in preempt-to-busy, within the timeslicing implementation itself! Ergo, when we do insert a preemption due to an expired timeslice, the new context may start with the missed semaphore flagged by the retired context and be yielded, ad infinitum. To avoid this, read the context id at the time of the semaphore interrupt and only yield if that context is still active. Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200407130811.17321-1-chris@chris-wilson.co.uk
2020-04-06	drm/i915/gt: Fill all the unused space in the GGTT	Chris Wilson
	When we allocate space in the GGTT we may have to allocate a larger region than will be populated by the object to accommodate fencing. Make sure that this space beyond the end of the buffer points safely into scratch space, in case the HW tries to access it anyway (e.g. fenced access to the last tile row). v2: Preemptively / conservatively guard gen6 ggtt as well. Reported-by: Imre Deak <imre.deak@intel.com> References: https://gitlab.freedesktop.org/drm/intel/-/issues/1554 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200331152348.26946-1-chris@chris-wilson.co.uk (cherry picked from commit 4d6c18590870fbac1e65dde5e01e621c8e0ca096) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-04-03	drm/i915/gt: Free request pool from virtual engines	Chris Wilson
	While extremely unlikely to be populated, we could capture a request on the virtual engine which we should free along with the virtual engine. Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200403203303.10903-1-chris@chris-wilson.co.uk
2020-04-03	drm/i915/selftests: Wait until we start timeslicing after a submit	Chris Wilson
	If we submit, we do not start timeslicing until we process the CS event that marks the start of the context running on HW. So in the selftest, be sure to wait until we have processed the pending events before asserting that timeslicing has begun. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200403190209.21818-1-chris@chris-wilson.co.uk
2020-04-03	drm/i915: Keep a per-engine request pool	Chris Wilson
	Add a tiny per-engine request mempool so that we should always have a request available for powermanagement allocations from tricky contexts. This reserve is expected to be only used for kernel contexts when barriers must be emitted [almost] without fail. The main consumer for this reserved request is expected to be engine-pm, for which we know that there will always be at least the previous pm request that we can reuse under mempressure (so there should always be a spare request for engine_park()). This is an alternative to using a comparatively bulky mempool, which requires custom handling for both our reserved allocation requirement and to protect our TYPESAFE_BY_RCU slab cache. The advantage of mempool would be that it would allow us to keep a larger per-engine request pool. However, converting over to mempool is straightforward should the need arise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-and-tested-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200402184037.21630-1-chris@chris-wilson.co.uk
2020-04-02	drm/i915/tgl: Make Wa_14010229206 permanent	Swathi Dhanavanthri
	This workaround now applies to all steppings, not just A0. Wa_1409085225 is a temporary A0-only W/A however it is identical to Wa_14010229206 and hence the combined workaround is made permanent. Bspec: 52890 Signed-off-by: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com> Tested-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> [mattrope: added missing blank line] Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200326234955.16155-1-swathi.dhanavanthri@intel.com