linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2020-10-22	drm/i915/pmu: Handle PCI unbind	Tvrtko Ursulin
	Mark the device as closed and keep references to driver data alive to allow for safe driver unbind with active PMU clients. Perf core does not otherwise handle this case so we have to do it manually like this. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201020100822.543332-1-tvrtko.ursulin@linux.intel.com
2020-10-22	drm/i915/gem: Flush coherency domains on first set-domain-ioctl	Chris Wilson
	Avoid skipping what appears to be a no-op set-domain-ioctl if the cache coherency state is inconsistent with our target domain. This also has the utility of using the population of the pages to validate the backing store. The danger in skipping the first set-domain is leaving the cache inconsistent and submitting stale data, or worse leaving the clean data in the cache and not flushing it to the GPU. The impact should be small as it requires a no-op set-domain as the very first ioctl in a particular sequence not found in typical userspace. Reported-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Fixes: 754a25442705 ("drm/i915: Skip object locking around a no-op set-domain ioctl") Testcase: igt/gem_mmap_offset/blt-coherency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019203825.10966-1-chris@chris-wilson.co.uk
2020-10-21	drm/i915/region: fix max size calculation	Matthew Auld
	We are incorrectly limiting the max allocation size as per the mm max_order, which is effectively the largest power-of-two that we can fit in the region size. However, it's normal to setup the region or allocator with a non-power-of-two size(for example 3G), which we should already handle correctly, except it seems for the early too-big-check. v2: make sure we also exercise the I915_BO_ALLOC_CONTIGUOUS path, which is quite different, since for that we are actually limited by the largest power-of-two that we can fit within the region size. (Chris) Fixes: b908be543e44 ("drm/i915: support creating LMEM objects") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201021103606.241395-1-matthew.auld@intel.com
2020-10-21	drm/i915/selftests: Flush the old heartbeat more gently	Chris Wilson
	In order to test how fast the heartbeat can respond, we measure with the interval set to its minimum. Before we measure though, we want to be sure we start with a fresh pulse, and so wait until any old one is complete. During that wait though, we were continually flushing the work, and so continually re-evaluating to see if the pulse was complete, and each attempt would count as an unresponsive system. If the engine did not complete the request in the couple of busy-spins, it would flag an error. This is unfortunate, so let's not busy-spin waiting for the old heartbeat, but terminate it and start afresh. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019142841.32273-1-chris@chris-wilson.co.uk
2020-10-20	drm/i915: Exclude low pages (128KiB) of stolen from use	Chris Wilson
	The GPU is trashing the low pages of its reserved memory upon reset. If we are using this memory for ringbuffers, then we will dutiful resubmit the trashed rings after the reset causing further resets, and worse. We must exclude this range from our own use. The value of 128KiB was found by empirical measurement (and verified now with a selftest) on gen9. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk
2020-10-20	drm/i915/gem: Poison stolen pages before use	Chris Wilson
	When allocating objects from stolen, memset() the backing store to POISON_INUSE (0x5a) to help identify any uninitialised use of a stolen object. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-1-chris@chris-wilson.co.uk
2020-10-19	drm/i915/gt: Onion unwind for scratch page allocation failure	Chris Wilson
	In switching to using objects for our ppGTT scratch pages, care was not taken to avoid trying to unref NULL objects on failure. And for gen6 ppGTT, it appears we forgot entirely to unwind after a partial allocation failure. Fixes: 89351925a477 ("drm/i915/gt: Switch to object allocations for page directories") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk
2020-10-16	drm/i915: Use the active reference on the vma while capturing	Chris Wilson
	During error capture, we need to take a reference to the vma from before the reset in order to catpure the contents of the vma later. Currently we are using both an active reference and a kref, but due to nature of the i915_vma reference handling, that kref is on the vma->obj and not the vma itself. This means the vma may be destroyed as soon as it is idle, that is in between the i915_active_release(&vma->active) and the i915_vma_put(vma): <3> [197.866181] BUG: KASAN: use-after-free in intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <3> [197.866339] Read of size 8 at addr ffff8881258cb800 by task gem_exec_captur/1041 <3> [197.866467] <4> [197.866512] CPU: 2 PID: 1041 Comm: gem_exec_captur Not tainted 5.9.0-g5e4234f97efba-kasan_200+ #1 <4> [197.866521] Hardware name: Intel Corp. Broxton P/Apollolake RVP1A, BIOS APLKRVPA.X64.0150.B11.1608081044 08/08/2016 <4> [197.866530] Call Trace: <4> [197.866549] dump_stack+0x99/0xd0 <4> [197.866760] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.866783] print_address_description.constprop.8+0x3e/0x60 <4> [197.866797] ? kmsg_dump_rewind_nolock+0xd4/0xd4 <4> [197.866819] ? lockdep_hardirqs_off+0xd4/0x120 <4> [197.867037] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867249] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867270] kasan_report.cold.10+0x1f/0x37 <4> [197.867492] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867710] intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867949] i915_gpu_coredump.part.29+0x150/0x7b0 [i915] <4> [197.868186] i915_capture_error_state+0x5e/0xc0 [i915] <4> [197.868396] intel_gt_handle_error+0x6eb/0xa20 [i915] <4> [197.868624] ? intel_gt_reset_global+0x370/0x370 [i915] <4> [197.868644] ? check_flags+0x50/0x50 <4> [197.868662] ? __lock_acquire+0xd59/0x6b00 <4> [197.868678] ? register_lock_class+0x1ad0/0x1ad0 <4> [197.868944] i915_wedged_set+0xcf/0x1b0 [i915] <4> [197.869147] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869371] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869398] simple_attr_write+0x153/0x1c0 <4> [197.869428] full_proxy_write+0xee/0x180 <4> [197.869442] ? __sb_start_write+0x1f3/0x310 <4> [197.869465] vfs_write+0x1a3/0x640 <4> [197.869492] ksys_write+0xec/0x1c0 <4> [197.869507] ? __ia32_sys_read+0xa0/0xa0 <4> [197.869525] ? lockdep_hardirqs_on_prepare+0x32b/0x4e0 <4> [197.869541] ? syscall_enter_from_user_mode+0x1c/0x50 <4> [197.869566] do_syscall_64+0x33/0x80 <4> [197.869579] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4> [197.869590] RIP: 0033:0x7fd8b7aee281 <4> [197.869604] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 <4> [197.869613] RSP: 002b:00007ffea3b72008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4> [197.869625] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd8b7aee281 <4> [197.869633] RDX: 0000000000000002 RSI: 00007fd8b81a82e7 RDI: 000000000000000d <4> [197.869641] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000034 <4> [197.869650] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd8b81a82e7 <4> [197.869658] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000 <3> [197.869707] <3> [197.869757] Allocated by task 1041: <4> [197.869833] kasan_save_stack+0x19/0x40 <4> [197.869843] __kasan_kmalloc.constprop.5+0xc1/0xd0 <4> [197.869853] kmem_cache_alloc+0x106/0x8e0 <4> [197.870059] i915_vma_instance+0x212/0x1930 [i915] <4> [197.870270] eb_lookup_vmas+0xe06/0x1d10 [i915] <4> [197.870475] i915_gem_do_execbuffer+0x131d/0x4080 [i915] <4> [197.870682] i915_gem_execbuffer2_ioctl+0x103/0x5d0 [i915] <4> [197.870701] drm_ioctl_kernel+0x1d2/0x270 <4> [197.870710] drm_ioctl+0x40d/0x85c <4> [197.870721] __x64_sys_ioctl+0x10d/0x170 <4> [197.870731] do_syscall_64+0x33/0x80 <4> [197.870740] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <3> [197.870748] <3> [197.870798] Freed by task 22: <4> [197.870865] kasan_save_stack+0x19/0x40 <4> [197.870875] kasan_set_track+0x1c/0x30 <4> [197.870884] kasan_set_free_info+0x1b/0x30 <4> [197.870894] __kasan_slab_free+0x111/0x160 <4> [197.870903] kmem_cache_free+0xcd/0x710 <4> [197.871109] i915_vma_parked+0x618/0x800 [i915] <4> [197.871307] __gt_park+0xdb/0x1e0 [i915] <4> [197.871501] ____intel_wakeref_put_last+0xb1/0x190 [i915] <4> [197.871516] process_one_work+0x8dc/0x15d0 <4> [197.871525] worker_thread+0x82/0xb30 <4> [197.871535] kthread+0x36d/0x440 <4> [197.871545] ret_from_fork+0x22/0x30 <3> [197.871553] <3> [197.871602] The buggy address belongs to the object at ffff8881258cb740 which belongs to the cache i915_vma of size 968 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2553 Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201016092527.29039-1-chris@chris-wilson.co.uk
2020-10-16	drm/i915/gt: Confirm the context survives execution	Chris Wilson
	Repeat our sanitychecks from before execution to after execution. One expects that if we were to see these, the gpu would already be on fire, but the timing may be informative. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015190816.31763-1-chris@chris-wilson.co.uk
2020-10-16	drm/i915/gt: Cleanup kasan warning for on-stack (unsigned long) casting	Chris Wilson
	Kasan is gving a warning for passing a u32 parameter into find_first_bit (casting to a unsigned long *, with appropriate length restrictions): [ 44.678262] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x2e/0x50 [ 44.678295] Read of size 8 at addr ffff888233f4fc30 by task core_hotunplug/474 [ 44.678326] [ 44.678358] CPU: 0 PID: 474 Comm: core_hotunplug Not tainted 5.9.0+ #608 [ 44.678465] Hardware name: BESSTAR (HK) LIMITED GN41/Default string, BIOS BLT-BI-MINIPC-F4G-EX3R110-GA65A-101-D 10/12/2018 [ 44.678500] Call Trace: [ 44.678534] dump_stack+0x84/0xba [ 44.678569] print_address_description.constprop.0+0x21/0x220 [ 44.678605] ? kmsg_dump_rewind_nolock+0x5f/0x5f [ 44.678638] ? _raw_spin_lock_irqsave+0x6d/0xb0 [ 44.678669] ? _raw_write_lock_irqsave+0xb0/0xb0 [ 44.678702] ? set_task_cpu+0x1e0/0x1e0 [ 44.678733] ? find_first_bit+0x2e/0x50 [ 44.678763] kasan_report.cold+0x20/0x42 [ 44.678794] ? find_first_bit+0x2e/0x50 [ 44.678825] __asan_load8+0x69/0x90 [ 44.678856] find_first_bit+0x2e/0x50 [ 44.679027] __caps_show.isra.0+0x9e/0x1f0 [i915] Since we are only using the shorter type for our own convenience, accommodate kasan and use unsigned long. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201013110845.16127-1-chris@chris-wilson.co.uk
2020-10-16	drm/i915/gt: Undo forced context restores after trivial preemptions	Chris Wilson
	We may try to preempt the currently executing request, only to find that after unravelling all the dependencies that the original executing context is still the earliest in the topological sort and re-submitted back to HW (if we do detect some change in the ELSP that requires re-submission). However, due to the way we check for wrap-around during the unravelling, we mark any context that has been submitted just once (i.e. with the rq->wa_tail set, but the ring->tail earlier) as potentially wrapping and requiring a forced restore on resubmission. This was expected to be not a problem, as it was anticipated that most unwinding for preemption would result in a context switch and the few that did not would be lost in the noise. It did not take long for someone to find one particular workload where the cost of those extra context restores was measurable. However, since we know the wa_tail is of fixed size, and we know that a request must be larger than the wa_tail itself, we can safely maintain the check for request wrapping and check against a slightly future point in the ring that includes an expected wa_tail. (That is if the ring->tail is already set to rq->wa_tail, including another 8 bytes in the check does not invalidate the incremental wrap detection.) Fixes: 8ab3a3812aa9 ("drm/i915/gt: Incrementally check for rewinding") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
2020-10-16	drm/i915/gt: Delay execlist processing for tgl	Chris Wilson
	When running gem_exec_nop, it floods the system with many requests (with the goal of userspace submitting faster than the HW can process a single empty batch). This causes the driver to continually resubmit new requests onto the end of an active context, a flood of lite-restore preemptions. If we time this just right, Tigerlake hangs. Inserting a small delay between the processing of CS events and submitting the next context, prevents the hang. Naturally it does not occur with debugging enabled. The suspicion then is that this is related to the issues with the CS event buffer, and inserting an mmio read of the CS pointer status appears to be very successful in preventing the hang. Other registers, or uncached reads, or plain mb, do not prevent the hang, suggesting that register is key -- but that the hang can be prevented by a simple udelay, suggests it is just a timing issue like that encountered by commit 233c1ae3c83f ("drm/i915/gt: Wait for CSB entries on Tigerlake"). Also note that the hang is not prevented by applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU between requests. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
2020-10-15	drm/i915/gem: Support parsing of oversize batches	Chris Wilson
	Matthew Auld noted that on more recent systems (such as the parser for gen9) we may have objects that are larger than expected by the GEM uAPI (i.e. greater than u32). These objects would have incorrect implicit batch lengths, causing the parser to reject them for being incomplete, or worse. Based on a patch by Matthew Auld. Reported-by: Matthew Auld <matthew.auld@intel.com> Fixes: 435e8fc059db ("drm/i915: Allow parsing of unsized batches") Testcase: igt/gem_exec_params/larger-than-life-batch Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20201015115954.871-1-chris@chris-wilson.co.uk
2020-10-15	drm/i915/gt: Fixup tgl mocs for PTE tracking	Chris Wilson
	Forcing mocs:1 [used for our winsys follows-pte mode] to be cached caused display glitches. Though it is documented as deprecated (and so likely behaves as uncached) use the follow-pte bit and force it out of L3 cache. Fixes: 4d8a5cfe3b13 ("drm/i915/gt: Initialize reserved and unspecified MOCS indices") Testcase: igt/kms_frontbuffer_tracking Testcase: igt/kms_big_fb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
2020-10-15	drm/i915: Enable eLLC caching of display buffers for SKL+	Ville Syrjälä
	Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. I chose WB for the caching mode, because my numbers are indicating that WT might actually be WB and WC might actually be UC. I'm not 100% sure that is indeed the case but at least my simple rendercopy based benchmark didn't see any difference in performance. Also if I configure things to do LLCeLLC+WT I still get cache dirt on my screen, suggesting that is in fact operating in WB mode anyway. This is also the reason I had to fix the MOCS target cache to really say PTE rather than LLC+eLLC. Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. Eero's earlier benchmarks numbers: "* Results in GfxBench and Unigine (Valley/Heaven) tests were within daily variation on the tested SKL machines * SKL GT4e (128MB eLLC) / Wayland / Weston: +15-20% SynMark TexMem512 (512MB of textures) +4-6% SynMark TerrainFly, CSCloth, ShMapVsm -5-10% SynMark TexMem128 (128MB of textures) SKL GT3e (64MB eLLC) / Xorg / Unity: +4-8% GpuTest Triangle fullscreen (FullHD) -5-10% GpuTest Triangle windowed (1/2 screen) * SKL GT2 (no eLLC) / Xorg / Unity: * Some of the higher FPS SynMark pixel and vertex shader tests are few percent higher, more than daily variance => Do you see any reason why this machine would be impacted although it doesn't eLLC?" Caveats: - Still haven't tested with a prime setup - Still not entirely sure this a good idea, but I've been using it on my cfl anyway :) v2: Split the MOCS PTE change out Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-3-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-3-chris@chris-wilson.co.uk
2020-10-15	drm/i915: Fix MOCS PTE setting for gen9+	Ville Syrjälä
	Fix up the MOCS PTE setting to really get the LLC cacheability from the PTE rather than hardocoding it to LLC or LLC+eLLC. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-2-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-2-chris@chris-wilson.co.uk
2020-10-15	drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup ↵	Ville Syrjälä
	during fbdev init Currently we leave the cache_level of the initial fb obj set to NONE. This means on eLLC machines the first pin_to_display() will try to switch it to WT which requires a vma unbind+bind. If that happens during the fbdev initialization rcu does not seem operational which causes the unbind to get stuck. To most appearances this looks like a dead machine on boot. Avoid the unbind by already marking the object cache_level as WT when creating it. We still do an excplicit ggtt pin which will rewrite the PTEs anyway, so they will match whatever cache level we set. Cc: <stable@vger.kernel.org> # v5.7+ Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
2020-10-13	drm/i915/gt: Initialize reserved and unspecified MOCS indices	Ayaz A Siddiqui
	In order to avoid functional breakage of mis-programmed applications that have grown to depend on unused MOCS entries, we are programming those entries to be equal to fully cached ("L3 + LLC") entry. These reserved and unspecified entries should not be used as they may be changed to less performant variants with better coherency in the future if more entries are needed. v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for programming default value, setting I915_MOCS_PTE index of tgl_mocs_table with desired value. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Francisco Jerez <currojerez@riseup.net> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Mcguire Russell W <russell.w.mcguire@intel.com> Cc: Spruit Neil R <neil.r.spruit@intel.com> Cc: Zhou Cheng <cheng.zhou@intel.com> Cc: Benemelis Mike G <mike.g.benemelis@intel.com> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com Cc: stable@vger.kernel.org
2020-10-07	drm/i915/gt: Track the most recent pulse for the heartbeat	Chris Wilson
	Since we track the idle_pulse for flushing the barriers and avoid re-emitting the pulse upon idling if no futher action is required, this also impacts the heartbeat. Before emitting a fresh heartbeat, we look at the engine idle status and assume that if the pulse was the last request emitted along the heartbeat, the engine is idling and a heartbeat pulse not required. This assumption fails, but we can reuse the idle pulse as the heartbeat if we are yet to emit one, and so track the status of that pulse for our engine health check. This impacts tgl/rcs0 as we rely on the heartbeat for our healthcheck for the normal preemption detection mechanism is disabled by default. Testcase: igt/gem_exec_schedule/preempt-hang/rcs0 #tgl Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201006094653.7558-1-chris@chris-wilson.co.uk
2020-10-06	drm/i915: Fix DMA mapped scatterlist lookup	Tvrtko Ursulin
	As the previous patch fixed the places where we walk the whole scatterlist for DMA addresses, this patch fixes the random lookup functionality. To achieve this we have to add a second lookup iterator and add a i915_gem_object_get_sg_dma helper, to be used analoguous to existing i915_gem_object_get_sg_dma. Therefore two lookup caches are maintained per object and they are flushed at the same point for simplicity. (Strictly speaking the DMA cache should be flushed from i915_gem_gtt_finish_pages, but today this conincides with unsetting of the pages in general.) Partial VMA view is then fixed to use the new DMA lookup and properly query sg length. v2: * Checkpatch. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Tom Murphy <murphyt7@tcd.ie> Cc: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-2-tvrtko.ursulin@linux.intel.com
2020-10-06	drm/i915: Fix DMA mapped scatterlist walks	Tvrtko Ursulin
	When walking DMA mapped scatterlists sg_dma_len has to be used since it can be different (coalesced) from the backing store entry. This also means we have to end the walk when encountering a zero length DMA entry and cannot rely on the normal sg list end marker. Both issues were there in theory for some time but were hidden by the fact Intel IOMMU driver was never coalescing entries. As there are ongoing efforts to change this we need to start handling it. v2: * Use unsigned int for local storing sg_dma_len. (Logan) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> References: 85d1225ec066 ("drm/i915: Introduce & use new lightweight SGL iterators") References: b31144c0daa8 ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()") Reported-by: Tom Murphy <murphyt7@tcd.ie> Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com
2020-10-06	drm/i915/gt: Scrub HW state on remove	Chris Wilson
	Currently we do a final scrub of the HW state upon release. However, when rebinding the device, this is too late as the device may either have been partially rebound or the device is no longer accessible. If the device has been removed before release, the reset goes astray leaving the device in an inconsistent state, unlikely to work without a full PCI reset. Furthermore, if the device is partially rebound before the HW scrubbing, there may be leftover HW state that should have been scrubbed. Either way, we need to push the scrubbing earlier before the removal, so into unregister. The danger is that on older machines, resetting the GPU also impact the display engine and so the reset should be after modesetting is disabled (and before reuse we need to recover modesetting). Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2508 Testcase: igt/core_hotunplug Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200929112639.24223-1-chris@chris-wilson.co.uk
2020-10-06	drm/i915: Skip over MI_NOOP when parsing	Chris Wilson
	Though less likely in practice, igt uses MI_NOOP frequently to pad out its batch buffers. The lookup and valiation of so many MI_NOOP command descriptions is noticeable, though the side-effect of poisoning the last-validated-command cache is more likely to impact upon real CS. Testcase: igt/gen9_exec_parse/bb-large Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201001102632.18789-1-chris@chris-wilson.co.uk
2020-09-30	drm/i915/gt: Retire cancelled requests on unload	Chris Wilson
	If we manage to hit the intel_gt_set_wedged_on_fini() while active, i.e. module unload during a stress test, we may cancel the requests but not clean up. This leads to a very slow module unload as we wait for something or other to trigger the retirement flushing, or timeout and unload with a bunch of warnings. Instead if we explicitly cancel then cleanup on an active unload, it should be instant and quiet. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-3-chris@chris-wilson.co.uk
2020-09-30	drm/i915/selftests: Finish pending mock requests on cancellation.	Chris Wilson
	Flush all the pending requests from the mock engine when they are cancelled. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-2-chris@chris-wilson.co.uk
2020-09-30	drm/i915/gt: Signal cancelled requests	Chris Wilson
	After marking the requests on an engine as cancelled upon wedging, send any signals for their completions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-1-chris@chris-wilson.co.uk
2020-09-29	drm/i915: Avoid mixing integer types during batch copies	Chris Wilson
	Be consistent and use unsigned long throughout the chunk copies to avoid the inherent clumsiness of mixing integer types of different widths and signs. Failing to take acount of a wider unsigned type when using min_t can lead to treating it as a negative, only for it flip back to a large unsigned value after passing a boundary check. Fixes: ed13033f0287 ("drm/i915/cmdparser: Only cache the dst vmap") Testcase: igt/gen9_exec_parse/bb-large Reported-by: "Candelaria, Jared" <jared.candelaria@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: "Candelaria, Jared" <jared.candelaria@intel.com> Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com> Cc: <stable@vger.kernel.org> # v4.9+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928215942.31917-1-chris@chris-wilson.co.uk
2020-09-29	drm/i915/gem: Always test execution status on closing the context	Chris Wilson
	Verify that if a context is active at the time it is closed, that it is either persistent and preemptible (with hangcheck running) or it shall be removed from execution. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Testcase: igt/gem_ctx_persistence/heartbeat-close Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-3-chris@chris-wilson.co.uk
2020-09-29	drm/i915/gt: Always send a pulse down the engine after disabling heartbeat	Chris Wilson
	Currently, we check we can send a pulse prior to disabling the heartbeat to verify that we can change the heartbeat, but since we may re-evaluate execution upon changing the heartbeat interval we need another pulse afterwards to refresh execution. v2: Tvrtko asked if we could reduce the double pulse to a single, which opened up a discussion of how we should handle the pulse-error after attempting to change the property, and the desire to serialise adjustment of the property with its validating pulse, and unwind upon failure. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-2-chris@chris-wilson.co.uk
2020-09-29	drm/i915: Cancel outstanding work after disabling heartbeats on an engine	Chris Wilson
	We only allow persistent requests to remain on the GPU past the closure of their containing context (and process) so long as they are continuously checked for hangs or allow other requests to preempt them, as we need to ensure forward progress of the system. If we allow persistent contexts to remain on the system after the the hangcheck mechanism is disabled, the system may grind to a halt. On disabling the mechanism, we sent a pulse along the engine to remove all executing contexts from the engine which would check for hung contexts -- but we did not prevent those contexts from being resubmitted if they survived the final hangcheck. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Testcase: igt/gem_ctx_persistence/heartbeat-stop Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
2020-09-26	drm/i915/gem: Hold request reference for canceling an active context	Chris Wilson
	We have to be very careful while walking the timeline->requests list under the RCU guard, as the requests (and so rq->link) use SLAB_TYPESAFE_BY_RCU and so the requests may be reallocated within an rcu grace period. As the requests are reallocated, they are removed from one list and placed on another, and if we are iterating over that request at that moment, the list iteration jumps from one list to the next and promptly gets confused. Verify we hold the request reference to ensure that the request is not added to a new list behind our backs. <4> [582.745252] general protection fault, probably for non-canonical address 0xcccccccccccccd5c: 0000 [#1] PREEMPT SMP PTI <4> [582.745297] CPU: 0 PID: 1475 Comm: gem_ctx_persist Not tainted 5.9.0-rc1-CI-CI_DRM_8908+ #1 <4> [582.745304] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018 <4> [582.745317] RIP: 0010:__lock_acquire+0x2c3/0x1f40 <4> [582.745323] Code: 00 65 8b 05 c7 8a ef 7e 85 c0 0f 85 b4 07 00 00 44 8b 9d c4 08 00 00 45 85 db 0f 84 0f 01 00 00 ba 05 00 00 00 e9 c8 06 00 00 <48> 81 3f c0 89 c7 82 b8 00 00 00 00 41 0f 45 c0 83 fe 01 41 89 c3 <4> [582.745334] RSP: 0018:ffffc9000461bc40 EFLAGS: 00010002 <4> [582.745340] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 <4> [582.745345] RDX: 0000000000000000 RSI: 0000000000000000 RDI: cccccccccccccd5c <4> [582.745350] RBP: ffff8881ec4a2880 R08: 0000000000000001 R09: 0000000000000001 <4> [582.745356] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 <4> [582.745361] R13: 0000000000000000 R14: 0000000000000000 R15: cccccccccccccd5c <4> [582.745367] FS: 00007fb44da78e40(0000) GS:ffff888278000000(0000) knlGS:0000000000000000 <4> [582.745373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [582.745378] CR2: 00007fb44daad040 CR3: 0000000268428000 CR4: 0000000000350ef0 <4> [582.745383] Call Trace: <4> [582.745390] ? __lock_acquire+0x913/0x1f40 <4> [582.745397] lock_acquire+0xb5/0x3c0 <4> [582.745526] ? kill_engines+0x19a/0x4b0 [i915] <4> [582.745533] ? find_held_lock+0x2d/0x90 <4> [582.745541] _raw_spin_lock_irq+0x30/0x40 <4> [582.745635] ? kill_engines+0x19a/0x4b0 [i915] <4> [582.745727] kill_engines+0x19a/0x4b0 [i915] <4> [582.745820] context_close+0x195/0x410 [i915] <4> [582.745912] i915_gem_context_close+0x5b/0x160 [i915] <4> [582.745994] i915_driver_postclose+0x14/0x40 [i915] <4> [582.746003] drm_file_free.part.13+0x240/0x290 <4> [582.746009] drm_release_noglobal+0x16/0x50 <4> [582.746016] __fput+0xa5/0x250 <4> [582.746021] task_work_run+0x6e/0xb0 <4> [582.746028] exit_to_user_mode_prepare+0x178/0x180 <4> [582.746034] syscall_exit_to_user_mode+0x36/0x220 <4> [582.746040] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4> [582.746045] RIP: 0033:0x7fb44d1dc421 <4> [582.746050] Code: f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 8b 05 ea cf 20 00 85 c0 75 16 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 <4> [582.746062] RSP: 002b:00007ffed2e83818 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 <4> [582.746069] RAX: 0000000000000000 RBX: 0000556410bfe840 RCX: 00007fb44d1dc421 <4> [582.746075] RDX: 000000000000000a RSI: 00000000c0406469 RDI: 0000000000000008 <4> [582.746080] RBP: 0000000000000008 R08: 00007fb44d1c51cc R09: 00007fb44d1c5240 <4> [582.746086] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000fffffffb <4> [582.746091] R13: 0000000000000006 R14: 0000000000000000 R15: 000000000000000a <4> [582.746099] Modules linked in: vgem mei_hdcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio btusb btrtl btbcm btintel x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul bluetooth ghash_clmulni_intel ecdh_generic ecc i915 r8169 realtek mei_me mei snd_hda_intel i2c_hid snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm pinctrl_geminilake pinctrl_intel prime_numbers [last unloaded: test_drm_mm] Fixes: 736e785f9b28 ("drm/i915/gem: Reduce context termination list iteration guard to RCU") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-2-chris@chris-wilson.co.uk
2020-09-26	drm/i915: Redo "Remove i915_request.lock requirement for execution callbacks"	Chris Wilson
	The reordering and rebasing of commit 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") caused it to revert an earlier correction. Let us restore commit 99f0a640d464 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-1-chris@chris-wilson.co.uk
2020-09-24	drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex	Chris Wilson
	Since the debugfs may peek into the GEM contexts as the corresponding client/fd is being closed, we may try and follow a dangling pointer. However, the context closure itself is serialised with the ctx->mutex, so if we hold that mutex as we inspect the state coupled in the context, we know the pointers within the context are stable and will remain valid as we inspect their tables. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200723172119.17649-3-chris@chris-wilson.co.uk
2020-09-23	drm/i915/uc: tune down GuC communication enabled/disabled messages	Jani Nikula
	The GuC communication enabled/disabled messages are too noisy in info level. Convert them from info to debug level, and switch to device based logging while at it. Reported-by: Karol Herbst <kherbst@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200917165056.29766-1-jani.nikula@intel.com
2020-09-22	drm/i915: check i915_vm_alloc_pt_stash for errors	Matthew Auld
	If we are really unlucky and encounter an error during i915_vm_alloc_pt_stash, we end up passing an empty pt/pd stash all the way down into the low-level ppgtt alloc code, leading to explosions, since it expects at least the required number of pt/pd for the va range. [ 211.981418] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 211.981421] #PF: supervisor read access in kernel mode [ 211.981422] #PF: error_code(0x0000) - not-present page [ 211.981424] PGD 80000008439cb067 P4D 80000008439cb067 PUD 84a37f067 PMD 0 [ 211.981427] Oops: 0000 [#1] SMP PTI [ 211.981428] CPU: 1 PID: 1301 Comm: i915_selftest Tainted: G U I 5.9.0-rc5+ #3 [ 211.981430] Hardware name: /NUC6i7KYB, BIOS KYSKLi70.86A.0050.2017.0831.1924 08/31/2017 [ 211.981521] RIP: 0010:__gen8_ppgtt_alloc+0x1ed/0x3c0 [i915] [ 211.981523] Code: c1 48 c7 c7 5d 5d fe c0 65 ff 0d ee 1d 03 3f e8 d9 91 1f e2 8b 55 c4 31 c0 48 8b 75 b8 85 d2 0f 95 c0 48 8b 1c c6 48 89 45 98 <48> 8b 03 48 8b 90 58 02 00 00 48 85 d2 0f 84 07 ea 15 00 48 81 fa [ 211.981526] RSP: 0018:ffffba2cc0eb3970 EFLAGS: 00010202 [ 211.981527] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000004 [ 211.981529] RDX: 0000000000000002 RSI: ffff9be998bdb8c0 RDI: ffff9be99c844300 [ 211.981530] RBP: ffffba2cc0eb39d8 R08: 0000000000000640 R09: ffff9be97cdfd000 [ 211.981531] R10: ffff9be97cdfd614 R11: 0000000000000000 R12: 0000000000000000 [ 211.981532] R13: ffff9be98607ba20 R14: ffff9be995a0b400 R15: ffffba2cc0eb39e8 [ 211.981534] FS: 00007f0f10b31000(0000) GS:ffff9be99fc40000(0000) knlGS:0000000000000000 [ 211.981536] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 211.981538] CR2: 0000000000000000 CR3: 000000084d74e006 CR4: 00000000003706e0 [ 211.981539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 211.981541] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 211.981542] Call Trace: [ 211.981609] gen8_ppgtt_alloc+0x79/0x90 [i915] [ 211.981678] ppgtt_bind_vma+0x36/0x80 [i915] [ 211.981756] __vma_bind+0x39/0x40 [i915] [ 211.981818] fence_work+0x21/0x98 [i915] [ 211.981879] fence_notify+0x8d/0x128 [i915] [ 211.981939] __i915_sw_fence_complete+0x62/0x240 [i915] [ 211.982018] i915_vma_pin_ww+0x1ee/0x9c0 [i915] Fixes: cd0452aa2a0d ("drm/i915: Preallocate stashes for vma page-directories") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200921160844.73186-1-matthew.auld@intel.com
2020-09-21	drm/i915: Fix uninitialised variable in intel_context_create_request.	Maarten Lankhorst
	In case backoff fails with an error, we return an undefined rq, assign err to rq correctly. Fixes: 8a929c9eb1c2 ("drm/i915: Use ww pinning for intel_context_create_request()") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200918111208.1392128-1-maarten.lankhorst@linux.intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-09-18	drm/i915/gt: Remove defunct intel_virtual_engine_get_sibling()	Chris Wilson
	As the last user was eliminated in commit e21fecdcde40 ("drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs"), we can remove the function. One less implementation detail creeping beyond its scope. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200826132811.17577-16-chris@chris-wilson.co.uk
2020-09-18	drm/i915: Reduce GPU error capture mutex hold time	Chris Wilson
	Shrink the hold time for the error capture mutex to just around the acquire/release of the PTE used for reading back the object via the Global GTT. For platforms that do not need the GGTT read back, we can skip the mutex entirely and allow concurrent error capture. Where we do use the GGTT, by restricting the hold time around the slow readback and compression, we are more resilient against softlockups (khungtaskd) as the heartbeat may well also trigger an error while the first is on going, and this allows the heartbeat reset to skip past the capture and not be stalled. Testcase: igt/gem_exec_capture/many-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-3-chris@chris-wilson.co.uk
2020-09-18	drm/i915: Break up error capture compression loops with cond_resched()	Chris Wilson
	As the error capture will compress user buffers as directed to by the user, it can take an arbitrary amount of time and space. Break up the compression loops with a call to cond_resched(), that will allow other processes to schedule (avoiding the soft lockups) and also serve as a warning should we try to make this loop atomic in the future. Testcase: igt/gem_exec_capture/many-* Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-2-chris@chris-wilson.co.uk
2020-09-18	drm/i915/gt: Show engine properties in the pretty printer	Chris Wilson
	When debugging the engine state, include the user properties that may, or may not, have been adjusted by the user/test. For example, vecs0 ... Properties: heartbeat_interval_ms: 2500 [default 2500] max_busywait_duration_ns: 8000 [default 8000] preempt_timeout_ms: 640 [default 640] stop_timeout_ms: 100 [default 100] timeslice_duration_ms: 1 [default 1] Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200916090059.3189-1-chris@chris-wilson.co.uk
2020-09-17	drm/i915: Fix an error code i915_gem_object_copy_blt()	Dan Carpenter
	This code should use "vma[1]" instead of "vma". The "vma" variable is a valid pointer. Fixes: 6b05030496f7 ("drm/i915: Convert i915_gem_object/client_blt.c to use ww locking as well, v2.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200911075243.GG12635@kadam
2020-09-15	drm/i915/gt: Use a mmio read of the CSB in case of failure	Chris Wilson
	If we find the GPU didn't update the CSB within 50us, we currently fail and eventually reset the GPU. Lets report the value from the mmio space as a last resort, it may just stave off an unnecessary GPU reset. References: HSDES#22011327657 Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-4-chris@chris-wilson.co.uk
2020-09-15	drm/i915/gt: Apply the CSB w/a for all	Chris Wilson
	Since we expect to inline the csb_parse() routines, the w/a for the stale CSB data on Tigerlake will be pulled into process_csb(), and so we might as well simply reuse the logic for all, and so will hopefully avoid any strange behaviour on Icelake that was not covered by our previous w/a. References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-3-chris@chris-wilson.co.uk
2020-09-15	drm/i915/gt: Wait for CSB entries on Tigerlake	Chris Wilson
	On Tigerlake, we are seeing a repeat of commit d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries") where, presumably, due to a missing Global Observation Point synchronisation, the write pointer of the CSB ringbuffer is updated _prior_ to the contents of the ringbuffer. That is we see the GPU report more context-switch entries for us to parse, but those entries have not been written, leading us to process stale events, and eventually report a hung GPU. However, this effect appears to be much more severe than we previously saw on Icelake (though it might be best if we try the same approach there as well and measure), and Bruce suggested the good idea of resetting the CSB entry after use so that we can detect when it has been updated by the GPU. By instrumenting how long that may be, we can set a reliable upper bound for how long we should wait for: 513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045 References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries") References: HSDES#22011327657, HSDES#1508287568 Suggested-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org # v5.4 Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
2020-09-15	drm/i915/gt: Widen CSB pointer to u64 for the parsers	Chris Wilson
	A CSB entry is 64b, and it is simpler for us to treat it as an array of 64b entries than as an array of pairs of 32b entries. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
2020-09-15	drm/i915/gt: Check for a registered driver with IPS	Chris Wilson
	If the ips module calls into the driver during an unbind/bind cycle, we may see the driver while it has unregistered itself from ips and try and dereference a NULL ips_mchdev pointer. <1> [211.928844] BUG: kernel NULL pointer dereference, address: 0000000000000014 <1> [211.928861] #PF: supervisor read access in kernel mode <1> [211.928871] #PF: error_code(0x0000) - not-present page <6> [211.928881] PGD 0 P4D 0 <4> [211.928890] Oops: 0000 [#1] PREEMPT SMP PTI <4> [211.928900] CPU: 3 PID: 327 Comm: ips-monitor Not tainted 5.9.0-rc5-CI-CI_DRM_9008+ #1 <4> [211.928914] Hardware name: Hewlett-Packard HP EliteBook 8440p/172A, BIOS 68CCU Ver. F.24 09/13/2013 <4> [211.929056] RIP: 0010:mchdev_get+0x5a/0x180 [i915] <4> [211.929067] Code: c0 5a 74 0d 80 3d f1 53 29 00 00 0f 84 ab 00 00 00 48 8b 1d c8 a8 29 00 e8 d3 18 89 e1 85 c0 74 09 80 3d d1 53 29 00 00 74 65 <8b> 4b 14 48 8d 7b 14 85 c9 0f 84 09 01 00 00 8d 51 01 89 c8 f0 0f <4> [211.929095] RSP: 0018:ffffc900002efe60 EFLAGS: 00010202 <4> [211.929105] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff8881297acf40 <4> [211.929118] RDX: 0000000000000000 RSI: ffffffff8264e2c0 RDI: ffff8881297ad820 <4> [211.929130] RBP: ffffc900002efe68 R08: ffff8881297ad820 R09: 00000000fffffffe <4> [211.929143] R10: ffff8881297acf40 R11: 00000000fff74c96 R12: ffff8881294dfa18 <4> [211.929155] R13: 0000000000000067 R14: ffff888126eff640 R15: ffff888126efe840 <4> [211.929168] FS: 0000000000000000(0000) GS:ffff888133d80000(0000) knlGS:0000000000000000 <4> [211.929182] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [211.929194] CR2: 0000000000000014 CR3: 0000000002610000 CR4: 00000000000006e0 <4> [211.929206] Call Trace: <4> [211.929294] i915_read_mch_val+0x15/0x380 [i915] <4> [211.929309] ? ips_monitor+0x3fb/0x630 [intel_ips] <4> [211.929321] ips_monitor+0x53c/0x630 [intel_ips] <4> [211.929334] ? ips_gpu_lower+0x30/0x30 [intel_ips] <4> [211.929348] kthread+0x14d/0x170 <4> [211.929358] ? kthread_park+0x80/0x80 <4> [211.929369] ret_from_fork+0x22/0x30 <4> [211.929382] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei intel_ips lpc_ich ptp prime_numbers pps_core <4> [211.929437] CR2: 0000000000000014 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915105113.26564-1-chris@chris-wilson.co.uk
2020-09-15	drm/i915/gt: Clear the buffer pool age before use	Chris Wilson
	If we create a new node, it is possible for the slab allocator to return us a recently freed node. If that node was just retired, it will retain the current jiffy as its node->age. There is then a miniscule window, where as that node is retired, it will appear on the free list with an incorrect age and be eligible for reuse by one thread, and then by a second thread as the correct node->age is written. Fixes: 06b73c2d0b65 ("drm/i915/gt: Delay taking the spinlock for grabbing from the buffer pool") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-3-chris@chris-wilson.co.uk
2020-09-15	drm/i915/gem: Prevent using pgprot_writecombine() if PAT is not supported	Chris Wilson
	Let's not try and use PAT attributes for I915_MAP_WC if the CPU doesn't support PAT. Fixes: 6056e50033d9 ("drm/i915/gem: Support discontiguous lmem object maps") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-2-chris@chris-wilson.co.uk
2020-09-15	drm/i915/gem: Avoid implicit vmap for highmem on x86-32	Chris Wilson
	On 32b, highmem using a finite set of indirect PTE (i.e. vmap) to provide virtual mappings of the high pages. As these are finite, map_new_virtual() must wait for some other kmap() to finish when it runs out. If we map a large number of objects, there is no method for it to tell us to release the mappings, and we deadlock. However, if we make an explicit vmap of the page, that uses a larger vmalloc arena, and also has the ability to tell us to release unwanted mappings. Most importantly, it will fail and propagate an error instead of waiting forever. Fixes: fb8621d3bee8 ("drm/i915: Avoid allocating a vmap arena for a single page") #x86-32 References: e87666b52f00 ("drm/i915/shrinker: Hook up vmap allocation failure notifier") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v4.7+ Link: https://patchwork.freedesktop.org/patch/msgid/20200915091417.4086-1-chris@chris-wilson.co.uk
2020-09-07	drm/i915: Unlock the shared hwsp_gtt object after pinning	Thomas Hellström
	The hwsp_gtt object is used for sub-allocation and could therefore be shared by many contexts causing unnecessary contention during concurrent context pinning. However since we're currently locking it only for pinning, it remains resident until we unpin it, and therefore it's safe to drop the lock early, allowing for concurrent thread access. Signed-off-by: Thomas Hellström <thomas.hellstrom@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>