summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2018-07-16drm: re-enable error handlingNicholas Mc Guire
drm_legacy_ctxbitmap_next() returns idr_alloc() which can return -ENOMEM, -EINVAL or -ENOSPC none of which are -1 . but the call sites of drm_legacy_ctxbitmap_next() seem to be assuming that the error case would be -1 (original return of drm_ctxbitmap_next() prior to 2.6.23 was actually -1). Thus reenable error handling by checking for < 0. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Fixes: 62968144e673 ("drm: convert drm context code to use Linux idr") Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/1531571532-22733-1-git-send-email-hofrat@osadl.org
2018-07-16drm/i915/execlists: Disable submission tasklet upon wedgingChris Wilson
If we declare the driver wedged before the GPU truly is, then we may see the GPU complete some CS events following our cancellation. This leaves us quite confused as we deleted all the bookkeeping and thus complain about the inconsistent state. We can just ignore the remaining events and let the GPU idle by not feeding it, and so avoid trying to racily overwrite shared state. We rely on there being a full GPU reset before unwedging, giving us the opportunity to reset the shared state. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180716080332.32283-4-chris@chris-wilson.co.uk
2018-07-16drm/i915: Remove pci private pointer after destroying the device privateChris Wilson
On an aborted module load, we unwind and free our device private - but we left a dangling pointer to our privates inside the pci_device. After the attempted aborted unload, we may still get a call to i915_pci_remove() when the module is removed, potentially chasing stale data. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180716080332.32283-5-chris@chris-wilson.co.uk
2018-07-16drm/i915/selftests: Downgrade igt_timeout messageChris Wilson
Give in, since CI continues to incorrectly insist that KERN_NOTICE is a warning and flags the timeout message as unwanted spam. At first, the intention was to use the message to indicate which tests might warrant an extended run, but virtually all tests require a timeout so it is simply not as interesting as first thought. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103667 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180716080332.32283-6-chris@chris-wilson.co.uk
2018-07-16drm/meson: Make DMT timings parameters and pixel clock genericNeil Armstrong
Remove the modes timings tables for DMT modes and calculate the HW paremeters from the modes timings. Switch the DMT modes pixel clock calculation out of the static frequency list to a generic calculation from a range of possible PLL dividers. This patch is an intermediate step towards usage of the Common Clock Framwework for PLL setup, by reworking the code to have common sel_pll() function called by the CEA (HDMI) freq setup and the generic DMT frequencies setup, we should be able to simply call clk_set_rate() on the PLL clock handle in a near future. The CEA (HDMI) and CVBS modes needs very specific clock paths that CCF will never be able to determine by itself, so there is still some work to do for a full handoff to CCF handling the clocks. This setup permits setting non-CEA modes like : - 1600x900-60Hz - 1280x1024-75Hz - 1280x1024-60Hz - 1440x900-60Hz - 1366x768-60Hz - 1280x800-60Hz - 1152x864-75Hz - 1024x768-75Hz - 1024x768-70Hz - 1024x768-60Hz - 832x624-75Hz - 800x600-75Hz - 800x600-72Hz - 800x600-60Hz - 640x480-75Hz - 640x480-73Hz - 640x480-67Hz Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Jerome Brunet <jbrunet@baylibre.com> [narmstrong: fixed trivial checkpatch issues] Link: https://patchwork.freedesktop.org/patch/msgid/1531726814-14638-1-git-send-email-narmstrong@baylibre.com
2018-07-16drm/nouveau: tegra: Detach from ARM DMA/IOMMU mappingThierry Reding
Depending on the kernel configuration, early ARM architecture setup code may have attached the GPU to a DMA/IOMMU mapping that transparently uses the IOMMU to back the DMA API. Tegra requires special handling for IOMMU backed buffers (a special bit in the GPU's MMU page tables indicates the memory path to take: via the SMMU or directly to the memory controller). Transparently backing DMA memory with an IOMMU prevents Nouveau from properly handling such memory accesses and causes memory access faults. As a side-note: buffers other than those allocated in instance memory don't need to be physically contiguous from the GPU's perspective since the GPU can map them into contiguous buffers using its own MMU. Mapping these buffers through the IOMMU is unnecessary and will even lead to performance degradation because of the additional translation. One exception to this are compressible buffers which need large pages. In order to enable these large pages, multiple small pages will have to be combined into one large (I/O virtually contiguous) mapping via the IOMMU. However, that is a topic outside the scope of this fix and isn't currently supported. An implementation will want to explicitly create these large pages in the Nouveau driver, so detaching from a DMA/IOMMU mapping would still be required. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nicolas Chauvet <kwizart@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/secboot/acr: Remove VLA usageKees Cook
In the quest to remove all stack VLA usage from the kernel[1], this allocates the working buffers before starting the writing so it won't abort in the middle. This needs an initial walk of the lists to figure out how large the buffer should be. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_dev_unref with drm_dev_putThomas Zimmermann
This patch unifies the naming of DRM functions for reference counting of struct drm_device. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_gem_object_unreference_unlocked with put functionThomas Zimmermann
This patch unifies the naming of DRM functions for reference counting of struct drm_gem_object. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Replace drm_framebuffer_{un/reference} with put, get functionsThomas Zimmermann
This patch unifies the naming of DRM functions for reference counting of struct drm_framebuffer. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by: Thomas Zimmermann <tdz@users.sourceforge.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/nvif: remove const attribute from nvif_mclassNick Desaulniers
Similar to commit 0bf8bf50eddc ("module: Remove const attribute from alias for MODULE_DEVICE_TABLE") Fixes many -Wduplicate-decl-specifier warnings due to the combination of const typeof() of already const variables. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/hwmon: potential uninitialized variablesDan Carpenter
Smatch complains that "value" can be uninitialized when kstrtol() returns -ERANGE. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Fix runtime PM leak in drm_open()Lyude Paul
Noticed this as I was skimming through, if we fail to allocate memory for cli we'll end up returning without dropping the runtime PM ref we got. Additionally, we'll even return the wrong return code! (ret most likely will == 0 here, we want -ENOMEM). Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/debugfs: Wake up GPU before doing any reclockingKarol Herbst
Fixes various reclocking related issues on prime systems. Signed-off-by: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/bios/vpstate: There are some fermi vbios with no boost or tdp entryKarol Herbst
If the entry size is too small, default to invalid values for both boost_id and tdp_id, so as to default to the base clock in both cases. Signed-off-by: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Martin Peres <martin.peres@free.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/kms/nv50-: Allow vblank_disable_immediateMario Kleiner
With instantaneous high precision vblank timestamping that updates at leading edge of vblank, the emulated "hw vblank counter" from vblank timestamping, which increments at leading edge of vblank, and reliable page flip execution and completion at leading edge of vblank, we should meet the requirements for fast/ immediate vblank irq disable/enable. This is only allowed on nv50+ gpu's, ie. the ones with atomic modesetting. One requirement for immediate vblank disable is that high precision vblank timestamping works reliably all the time on all connectors. This is not the case on all pre-nv50 parts for analog VGA outputs, where we currently don't always have support for scanout position queries and therefore fall back to vblank interrupt timestamping. The implementation in nv04_head_state() does not return valid values for vblanks, vtotal, hblanks, htotal for VGA outputs on all cards, but those are needed for scanout position queries. Testing on Linux-4.12-rc5 + drm-next on a GeForce 9500 GT (NV G96) with timing measurement equipment indicates this works fine, so allow immediate vblank disable for power saving. For debugging in case of unexpected trouble, booting with kernel cmdline option drm.vblankoffdelay=0 (or echo 0 > /sys/module/drm/parameters/vblankoffdelay) would keep vblank irqs permanently on to approximate old behavior. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/kms/nv50-: remove duplicate assignmentBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/kms/nv50-: fix drm-get-put.cocci warningskbuild test robot
Use drm_*_get() and drm_*_put() helpers instead of drm_*_reference() and drm_*_unreference() helpers. Generated by: scripts/coccinelle/api/drm-get-put.cocci Fixes: 30ed49b55b6e ("drm/nouveau/kms/nv50-: move code underneath dispnv50/") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/disp/nv50-gp10x: fix coverity warningBen Skeggs
Change values to u32, there's no need for them to be 64-bit. Reported-by: Colin Ian King <colin.king@canonical.com> Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/core: ERR_PTR vs NULL bug in nvkm_engine_info()Ben Skeggs
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/mmu/gp10b: remove ghost fileJérôme Glisse
This ghost file have been haunting us. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/secboot/tegra: Enable gp20b/gp10b firmware tag when relevantNicolas Chauvet
This allows to have the related MODULE_FIRMWARE tag only on relevant arch (arm64). This will saves about 400k on initramfs when not relevant Signed-off-by: Nicolas Chauvet <kwizart@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/fault/gv100: fix fault buffer initialisationBen Skeggs
Not sure how this happened, it worked last time I tested it! Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/gr/gv100: handle multiple SM-per-TPC for shader exceptionsBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfsLyude Paul
Currently nouveau doesn't actually expose the state debugfs file that's usually provided for any modesetting driver that supports atomic, even if nouveau is loaded with atomic=1. This is due to the fact that the standard debugfs files that DRM creates for atomic drivers is called when drm_get_pci_dev() is called from nouveau_drm.c. This happens well before we've initialized the display core, which is currently responsible for setting the DRIVER_ATOMIC cap. So, move the atomic option into nouveau_drm.c and just add the DRIVER_ATOMIC cap whenever it's enabled on the kernel commandline. This shouldn't cause any actual issues, as the atomic ioctl will still fail as expected even if the display core doesn't disable it until later in the init sequence. This also provides the added benefit of being able to use the state debugfs file to check the current display state even if clients aren't allowed to modify it through anything other than the legacy ioctls. Additionally, disable the DRIVER_ATOMIC cap in nv04's display core, as this was already disabled there previously. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Remove bogus crtc check in pmops_runtime_idleLyude Paul
This both uses the legacy modesetting structures in a racy manner, and additionally also doesn't even check the right variable (enabled != the CRTC is actually turned on for atomic). This fixes issues on my P50 regarding the dedicated GPU not entering runtime suspend. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()Lyude Paul
A CRTC being enabled doesn't mean it's on! It doesn't even necessarily mean it's being used. This fixes runtime PM leaks on the P50 I've got next to me. Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Avoid looping through fake MST connectorsLyude Paul
When MST and atomic were introduced to nouveau, another structure that could contain a drm_connector embedded within it was introduced; struct nv50_mstc. This meant that we no longer would be able to simply loop through our connector list and assume that nouveau_connector() would return a proper pointer for each connector, since the assertion that all connectors coming from nouveau have a full nouveau_connector struct became invalid. Unfortunately, none of the actual code that looped through connectors ever got updated, which means that we've been causing invalid memory accesses for quite a while now. An example that was caught by KASAN: [ 201.038698] ================================================================== [ 201.038792] BUG: KASAN: slab-out-of-bounds in nvif_notify_get+0x190/0x1a0 [nouveau] [ 201.038797] Read of size 4 at addr ffff88076738c650 by task kworker/0:3/718 [ 201.038800] [ 201.038822] CPU: 0 PID: 718 Comm: kworker/0:3 Tainted: G O 4.18.0-rc4Lyude-Test+ #1 [ 201.038825] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018 [ 201.038882] Workqueue: events nouveau_display_hpd_work [nouveau] [ 201.038887] Call Trace: [ 201.038894] dump_stack+0xa4/0xfd [ 201.038900] print_address_description+0x71/0x239 [ 201.038929] ? nvif_notify_get+0x190/0x1a0 [nouveau] [ 201.038935] kasan_report.cold.6+0x242/0x2fe [ 201.038942] __asan_report_load4_noabort+0x19/0x20 [ 201.038970] nvif_notify_get+0x190/0x1a0 [nouveau] [ 201.038998] ? nvif_notify_put+0x1f0/0x1f0 [nouveau] [ 201.039003] ? kmsg_dump_rewind_nolock+0xe4/0xe4 [ 201.039049] nouveau_display_init.cold.12+0x34/0x39 [nouveau] [ 201.039089] ? nouveau_user_framebuffer_create+0x120/0x120 [nouveau] [ 201.039133] nouveau_display_resume+0x5c0/0x810 [nouveau] [ 201.039173] ? nvkm_client_ioctl+0x20/0x20 [nouveau] [ 201.039215] nouveau_do_resume+0x19f/0x570 [nouveau] [ 201.039256] nouveau_pmops_runtime_resume+0xd8/0x2a0 [nouveau] [ 201.039264] pci_pm_runtime_resume+0x130/0x250 [ 201.039269] ? pci_restore_standard_config+0x70/0x70 [ 201.039275] __rpm_callback+0x1f2/0x5d0 [ 201.039279] ? rpm_resume+0x560/0x18a0 [ 201.039283] ? pci_restore_standard_config+0x70/0x70 [ 201.039287] ? pci_restore_standard_config+0x70/0x70 [ 201.039291] ? pci_restore_standard_config+0x70/0x70 [ 201.039296] rpm_callback+0x175/0x210 [ 201.039300] ? pci_restore_standard_config+0x70/0x70 [ 201.039305] rpm_resume+0xcc3/0x18a0 [ 201.039312] ? rpm_callback+0x210/0x210 [ 201.039317] ? __pm_runtime_resume+0x9e/0x100 [ 201.039322] ? kasan_check_write+0x14/0x20 [ 201.039326] ? do_raw_spin_lock+0xc2/0x1c0 [ 201.039333] __pm_runtime_resume+0xac/0x100 [ 201.039374] nouveau_display_hpd_work+0x67/0x1f0 [nouveau] [ 201.039380] process_one_work+0x7a0/0x14d0 [ 201.039388] ? cancel_delayed_work_sync+0x20/0x20 [ 201.039392] ? lock_acquire+0x113/0x310 [ 201.039398] ? kasan_check_write+0x14/0x20 [ 201.039402] ? do_raw_spin_lock+0xc2/0x1c0 [ 201.039409] worker_thread+0x86/0xb50 [ 201.039418] kthread+0x2e9/0x3a0 [ 201.039422] ? process_one_work+0x14d0/0x14d0 [ 201.039426] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 201.039431] ret_from_fork+0x3a/0x50 [ 201.039441] [ 201.039444] Allocated by task 79: [ 201.039449] save_stack+0x43/0xd0 [ 201.039452] kasan_kmalloc+0xc4/0xe0 [ 201.039456] kmem_cache_alloc_trace+0x10a/0x260 [ 201.039494] nv50_mstm_add_connector+0x9a/0x340 [nouveau] [ 201.039504] drm_dp_add_port+0xff5/0x1fc0 [drm_kms_helper] [ 201.039511] drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper] [ 201.039518] drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper] [ 201.039525] drm_dp_mst_link_probe_work+0x71/0xb0 [drm_kms_helper] [ 201.039529] process_one_work+0x7a0/0x14d0 [ 201.039533] worker_thread+0x86/0xb50 [ 201.039537] kthread+0x2e9/0x3a0 [ 201.039541] ret_from_fork+0x3a/0x50 [ 201.039543] [ 201.039546] Freed by task 0: [ 201.039549] (stack is not available) [ 201.039551] [ 201.039555] The buggy address belongs to the object at ffff88076738c1a8 which belongs to the cache kmalloc-2048 of size 2048 [ 201.039559] The buggy address is located 1192 bytes inside of 2048-byte region [ffff88076738c1a8, ffff88076738c9a8) [ 201.039563] The buggy address belongs to the page: [ 201.039567] page:ffffea001d9ce200 count:1 mapcount:0 mapping:ffff88084000d0c0 index:0x0 compound_mapcount: 0 [ 201.039573] flags: 0x8000000000008100(slab|head) [ 201.039578] raw: 8000000000008100 ffffea001da3be08 ffffea001da25a08 ffff88084000d0c0 [ 201.039582] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 [ 201.039585] page dumped because: kasan: bad access detected [ 201.039588] [ 201.039591] Memory state around the buggy address: [ 201.039594] ffff88076738c500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 201.039598] ffff88076738c580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 201.039601] >ffff88076738c600: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc [ 201.039604] ^ [ 201.039607] ffff88076738c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 201.039611] ffff88076738c700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 201.039613] ================================================================== Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau: Use drm_connector_list_iter_* for iterating connectorsLyude Paul
Every codepath in nouveau that loops through the connector list currently does so using the old method, which is prone to race conditions from MST connectors being created and destroyed. This has been causing a multitude of problems, including memory corruption from trying to access connectors that have already been freed! Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply()Dan Carpenter
The bo array has req->nr_buffers elements so the > should be >= so we don't read beyond the end of the array. Fixes: a1606a9596e5 ("drm/nouveau: new gem pushbuf interface, bump to 0.0.16") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-07-16drm/nouveau/kms/nv50-: ensure window updates are submitted when flushing mst ↵Ben Skeggs
disables It was possible for this to be skipped when shutting down MST streams, and leaving the core channel interlocked with a wndw channel update that never happens - leading to a hung display. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Tested-By: Lyude Paul <lyude@redhat.com>
2018-07-16Merge 4.18-rc5 into char-misc-nextGreg Kroah-Hartman
We want the char-misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-16Merge tag 'drm-intel-fixes-2018-07-12' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes I already pulled the first fix, pull the GVT fixes. - GVT fix for KBL vGPU hang to update virtual register from LRI. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180713070922.GA19840@intel.com
2018-07-16Merge branch 'drm-armada-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm ↵Dave Airlie
into drm-fixes Two armada fixes. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180713075427.GA16160@rmk-PC.armlinux.org.uk
2018-07-16Merge tag 'drm-misc-fixes-2018-07-13' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Fixes for v4.18-rc5: - Single fix for a build error when the driver is builtin, but the backend is a loadable module. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/9c596cf5-3f24-070e-74f2-c59bfbaf68fa@linux.intel.com
2018-07-16Merge tag 'drm/tegra/for-4.18-rc5' of ↵Dave Airlie
git://anongit.freedesktop.org/tegra/linux into drm-fixes drm/tegra: Fixes for v4.18-rc5 This contains a couple of one- or two-line fixes for various minor issues in the Tegra driver. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180712070142.15571-1-thierry.reding@gmail.com
2018-07-16Merge branch 'drm-fixes-4.18' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes A few display and GPUVM fixes for 4.18. A few more fixes for 4.18. Two display fixes and a fix to avoid a segfault if the GPU does not power up properly on resume. These are on top of my pull from earlier this week. Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180712043820.2877-1-alexander.deucher@amd.com
2018-07-16Merge tag 'drm-intel-fixes-2018-07-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix hotplug irq ack on i965/g4x (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180710213249.GA16479@intel.com
2018-07-14drm/amdkfd: Add CU-masking ioctl to KFDFelix Kuehling
CU-masking allows a KFD client to control the set of CUs used by a user mode queue for executing compute dispatches. This can be used for optimizing the partitioning of the GPU and minimize conflicts between concurrent tasks. Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-14drm/i915/guc: Disable rpm wakeref asserts in GuC irq handlerMichał Winiarski
We're seeing "RPM wakelock ref not held during HW access" warning otherwise. Since IRQs are synced for runtime suspend we can just disable the wakeref asserts. Reported-by: Marta Löfstedt <marta.lofstedt@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105710 Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180714173703.7894-1-chris@chris-wilson.co.uk Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2018-07-13drm/i915/execlists: Drop clear_gtiir() on GPU resetChris Wilson
With the new CSB processing code, we are not vulnerable to delayed delivery of a pre-reset interrupt as we use the CSB status pointers in the HWSP to decide if we need to parse any CSB events and no longer need to wait for the first post-reset interrupt to be assured that the CSB mmio registers are valid. The new icl code to clear registers has a nasty lock inversion: [ 57.409776] ====================================================== [ 57.409779] WARNING: possible circular locking dependency detected [ 57.409783] 4.18.0-rc4-CI-CI_DII_1137+ #1 Tainted: G U W [ 57.409785] ------------------------------------------------------ [ 57.409788] swapper/6/0 is trying to acquire lock: [ 57.409790] 000000004f304ee5 (&engine->timeline.lock/1){-.-.}, at: execlists_submit_request+0x2b/0x1a0 [i915] [ 57.409841] but task is already holding lock: [ 57.409844] 00000000aad89594 (&(&rq->lock)->rlock#2){-.-.}, at: notify_ring+0x2b2/0x480 [i915] [ 57.409869] which lock already depends on the new lock. [ 57.409872] the existing dependency chain (in reverse order) is: [ 57.409876] -> #2 (&(&rq->lock)->rlock#2){-.-.}: [ 57.409900] notify_ring+0x2b2/0x480 [i915] [ 57.409922] gen8_cs_irq_handler+0x39/0xa0 [i915] [ 57.409943] gen11_irq_handler+0x2f0/0x420 [i915] [ 57.409949] __handle_irq_event_percpu+0x42/0x370 [ 57.409952] handle_irq_event_percpu+0x2b/0x70 [ 57.409956] handle_irq_event+0x2f/0x50 [ 57.409959] handle_edge_irq+0xe7/0x190 [ 57.409964] handle_irq+0x67/0x160 [ 57.409967] do_IRQ+0x5e/0x120 [ 57.409971] ret_from_intr+0x0/0x1d [ 57.409974] _raw_spin_unlock_irqrestore+0x4e/0x60 [ 57.409979] tasklet_action_common.isra.5+0x47/0xb0 [ 57.409982] __do_softirq+0xd9/0x505 [ 57.409985] irq_exit+0xa9/0xc0 [ 57.409988] do_IRQ+0x9a/0x120 [ 57.409991] ret_from_intr+0x0/0x1d [ 57.409995] cpuidle_enter_state+0xac/0x360 [ 57.409999] do_idle+0x1f3/0x250 [ 57.410004] cpu_startup_entry+0x6a/0x70 [ 57.410010] start_secondary+0x19d/0x1f0 [ 57.410015] secondary_startup_64+0xa5/0xb0 [ 57.410018] -> #1 (&(&dev_priv->irq_lock)->rlock){-.-.}: [ 57.410081] clear_gtiir+0x30/0x200 [i915] [ 57.410116] execlists_reset+0x6e/0x2b0 [i915] [ 57.410140] i915_reset_engine+0x111/0x190 [i915] [ 57.410165] i915_handle_error+0x11a/0x4a0 [i915] [ 57.410198] i915_hangcheck_elapsed+0x378/0x530 [i915] [ 57.410204] process_one_work+0x248/0x6c0 [ 57.410207] worker_thread+0x37/0x380 [ 57.410211] kthread+0x119/0x130 [ 57.410215] ret_from_fork+0x3a/0x50 [ 57.410217] -> #0 (&engine->timeline.lock/1){-.-.}: [ 57.410224] _raw_spin_lock_irqsave+0x33/0x50 [ 57.410256] execlists_submit_request+0x2b/0x1a0 [i915] [ 57.410289] submit_notify+0x8d/0x124 [i915] [ 57.410314] __i915_sw_fence_complete+0x81/0x250 [i915] [ 57.410339] dma_i915_sw_fence_wake+0xd/0x20 [i915] [ 57.410344] dma_fence_signal_locked+0x79/0x200 [ 57.410368] notify_ring+0x2ba/0x480 [i915] [ 57.410392] gen8_cs_irq_handler+0x39/0xa0 [i915] [ 57.410416] gen11_irq_handler+0x2f0/0x420 [i915] [ 57.410421] __handle_irq_event_percpu+0x42/0x370 [ 57.410425] handle_irq_event_percpu+0x2b/0x70 [ 57.410428] handle_irq_event+0x2f/0x50 [ 57.410432] handle_edge_irq+0xe7/0x190 [ 57.410436] handle_irq+0x67/0x160 [ 57.410439] do_IRQ+0x5e/0x120 [ 57.410445] ret_from_intr+0x0/0x1d [ 57.410449] cpuidle_enter_state+0xac/0x360 [ 57.410453] do_idle+0x1f3/0x250 [ 57.410456] cpu_startup_entry+0x6a/0x70 [ 57.410460] start_secondary+0x19d/0x1f0 [ 57.410464] secondary_startup_64+0xa5/0xb0 [ 57.410466] other info that might help us debug this: [ 57.410471] Chain exists of: &engine->timeline.lock/1 --> &(&dev_priv->irq_lock)->rlock --> &(&rq->lock)->rlock#2 [ 57.410481] Possible unsafe locking scenario: [ 57.410485] CPU0 CPU1 [ 57.410487] ---- ---- [ 57.410490] lock(&(&rq->lock)->rlock#2); [ 57.410494] lock(&(&dev_priv->irq_lock)->rlock); [ 57.410498] lock(&(&rq->lock)->rlock#2); [ 57.410503] lock(&engine->timeline.lock/1); [ 57.410506] *** DEADLOCK *** [ 57.410511] 4 locks held by swapper/6/0: [ 57.410514] #0: 0000000074575789 (&(&dev_priv->irq_lock)->rlock){-.-.}, at: gen11_irq_handler+0x8a/0x420 [i915] [ 57.410542] #1: 000000009b29b30e (rcu_read_lock){....}, at: notify_ring+0x1a/0x480 [i915] [ 57.410573] #2: 00000000aad89594 (&(&rq->lock)->rlock#2){-.-.}, at: notify_ring+0x2b2/0x480 [i915] [ 57.410601] #3: 000000009b29b30e (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] [ 57.410635] stack backtrace: [ 57.410640] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G U W 4.18.0-rc4-CI-CI_DII_1137+ #1 [ 57.410644] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.2222.A01.1805300339 05/30/2018 [ 57.410650] Call Trace: [ 57.410652] <IRQ> [ 57.410657] dump_stack+0x67/0x9b [ 57.410662] print_circular_bug.isra.16+0x1c8/0x2b0 [ 57.410666] __lock_acquire+0x1897/0x1b50 [ 57.410671] ? lock_acquire+0xa6/0x210 [ 57.410674] lock_acquire+0xa6/0x210 [ 57.410706] ? execlists_submit_request+0x2b/0x1a0 [i915] [ 57.410711] _raw_spin_lock_irqsave+0x33/0x50 [ 57.410741] ? execlists_submit_request+0x2b/0x1a0 [i915] [ 57.410769] execlists_submit_request+0x2b/0x1a0 [i915] [ 57.410774] ? _raw_spin_unlock_irqrestore+0x39/0x60 [ 57.410804] submit_notify+0x8d/0x124 [i915] [ 57.410828] __i915_sw_fence_complete+0x81/0x250 [i915] [ 57.410854] dma_i915_sw_fence_wake+0xd/0x20 [i915] [ 57.410858] dma_fence_signal_locked+0x79/0x200 [ 57.410882] notify_ring+0x2ba/0x480 [i915] [ 57.410907] gen8_cs_irq_handler+0x39/0xa0 [i915] [ 57.410933] gen11_irq_handler+0x2f0/0x420 [i915] [ 57.410938] __handle_irq_event_percpu+0x42/0x370 [ 57.410943] handle_irq_event_percpu+0x2b/0x70 [ 57.410947] handle_irq_event+0x2f/0x50 [ 57.410951] handle_edge_irq+0xe7/0x190 [ 57.410955] handle_irq+0x67/0x160 [ 57.410958] do_IRQ+0x5e/0x120 [ 57.410962] common_interrupt+0xf/0xf [ 57.410965] </IRQ> [ 57.410969] RIP: 0010:cpuidle_enter_state+0xac/0x360 [ 57.410972] Code: 44 00 00 31 ff e8 84 93 91 ff 45 84 f6 74 12 9c 58 f6 c4 02 0f 85 31 02 00 00 31 ff e8 7d 30 98 ff e8 e8 0e 94 ff fb 4c 29 fb <48> ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 ea b8 ff [ 57.411015] RSP: 0018:ffffc90000133e90 EFLAGS: 00000216 ORIG_RAX: ffffffffffffffdd [ 57.411023] RAX: ffff8804ae748040 RBX: 000000000002a97d RCX: 0000000000000000 [ 57.411029] RDX: 0000000000000046 RSI: ffffffff82141263 RDI: ffffffff820f05a7 [ 57.411035] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ 57.411041] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8229f078 [ 57.411045] R13: ffff8804ab2adfa8 R14: 0000000000000000 R15: 0000000d5de092e3 [ 57.411052] do_idle+0x1f3/0x250 [ 57.411055] cpu_startup_entry+0x6a/0x70 [ 57.411059] start_secondary+0x19d/0x1f0 [ 57.411064] secondary_startup_64+0xa5/0xb0 The easiest remedy is to remove the defunct code. Fixes: ff047a87cfac ("drm/i915/icl: Correctly clear lost ctx-switch interrupts across reset for Gen11") References: fd8526e50902 ("drm/i915/execlists: Trust the CSB") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180713203529.1973-3-chris@chris-wilson.co.uk
2018-07-13drm/i915: Do not short-circuit tasklets during resetChris Wilson
Inside intel_engine_is_idle(), we flush the tasklet to ensure that is being run in a timely fashion (ksoftirqd has taught us to expect the worst). However, if we are in the middle of reset, the HW may not yet be ready to execute the submission tasklet and so we must respect the disable flag. Fixes: dd0cf235d81f ("drm/i915: Speed up idle detection by kicking the tasklets") Testcase: igt/drv_selftest/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180713203529.1973-2-chris@chris-wilson.co.uk
2018-07-13drm/i915/selftests: Include the start of each subtest in the GEM traceChris Wilson
Knowing the boundary of each subtest can be instrumental in digesting the voluminous trace output and finding the critical piece of information. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180713203529.1973-1-chris@chris-wilson.co.uk
2018-07-13drm/amdkfd: Enable Raven for KFDYong Zhao
Add DID and kfd_device_info for Raven. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-13drm/amdkfd: Optimize out some duplicated code in kfd_signal_iommu_event()Yong Zhao
memory_exception_data is already initialized for not-present faults. It only needs to be overridden for permission faults. Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-13drm/amdkfd: Workaround to accommodate Raven too many PPR issueYong Zhao
On Raven multiple PPRs can be queued up by the hardware. When the first of those requests is handled by the IOMMU driver, the memory access succeeds. After that the application may be done with the memory and unmap it. At that point the page table entries are invalidated, but there are still outstanding duplicate PPRs for those addresses. When the IOMMU driver processes those duplicate requests, it finds invalid page table entries and triggers an invalid PPR fault. As a workaround, don't signal invalid PPR faults on Raven to avoid segfaulting applications that haven't done anything wrong. As a side effect, real GPU memory access faults may go unnoticed by the application. Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-13drm/amdkfd: Avoid flooding dmesg on Raven due to IOMMU issuesYong Zhao
On Raven Invalid PPRs (peripheral page requests) can be reported because multiple PPRs can be still queued when memory is freed. Apply a rate limit to avoid flooding the log in this case. Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-13drm/amdkfd: Make SDMA engine number an ASIC-dependent variableYong Zhao
On Raven there is only one SDMA engine instead of previously assumed two, so we need to adapt our code to this new scenario. Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-13drm/amdkfd: Consolidate duplicate memory banks info in topologyYong Zhao
If there are several memory banks that has the same properties in CRAT, we aggregate them into one memory bank. This cleans up memory banks on APUs (e.g. Raven) where the CRAT reports each memory channel as a separate bank. This only confuses user mode, which only deals with virtual memory. Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-13drm/amdgpu/pp/smu7: cache smu firmware tocAlex Deucher
Rather than calculating it everytime we rebuild the toc buffer, calculate it once initially and then just copy the cached results to the vram buffer. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>