Merge tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie: "This is a set of fixes that have queued up, I think I might have another pull with some more before rc1 but I'd like to dequeue what I have now just in case Easter is more eggciting that expected. The main thing in here is a fix for a longstanding nouveau power management issues on certain laptops, it should help runtime suspend/resume for a lot of people. There is also a reverted patch for some drm_mm behaviour in atomic contexts. Summary: core: - revert drm_mm atomic patch - dt binding fixes fbcon: - null ptr error fix i915: - GVT fixes nouveau: - runpm fix - svm fixes amdgpu: - HDCP fixes - gfx10 fix - Misc display fixes - BACO fixes amdkfd: - Fix memory leak vboxvideo: - remove conflicting fbs vc4: - mode validation fix xen: - fix PTR_ERR usage" * tag 'drm-next-2020-04-08' of git://anongit.freedesktop.org/drm/drm: (41 commits) drm/nouveau/kms/nv50-: wait for FIFO space on PIO channels drm/nouveau/nvif: protect waits against GPU falling off the bus drm/nouveau/nvif: access PTIMER through usermode class, if available drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init drm/nouveau: workaround runpm fail by disabling PCI power management on certain intel bridges drm/nouveau/svm: remove useless SVM range check drm/nouveau/svm: check for SVM initialized before migrating drm/nouveau/svm: fix vma range check for migration drm/nouveau: remove checks for return value of debugfs functions drm/nouveau/ttm: evict other IO mappings when running out of BAR1 space drm/amdkfd: kfree the wrong pointer drm/amd/display: increase HDCP authentication delay drm/amd/display: Correctly cancel future watchdog and callback events drm/amd/display: Don't try hdcp1.4 when content_type is set to type1 drm/amd/powerplay: move the ASIC specific nbio operation out of smu_v11_0.c drm/amd/powerplay: drop redundant BIF doorbell interrupt operations drm/amd/display: Fix dcn21 num_states drm/amd/display: Enable BT2020 in COLOR_ENCODING property drm/amd/display: LFC not working on 2.0x range monitors (v2) drm/amd/display: Support plane level CTM ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2020-04-07 20:24:34 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2020-04-07 20:24:34 -0700
commit: f5e94d10e4c468357019e5c28d48499f677b284f (patch)
tree: a0e3d199648d5c2d3c8d89532e406a0eca218012 /drivers/gpu/drm/nouveau/nouveau_drm.c
parent: 9ebe5422ad6c0309d3a2d4cd099b8410d2b6c6b0 (diff)
parent: 12ab316ced2c5f32ced0e6300a054db644b5444a (diff)
1 files changed, 63 insertions, 0 deletions
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 6b1629c14dd7..ca4087f5a15b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -618,6 +618,64 @@ nouveau_drm_device_fini(struct drm_device *dev)
 	kfree(drm);
 }
 
+/*
+ * On some Intel PCIe bridge controllers doing a
+ * D0 -> D3hot -> D3cold -> D0 sequence causes Nvidia GPUs to not reappear.
+ * Skipping the intermediate D3hot step seems to make it work again. This is
+ * probably caused by not meeting the expectation the involved AML code has
+ * when the GPU is put into D3hot state before invoking it.
+ *
+ * This leads to various manifestations of this issue:
+ *  - AML code execution to power on the GPU hits an infinite loop (as the
+ *    code waits on device memory to change).
+ *  - kernel crashes, as all PCI reads return -1, which most code isn't able
+ *    to handle well enough.
+ *
+ * In all cases dmesg will contain at least one line like this:
+ * 'nouveau 0000:01:00.0: Refused to change power state, currently in D3'
+ * followed by a lot of nouveau timeouts.
+ *
+ * In the \_SB.PCI0.PEG0.PG00._OFF code deeper down writes bit 0x80 to the not
+ * documented PCI config space register 0x248 of the Intel PCIe bridge
+ * controller (0x1901) in order to change the state of the PCIe link between
+ * the PCIe port and the GPU. There are alternative code paths using other
+ * registers, which seem to work fine (executed pre Windows 8):
+ *  - 0xbc bit 0x20 (publicly available documentation claims 'reserved')
+ *  - 0xb0 bit 0x10 (link disable)
+ * Changing the conditions inside the firmware by poking into the relevant
+ * addresses does resolve the issue, but it seemed to be ACPI private memory
+ * and not any device accessible memory at all, so there is no portable way of
+ * changing the conditions.
+ * On a XPS 9560 that means bits [0,3] on \CPEX need to be cleared.
+ *
+ * The only systems where this behavior can be seen are hybrid graphics laptops
+ * with a secondary Nvidia Maxwell, Pascal or Turing GPU. It's unclear whether
+ * this issue only occurs in combination with listed Intel PCIe bridge
+ * controllers and the mentioned GPUs or other devices as well.
+ *
+ * documentation on the PCIe bridge controller can be found in the
+ * "7th Generation Intel® Processor Families for H Platforms Datasheet Volume 2"
+ * Section "12 PCI Express* Controller (x16) Registers"
+ */
+
+static void quirk_broken_nv_runpm(struct pci_dev *pdev)
+{
+	struct drm_device *dev = pci_get_drvdata(pdev);
+	struct nouveau_drm *drm = nouveau_drm(dev);
+	struct pci_dev *bridge = pci_upstream_bridge(pdev);
+
+	if (!bridge || bridge->vendor != PCI_VENDOR_ID_INTEL)
+		return;
+
+	switch (bridge->device) {
+	case 0x1901:
+		drm->old_pm_cap = pdev->pm_cap;
+		pdev->pm_cap = 0;
+		NV_INFO(drm, "Disabling PCI power management to avoid bug\n");
+		break;
+	}
+}
+
 static int nouveau_drm_probe(struct pci_dev *pdev,
 			     const struct pci_device_id *pent)
 {
@@ -699,6 +757,7 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
 	if (ret)
 		goto fail_drm_dev_init;
 
+	quirk_broken_nv_runpm(pdev);
 	return 0;
 
 fail_drm_dev_init:
@@ -734,7 +793,11 @@ static void
 nouveau_drm_remove(struct pci_dev *pdev)
 {
 	struct drm_device *dev = pci_get_drvdata(pdev);
+	struct nouveau_drm *drm = nouveau_drm(dev);
 
+	/* revert our workaround */
+	if (drm->old_pm_cap)
+		pdev->pm_cap = drm->old_pm_cap;
 	nouveau_drm_device_remove(dev);
 	pci_disable_device(pdev);
 }
author	Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 20:24:34 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2020-04-07 20:24:34 -0700
commit	f5e94d10e4c468357019e5c28d48499f677b284f (patch)
tree	a0e3d199648d5c2d3c8d89532e406a0eca218012 /drivers/gpu/drm/nouveau/nouveau_drm.c
parent	9ebe5422ad6c0309d3a2d4cd099b8410d2b6c6b0 (diff)
parent	12ab316ced2c5f32ced0e6300a054db644b5444a (diff)