linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-08-18	USB: Also match device drivers using the ->match vfunc	Bastien Nocera
	We only ever used the ID table matching before, but we should also support open-coded match functions. Fixes: 88b7381a939de ("USB: Select better matching USB drivers when available") Signed-off-by: Bastien Nocera <hadess@hadess.net> Cc: stable <stable@vger.kernel.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20200818110445.509668-1-hadess@hadess.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: host: xhci-tegra: fix tegra_xusb_get_phy()	JC Kuo
	tegra_xusb_get_phy() should take input argument "name". Signed-off-by: JC Kuo <jckuo@nvidia.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200811092553.657762-1-jckuo@nvidia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: host: xhci-tegra: otg usb2/usb3 port init	JC Kuo
	tegra_xusb_init_usb_phy() should initialize "otg_usb2_port" and "otg_usb3_port" with -EINVAL because "0" is a valid value represents usb2 port 0 or usb3 port 0. Signed-off-by: JC Kuo <jckuo@nvidia.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200811093143.699541-1-jckuo@nvidia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: hcd: Fix use after free in usb_hcd_pci_remove()	Andy Shevchenko
	On the removal stage we put a reference to the controller structure and if it's not used anymore it gets freed, but later we try to dereference a pointer to a member of that structure. Copy necessary field to a temporary variable to avoid use after free. Fixes: 306c54d0edb6 ("usb: hcd: Try MSI interrupts on PCI devices") Reported-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/linux-usb/30a8c4ca-64c2-863b-cfcd-0970599c0ba3@huawei.com/ Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200814182218.71957-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: typec: ucsi: Hold con->lock for the entire duration of ucsi_register_port()	Hans de Goede
	Commit 081da1325d35 ("usb: typec: ucsi: displayport: Fix a potential race during registration") made the ucsi code hold con->lock in ucsi_register_displayport(). But we really don't want any interactions with the connector to run before the port-registration process is fully complete. This commit moves the taking of con->lock from ucsi_register_displayport() into ucsi_register_port() to achieve this. Cc: stable@vger.kernel.org Fixes: 081da1325d35 ("usb: typec: ucsi: displayport: Fix a potential race during registration") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20200809141904.4317-5-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: typec: ucsi: Rework ppm_lock handling	Hans de Goede
	The ppm_lock really only needs to be hold during 2 functions: ucsi_reset_ppm() and ucsi_run_command(). Push the taking of the lock down into these 2 functions, renaming ucsi_run_command() to ucsi_send_command() which was an existing wrapper already taking the lock for its callers. This simplifies things for the callers and removes the difference between ucsi_send_command() and ucsi_run_command() which has led to various locking bugs in the past. Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20200809141904.4317-4-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: typec: ucsi: Fix 2 unlocked ucsi_run_command calls	Hans de Goede
	Fix 2 unlocked ucsi_run_command calls: 1. ucsi_handle_connector_change() contains one ucsi_send_command() call, which takes the ppm_lock for it; and one ucsi_run_command() call which relies on the caller have taking the ppm_lock. ucsi_handle_connector_change() does not take the lock, so the second (ucsi_run_command) calls should also be ucsi_send_command(). 2. ucsi_get_pdos() gets called from ucsi_handle_connector_change() which does not hold the ppm_lock, so it also must use ucsi_send_command(). This commit also adds a WARN_ON(!mutex_is_locked(&ucsi->ppm_lock)); to ucsi_run_command() to avoid similar problems getting re-introduced in the future. Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20200809141904.4317-3-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: typec: ucsi: Fix AB BA lock inversion	Hans de Goede
	Lockdep reports an AB BA lock inversion between ucsi_init() and ucsi_handle_connector_change(): AB order: 1. ucsi_init takes ucsi->ppm_lock (it runs with that locked for the duration of the function) 2. usci_init eventually end up calling ucsi_register_displayport, which takes ucsi_connector->lock BA order: 1. ucsi_handle_connector_change work is started, takes ucsi_connector->lock 2. ucsi_handle_connector_change calls ucsi_send_command which takes ucsi->ppm_lock The ppm_lock really only needs to be hold during 2 functions: ucsi_reset_ppm() and ucsi_run_command(). This commit fixes the AB BA lock inversion by making ucsi_init drop the ucsi->ppm_lock before it starts registering ports; and replacing any ucsi_run_command() calls after this point with ucsi_send_command() (which is a wrapper around run_command taking the lock while handling the command). Some of the replacing of ucsi_run_command with ucsi_send_command in the helpers used during port registration also fixes a number of code paths after registration which call ucsi_run_command() without holding the ppm_lock: 1. ucsi_altmode_update_active() call in ucsi/displayport.c 2. ucsi_register_altmodes() call from ucsi_handle_connector_change() (through ucsi_partner_change()) Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20200809141904.4317-2-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usbip: Implement a match function to fix usbip	M. Vefa Bicakci
	Commit 88b7381a939d ("USB: Select better matching USB drivers when available") introduced the use of a "match" function to select a non-generic/better driver for a particular USB device. This unfortunately breaks the operation of usbip in general, as reported in the kernel bugzilla with bug 208267 (linked below). Upon inspecting the aforementioned commit, one can observe that the original code in the usb_device_match function used to return 1 unconditionally, but the aforementioned commit makes the usb_device_match function use identifier tables and "match" virtual functions, if either of them are available. Hence, this commit implements a match function for usbip that unconditionally returns true to ensure that usbip is functional again. This change has been verified to restore usbip functionality, with a v5.7.y kernel on an up-to-date version of Qubes OS 4.0, which uses usbip to redirect USB devices between VMs. Thanks to Jonathan Dieter for the effort in bisecting this issue down to the aforementioned commit. Fixes: 88b7381a939d ("USB: Select better matching USB drivers when available") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208267 Link: https://bugzilla.redhat.com/show_bug.cgi?id=1856443 Link: https://github.com/QubesOS/qubes-issues/issues/5905 Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Cc: <stable@vger.kernel.org> # 5.7 Cc: Valentina Manea <valentina.manea.m@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Bastien Nocera <hadess@hadess.net> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20200810160017.46002-1-m.v.b@runbox.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	usb: renesas-xhci: remove version check	Vinod Koul
	Some devices in wild are reporting bunch of firmware versions, so remove the check for versions in driver Reported by: Anastasios Vacharakis <vacharakis@gmail.com> Reported by: Glen Journeay <journeay@gmail.com> Fixes: 2478be82de44 ("usb: renesas-xhci: Add ROM loader for uPD720201") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208911 Signed-off-by: Vinod Koul <vkoul@kernel.org> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200818071739.789720-1-vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	USB: lvtest: return proper error code in probe	Evgeny Novikov
	lvs_rh_probe() can return some nonnegative value from usb_control_msg() when it is less than "USB_DT_HUB_NONVAR_SIZE + 2" that is considered as a failure. Make lvs_rh_probe() return -EINVAL in this case. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Evgeny Novikov <novikov@ispras.ru> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200805090643.3432-1-novikov@ispras.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	USB: cdc-acm: rework notification_buffer resizing	Tom Rix
	Clang static analysis reports this error cdc-acm.c:409:3: warning: Use of memory after it is freed acm_process_notification(acm, (unsigned char )dr); There are three problems, the first one is that dr is not reset The variable dr is set with if (acm->nb_index) dr = (struct usb_cdc_notification )acm->notification_buffer; But if the notification_buffer is too small it is resized with if (acm->nb_size) { kfree(acm->notification_buffer); acm->nb_size = 0; } alloc_size = roundup_pow_of_two(expected_size); /* * kmalloc ensures a valid notification_buffer after a * use of kfree in case the previous allocation was too * small. Final freeing is done on disconnect. */ acm->notification_buffer = kmalloc(alloc_size, GFP_ATOMIC); dr should point to the new acm->notification_buffer. The second problem is any data in the notification_buffer is lost when the pointer is freed. In the normal case, the current data is accumulated in the notification_buffer here. memcpy(&acm->notification_buffer[acm->nb_index], urb->transfer_buffer, copy_size); When a resize happens, anything before notification_buffer[acm->nb_index] is garbage. The third problem is the acm->nb_index is not reset on a resizing buffer error. So switch resizing to using krealloc and reassign dr and reset nb_index. Fixes: ea2583529cd1 ("cdc-acm: reassemble fragmented notifications") Signed-off-by: Tom Rix <trix@redhat.com> Cc: stable <stable@vger.kernel.org> Acked-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20200801152154.20683-1-trix@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	USB: quirks: Add no-lpm quirk for another Raydium touchscreen	Kai-Heng Feng
	There's another Raydium touchscreen needs the no-lpm quirk: [ 1.339149] usb 1-9: New USB device found, idVendor=2386, idProduct=350e, bcdDevice= 0.00 [ 1.339150] usb 1-9: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 1.339151] usb 1-9: Product: Raydium Touch System [ 1.339152] usb 1-9: Manufacturer: Raydium Corporation ... [ 6.450497] usb 1-9: can't set config #1, error -110 BugLink: https://bugs.launchpad.net/bugs/1889446 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200731051622.28643-1-kai.heng.feng@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-18	USB: yurex: Fix bad gfp argument	Alan Stern
	The syzbot fuzzer identified a bug in the yurex driver: It passes GFP_KERNEL as a memory-allocation flag to usb_submit_urb() at a time when its state is TASK_INTERRUPTIBLE, not TASK_RUNNING: do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000370c7c68>] prepare_to_wait+0xb1/0x2a0 kernel/sched/wait.c:247 WARNING: CPU: 1 PID: 340 at kernel/sched/core.c:7253 __might_sleep+0x135/0x190 kernel/sched/core.c:7253 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 340 Comm: syz-executor677 Not tainted 5.8.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 panic+0x2aa/0x6e1 kernel/panic.c:231 __warn.cold+0x20/0x50 kernel/panic.c:600 report_bug+0x1bd/0x210 lib/bug.c:198 handle_bug+0x41/0x80 arch/x86/kernel/traps.c:234 exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:254 asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536 RIP: 0010:__might_sleep+0x135/0x190 kernel/sched/core.c:7253 Code: 65 48 8b 1c 25 40 ef 01 00 48 8d 7b 10 48 89 fe 48 c1 ee 03 80 3c 06 00 75 2b 48 8b 73 10 48 c7 c7 e0 9e 06 86 e8 ed 12 f6 ff <0f> 0b e9 46 ff ff ff e8 1f b2 4b 00 e9 29 ff ff ff e8 15 b2 4b 00 RSP: 0018:ffff8881cdb77a28 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8881c6458000 RCX: 0000000000000000 RDX: ffff8881c6458000 RSI: ffffffff8129ec93 RDI: ffffed1039b6ef37 RBP: ffffffff86fdade2 R08: 0000000000000001 R09: ffff8881db32f54f R10: 0000000000000000 R11: 0000000030343354 R12: 00000000000001f2 R13: 0000000000000000 R14: 0000000000000068 R15: ffffffff83c1b1aa slab_pre_alloc_hook.constprop.0+0xea/0x200 mm/slab.h:498 slab_alloc_node mm/slub.c:2816 [inline] slab_alloc mm/slub.c:2900 [inline] kmem_cache_alloc_trace+0x46/0x220 mm/slub.c:2917 kmalloc include/linux/slab.h:554 [inline] dummy_urb_enqueue+0x7a/0x880 drivers/usb/gadget/udc/dummy_hcd.c:1251 usb_hcd_submit_urb+0x2b2/0x22d0 drivers/usb/core/hcd.c:1547 usb_submit_urb+0xb4e/0x13e0 drivers/usb/core/urb.c:570 yurex_write+0x3ea/0x820 drivers/usb/misc/yurex.c:495 This patch changes the call to use GFP_ATOMIC instead of GFP_KERNEL. Reported-and-tested-by: syzbot+c2c3302f9c601a4b1be2@syzkaller.appspotmail.com Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200810182954.GB307778@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-17	Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe"	Quinn Tran
	FCP T10-PI and NVMe features are independent of each other. This patch allows both features to co-exist. This reverts commit 5da05a26b8305a625bc9d537671b981795b46dab. Link: https://lore.kernel.org/r/20200806111014.28434-12-njavali@marvell.com Fixes: 5da05a26b830 ("scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"	Saurav Kashyap
	FCoE adapter initialization failed for ISP8021 with the following patch applied. In addition, reproduction of the issue the patch originally tried to address has been unsuccessful. This reverts commit 3cb182b3fa8b7a61f05c671525494697cba39c6a. Link: https://lore.kernel.org/r/20200806111014.28434-11-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Fix null pointer access during disconnect from subsystem	Quinn Tran
	NVMEAsync command is being submitted to QLA while the same NVMe controller is in the middle of reset. The reset path has deleted the association and freed aen_op->fcp_req.private. Add a check for this private pointer before issuing the command. ... 6 [ffffb656ca11fce0] page_fault at ffffffff8c00114e [exception RIP: qla_nvme_post_cmd+394] RIP: ffffffffc0d012ba RSP: ffffb656ca11fd98 RFLAGS: 00010206 RAX: ffff8fb039eda228 RBX: ffff8fb039eda200 RCX: 00000000000da161 RDX: ffffffffc0d4d0f0 RSI: ffffffffc0d26c9b RDI: ffff8fb039eda220 RBP: 0000000000000013 R8: ffff8fb47ff6aa80 R9: 0000000000000002 R10: 0000000000000000 R11: ffffb656ca11fdc8 R12: ffff8fb27d04a3b0 R13: ffff8fc46dd98a58 R14: 0000000000000000 R15: ffff8fc4540f0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 7 [ffffb656ca11fe08] nvme_fc_start_fcp_op at ffffffffc0241568 [nvme_fc] 8 [ffffb656ca11fe50] nvme_fc_submit_async_event at ffffffffc0241901 [nvme_fc] 9 [ffffb656ca11fe68] nvme_async_event_work at ffffffffc014543d [nvme_core] 10 [ffffb656ca11fe98] process_one_work at ffffffff8b6cd437 11 [ffffb656ca11fed8] worker_thread at ffffffff8b6cdcef 12 [ffffb656ca11ff10] kthread at ffffffff8b6d3402 13 [ffffb656ca11ff50] ret_from_fork at ffffffff8c000255 -- PID: 37824 TASK: ffff8fb033063d80 CPU: 20 COMMAND: "kworker/u97:451" 0 [ffffb656ce1abc28] __schedule at ffffffff8be629e3 1 [ffffb656ce1abcc8] schedule at ffffffff8be62fe8 2 [ffffb656ce1abcd0] schedule_timeout at ffffffff8be671ed 3 [ffffb656ce1abd70] wait_for_completion at ffffffff8be639cf 4 [ffffb656ce1abdd0] flush_work at ffffffff8b6ce2d5 5 [ffffb656ce1abe70] nvme_stop_ctrl at ffffffffc0144900 [nvme_core] 6 [ffffb656ce1abe80] nvme_fc_reset_ctrl_work at ffffffffc0243445 [nvme_fc] 7 [ffffb656ce1abe98] process_one_work at ffffffff8b6cd437 8 [ffffb656ce1abed8] worker_thread at ffffffff8b6cdb50 9 [ffffb656ce1abf10] kthread at ffffffff8b6d3402 10 [ffffb656ce1abf50] ret_from_fork at ffffffff8c000255 Link: https://lore.kernel.org/r/20200806111014.28434-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Check if FW supports MQ before enabling	Saurav Kashyap
	OS boot during Boot from SAN was stuck at dracut emergency shell after enabling NVMe driver parameter. For non-MQ support the driver was enabling MQ. Add a check to confirm if FW supports MQ. Link: https://lore.kernel.org/r/20200806111014.28434-9-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba	Arun Easi
	qla_nvme_register_hba() puts out a warning when there are not enough queue pairs available for FC-NVME. Just fail the NVME registration rather than a WARNING + call Trace. Link: https://lore.kernel.org/r/20200806111014.28434-8-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set ↵	Arun Easi
	anytime ql2xextended_error_logging can now be set to 1 to get the default mask value, as opposed to at module load time only. Link: https://lore.kernel.org/r/20200806111014.28434-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Reduce noisy debug message	Quinn Tran
	Update debug level and message for ELS IOCB done. Link: https://lore.kernel.org/r/20200806111014.28434-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Fix login timeout	Quinn Tran
	Multipath errors were seen during failback due to login timeout. The remote device sent LOGO, the local host tore down the session and did relogin. The RSCN arrived indicates remote device is going through failover after which the relogin is in a 20s timeout phase. At this point the driver is stuck in the relogin process. Add a fix to delete the session as part of abort/flush the login. Link: https://lore.kernel.org/r/20200806111014.28434-5-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Indicate correct supported speeds for Mezz card	Quinn Tran
	Correct the supported speeds for 16G Mezz card. Link: https://lore.kernel.org/r/20200806111014.28434-4-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Flush I/O on zone disable	Quinn Tran
	Perform implicit logout to flush I/O on zone disable. Link: https://lore.kernel.org/r/20200806111014.28434-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Flush all sessions on zone disable	Quinn Tran
	On Zone Disable, certain switches would ignore all commands. This causes timeout for both switch scan command and abort of that command. On detection of this condition, all sessions will be shutdown. Link: https://lore.kernel.org/r/20200806111014.28434-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values	Enzo Matsumiya
	Improves readability of qla_mbx.c. Link: https://lore.kernel.org/r/20200805200546.22497-1-ematsumiya@suse.de Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: scsi_debug: Fix scp is NULL errors	Douglas Gilbert
	John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The problem was tracked down to a missing critical section on a "short circuit" path. Namely, the time to process the current command so far has already exceeded the requested command duration (i.e. the number of nanoseconds in the ndelay parameter). The random=1 parameter setting was pivotal in finding this error. The failure scenario involved first taking that "short circuit" path (due to a very short command duration) and then taking the more likely hrtimer_start() path (due to a longer command duration). With random=1 each command's duration is taken from the uniformly distributed [0..ndelay) interval. The fio utility also helped by reliably generating the error scenario at about once per minute on a RPi 4 (64 bit OS). Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.com Reported-by: John Garry <john.garry@huawei.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: zfcp: Fix use-after-free in request timeout handlers	Steffen Maier
	Before v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use timer_setup()"), we intentionally only passed zfcp_adapter as context argument to zfcp_fsf_request_timeout_handler(). Since we only trigger adapter recovery, it was unnecessary to sync against races between timeout and (late) completion. Likewise, we only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action, it was unnecessary to sync against races between timeout and (late) completion. Meanwhile the timeout handlers get timer_list as context argument and do a timer-specific container-of to zfcp_fsf_req which can have been freed. Fix it by making sure that any request timeout handlers, that might just have started before del_timer(), are completed by using del_timer_sync() instead. This ensures the request free happens afterwards. Space time diagram of potential use-after-free: Basic idea is to have 2 or more pending requests whose timeouts run out at almost the same time. req 1 timeout ERP thread req 2 timeout ---------------- ---------------- --------------------------------------- zfcp_fsf_request_timeout_handler fsf_req = from_timer(fsf_req, t, timer) adapter = fsf_req->adapter zfcp_qdio_siosl(adapter) zfcp_erp_adapter_reopen(adapter,...) zfcp_erp_strategy ... zfcp_fsf_req_dismiss_all list_for_each_entry_safe zfcp_fsf_req_complete 1 del_timer 1 zfcp_fsf_req_free 1 zfcp_fsf_req_complete 2 zfcp_fsf_request_timeout_handler del_timer 2 fsf_req = from_timer(fsf_req, t, timer) zfcp_fsf_req_free 2 adapter = fsf_req->adapter ^^^^^^^ already freed Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()") Cc: <stable@vger.kernel.org> #4.15+ Suggested-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs: No need to send Abort Task if the task in DB was cleared	Bean Huo
	If the bit corresponding to a task in the Doorbell register has been cleared, no need to poll the status of the task on the device side and to send an Abort Task TM. Instead, let it directly goto cleanup. In addition, to keep original debug output, move the goto below the debug print. Link: https://lore.kernel.org/r/20200811141859.27399-3-huobean@gmail.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs: Clean up completed request without interrupt notification	Stanley Chu
	If somehow no interrupt notification is raised for a completed request and its doorbell bit is cleared by host, UFS driver needs to cleanup its outstanding bit in ufshcd_abort(). Otherwise, system may behave abnormally in the following scenario: After ufshcd_abort() returns, this request will be requeued by SCSI layer with its outstanding bit set. Any future completed request will trigger ufshcd_transfer_req_compl() to handle all "completed outstanding bits". At this time the "abnormal outstanding bit" will be detected and the "requeued request" will be chosen to execute request post-processing flow. This is wrong because this request is still "alive". Link: https://lore.kernel.org/r/20200811141859.27399-2-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs: Improve interrupt handling for shared interrupts	Adrian Hunter
	For shared interrupts, the interrupt status might be zero, so check that first. Link: https://lore.kernel.org/r/20200811133936.19171-2-adrian.hunter@intel.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs: Fix interrupt error message for shared interrupts	Adrian Hunter
	The interrupt might be shared, in which case it is not an error for the interrupt handler to be called when the interrupt status is zero, so don't print the message unless there was enabled interrupt status. Link: https://lore.kernel.org/r/20200811133936.19171-1-adrian.hunter@intel.com Fixes: 9333d7757348 ("scsi: ufs: Fix irq return code") Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL	Adrian Hunter
	Intel EHL UFS host controller advertises auto-hibernate capability but it does not work correctly. Add a quirk for that. [mkp: checkpatch fix] Link: https://lore.kernel.org/r/20200810141024.28859-1-adrian.hunter@intel.com Fixes: 8c09d7527697 ("scsi: ufshdc-pci: Add Intel PCI IDs for EHL") Acked-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs-mediatek: Fix incorrect time to wait link status	Stanley Chu
	Fix incorrect calculation of "ms" based waiting time in function ufs_mtk_setup_clocks(). Link: https://lore.kernel.org/r/20200809055702.20140-1-stanley.chu@mediatek.com Fixes: 9006e3986f66 ("scsi: ufs-mediatek: Do not gate clocks if auto-hibern8 is not entered yet") Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs: Fix possible infinite loop in ufshcd_hold	Stanley Chu
	In ufshcd_suspend(), after clk-gating is suspended and link is set as Hibern8 state, ufshcd_hold() is still possibly invoked before ufshcd_suspend() returns. For example, MediaTek's suspend vops may issue UIC commands which would call ufshcd_hold() during the command issuing flow. Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled, then ufshcd_hold() may enter infinite loops because there is no clk-ungating work scheduled or pending. In this case, ufshcd_hold() shall just bypass, and keep the link as Hibern8 state. Link: https://lore.kernel.org/r/20200809050734.18740-1-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Co-developed-by: Andy Teng <andy.teng@mediatek.com> Signed-off-by: Andy Teng <andy.teng@mediatek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: fcoe: Fix I/O path allocation	Mike Christie
	ixgbe_fcoe_ddp_setup() can be called from the main I/O path and is called with a spin_lock held, so we have to use GFP_ATOMIC allocation instead of GFP_KERNEL. Link: https://lore.kernel.org/r/1596831813-9839-1-git-send-email-michael.christie@oracle.com cc: Hannes Reinecke <hare@suse.de> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	scsi: ufs: ti-j721e-ufs: Fix error return in ti_j721e_ufs_probe()	Jing Xiangfeng
	Fix to return error code PTR_ERR() from the error handling case instead of 0. Link: https://lore.kernel.org/r/20200806070135.67797-1-jingxiangfeng@huawei.com Fixes: 22617e216331 ("scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes") Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds
	Pull networking fixes from David Miller: "Another batch of fixes: 1) Remove nft_compat counter flush optimization, it generates warnings from the refcount infrastructure. From Florian Westphal. 2) Fix BPF to search for build id more robustly, from Jiri Olsa. 3) Handle bogus getopt lengths in ebtables, from Florian Westphal. 4) Infoleak and other fixes to j1939 CAN driver, from Eric Dumazet and Oleksij Rempel. 5) Reset iter properly on mptcp sendmsg() error, from Florian Westphal. 6) Show a saner speed in bonding broadcast mode, from Jarod Wilson. 7) Various kerneldoc fixes in bonding and elsewhere, from Lee Jones. 8) Fix double unregister in bonding during namespace tear down, from Cong Wang. 9) Disable RP filter during icmp_redirect selftest, from David Ahern" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits) otx2_common: Use devm_kcalloc() in otx2_config_npa() net: qrtr: fix usage of idr in port assignment to socket selftests: disable rp_filter for icmp_redirect.sh Revert "net: xdp: pull ethernet header off packet after computing skb->protocol" phylink: <linux/phylink.h>: fix function prototype kernel-doc warning mptcp: sendmsg: reset iter on error redux net: devlink: Remove overzealous WARN_ON with snapshots tipc: not enable tipc when ipv6 works as a module tipc: fix uninit skb->data in tipc_nl_compat_dumpit() net: Fix potential wrong skb->protocol in skb_vlan_untag() net: xdp: pull ethernet header off packet after computing skb->protocol ipvlan: fix device features bonding: fix a potential double-unregister can: j1939: add rxtimer for multipacket broadcast session can: j1939: abort multipacket broadcast session when timeout occurs can: j1939: cancel rxtimer on multipacket broadcast session complete can: j1939: fix support for multipacket broadcast message net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs' net: fddi: skfp: cfm: Remove set but unused variable 'oldstate' net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs' ...
2020-08-17	otx2_common: Use devm_kcalloc() in otx2_config_npa()	Xu Wang
	A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "devm_kcalloc". Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-17	PCI/P2PDMA: Fix build without DMA ops	Christoph Hellwig
	My commit to make DMA ops support optional missed the reference in the p2pdma code. And while the build bot didn't manage to find a config where this can happen, Matthew did. Fix this by replacing two IS_ENABLED checks with ifdefs. Fixes: 2f9237d4f6df ("dma-mapping: make support for dma ops optional") Link: https://lore.kernel.org/r/20200810124843.1532738-1-hch@lst.de Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2020-08-17	libnvdimm: KASAN: global-out-of-bounds Read in internal_create_group	Zqiang
	Because the last member of the "nvdimm_firmware_attributes" array was not assigned a null ptr, when traversal of "grp->attrs" array is out of bounds in "create_files" func. func: create_files: ->for (i = 0, attr = grp->attrs; *attr && !error; i++, attr++) ->.... BUG: KASAN: global-out-of-bounds in create_files fs/sysfs/group.c:43 [inline] BUG: KASAN: global-out-of-bounds in internal_create_group+0x9d8/0xb20 fs/sysfs/group.c:149 Read of size 8 at addr ffffffff8a2e4cf0 by task kworker/u17:10/959 CPU: 2 PID: 959 Comm: kworker/u17:10 Not tainted 5.8.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound async_run_entry_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 create_files fs/sysfs/group.c:43 [inline] internal_create_group+0x9d8/0xb20 fs/sysfs/group.c:149 internal_create_groups.part.0+0x90/0x140 fs/sysfs/group.c:189 internal_create_groups fs/sysfs/group.c:185 [inline] sysfs_create_groups+0x25/0x50 fs/sysfs/group.c:215 device_add_groups drivers/base/core.c:2024 [inline] device_add_attrs drivers/base/core.c:2178 [inline] device_add+0x7fd/0x1c40 drivers/base/core.c:2881 nd_async_device_register+0x12/0x80 drivers/nvdimm/bus.c:506 async_run_entry_fn+0x121/0x530 kernel/async.c:123 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the variable: nvdimm_firmware_attributes+0x10/0x40 Link: https://lore.kernel.org/r/20200812085501.30963-1-qiang.zhang@windriver.com Link: https://lore.kernel.org/r/20200814150509.225615-1-vaibhav@linux.ibm.com Fixes: 48001ea50d17f ("PM, libnvdimm: Add runtime firmware activation support") Reported-by: syzbot+1cf0ffe61aecf46f588f@syzkaller.appspotmail.com Reported-by: Sandipan Das <sandipan@linux.ibm.com> Reported-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2020-08-17	drm: msm: a6xx: use dev_pm_opp_set_bw to scale DDR	Sharat Masetty
	This patches replaces the previously used static DDR vote and uses dev_pm_opp_set_bw() to scale GPU->DDR bandwidth along with scaling GPU frequency. Also since the icc path voting is handled completely in the opp driver, remove the icc_path handle and its usage in the drm driver. Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-17	of/address: check for invalid range.cpu_addr	Colin Ian King
	Currently invalid CPU addresses are not being sanity checked resulting in SATA setup failure on a SynQuacer SC2A11 development machine. The original check was removed by and earlier commit, so add a sanity check back in to avoid this regression. Fixes: 7a8b64d17e35 ("of/address: use range parser for of_dma_get_range") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200817113208.523805-1-colin.king@canonical.com Signed-off-by: Rob Herring <robh@kernel.org>
2020-08-17	drm/msm/gpu: make ringbuffer readonly	Rob Clark
	The GPU has no business writing into the ringbuffer, let's make it readonly to the GPU. Fixes: 7198e6b03155 ("drm/msm: add a3xx gpu support") Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-17	drm/msm/adreno: fix updating ring fence	Rob Clark
	We need to set it to the most recent completed fence, not the most recent submitted. Otherwise we have races where we think we can retire submits that the GPU is not finished with, if the GPU doesn't manage to overwrite the seqno before we look at it. This can show up with hang recovery if one of the submits after the crashing submit also hangs after it is replayed. Fixes: f97decac5f4c ("drm/msm: Support multiple ringbuffers") Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-17	drm/msm/dpu: fix unitialized variable error	Rob Clark
	drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:817 dpu_crtc_enable() error: uninitialized symbol 'request_bandwidth'. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-17	drm/msm/dpu: Fix scale params in plane validation	Kalyan Thota
	Plane validation uses an API drm_calc_scale which will return src/dst value as a scale ratio. when viewing the range on a scale the values should fall in as Upscale ratio < Unity scale < Downscale ratio for src/dst formula Fix the min and max scale ratios to suit the API accordingly. Signed-off-by: Kalyan Thota <kalyan_t@codeaurora.org> Tested-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-17	drm/msm/dpu: Fix reservation failures in modeset	Kalyan Thota
	In TEST_ONLY commit, rm global_state will duplicate the object and request for new reservations, once they pass then the new state will be swapped with the old and will be available for the Atomic Commit. This patch fixes some of missing links in the resource reservation sequence mentioned above. 1) Creation of duplicate state in test_only commit (Rob) 2) Allocate and release the resources on every modeset. 3) Avoid allocation only when active is false. In a modeset operation, swap state happens well before disable. Hence clearing reservations in disable will cause failures in modeset enable. Allow reservations to be cleared/allocated before swap, such that only newly committed resources are pushed to HW. Changes in v1: - Move the rm release to atomic_check. - Ensure resource allocation and free happens when active is not changed i.e only when mode is changed.(Rob) Changes in v2: - Handle dpu_kms_get_global_state API failure as it may return EDEADLK (swboyd). Signed-off-by: Kalyan Thota <kalyan_t@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-08-17	drm/modeset-lock: Take the modeset BKL for legacy drivers	Daniel Vetter
	This fell off in the conversion in commit 9bcaa3fe58ab7559e71df798bcff6e0795158695 Author: Michal Orzel <michalorzel.eng@gmail.com> Date: Tue Apr 28 19:10:04 2020 +0200 drm: Replace drm_modeset_lock/unlock_all with DRM_MODESET_LOCK_ALL_* helpers but it's caught by the drm_warn_on_modeset_not_all_locked() that the legacy modeset code uses. Since this is the bkl and it's unclear what's all protected, play it safe and grab it again for legacy drivers. Unfortunately this means we need to sprinkle a few more #includes around. Also we need to add the drm_device as a parameter to the _END macro. Finally remove the mute_lock() from setcrtc, since that's now done by the macro. Cc: Alex Deucher <alexdeucher@gmail.com> References: https://gitlab.freedesktop.org/drm/amd/-/issues/1224 Fixes: 9bcaa3fe58ab ("drm: Replace drm_modeset_lock/unlock_all with DRM_MODESET_LOCK_ALL_* helpers") Cc: Michal Orzel <michalorzel.eng@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200814093842.3048472-1-daniel.vetter@ffwll.ch
2020-08-17	vfio/type1: Add proper error unwind for vfio_iommu_replay()	Alex Williamson
	The vfio_iommu_replay() function does not currently unwind on error, yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma structure to indicate IOMMU mapping. The IOMMU mappings are torn down when the domain is destroyed, but the other actions go on to cause trouble later. For example, the iommu->domain_list can be empty if we only have a non-IOMMU backed mdev attached. We don't currently check if the list is empty before getting the first entry in the list, which leads to a bogus domain pointer. If a vfio_dma entry is erroneously marked as iommu_mapped, we'll attempt to use that bogus pointer to retrieve the existing physical page addresses. This is the scenario that uncovered this issue, attempting to hot-add a vfio-pci device to a container with an existing mdev device and DMA mappings, one of which could not be pinned, causing a failure adding the new group to the existing container and setting the conditions for a subsequent attempt to explode. To resolve this, we can first check if the domain_list is empty so that we can reject replay of a bogus domain, should we ever encounter this inconsistent state again in the future. The real fix though is to add the necessary unwind support, which means cleaning up the current pinning if an IOMMU mapping fails, then walking back through the r-b tree of DMA entries, reading from the IOMMU which ranges are mapped, and unmapping and unpinning those ranges. To be able to do this, we also defer marking the DMA entry as IOMMU mapped until all entries are processed, in order to allow the unwind to know the disposition of each entry. Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>