summaryrefslogtreecommitdiff
path: root/drivers
diff options
context:
space:
mode:
authorDexuan Cui <decui@microsoft.com>2020-09-04 19:55:55 -0700
committerWei Liu <wei.liu@kernel.org>2020-09-09 11:37:19 +0000
commit19873eec7e13fda140a0ebc75d6664e57c00bfb1 (patch)
tree0b72582a468eaee0f555e5c54d4b20aa86fd519c /drivers
parentb46b4a8a57c377b72a98c7930a9f6969d2d4784e (diff)
Drivers: hv: vmbus: hibernation: do not hang forever in vmbus_bus_resume()
After we Stop and later Start a VM that uses Accelerated Networking (NIC SR-IOV), currently the VF vmbus device's Instance GUID can change, so after vmbus_bus_resume() -> vmbus_request_offers(), vmbus_onoffer() can not find the original vmbus channel of the VF, and hence we can't complete() vmbus_connection.ready_for_resume_event in check_ready_for_resume_event(), and the VM hangs in vmbus_bus_resume() forever. Fix the issue by adding a timeout, so the resuming can still succeed, and the saved state is not lost, and according to my test, the user can disable Accelerated Networking and then will be able to SSH into the VM for further recovery. Also prevent the VM in question from suspending again. The host will be fixed so in future the Instance GUID will stay the same across hibernation. Fixes: d8bd2d442bb2 ("Drivers: hv: vmbus: Resume after fixing up old primary channels") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200905025555.45614-1-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
Diffstat (limited to 'drivers')
-rw-r--r--drivers/hv/vmbus_drv.c9
1 files changed, 7 insertions, 2 deletions
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 45c954d75e70..0f2d6a60f769 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2387,7 +2387,10 @@ static int vmbus_bus_suspend(struct device *dev)
if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
wait_for_completion(&vmbus_connection.ready_for_suspend_event);
- WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0);
+ if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
+ pr_err("Can not suspend due to a previous failed resuming\n");
+ return -EBUSY;
+ }
mutex_lock(&vmbus_connection.channel_mutex);
@@ -2463,7 +2466,9 @@ static int vmbus_bus_resume(struct device *dev)
vmbus_request_offers();
- wait_for_completion(&vmbus_connection.ready_for_resume_event);
+ if (wait_for_completion_timeout(
+ &vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
+ pr_err("Some vmbus device is missing after suspending?\n");
/* Reset the event for the next suspend. */
reinit_completion(&vmbus_connection.ready_for_suspend_event);