usb: dwc2: host: Don't retry NAKed transactions right away

On rk3288-veyron devices on Chrome OS it was found that plugging in an Arduino-based USB device could cause the system to lockup, especially if the CPU Frequency was at one of the slower operating points (like 100 MHz / 200 MHz). Upon tracing, I found that the following was happening: * The USB device (full speed) was connected to a high speed hub and then to the rk3288. Thus, we were dealing with split transactions, which is all handled in software on dwc2. * Userspace was initiating a BULK IN transfer * When we sent the SSPLIT (to start the split transaction), we got an ACK. Good. Then we issued the CSPLIT. * When we sent the CSPLIT, we got back a NAK. We immediately (from the interrupt handler) started to retry and sent another SSPLIT. * The device kept NAKing our CSPLIT, so we kept ping-ponging between sending a SSPLIT and a CSPLIT, each time sending from the interrupt handler. * The handling of the interrupts was (because of the low CPU speed and the inefficiency of the dwc2 interrupt handler) was actually taking _longer_ than it took the other side to send the ACK/NAK. Thus we were _always_ in the USB interrupt routine. * The fact that USB interrupts were always going off was preventing other things from happening in the system. This included preventing the system from being able to transition to a higher CPU frequency. As I understand it, there is no requirement to retry super quickly after a NAK, we just have to retry sometime in the future. Thus one solution to the above is to just add a delay between getting a NAK and retrying the transmission. If this delay is sufficiently long to get out of the interrupt routine then the rest of the system will be able to make forward progress. Even a 25 us delay would probably be enough, but we'll be extra conservative and try to delay 1 ms (the exact amount depends on HZ and the accuracy of the jiffy and how close the current jiffy is to ticking, but could be as much as 20 ms or as little as 1 ms). Presumably adding a delay like this could impact the USB throughput, so we only add the delay with repeated NAKs. NOTE: Upon further testing of a pl2303 serial adapter, I found that this fix may help with problems there. Specifically I found that the pl2303 serial adapters tend to respond with a NAK when they have nothing to say and thus we end with this same sequence. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Julius Werner <jwerner@chromium.org> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: John Youn <johnyoun@synopsys.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
author: Douglas Anderson <dianders@chromium.org> 2017-12-12 10:30:31 -0800
committer: Felipe Balbi <felipe.balbi@linux.intel.com> 2017-12-13 11:27:53 +0200
commit: 38d2b5fb75c15923fb89c32134516a623515bce4 (patch)
tree: 42186c421271b50e0431e9fab9e5141417512905 /drivers/usb/dwc2/hcd_intr.c
parent: f2830ad455ec0fdc386baeb9d654f7095bf849da (diff)
1 files changed, 20 insertions, 0 deletions
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index 916d991b96b8..a5dfd9d8bd9a 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -53,6 +53,12 @@
 #include "core.h"
 #include "hcd.h"
 
+/*
+ * If we get this many NAKs on a split transaction we'll slow down
+ * retransmission.  A 1 here means delay after the first NAK.
+ */
+#define DWC2_NAKS_BEFORE_DELAY		3
+
 /* This function is for debug only */
 static void dwc2_track_missed_sofs(struct dwc2_hsotg *hsotg)
 {
@@ -1201,11 +1207,25 @@ static void dwc2_hc_nak_intr(struct dwc2_hsotg *hsotg,
 	/*
 	 * Handle NAK for IN/OUT SSPLIT/CSPLIT transfers, bulk, control, and
 	 * interrupt. Re-start the SSPLIT transfer.
+	 *
+	 * Normally for non-periodic transfers we'll retry right away, but to
+	 * avoid interrupt storms we'll wait before retrying if we've got
+	 * several NAKs. If we didn't do this we'd retry directly from the
+	 * interrupt handler and could end up quickly getting another
+	 * interrupt (another NAK), which we'd retry.
+	 *
+	 * Note that in DMA mode software only gets involved to re-send NAKed
+	 * transfers for split transactions, so we only need to apply this
+	 * delaying logic when handling splits. In non-DMA mode presumably we
+	 * might want a similar delay if someone can demonstrate this problem
+	 * affects that code path too.
 	 */
 	if (chan->do_split) {
 		if (chan->complete_split)
 			qtd->error_count = 0;
 		qtd->complete_split = 0;
+		qtd->num_naks++;
+		qtd->qh->want_wait = qtd->num_naks >= DWC2_NAKS_BEFORE_DELAY;
 		dwc2_halt_channel(hsotg, chan, qtd, DWC2_HC_XFER_NAK);
 		goto handle_nak_done;
 	}
author	Douglas Anderson <dianders@chromium.org>	2017-12-12 10:30:31 -0800
committer	Felipe Balbi <felipe.balbi@linux.intel.com>	2017-12-13 11:27:53 +0200
commit	38d2b5fb75c15923fb89c32134516a623515bce4 (patch)
tree	42186c421271b50e0431e9fab9e5141417512905 /drivers/usb/dwc2/hcd_intr.c
parent	f2830ad455ec0fdc386baeb9d654f7095bf849da (diff)