diff options
author | Michal Wilczynski <m.wilczynski@samsung.com> | 2024-10-08 12:03:27 +0200 |
---|---|---|
committer | Ulf Hansson <ulf.hansson@linaro.org> | 2024-10-10 13:30:48 +0200 |
commit | 27e8fe0da3b75520edfba9cee0030aeb5aef1505 (patch) | |
tree | b9b72367b01b1333b01ad1310dd40dace94df444 /drivers/mmc/host/sdhci-of-dwcmshc.c | |
parent | 5b35746a0fdc73063a4c7fc6208b7abd644f9ef5 (diff) |
mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling
While working with the T-Head 1520 LicheePi4A SoC, certain conditions
arose that allowed me to reproduce a race issue in the sdhci code.
To reproduce the bug, you need to enable the sdio1 controller in the
device tree file
`arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi` as follows:
&sdio1 {
bus-width = <4>;
max-frequency = <100000000>;
no-sd;
no-mmc;
broken-cd;
cap-sd-highspeed;
post-power-on-delay-ms = <50>;
status = "okay";
wakeup-source;
keep-power-in-suspend;
};
When resetting the SoC using the reset button, the following messages
appear in the dmesg log:
[ 8.164898] mmc2: Got command interrupt 0x00000001 even though no
command operation was in progress.
[ 8.174054] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 8.180503] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000005
[ 8.186950] mmc2: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000
[ 8.193395] mmc2: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
[ 8.199841] mmc2: sdhci: Present: 0x03da0000 | Host ctl: 0x00000000
[ 8.206287] mmc2: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
[ 8.212733] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x0000decf
[ 8.219178] mmc2: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000
[ 8.225622] mmc2: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003
[ 8.232068] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[ 8.238513] mmc2: sdhci: Caps: 0x3f69c881 | Caps_1: 0x08008177
[ 8.244959] mmc2: sdhci: Cmd: 0x00000502 | Max curr: 0x00191919
[ 8.254115] mmc2: sdhci: Resp[0]: 0x00001009 | Resp[1]: 0x00000000
[ 8.260561] mmc2: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
[ 8.267005] mmc2: sdhci: Host ctl2: 0x00001000
[ 8.271453] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr:
0x0000000000000000
[ 8.278594] mmc2: sdhci: ============================================
I also enabled some traces to better understand the problem:
kworker/3:1-62 [003] ..... 8.163538: mmc_request_start:
mmc2: start struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_arg=0x0 cmd_flags=0x2e1 cmd_retries=0 stop_opcode=0 stop_arg=0x0
stop_flags=0x0 stop_retries=0 sbc_opcode=0 sbc_arg=0x0 sbc_flags=0x0
sbc_retires=0 blocks=0 block_size=0 blk_addr=0 data_flags=0x0 tag=0
can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1
retune_period=0
<idle>-0 [000] d.h2. 8.164816: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x10000
intmask_p=0x18000
irq/24-mmc2-96 [000] ..... 8.164840: sdhci_thread_irq:
msg=
irq/24-mmc2-96 [000] d.h2. 8.164896: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x1
intmask_p=0x1
irq/24-mmc2-96 [000] ..... 8.285142: mmc_request_done:
mmc2: end struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_err=-110 cmd_resp=0x0 0x0 0x0 0x0 cmd_retries=0 stop_opcode=0
stop_err=0 stop_resp=0x0 0x0 0x0 0x0 stop_retries=0 sbc_opcode=0
sbc_err=0 sbc_resp=0x0 0x0 0x0 0x0 sbc_retries=0 bytes_xfered=0
data_err=0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0
hold_retune=1 retune_period=0
Here's what happens: the __mmc_start_request function is called with
opcode 5. Since the power to the Wi-Fi card, which resides on this SDIO
bus, is initially off after the reset, an interrupt SDHCI_INT_TIMEOUT is
triggered. Immediately after that, a second interrupt SDHCI_INT_RESPONSE
is triggered. Depending on the exact timing, these conditions can
trigger the following race problem:
1) The sdhci_cmd_irq top half handles the command as an error. It sets
host->cmd to NULL and host->pending_reset to true.
2) The sdhci_thread_irq bottom half is scheduled next and executes faster
than the second interrupt handler for SDHCI_INT_RESPONSE. It clears
host->pending_reset before the SDHCI_INT_RESPONSE handler runs.
3) The pending interrupt SDHCI_INT_RESPONSE handler gets called, triggering
a code path that prints: "mmc2: Got command interrupt 0x00000001 even
though no command operation was in progress."
To solve this issue, we need to clear pending interrupts when resetting
host->pending_reset. This ensures that after sdhci_threaded_irq restores
interrupts, there are no pending stale interrupts.
The behavior observed here is non-compliant with the SDHCI standard.
Place the code in the sdhci-of-dwcmshc driver to account for a
hardware-specific quirk instead of the core SDHCI code.
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 43658a542ebf ("mmc: sdhci-of-dwcmshc: Add support for T-Head TH1520")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241008100327.4108895-1-m.wilczynski@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Diffstat (limited to 'drivers/mmc/host/sdhci-of-dwcmshc.c')
-rw-r--r-- | drivers/mmc/host/sdhci-of-dwcmshc.c | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c b/drivers/mmc/host/sdhci-of-dwcmshc.c index 8999b97263af..8fd80dac11bf 100644 --- a/drivers/mmc/host/sdhci-of-dwcmshc.c +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c @@ -852,6 +852,14 @@ static void th1520_sdhci_reset(struct sdhci_host *host, u8 mask) sdhci_reset(host, mask); + /* The T-Head 1520 SoC does not comply with the SDHCI specification + * regarding the "Software Reset for CMD line should clear 'Command + * Complete' in the Normal Interrupt Status Register." Clear the bit + * here to compensate for this quirk. + */ + if (mask & SDHCI_RESET_CMD) + sdhci_writel(host, SDHCI_INT_RESPONSE, SDHCI_INT_STATUS); + if (priv->flags & FLAG_IO_FIXED_1V8) { ctrl_2 = sdhci_readw(host, SDHCI_HOST_CONTROL2); if (!(ctrl_2 & SDHCI_CTRL_VDD_180)) { |