diff options
author | Oded Gabbay <oded.gabbay@gmail.com> | 2019-02-28 10:46:12 +0200 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2019-02-28 13:04:59 +0100 |
commit | a28ce422a6d926c11d7e72a83ccaa4f9b06077ea (patch) | |
tree | 65f08f69e8eec0cf7d5272a87e2e612f975c04ec /drivers/misc/habanalabs/goya/goya.c | |
parent | 27ca384cb7c44b8b16ea43f9aed930664140360e (diff) |
habanalabs: disable CPU access on timeouts
This patch provides a workaround for a bug in the F/W where the response
time for a request from KMD may take more then 100ms. This could cause the
queue between KMD and the F/W to get out of sync.
The WA is to:
1. Increase the timeout of ALL requests to 1s.
2. In case a request isn't answered in time, mark the state as
"cpu_disabled" and prevent sending further requests from KMD to the F/W.
This will eventually lead to a heartbeat failure and hard reset of the
device.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'drivers/misc/habanalabs/goya/goya.c')
-rw-r--r-- | drivers/misc/habanalabs/goya/goya.c | 9 |
1 files changed, 7 insertions, 2 deletions
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c index 7c2edabe20bd..5780041abe32 100644 --- a/drivers/misc/habanalabs/goya/goya.c +++ b/drivers/misc/habanalabs/goya/goya.c @@ -3232,6 +3232,11 @@ int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len, if (hdev->disabled) goto out; + if (hdev->device_cpu_disabled) { + rc = -EIO; + goto out; + } + rc = hl_hw_queue_send_cb_no_cmpl(hdev, GOYA_QUEUE_ID_CPU_PQ, len, pkt_dma_addr); if (rc) { @@ -3245,8 +3250,8 @@ int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len, hl_hw_queue_inc_ci_kernel(hdev, GOYA_QUEUE_ID_CPU_PQ); if (rc == -ETIMEDOUT) { - dev_err(hdev->dev, - "Timeout while waiting for CPU packet fence\n"); + dev_err(hdev->dev, "Timeout while waiting for device CPU\n"); + hdev->device_cpu_disabled = true; goto out; } |