summaryrefslogtreecommitdiff
path: root/drivers/accel
AgeCommit message (Collapse)Author
2023-06-08accel/habanalabs: fix bug of not fetching addr_dec infoOfir Bitton
addr_dec info should always be fetched, regardless of cause value. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: remove sim codeOded Gabbay
There were a few places where simulator only code got into the upstream. Remove those places that can confuse other developers. Fixes: 2a0a839b6a28 ("habanalabs: extend fatal messages to contain PCI info") Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: mask part of hmmu page fault captured addressDani Liberman
When receiving page fault from hmmu, the captured address is scrambled both by HW and by driver. The driver part is unscrambled but the HW part isn't getting unscrambled. To avoid declaring wrong address, the HW scrambled part will be masked. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: update state when loading boot fitKoby Elbaz
Any FW component we load must be followed by a corresponding state update. However, it seems that so far we skipped doing so for the bootfit case, so fix that. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: print qman data on error only for lower qmanTomer Tayar
By default, the upper QMANs are not used, and instead engines ARCs access the lower QMANs directly. Errors for upper QMANs are therefore not expected, and the debug print of the PQ entries is not needed. Modify the QMAN debug data print on errors to include only information for the lower QMAN. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: use lower QM in QM errors handlingTomer Tayar
The QMAN GLBL_ERR_STS_4 register has indications for errors also in the lower CQ and the ARC CQ, and not just for errors in the lower CP. Modify the relevant define/struct and the related print to use "lower QM" instead of "lower CP". Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: use binning info when handling razwiDani Liberman
When receiving sei interrupt from tpc or decoder, we need to check the binning mask because if the engine is binned, the razwi info won't be in the router of the binned engine, instead will be in the router of the substitute engine. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: remove support for mmu disableOfir Bitton
As mmu disable mode is only used for bring-up stages, let's remove this option and all code related to it. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: upon DMA errors, use FW-extracted error causeKoby Elbaz
Initially, the driver used to read the error cause data directly from the ASIC. However, the FW now clears it before the driver could read it. Therefore we should use the error cause data that is extracted by the FW. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: print max timeout value on CS stuckOded Gabbay
If a workload got stuck, we print an error to the kernel log about it. Add to that print the configured max timeout value, as that value is not fixed between ASICs and in addition it can be configured using a kernel module parameter. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-06-08accel/habanalabs: align to latest firmware specsOded Gabbay
Update the firmware common interface files with the latest version. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-06-08accel/habanalabs: fix mem leak in capture user mappingsMoti Haimovski
This commit fixes a memory leak caused when clearing the user_mappings info when a new context is opened immediately after user_mapping is captured and a hard reset is performed. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: set unused bit as reservedOded Gabbay
Get latest f/w gaudi2 interface file which marks unused bist_need_iatu_config bit in cold_rst_data structure as reserved bit. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-06-08accel/habanalabs: rename security functions related argumentsKoby Elbaz
Make the argument names specify the registers array represent registers that should be unsecured so the user can access them. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: fix gaudi2_get_tpc_idle_status() returnDan Carpenter
The gaudi2_get_tpc_idle_status() function returned the incorrect variable so it always returned true. Fixes: d85f0531b928 ("accel/habanalabs: break is_idle function into per-engine sub-routines") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: Fix some kernel-doc commentsYang Li
Make the description of @regs_range_array and @regs_range_array_size to @user_regs_range_array and @user_regs_range_array_size to silence the warnings: drivers/accel/habanalabs/common/security.c:506: warning: Function parameter or member 'user_regs_range_array' not described in 'hl_init_pb_ranges_single_dcore' drivers/accel/habanalabs/common/security.c:506: warning: Function parameter or member 'user_regs_range_array_size' not described in 'hl_init_pb_ranges_single_dcore' drivers/accel/habanalabs/common/security.c:506: warning: Excess function parameter 'regs_range_array' description in 'hl_init_pb_ranges_single_dcore' drivers/accel/habanalabs/common/security.c:506: warning: Excess function parameter 'regs_range_array_size' description in 'hl_init_pb_ranges_single_dcore' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=4940 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: always fetch pci addr_dec error infoOfir Bitton
Due to missing indication of address decode source (LBW/HBW bus), we should always try and fetch extended information. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: fix a static warning - 'dubious: x & !y'Koby Elbaz
Use a straight forward approach to get a conditional result. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: poll for device status update following WFE cmdKoby Elbaz
Currently, we rely on COMMS protocol's ack to verify that WFE command has been acknowledged by the FW. However, this does not guarantee that the device status has been updated. Although unlikely, this could trigger a race since the driver expects the device to be halted at that stage, but it might not be. Therefore, we increase WFE's robustness by polling on the status register that will be updated once the device is actually halted. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: expose debugfs files laterTomer Tayar
Currently the debugfs root folder and files for a device are created at an early step, before the device initialization and before the char device and sysfs files are exposed to user. As there is no real reason not to do it together with the device creation, postpone it to be done right afterwards. The initialization of the debugfs entry structure is left in its current position because it is used before creating the files. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: add pci health check during heartbeatOfir Bitton
Currently upon a heartbeat failure, we don't know if the failure is due to firmware hang or due to a bad PCI link. Hence, we are reading a PCI config space register with a known value (vendor ID) so we will know which of the two possibilities caused the heartbeat failure. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: add missing tpc interrupt infoDafna Hirschfeld
For some reason the last possible tpc interrupt cause in gaudi2_tpc_interrupts_cause is missing from the code. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: refactor abort of completions and waitsKoby Elbaz
Aborting CS completions should be in command_submission.c but aborting waiting for user interrupts should be in device.c. This separation is also for adding more abort operations in the future. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: minimize encapsulation signal mutex lock timeKoby Elbaz
Sync Stream Encapsulated Signal Handlers can be managed from different contexts, and as such they are protected via a spin_lock. However, spin_lock was unnecessarily protecting a larger code section than really needed, covering a sleepable code section as well. Since spin_lock disables preemption, it could lead to sleeping in atomic context. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/ivpu: Fix sporadic VPU boot failureAndrzej Kacprowski
Wait for AON bit in HOST_SS_CPR_RST_CLR to return 0 before starting VPUIP power up sequence, otherwise the VPU device may sporadically fail to boot. An error in power up sequence is propagated to the runtime power management - the device will be in an error state until the VPU driver is reloaded. Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU") Cc: stable@vger.kernel.org # 6.3.x Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230607094502.388489-1-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Do not use mutex_lock_interruptibleStanislaw Gruszka
If we get signal when waiting for the mmu->lock we do not invalidate current MMU configuration that might result in undefined behavior. Additionally there is little or no benefit on break waiting for ipc->lock. In current code base, we keep this lock for short periods. Fixes: 263b2ba5fc93 ("accel/ivpu: Add Intel VPU MMU support") Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230525103818.877590-2-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Do not trigger extra VPU reset if the VPU is idleAndrzej Kacprowski
Turning off the PLL and entering D0i3 will reset the VPU so an explicit IP reset is redundant. But if the VPU is active, it may interfere with PLL disabling and to avoid that, we have to issue an additional IP reset to silence the VPU before turning off the PLL. Fixes: a8fed6d1e0b9 ("accel/ivpu: Fix power down sequence") Cc: stable@vger.kernel.org # 6.3.x Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230525103818.877590-1-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Mark 64 kB contiguous areas as contiguous in PTEsKarol Wachowski
Whenever KMD maps region larger than 64kB that is both aligned and contiguous, set contiguous bit (52) in MMU PTE descriptor for each page in that region. This allows to treat 16 contiguous pages as one and reduce number of MMU page walks required which results in lower latency. Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-6-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Rename and cleanup MMU600 page tablesKarol Wachowski
Simplify and unify naming convention in MMU600 page tables configuration. All DMA addresses in page tables directly accessed by VPU are called with _dma sufix and all CPU pointers to those page tables have _ptr sufix. Base pointers used to do a page walk on the CPU have corresponding names: pud_ptrs (pointers used to get access to PUD DMA) pmd_ptrs (pointers used to get access to PMD DMA) pte_ptrs (pointers used to get access to PTE DMA) with the following convention: u64 *pud_dma_ptr = pud_ptrs[pgd_idx]; *pud_dma_ptr = pud_dma; u64 *pmd_dma_ptr = pmd_ptrs[pgd_idx][pud_idx]; *pmd_dma_ptr = pmd_dma; u64 *pte_dma_ptr = pte_ptrs[pgd_idx][pud_idx][pmd_idx]; *pte_dma_ptr = pte_dma; On the way change to coherent dma allocation, _wc is only valid on ARM and was used by mistake. Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-5-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Make DMA bit mask HW specificKarol Wachowski
Future devices will have different dma bit mask, make it hw specific. Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-4-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Add MMU support for 4 level page mappingsKarol Wachowski
Program additional fourth level required for mappings with VA above 38bits. Co-developed-by: Raymond Tan <raymond.tan@intel.com> Signed-off-by: Raymond Tan <raymond.tan@intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-3-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Remove configuration of MMU TBU1 and TBU3Karol Wachowski
MTL HW only uses StreamId0 and StreamId3 that map to TBU0 and TBU2. Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-2-stanislaw.gruszka@linux.intel.com
2023-06-08accel/ivpu: Use struct_size()Christophe JAILLET
Use struct_size() instead of hand-writing it. It is less verbose, more robust and more informative. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Marco Pagani <marpagan@redhat.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/0ae53be873c27c9a8740c4fe6d8e7cd1b1224994.1685366864.git.christophe.jaillet@wanadoo.fr
2023-06-06accel/ivpu: Reserve all non-command bo's using DMA_RESV_USAGE_BOOKKEEPStanislaw Gruszka
Use DMA_RESV_USAGE_BOOKKEEP reservation for buffer objects, except for command buffers for which we use DMA_RESV_USAGE_WRITE (since VPU can write to command buffer context save area). Fixes: 0ec8671837a6 ("accel/ivpu: Fix S3 system suspend when not idle") Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230413063810.3167511-1-stanislaw.gruszka@linux.intel.com
2023-06-06accel/ivpu: ivpu_ipc needs GENERIC_ALLOCATORRandy Dunlap
Drivers that use the gen_pool*() family of functions should select GENERIC_ALLOCATOR to prevent build errors like these: ld: drivers/accel/ivpu/ivpu_ipc.o: in function `gen_pool_free': include/linux/genalloc.h:172: undefined reference to `gen_pool_free_owner' ld: drivers/accel/ivpu/ivpu_ipc.o: in function `gen_pool_alloc_algo': include/linux/genalloc.h:138: undefined reference to `gen_pool_alloc_algo_owner' ld: drivers/accel/ivpu/ivpu_ipc.o: in function `gen_pool_free': include/linux/genalloc.h:172: undefined reference to `gen_pool_free_owner' ld: drivers/accel/ivpu/ivpu_ipc.o: in function `ivpu_ipc_init': drivers/accel/ivpu/ivpu_ipc.c:441: undefined reference to `devm_gen_pool_create' ld: drivers/accel/ivpu/ivpu_ipc.o: in function `gen_pool_add_virt': include/linux/genalloc.h:104: undefined reference to `gen_pool_add_owner' Fixes: 5d7422cfb498 ("accel/ivpu: Add IPC driver and JSM messages") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/all/202305221206.1TaugDKP-lkp@intel.com/ Cc: Oded Gabbay <ogabbay@kernel.org> Cc: dri-devel@lists.freedesktop.org Cc: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Cc: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com> Cc: Krystian Pradzynski <krystian.pradzynski@linux.intel.com> Cc: Jeffrey Hugo <quic_jhugo@quicinc.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230526044519.13441-1-rdunlap@infradead.org
2023-06-05accel/habanalabs: call to HW/FW err returns 0 when no events existMoti Haimovski
This commit modifies the call to retrieve HW or FW error events to return success when no events are pending, as done in the calls to other events. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: unsecure TPC bias registersOfir Bitton
User needs to be able to perform downcast / upcast of fp8_143 dtype. Hence bias register needs to be accessed by the user. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: do soft-reset using cpucp packetDafna Hirschfeld
This is done depending on the FW version. The cpucp method is preferable and saves scratchpads resource. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: check fw version using sw versionDafna Hirschfeld
The fw inner version is less trustable, instead use the fw general sw release version. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: extract and save the FW's SW major/minor/sub-minorDafna Hirschfeld
It is not always possible to know the FW's SW version from the inner FW version. Therefore we should extract the general SW version in addition to the FW version and use it in functions like 'hl_is_fw_ver_below_1_9' etc. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: rename fw_{major/minor}_version to fw_inner_{major/minor}_verDafna Hirschfeld
We later want to add fields for Firmware SW version. The current extracted FW version is the inner FW versioning so the new name is better and also better differentiate from the FW's SW version. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: add helper to extract the FW major/minorDafna Hirschfeld
the helper is extract_u32_until_given_char and can later be used to also get the major/minor of the sw version. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: fix bug in free scratchpad memoryMoti Haimovski
This commit fixes a bug in Gaudi2 when freeing the scratchpad memory in case software init fails. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: remove commented code that won't be usedKoby Elbaz
Once it was decided that these security settings are to be done by FW rather than by the driver, there's no reason to keep them in the code. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: allow user to modify EDMA RL registerRakesh Ughreja
EDMA transpose workload requires to signal for every activation. User FW sends all the dummy signals to RD_LBW_RATE_LIM_CFG, to save lbw bandwidth. We need the user to be able to access that register to configure it. Signed-off-by: Rakesh Ughreja <rughreja@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: ignore false positive razwiTal Cohen
In Gaudi2 asic, PSOC RAZWI may cause in HBW or LBW. The address that caused the error is read from HW register and printed by the Driver. There are cases where the Driver receives an indication on PSOC RAZWI error but the address value is zero. In that case, the indication is a false positive. The Driver should not "count" a PSOC RAZWI event error when the caused the address is zeroed. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: remove variable gaudi_irq_nameTom Rix
gcc with W=1 reports drivers/accel/habanalabs/gaudi/gaudi.c:117:19: error: ‘gaudi_irq_name’ defined but not used [-Werror=unused-const-variable=] 117 | static const char gaudi_irq_name[GAUDI_MSI_ENTRIES][GAUDI_MAX_STRING_LEN] = { | ^~~~~~~~~~~~~~ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-05-23accel/qaic: Fix NNC message corruptionJeffrey Hugo
If msg_xfer() is unable to queue part of a NNC message because the MHI ring is full, it will attempt to give the QSM some time to drain the queue. However, if QSM fails to make any room, msg_xfer() will fail and tell the caller to try again. This is problematic because part of the message may have been committed to the ring and there is no mechanism to revoke that content. This will cause QSM to receive a corrupt message. The better way to do this is to check if the ring has enough space for the entire message before committing any of the message. Since msg_xfer() is under the cntl_mutex no one else can come in and consume the space. Fixes: 129776ac2e38 ("accel/qaic: Add control path") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-6-quic_jhugo@quicinc.com
2023-05-23accel/qaic: Grab ch_lock during QAIC_ATTACH_SLICE_BOPranjal Ramajor Asha Kanojiya
During QAIC_ATTACH_SLICE_BO, we associate a BO to its DBC. We need to grab the dbc->ch_lock to make sure that DBC does not goes away while QAIC_ATTACH_SLICE_BO is still running. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-5-quic_jhugo@quicinc.com
2023-05-23accel/qaic: Flush the transfer list againPranjal Ramajor Asha Kanojiya
Before calling synchronize_srcu() we clear the transfer list, this is to allow all the QAIC_WAIT_BO callers to exit otherwise the system could deadlock. There could be a corner case where more elements get added to transfer list after we have flushed it. Re-flush the transfer list once all the holders of dbc->ch_lock have completed execution i.e. synchronize_srcu() is complete. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-4-quic_jhugo@quicinc.com