summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs
AgeCommit message (Collapse)Author
2020-11-30habanalabs/gaudi: add NIC security configurationOded Gabbay
Configure the security properties of the NIC IP. This is to prevent the user process from doing something with the NIC that he shouldn't do. e.g. crash the server, steal data, etc. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: add NIC firmware-related definitionsOded Gabbay
Add new structures and messages that the driver use to interact with the firmware to receive information and events (errors) about GAUDI's NIC. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: add NIC QMAN H/W and registers definitionsOded Gabbay
Add auto-generated header files that describe the NIC QMANs registers used by the driver. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: remove duplicate checkOded Gabbay
We already check if queue index is smaller than max queues a few lines above this check so no need to check this again. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: sync stream refactor functionsOfir Bitton
Refactor sync stream implementation by reducing function length for better readability. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: add support for multiple SOBs per monitorOfir Bitton
Support advanced monitor functionality to monitor more than a single SOB. In addition expand all CB generation functions with buffer offset in order to put in them multiple packets that are generated by different functions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: sync stream structures refactorOfir Bitton
Refactor sync stream implementation by adding more structures for better readability. In addition reducing allocated resources. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: don't init vm module if no MMUOded Gabbay
In case we are running without MMU enabled (debug mode), no need to initialize the VM module in the driver. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: minimize prints when everything is fineOded Gabbay
No need to print when the driver starts to initialize the H/W. Drivers should be silent when everything is OK. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: support multiple types of firmwaresOded Gabbay
The driver now loads the firmware in two stages. For debugging purposes we need to support situations where only the first stage firmware is loaded. Therefore, use a bitmask to determine which F/W is loaded Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: we need CPU queues for hwmonOded Gabbay
F/W can be loaded but device CPU queues disabled. In that case, HWMON should be disabled. This is only relevant when debugging Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: move mmu_prepare to context initOfir Bitton
Currently mmu_prepare is located at context switch. Since we support a single context, no reason to reconfigure the MMU registers every context switch. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: change aggregate cs counters to atomicOded Gabbay
In case we will have multiple contexts/processes, we can't just increment aggregated counters. We need to make them atomic as they can be incremented by multiple processes Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: put devices before driver removalOfir Bitton
Driver never puts its device and control_device objects, hence a memory leak is introduced every driver removal. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: free host huge va_range if not usedOfir Bitton
If huge range is not valid, driver uses the host range also for huge page allocations, but driver never frees its allocation. This introduces a memory leak every time a user closes its context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-23habanalabs/gaudi: fix missing code in ECC handlingOded Gabbay
There is missing statement and missing "break;" in the ECC handling code in gaudi.c This will cause a wrong behavior upon certain ECC interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-04habanalabs/gaudi: mask WDT error in QMANOded Gabbay
This interrupt cause is not relevant because of how the user use the QMAN arbitration mechanism. We must mask it as the log explodes with it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-04habanalabs/gaudi: move coresight mmu configOfir Bitton
We must relocate the coresight mmu configuration to the coresight flow to make it work in case the first submission is to configure the profiler. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-04habanalabs: fix kernel pointer typeArnd Bergmann
All throughout the driver, normal kernel pointers are stored as 'u64' struct members, which is kind of silly and requires casting through a uintptr_t to void* every time they are used. There is one line that missed the intermediate uintptr_t case, which leads to a compiler warning: drivers/misc/habanalabs/common/command_buffer.c: In function 'hl_cb_mmap': drivers/misc/habanalabs/common/command_buffer.c:512:44: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 512 | rc = hdev->asic_funcs->cb_mmap(hdev, vma, (void *) cb->kernel_address, Rather than adding one more cast, just fix the type and remove all the other casts. Fixes: 0db575350cb1 ("habanalabs: make use of dma_mmap_coherent") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-09-30habanalabs/gaudi: use correct define for qman initOded Gabbay
There was a copy-paste error, and the wrong define was used for initializing the QMAN. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200925171415.25663-1-oded.gabbay@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25habanalabs/gaudi: configure QMAN LDMA registers properlyOfir Bitton
LDMA registers are configured with a fixed value. We add new define set which gives the configuration a proper meaning. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25habanalabs: add notice of device not idleOded Gabbay
The device should be idle after a context is closed. If not, print a notice. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25habanalabs: add debug messages for opening/closing contextOded Gabbay
During debugging of error we sometimes need to know whether the error happened when a user context was open. Add debug prints when opening and closing user contexts. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25habanalabs: release kernel context after hw_finiOded Gabbay
Some engines use resources that belong to the kernel context (e.g. MMU mappings). In case the halt-engines doesn't work properly due to H/W restriction, we need to make sure the kernel context lives on until after the hw_fini. The hw_fini resets the ASIC after that no engine is alive and we can safely close the kernel context. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25habanalabs: correct an error messageOded Gabbay
We don't try to allocate huge pages here so remove the huge word. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: update scratchpad register mapOded Gabbay
Our firmware use some scratchpad registers in the device for different roles. Update the file to the latest version of the firmware code. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: add indication of security-enabled F/WOded Gabbay
Future F/W versions will have enhanced security measures and the driver won't be able to do certain configurations that it always did and those configurations will be done by the firmware. We use the firmware's preboot version to determine whether security measures are enabled or not. Because we need this very early in our code, the read of the preboot version is moved to the earliest possible place, right after the device's PCI initialization. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs/gaudi: fix DMA completions max outstanding to 15Oded Gabbay
This is a workaround for H/W bug H3-2116, where if there are more than 16 outstanding completions in the DMA transpose engine, there can be a deadlock in the engine. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs/gaudi: remove axi drain supportOded Gabbay
AXI drain is broken in GAUDI so remove support for enabling it. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: update firmware interface fileOded Gabbay
Add new packet to fetch PLL information from firmware. This will be needed in the future when the driver won't be able to access the PLL registers directly Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: Add an option to map CB to device MMUTomer Tayar
There are cases in which the device should access the host memory of a CB through the device MMU, and thus this memory should be mapped. The patch adds a flag to the CB IOCTL, in which a user can ask the driver to perform the mapping when creating a CB. The mapping is allowed only if a dedicated VA range was allocated for the specific ASIC. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: Save context in a command buffer objectTomer Tayar
Future changes require using a context while handling a command buffer, and thus need to save the context in the command buffer object. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: no need for DMA_SHARED_BUFFEROded Gabbay
Now that the driver no longer uses dma_buf, we can remove the select of DMA_SHARED_BUFFER from kconfig. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: allow to wait on CS without sleepOded Gabbay
The user sometimes wants to check if a CS has completed to clean resources. In that case, the user doesn't want to sleep but just to check if the CS has finished and continue with his code. Add a new definition to the API of the wait on CS. The new definition says that if the timeout is 0, the driver won't sleep at all but return immediately after checking if the CS has finished. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs/gaudi: increase timeout for boot fit loadOded Gabbay
The firmware running in the boot stage takes more time to execute due to increased security mechanisms. Therefore, we need to increase the timeout we wait for the boot fit to finish loading. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: add debugfs support for MMU with 6 HOPsMoti Haimovski
This commit modify the existing debugfs code to support future devices that have a 6 HOPs MMU implementation instead of 5 HOPs implementation. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: add num_hops to hl_mmu_propertiesMoti Haimovski
This commit adds the number of HOPs supported by the device to the device MMU properties. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: refactor MMU as device-orientedMoti Haimovski
As preparation to MMU v2, rework MMU to be device oriented instantiated according to the device in hand. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: rename mmu.c to mmu_v1.cMoti Haimovski
In the future we will have MMU v2 code, so we need to prepare the driver for it. The first step is to rename the current MMU file to mmu_v1.c. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: use smallest possible alignment for virtual addressesOmer Shpigelman
Change the acquiring of a device virtual address for mapping by using the smallest possible alignment, rather than the biggest, depending on the page size used by the user for allocating the memory. This will lower the virtual space memory consumption. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: check flag before reset because of f/w eventOded Gabbay
For consistency with GAUDI code, add check of the relevant flag in the device structure before resetting the GOYA device in case of firmware event. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: increase PQ COMP_OFFSET by one nibbleOded Gabbay
For future ASICs, we increase this field by one nibble. This field was not used by the current ASICs so this change doesn't break anything. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: Fix alignment issue in cpucp_info structureOfir Bitton
Because the device CPU compiler aligns structures to 8 bytes, struct cpucp_info has an alignment issue as some parts in the structure are not aligned to 8 bytes. It is preferred that we explicitly insert placeholders inside the structure to avoid confusion in order to validate this scenario, we printed both pointers: __u8 cpucp_version[VERSION_MAX_LEN]; (0xffff899c67ed4cbc) __le64 dram_size; (0xffff899c67ed4d40) we see difference of 132 bytes although the first array is only 128 bytes long, Meaning compiler added a 4 byte padding. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: remove unused defineOded Gabbay
Cleanup the code. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: remove unused ASIC function pointerOded Gabbay
Old function pointer that was left when the call to this function pointer was removed. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: rename ArmCP to CPU-CPOded Gabbay
There were a couple of comments where the name ArmCP was still used. Rename it to CPU-CP. In addition, rename ArmCP or ARM in log messages to "device CPU". Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: count dropped CS because max CS in-flightOded Gabbay
There is a case where the user reaches the maximum number of CS in-flight. In that case, the driver rejects the new CS of the user with EAGAIN. Count that event so the user can query the driver later to see if it happened. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: make use of dma_mmap_coherentHillf Danton
Add dma_mmap_coherent() for goya and gaudi to match their use of dma_alloc_coherent(), see the Link tag for why. Link: https://lore.kernel.org/lkml/20200609091727.GA23814@lst.de/ Cc: Christoph Hellwig <hch@lst.de> Cc: Zhang Li <li.zhang@bitmain.com> Cc: Ding Z Nan <oshack@hotmail.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: clear vm_pgoff before doing the mmapOded Gabbay
The driver use vm_pgoff to hold the CB idr handle. Before we actually call the mapping function, we need to clear the handle so there won't be any garbage left in vm_pgoff. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: restructure hl_mmapOded Gabbay
Arrange the hl_mmap code to be more structured and expandable for the future. Add better defines that describe our usage of the vm_pgoff. Note that I shamelessly took the code and defines from the amdkfd driver (my previous driver). Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>