summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs/gaudi/gaudi.c
AgeCommit message (Collapse)Author
2021-02-08habanalabs: support fetching first available user CQOfir Bitton
User must be aware of the available CQs when it needs to use them. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-02-08habanalabs: improve communication protocol with cpucpOfir Bitton
Current messaging communictaion protocol with cpucp can get out of sync due to coherency issues. In order to improve the protocol reliability, we modify the protocol to expect a different acknowledgment for every packet sent to cpucp. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: unmask HBM interrupts after handlingOded Gabbay
As the driver does with all interrupts, we need to tell F/W to unmask the HBM interrupts after the driver handled them. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: update SyncManager interrupt handlingOded Gabbay
The firmware provides more information about SyncManager events. Adjust the code to the latest firmware interface file. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: staged submission supportOfir Bitton
We introduce a new mechanism named Staged Submission. This mechanism allows the user to send a whole CS in pieces. Each CS will not require completion rather than the last CS. Timeout timer will be triggered upon reception of the first CS in group. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: modify device_idle interfaceOhad Sharabi
Currently this API uses single 64 bits mask for engines idle indication. Recently, it was observed that more bits are needed for some ASICs. This patch modifies the use of the idle mask and the idle_extensions mask. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: add new mem ioctl op for mapping hw blocksOfir Bitton
For future ASIC support the driver allows user to map certain regions in the device's configuration space for direct access from userspace. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: fix MMU debugfs related nodesfarah kassabri
In mmu debugfs node show un-scrambled physical addresses. before read/write through data nodes, need to unscramble the physical address before using it for pci transaction. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: add user available interrupt to hw_ipOfir Bitton
In order to support completions that arrive directly to the user, the driver needs to supply the user with the first available msix interrupt available. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: add security violations dump to debugfsOfir Bitton
In order to improve driver security debuggability, we add security violations dump to debugfs. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: print sync manager SEI interrupt infoOfir Bitton
Driver must print sync manager SEI information upon receiving interrupt from FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: remove PCI access to SM blockOfir Bitton
Due to HW limitation we must remove all direct access to SM registers, in order to do that we will access SM registers using the HW QMANS. When possible and no user context is present, we can directly access the HW QMANS. Whenever there is an active user, driver will prepare a pending command buffer list which will be sent upon user submissions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: read device boot errors after cpucp is upOfir Bitton
Boot cpu can report errors in various boot stages. Current implementaion does not take into consideration errors reported in late stages, hence we will check for errors at the most late stage when fetching cpucp information. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: support non power-of-2 DRAM phys page sizesMoti Haimovski
DRAM physical page sizes depend of the amount of HBMs available in the device. this number is device-dependent and may also be subject to binning when one or more of the DRAM controllers are found to to be faulty. Such a configuration may lead to partitioning the DRAM to non-power-of-2 pages. To support this feature we also need to add infrastructure of address scarmbling. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: remove access to kernel memory using debugfsOfir Bitton
Accessing kernel allocated memory through debugfs should not be allowed as it introduces a security vulnerability. We remove the option to read/write kernel memory for all asics. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: set uninitialized symbolOfir Bitton
Initialize local variable that is returned by the function, in case it is never assigned. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: replace WARN/WARN_ON with dev_crit in driverAlon Mizrahi
Often WARN is defined in data-centers as BUG and we would like to avoid hanging the entire server on some internal error of the driver (important as it might be). Therefore, use dev_crit instead. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: remove duplicated gaudi packets masksOfir Bitton
As all packets use the same CTL register masks, we remove duplicated masks and use common masks instead. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: support CS with no completionOfir Bitton
As part of the staged submission feature, we need Gaudi to support command submissions that will never get a completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: Init the VM module for kernel contextOfir Bitton
In order for reserving VA ranges for kernel memory, we need to allow the VM module to be initiated with kernel context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: refactor MMU locks codeOhad Sharabi
remove mmu_cache_lock as it protects a section which is already protected by mmu_lock. in addition, wrap mmu cache invalidate calls in hl_vm_ctx_fini with mmu_lock. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12habanalabs: fix dma_addr passed to dma_mmap_coherentOded Gabbay
When doing dma_alloc_coherent in the driver, we add a certain hard-coded offset to the DMA address before returning to the callee function. This offset is needed when our device use this DMA address to perform outbound transactions to the host. However, if we want to map the DMA'able memory to the user via dma_mmap_coherent(), we need to pass the original dma address, without this offset. Otherwise, we will get erronouos mapping. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: retry loading TPC f/w on -EINTROded Gabbay
If loading the firmware file for the TPC f/w was interrupted, try to do it again, up to 5 times. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: adjust pci controller init to new firmwareOded Gabbay
When the firmware security is enabled, the pcie_aux_dbi_reg_addr register in the PCI controller is blocked. Therefore, ignore the result of writing to this register and assume it worked. Also remove the prints on errors in the internal ELBI write function. If the security is enabled, the firmware is responsible for setting this register correctly so we won't have any problem. If the security is disabled, the write will work (unless something is totally broken at the PCI level and then the whole sequence will fail). In addition, remove a write to register pcie_aux_dbi_reg_addr+4, which was never actually needed. Moreover, PCIE_DBI registers are blocked to access from host when firmware security is enabled. Use a different register to flush the writes. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: enhance reset messageOded Gabbay
Print the initiator who performs the hard-reset for easier debugging. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: disable CGM at HW initializationOded Gabbay
In case the clock gating was enabled in preboot we need to disable it at the H/W initialization stage before touching the MME/TPC registers. Otherwise, the ASIC can get stuck. If the security is enabled in the firmware level, the CGM is always disabled and the driver can't enable it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: do not set EB in collective slave queuesAlon Mizrahi
We don't need to set EB on signal packets from collective slave queues as it degrades performance. Because the slaves are the network queues, the engine barrier doesn't actually guarantee that the packet has been sent. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: preboot hard reset supportOfir Bitton
FW hard reset capability indication is now moved to preboot stage. Driver will check if HW is dirty only after it validated preboot is up. If HW is dirty, driver will perform a hard reset according to the FW capability. In addition, FW defines a new message which driver need to send in order to initiate a hard reset. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: remove generic gaudi get_pll_freq functionAlon Mizrahi
As we only fetch the CPU_PLL frequency in gaudi, we don't need a generic get_pll_frequency function which takes a pll index as input Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: Modify the cs_cnt of a CB to be atomicTomer Tayar
Modify the CS counter of a CB to be atomic, so no locking is required when it is being modified or read. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: change messages to debug levelOded Gabbay
Some messages should be changed to debug mode as we want to keep minimal prints during normal operation of the device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: handle reset when f/w is in prebootOded Gabbay
Currently, if the f/w is in preboot/u-boot they don't perform the new reset mechanism. Therefore, the driver needs to reset the device. To prevent reset of PCI_IF, the driver needs to first configure the reset units. If the security is enabled, the driver can't configure the reset units. In that situation, don't reset the card. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: print ECC type fieldOded Gabbay
We have the ECC type field from the firmware but the driver didn't print it, so we need to add that field to the ECC print message. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: gaudi_ctx_fini() can be statickernel test robot
Make a function in gaudi.c to be static Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: fetch pll frequency from firmwareAlon Mizrahi
Once firmware security is enabled, driver must fetch pll frequencies through the firmware message interface instead of reading the registers directly. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: mmu map wrapper for sizes larger than a pageOfir Bitton
We introduce a new wrapper which allows us to mmu map any size to any host va_range available. In addition we remove duplicated code from various places in driver and using this new wrapper instead. This wrapper supports mapping only contiguous physical memory blocks and will be used for mappings that are done to the driver ASID. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: align to new FW reset schemeOfir Bitton
As part of the security effort in which FW will be handling sensitive HW registers, hard reset flow will be done by FW and will be triggered by driver. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: firmware returns 64bit argumentAlon Mizrahi
F/W message returns 64bit value but up until now we casted it to a 32bit variable, instead of receiving 64bit in the first place. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: support reserving aligned va blockOfir Bitton
Add support for reserving va block with alignment different than page size. This is a pre-requisite for allocations needed in future ASICs Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: fetch HBM ecc info from FWOfir Bitton
Once FW security is enabled there is no access to HBM ecc registers, need to read values from FW using a dedicated interface. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: fetch hard reset capability from FWOfir Bitton
Driver must fetch FW hard reset capability during boot time, in order to skip the hard reset flow if necessary. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: move asic property to correct structureOded Gabbay
Whether an ASIC has MMU towards its DRAM is an ASIC property, so move it to the asic fixed properties structure. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: use host va range for internal poolsOfir Bitton
Instead of using a dedicated va range for each internal pool, we introduce a new way for reserving a va block from an existing va range. This is a more generic way of reserving va blocks for future use. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: move HW dirty check to a proper locationOfir Bitton
Driver must verify if HW is dirty before trying to fetch preboot information. Hence, we move this validation to a prior stage of the boot sequence. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: remove pcie_en strap toggleIgor Grinberg
Since the very large grace period is over and this functionality prevents us to implement the new reset sequence and apply security settings, we need to remove the code toggling the PCIE_EN bit in the straps register. Remove it for good. Signed-off-by: Igor Grinberg <igrinberg@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: remove duplicate printOded Gabbay
We print twice the firmware status regarding security, once in common code and once in asic code. Remove the print in asic code and leave the common code print. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: reset device upon fw read failurefarah kassabri
failure in reading pre-boot verion is not handled correctly, upon failure we need to reset the device in order to be able to reinstall the driver. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: Move repeatedly included headers to habanalabs.hTomer Tayar
Several header files are repeatedly included in many files. Move these files to habanalabs.h which is included by all. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: fetch PLL info from FWOfir Bitton
Once FW security is enabled there is no access to PLL registers, need to read values from FW using a dedicated interface. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: scrub all memory upon closing FDfarah kassabri
In cases of multi-tenants, administrators may want to prevent data leakage between users running on the same device one after another. To do that the driver can scrub the internal memory (both SRAM and DRAM) after a user finish to use the memory. Because in GAUDI the driver allows only one application to use the device at a time, it can scrub the memory when user app close FD. In future devices where we have MMU on the DRAM, we can scrub the DRAM memory with a finer granularity (page granularity) when the user allocates the memory. This feature is not supported in Goya. To allow users that want to debug their applications, we add a kernel module parameter to load the driver with this feature disabled. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>