summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs
AgeCommit message (Collapse)Author
2021-01-27habanalabs: add ASIC property of functional HBMsOded Gabbay
The number of functional HBMs in the same ASIC can be different due to malfunctioning HBM banks. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: add debug prints for security statusOfir Bitton
In order to have more information while debugging boot issues, we should print the firmware security status at every boot stage. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: modify memory functions signaturesOmer Shpigelman
For consistency, modify all memory ioctl functions to get the ioctl arguments structure rather than the arguments themselves. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: kernel doc format in memory functionsOmer Shpigelman
Change all memory functions documentation according to kernel doc format. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: replace WARN/WARN_ON with dev_crit in driverAlon Mizrahi
Often WARN is defined in data-centers as BUG and we would like to avoid hanging the entire server on some internal error of the driver (important as it might be). Therefore, use dev_crit instead. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: report dram_page_size in hw_ip_info ioctlMoti Haimovski
Instead of having it hard-coded as a define, pass it to the user in runtime. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/goya: move mmu_prepare to context initOhad Sharabi
Currently mmu_prepare is located at context switch. Since we support a single context, no reason to reconfigure the MMU registers every context switch. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: remove duplicated gaudi packets masksOfir Bitton
As all packets use the same CTL register masks, we remove duplicated masks and use common masks instead. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: allow user to pass a staged submission seqOfir Bitton
In order to support the staged submission feature, user must be allowed to use the same CS sequence for all submissions in the same staged submission. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: support CS with no completionOfir Bitton
As part of the staged submission feature, we need Gaudi to support command submissions that will never get a completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: Init the VM module for kernel contextOfir Bitton
In order for reserving VA ranges for kernel memory, we need to allow the VM module to be initiated with kernel context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: refactor MMU locks codeOhad Sharabi
remove mmu_cache_lock as it protects a section which is already protected by mmu_lock. in addition, wrap mmu cache invalidate calls in hl_vm_ctx_fini with mmu_lock. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: update firmware boot interfaceOded Gabbay
Update to latest firmware hl_boot_if.h file. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21habanalabs: disable FW events on device removalOded Gabbay
When device is removed, we need to make sure the F/W won't send us any more events because during the remove process we disable the interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21habanalabs: fix backward compatibility of idle checkOded Gabbay
Need to take the lower 32 bits of the driver's 64-bit idle mask and put it in the legacy 32-bit variable that the userspace reads to know the idle mask. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21habanalabs: zero pci counters packet before submit to FWOfir Bitton
Driver does not zero some pci counters packets before sending to FW. This causes an out of sync PI/CI between driver and FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12misc/habana: Use FOLL_LONGTERM for userptrDaniel Vetter
These are persistent, not just for the duration of a dma operation. Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-5-daniel.vetter@ffwll.ch
2021-01-12misc/habana: Stop using frame_vector helpersDaniel Vetter
All we need are a pages array, pin_user_pages_fast can give us that directly. Plus this avoids the entire raw pfn side of get_vaddr_frames. Note that pin_user_pages_fast is a safe replacement despite the seeming lack of checking for vma->vm_flasg & (VM_IO | VM_PFNMAP). Such ptes are marked with pte_mkspecial (which pup_fast rejects in the fastpath), and only architectures supporting that support the pin_user_pages_fast fastpath. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-4-daniel.vetter@ffwll.ch
2021-01-12habanalabs: prevent soft lockup during unmapOded Gabbay
When using Deep learning framework such as tensorflow or pytorch, there are tens of thousands of host memory mappings. When the user frees all those mappings at the same time, the process of unmapping and unpinning them can take a long time, which may cause a soft lockup bug. To prevent this, we need to free the core to do other things during the unmapping process. For now, we chose to do it every 32K unmappings (each unmap is a single 4K page). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12habanalabs: fix reset process in case of failuresOded Gabbay
There are some points in the reset process where if the code fails for some reason, and the system admin tries to initiate the reset process again we will get a kernel panic. This is because there aren't any protections in different fini functions that are called during the reset process. The protections that are added in this patch make sure that if the fini functions are called multiple times, without calling init functions between them, there won't be double release of already released resources. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12habanalabs: fix dma_addr passed to dma_mmap_coherentOded Gabbay
When doing dma_alloc_coherent in the driver, we add a certain hard-coded offset to the DMA address before returning to the callee function. This offset is needed when our device use this DMA address to perform outbound transactions to the host. However, if we want to map the DMA'able memory to the user via dma_mmap_coherent(), we need to pass the original dma address, without this offset. Otherwise, we will get erronouos mapping. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-29habanalabs: Fix memleak in hl_device_resetDinghao Liu
When kzalloc() fails, we should execute hl_mmu_fini() to release the MMU module. It's the same when hl_ctx_init() fails. Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: fix order of status checkOded Gabbay
When the device is in reset or needs to be reset, the disabled property is don't-care. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: register to pci shutdown callbackOded Gabbay
We need to make sure our device is idle when rebooting a virtual machine. This is done in the driver level. The firmware will later handle FLR but we want to be extra safe and stop the devices until the FLR is handled. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: add validation cs counter, fix misplaced countersAlon Mizrahi
Up until now validation errors were counted in the parsing field of the cs_counters struct, so we added a new counter and increased it when needed. In addition, there were some locations where only one of the counters was updated (ctx or aggregate) so add the second one to be updated as well. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: retry loading TPC f/w on -EINTROded Gabbay
If loading the firmware file for the TPC f/w was interrupted, try to do it again, up to 5 times. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: adjust pci controller init to new firmwareOded Gabbay
When the firmware security is enabled, the pcie_aux_dbi_reg_addr register in the PCI controller is blocked. Therefore, ignore the result of writing to this register and assume it worked. Also remove the prints on errors in the internal ELBI write function. If the security is enabled, the firmware is responsible for setting this register correctly so we won't have any problem. If the security is disabled, the write will work (unless something is totally broken at the PCI level and then the whole sequence will fail). In addition, remove a write to register pcie_aux_dbi_reg_addr+4, which was never actually needed. Moreover, PCIE_DBI registers are blocked to access from host when firmware security is enabled. Use a different register to flush the writes. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: update comment in hl_boot_if.hOded Gabbay
Hard-reset flag is updated in many stages of the boot sequence of the firmware. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: enhance reset messageOded Gabbay
Print the initiator who performs the hard-reset for easier debugging. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: full FW hard reset supportOfir Bitton
Driver must fetch FW hard reset capability at every FW boot stage: preboot, CPU boot, CPU application. If hard reset is triggered, driver will take into consideration only the last capability received. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: disable CGM at HW initializationOded Gabbay
In case the clock gating was enabled in preboot we need to disable it at the H/W initialization stage before touching the MME/TPC registers. Otherwise, the ASIC can get stuck. If the security is enabled in the firmware level, the CGM is always disabled and the driver can't enable it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: Revise comment to align with mirror list nameTomer Tayar
hw_queues_mirror was renamed to cs_mirror, so revise accordingly a comment that refers to this list. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: do not set EB in collective slave queuesAlon Mizrahi
We don't need to set EB on signal packets from collective slave queues as it degrades performance. Because the slaves are the network queues, the engine barrier doesn't actually guarantee that the packet has been sent. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: preboot hard reset supportOfir Bitton
FW hard reset capability indication is now moved to preboot stage. Driver will check if HW is dirty only after it validated preboot is up. If HW is dirty, driver will perform a hard reset according to the FW capability. In addition, FW defines a new message which driver need to send in order to initiate a hard reset. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: remove generic gaudi get_pll_freq functionAlon Mizrahi
As we only fetch the CPU_PLL frequency in gaudi, we don't need a generic get_pll_frequency function which takes a pll index as input Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: fetch PSOC PLL frequency from F/W in goyaAlon Mizrahi
When the F/W security is enabled, goya needs to fetch the PSOC pll frequency through a dedicated interface Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: Fix a missing-braces warningTomer Tayar
Fix a compilation "missing braces around initializer" warning. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-07Merge 5.10-rc7 into char-misc-nextGreg Kroah-Hartman
We want the fixes in here, and this resolves a merge issue with drivers/misc/habanalabs/common/memory.c. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-30habanalabs: Add CB IOCTL opcode to retrieve CB informationTomer Tayar
Add a new CB IOCTL opcode that enables a user to query about a CB and get its usage count. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: Modify the cs_cnt of a CB to be atomicTomer Tayar
Modify the CS counter of a CB to be atomic, so no locking is required when it is being modified or read. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: Add mask for CS type bits in CS flagsTomer Tayar
hl_cs_sanity_checks() extracts the CS type bits of the CS flags, by masking out the non-type bits. To save the need for updating the function whenever new bits for non-type flags are added, add an explicit mask for the CS type bits. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: change messages to debug levelOded Gabbay
Some messages should be changed to debug mode as we want to keep minimal prints during normal operation of the device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: free host huge va_range if not usedOfir Bitton
If huge range is not valid, driver uses the host range also for huge page allocations, but driver never frees its allocation. This introduces a memory leak every time a user closes its context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: handle reset when f/w is in prebootOded Gabbay
Currently, if the f/w is in preboot/u-boot they don't perform the new reset mechanism. Therefore, the driver needs to reset the device. To prevent reset of PCI_IF, the driver needs to first configure the reset units. If the security is enabled, the driver can't configure the reset units. In that situation, don't reset the card. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: add missing counter updateOded Gabbay
The global CS drop-on-reset counter wasn't updated together with the context counter. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: add ull to PLL masksAlon Mizrahi
These defines are 64-bit defines so they need ull suffix. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: add support for cs with timestampOfir Bitton
add support for user to request a timestamp upon cs completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: indicate to user that a cs is goneOfir Bitton
We want to indicate to the user that a certain command submission is finished long time ago and it is no longer in database. This means no further information regarding this cs can be obtained. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs/gaudi: print ECC type fieldOded Gabbay
We have the ECC type field from the firmware but the driver didn't print it, so we need to add that field to the ECC print message. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: update firmware filesOded Gabbay
Update various firmware header files with new defines. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>