summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs/common
AgeCommit message (Collapse)Author
2021-01-27habanalabs: add user available interrupt to hw_ipOfir Bitton
In order to support completions that arrive directly to the user, the driver needs to supply the user with the first available msix interrupt available. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: always try to use the hint addressfarah kassabri
Currently hint address is ignored in case va block page size is not power of 2. We need to support th user hint address also in this case, but only if the hint address is aligned to page size. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: add security violations dump to debugfsOfir Bitton
In order to improve driver security debuggability, we add security violations dump to debugfs. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: ignore F/W BMC errors in case no BMC presentOfir Bitton
In order to support operation mode in which BMC is not active, driver must not take BMC errors into consideration. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: Use 'dma_set_mask_and_coherent()'Christophe JAILLET
Axe 'hl_pci_set_dma_mask()' and replace it with an equivalent 'dma_set_mask_and_coherent()' call. This makes the code a bit less verbose. It also removes an erroneous comment, because 'hl_pci_set_dma_mask()' does not try to use a fall-back value. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: add driver support for internal cb schedulingOfir Bitton
In order to support scnenarios in which driver needs access to HW components but it cannot access them directly, we add support for scheduling command buffers internally. These command buffers will be transmitted upon next user command submission context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: increment ctx ref from within a cs allocationOfir Bitton
A CS must increment the relevant context reference count. We want to increment the reference inside the CS allocation function as opposed for today where we increment it outside. This is logical since we want to avoid explicitly incrementing the context every time we call the CS allocate function. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: separate common code to dedicated foldersOfir Bitton
We separate some of the common code source files to different folders for a better maintainability and testability. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: read device boot errors after cpucp is upOfir Bitton
Boot cpu can report errors in various boot stages. Current implementaion does not take into consideration errors reported in late stages, hence we will check for errors at the most late stage when fetching cpucp information. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: report correct dram size in info ioctlOfir Bitton
In case MMU is enabled, we must take MMU page size into consideration when reporting dram size to the user. This is because the MMU page size can be a value which is NOT a power-of-2 value. As a result, the total DRAM size (which is always a power-of-2 value) needed to be rounded-down. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: support non power-of-2 DRAM phys page sizesMoti Haimovski
DRAM physical page sizes depend of the amount of HBMs available in the device. this number is device-dependent and may also be subject to binning when one or more of the DRAM controllers are found to to be faulty. Such a configuration may lead to partitioning the DRAM to non-power-of-2 pages. To support this feature we also need to add infrastructure of address scarmbling. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: return dram virtual address in info ioctlAlon Mizrahi
When working with DRAM MMU, we should supply the userspace with the virtual start address of the DRAM instead of the physical one. This is because the physical one has no meaning for the user as he only knows the virtual address range. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: add ASIC property of functional HBMsOded Gabbay
The number of functional HBMs in the same ASIC can be different due to malfunctioning HBM banks. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: add debug prints for security statusOfir Bitton
In order to have more information while debugging boot issues, we should print the firmware security status at every boot stage. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: modify memory functions signaturesOmer Shpigelman
For consistency, modify all memory ioctl functions to get the ioctl arguments structure rather than the arguments themselves. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: kernel doc format in memory functionsOmer Shpigelman
Change all memory functions documentation according to kernel doc format. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: replace WARN/WARN_ON with dev_crit in driverAlon Mizrahi
Often WARN is defined in data-centers as BUG and we would like to avoid hanging the entire server on some internal error of the driver (important as it might be). Therefore, use dev_crit instead. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: report dram_page_size in hw_ip_info ioctlMoti Haimovski
Instead of having it hard-coded as a define, pass it to the user in runtime. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: allow user to pass a staged submission seqOfir Bitton
In order to support the staged submission feature, user must be allowed to use the same CS sequence for all submissions in the same staged submission. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs/gaudi: support CS with no completionOfir Bitton
As part of the staged submission feature, we need Gaudi to support command submissions that will never get a completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: Init the VM module for kernel contextOfir Bitton
In order for reserving VA ranges for kernel memory, we need to allow the VM module to be initiated with kernel context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27habanalabs: refactor MMU locks codeOhad Sharabi
remove mmu_cache_lock as it protects a section which is already protected by mmu_lock. in addition, wrap mmu cache invalidate calls in hl_vm_ctx_fini with mmu_lock. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21habanalabs: disable FW events on device removalOded Gabbay
When device is removed, we need to make sure the F/W won't send us any more events because during the remove process we disable the interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21habanalabs: fix backward compatibility of idle checkOded Gabbay
Need to take the lower 32 bits of the driver's 64-bit idle mask and put it in the legacy 32-bit variable that the userspace reads to know the idle mask. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21habanalabs: zero pci counters packet before submit to FWOfir Bitton
Driver does not zero some pci counters packets before sending to FW. This causes an out of sync PI/CI between driver and FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12misc/habana: Use FOLL_LONGTERM for userptrDaniel Vetter
These are persistent, not just for the duration of a dma operation. Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-5-daniel.vetter@ffwll.ch
2021-01-12misc/habana: Stop using frame_vector helpersDaniel Vetter
All we need are a pages array, pin_user_pages_fast can give us that directly. Plus this avoids the entire raw pfn side of get_vaddr_frames. Note that pin_user_pages_fast is a safe replacement despite the seeming lack of checking for vma->vm_flasg & (VM_IO | VM_PFNMAP). Such ptes are marked with pte_mkspecial (which pup_fast rejects in the fastpath), and only architectures supporting that support the pin_user_pages_fast fastpath. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-4-daniel.vetter@ffwll.ch
2021-01-12habanalabs: prevent soft lockup during unmapOded Gabbay
When using Deep learning framework such as tensorflow or pytorch, there are tens of thousands of host memory mappings. When the user frees all those mappings at the same time, the process of unmapping and unpinning them can take a long time, which may cause a soft lockup bug. To prevent this, we need to free the core to do other things during the unmapping process. For now, we chose to do it every 32K unmappings (each unmap is a single 4K page). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12habanalabs: fix reset process in case of failuresOded Gabbay
There are some points in the reset process where if the code fails for some reason, and the system admin tries to initiate the reset process again we will get a kernel panic. This is because there aren't any protections in different fini functions that are called during the reset process. The protections that are added in this patch make sure that if the fini functions are called multiple times, without calling init functions between them, there won't be double release of already released resources. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-29habanalabs: Fix memleak in hl_device_resetDinghao Liu
When kzalloc() fails, we should execute hl_mmu_fini() to release the MMU module. It's the same when hl_ctx_init() fails. Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: fix order of status checkOded Gabbay
When the device is in reset or needs to be reset, the disabled property is don't-care. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: register to pci shutdown callbackOded Gabbay
We need to make sure our device is idle when rebooting a virtual machine. This is done in the driver level. The firmware will later handle FLR but we want to be extra safe and stop the devices until the FLR is handled. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: add validation cs counter, fix misplaced countersAlon Mizrahi
Up until now validation errors were counted in the parsing field of the cs_counters struct, so we added a new counter and increased it when needed. In addition, there were some locations where only one of the counters was updated (ctx or aggregate) so add the second one to be updated as well. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: adjust pci controller init to new firmwareOded Gabbay
When the firmware security is enabled, the pcie_aux_dbi_reg_addr register in the PCI controller is blocked. Therefore, ignore the result of writing to this register and assume it worked. Also remove the prints on errors in the internal ELBI write function. If the security is enabled, the firmware is responsible for setting this register correctly so we won't have any problem. If the security is disabled, the write will work (unless something is totally broken at the PCI level and then the whole sequence will fail). In addition, remove a write to register pcie_aux_dbi_reg_addr+4, which was never actually needed. Moreover, PCIE_DBI registers are blocked to access from host when firmware security is enabled. Use a different register to flush the writes. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: full FW hard reset supportOfir Bitton
Driver must fetch FW hard reset capability at every FW boot stage: preboot, CPU boot, CPU application. If hard reset is triggered, driver will take into consideration only the last capability received. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: Revise comment to align with mirror list nameTomer Tayar
hw_queues_mirror was renamed to cs_mirror, so revise accordingly a comment that refers to this list. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs/gaudi: do not set EB in collective slave queuesAlon Mizrahi
We don't need to set EB on signal packets from collective slave queues as it degrades performance. Because the slaves are the network queues, the engine barrier doesn't actually guarantee that the packet has been sent. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: preboot hard reset supportOfir Bitton
FW hard reset capability indication is now moved to preboot stage. Driver will check if HW is dirty only after it validated preboot is up. If HW is dirty, driver will perform a hard reset according to the FW capability. In addition, FW defines a new message which driver need to send in order to initiate a hard reset. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: remove generic gaudi get_pll_freq functionAlon Mizrahi
As we only fetch the CPU_PLL frequency in gaudi, we don't need a generic get_pll_frequency function which takes a pll index as input Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28habanalabs: Fix a missing-braces warningTomer Tayar
Fix a compilation "missing braces around initializer" warning. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-07Merge 5.10-rc7 into char-misc-nextGreg Kroah-Hartman
We want the fixes in here, and this resolves a merge issue with drivers/misc/habanalabs/common/memory.c. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-30habanalabs: Add CB IOCTL opcode to retrieve CB informationTomer Tayar
Add a new CB IOCTL opcode that enables a user to query about a CB and get its usage count. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: Modify the cs_cnt of a CB to be atomicTomer Tayar
Modify the CS counter of a CB to be atomic, so no locking is required when it is being modified or read. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: Add mask for CS type bits in CS flagsTomer Tayar
hl_cs_sanity_checks() extracts the CS type bits of the CS flags, by masking out the non-type bits. To save the need for updating the function whenever new bits for non-type flags are added, add an explicit mask for the CS type bits. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: change messages to debug levelOded Gabbay
Some messages should be changed to debug mode as we want to keep minimal prints during normal operation of the device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: free host huge va_range if not usedOfir Bitton
If huge range is not valid, driver uses the host range also for huge page allocations, but driver never frees its allocation. This introduces a memory leak every time a user closes its context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: add missing counter updateOded Gabbay
The global CS drop-on-reset counter wasn't updated together with the context counter. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: add support for cs with timestampOfir Bitton
add support for user to request a timestamp upon cs completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: indicate to user that a cs is goneOfir Bitton
We want to indicate to the user that a certain command submission is finished long time ago and it is no longer in database. This means no further information regarding this cs can be obtained. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30habanalabs: fetch pll frequency from firmwareAlon Mizrahi
Once firmware security is enabled, driver must fetch pll frequencies through the firmware message interface instead of reading the registers directly. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>