linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2021-01-27	habanalabs: add user available interrupt to hw_ip	Ofir Bitton
	In order to support completions that arrive directly to the user, the driver needs to supply the user with the first available msix interrupt available. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: always try to use the hint address	farah kassabri
	Currently hint address is ignored in case va block page size is not power of 2. We need to support th user hint address also in this case, but only if the hint address is aligned to page size. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: add security violations dump to debugfs	Ofir Bitton
	In order to improve driver security debuggability, we add security violations dump to debugfs. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: ignore F/W BMC errors in case no BMC present	Ofir Bitton
	In order to support operation mode in which BMC is not active, driver must not take BMC errors into consideration. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: Use 'dma_set_mask_and_coherent()'	Christophe JAILLET
	Axe 'hl_pci_set_dma_mask()' and replace it with an equivalent 'dma_set_mask_and_coherent()' call. This makes the code a bit less verbose. It also removes an erroneous comment, because 'hl_pci_set_dma_mask()' does not try to use a fall-back value. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: add driver support for internal cb scheduling	Ofir Bitton
	In order to support scnenarios in which driver needs access to HW components but it cannot access them directly, we add support for scheduling command buffers internally. These command buffers will be transmitted upon next user command submission context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: increment ctx ref from within a cs allocation	Ofir Bitton
	A CS must increment the relevant context reference count. We want to increment the reference inside the CS allocation function as opposed for today where we increment it outside. This is logical since we want to avoid explicitly incrementing the context every time we call the CS allocate function. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: separate common code to dedicated folders	Ofir Bitton
	We separate some of the common code source files to different folders for a better maintainability and testability. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: read device boot errors after cpucp is up	Ofir Bitton
	Boot cpu can report errors in various boot stages. Current implementaion does not take into consideration errors reported in late stages, hence we will check for errors at the most late stage when fetching cpucp information. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: report correct dram size in info ioctl	Ofir Bitton
	In case MMU is enabled, we must take MMU page size into consideration when reporting dram size to the user. This is because the MMU page size can be a value which is NOT a power-of-2 value. As a result, the total DRAM size (which is always a power-of-2 value) needed to be rounded-down. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: support non power-of-2 DRAM phys page sizes	Moti Haimovski
	DRAM physical page sizes depend of the amount of HBMs available in the device. this number is device-dependent and may also be subject to binning when one or more of the DRAM controllers are found to to be faulty. Such a configuration may lead to partitioning the DRAM to non-power-of-2 pages. To support this feature we also need to add infrastructure of address scarmbling. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: return dram virtual address in info ioctl	Alon Mizrahi
	When working with DRAM MMU, we should supply the userspace with the virtual start address of the DRAM instead of the physical one. This is because the physical one has no meaning for the user as he only knows the virtual address range. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: add ASIC property of functional HBMs	Oded Gabbay
	The number of functional HBMs in the same ASIC can be different due to malfunctioning HBM banks. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs/gaudi: add debug prints for security status	Ofir Bitton
	In order to have more information while debugging boot issues, we should print the firmware security status at every boot stage. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: modify memory functions signatures	Omer Shpigelman
	For consistency, modify all memory ioctl functions to get the ioctl arguments structure rather than the arguments themselves. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: kernel doc format in memory functions	Omer Shpigelman
	Change all memory functions documentation according to kernel doc format. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: replace WARN/WARN_ON with dev_crit in driver	Alon Mizrahi
	Often WARN is defined in data-centers as BUG and we would like to avoid hanging the entire server on some internal error of the driver (important as it might be). Therefore, use dev_crit instead. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: report dram_page_size in hw_ip_info ioctl	Moti Haimovski
	Instead of having it hard-coded as a define, pass it to the user in runtime. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: allow user to pass a staged submission seq	Ofir Bitton
	In order to support the staged submission feature, user must be allowed to use the same CS sequence for all submissions in the same staged submission. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs/gaudi: support CS with no completion	Ofir Bitton
	As part of the staged submission feature, we need Gaudi to support command submissions that will never get a completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: Init the VM module for kernel context	Ofir Bitton
	In order for reserving VA ranges for kernel memory, we need to allow the VM module to be initiated with kernel context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27	habanalabs: refactor MMU locks code	Ohad Sharabi
	remove mmu_cache_lock as it protects a section which is already protected by mmu_lock. in addition, wrap mmu cache invalidate calls in hl_vm_ctx_fini with mmu_lock. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21	habanalabs: disable FW events on device removal	Oded Gabbay
	When device is removed, we need to make sure the F/W won't send us any more events because during the remove process we disable the interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21	habanalabs: fix backward compatibility of idle check	Oded Gabbay
	Need to take the lower 32 bits of the driver's 64-bit idle mask and put it in the legacy 32-bit variable that the userspace reads to know the idle mask. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-21	habanalabs: zero pci counters packet before submit to FW	Ofir Bitton
	Driver does not zero some pci counters packets before sending to FW. This causes an out of sync PI/CI between driver and FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12	misc/habana: Use FOLL_LONGTERM for userptr	Daniel Vetter
	These are persistent, not just for the duration of a dma operation. Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-5-daniel.vetter@ffwll.ch
2021-01-12	misc/habana: Stop using frame_vector helpers	Daniel Vetter
	All we need are a pages array, pin_user_pages_fast can give us that directly. Plus this avoids the entire raw pfn side of get_vaddr_frames. Note that pin_user_pages_fast is a safe replacement despite the seeming lack of checking for vma->vm_flasg & (VM_IO \| VM_PFNMAP). Such ptes are marked with pte_mkspecial (which pup_fast rejects in the fastpath), and only architectures supporting that support the pin_user_pages_fast fastpath. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-4-daniel.vetter@ffwll.ch
2021-01-12	habanalabs: prevent soft lockup during unmap	Oded Gabbay
	When using Deep learning framework such as tensorflow or pytorch, there are tens of thousands of host memory mappings. When the user frees all those mappings at the same time, the process of unmapping and unpinning them can take a long time, which may cause a soft lockup bug. To prevent this, we need to free the core to do other things during the unmapping process. For now, we chose to do it every 32K unmappings (each unmap is a single 4K page). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-12	habanalabs: fix reset process in case of failures	Oded Gabbay
	There are some points in the reset process where if the code fails for some reason, and the system admin tries to initiate the reset process again we will get a kernel panic. This is because there aren't any protections in different fini functions that are called during the reset process. The protections that are added in this patch make sure that if the fini functions are called multiple times, without calling init functions between them, there won't be double release of already released resources. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-29	habanalabs: Fix memleak in hl_device_reset	Dinghao Liu
	When kzalloc() fails, we should execute hl_mmu_fini() to release the MMU module. It's the same when hl_ctx_init() fails. Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: fix order of status check	Oded Gabbay
	When the device is in reset or needs to be reset, the disabled property is don't-care. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: register to pci shutdown callback	Oded Gabbay
	We need to make sure our device is idle when rebooting a virtual machine. This is done in the driver level. The firmware will later handle FLR but we want to be extra safe and stop the devices until the FLR is handled. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: add validation cs counter, fix misplaced counters	Alon Mizrahi
	Up until now validation errors were counted in the parsing field of the cs_counters struct, so we added a new counter and increased it when needed. In addition, there were some locations where only one of the counters was updated (ctx or aggregate) so add the second one to be updated as well. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: adjust pci controller init to new firmware	Oded Gabbay
	When the firmware security is enabled, the pcie_aux_dbi_reg_addr register in the PCI controller is blocked. Therefore, ignore the result of writing to this register and assume it worked. Also remove the prints on errors in the internal ELBI write function. If the security is enabled, the firmware is responsible for setting this register correctly so we won't have any problem. If the security is disabled, the write will work (unless something is totally broken at the PCI level and then the whole sequence will fail). In addition, remove a write to register pcie_aux_dbi_reg_addr+4, which was never actually needed. Moreover, PCIE_DBI registers are blocked to access from host when firmware security is enabled. Use a different register to flush the writes. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: full FW hard reset support	Ofir Bitton
	Driver must fetch FW hard reset capability at every FW boot stage: preboot, CPU boot, CPU application. If hard reset is triggered, driver will take into consideration only the last capability received. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: Revise comment to align with mirror list name	Tomer Tayar
	hw_queues_mirror was renamed to cs_mirror, so revise accordingly a comment that refers to this list. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs/gaudi: do not set EB in collective slave queues	Alon Mizrahi
	We don't need to set EB on signal packets from collective slave queues as it degrades performance. Because the slaves are the network queues, the engine barrier doesn't actually guarantee that the packet has been sent. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: preboot hard reset support	Ofir Bitton
	FW hard reset capability indication is now moved to preboot stage. Driver will check if HW is dirty only after it validated preboot is up. If HW is dirty, driver will perform a hard reset according to the FW capability. In addition, FW defines a new message which driver need to send in order to initiate a hard reset. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: remove generic gaudi get_pll_freq function	Alon Mizrahi
	As we only fetch the CPU_PLL frequency in gaudi, we don't need a generic get_pll_frequency function which takes a pll index as input Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-28	habanalabs: Fix a missing-braces warning	Tomer Tayar
	Fix a compilation "missing braces around initializer" warning. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-12-07	Merge 5.10-rc7 into char-misc-next	Greg Kroah-Hartman
	We want the fixes in here, and this resolves a merge issue with drivers/misc/habanalabs/common/memory.c. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-30	habanalabs: Add CB IOCTL opcode to retrieve CB information	Tomer Tayar
	Add a new CB IOCTL opcode that enables a user to query about a CB and get its usage count. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: Modify the cs_cnt of a CB to be atomic	Tomer Tayar
	Modify the CS counter of a CB to be atomic, so no locking is required when it is being modified or read. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: Add mask for CS type bits in CS flags	Tomer Tayar
	hl_cs_sanity_checks() extracts the CS type bits of the CS flags, by masking out the non-type bits. To save the need for updating the function whenever new bits for non-type flags are added, add an explicit mask for the CS type bits. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: change messages to debug level	Oded Gabbay
	Some messages should be changed to debug mode as we want to keep minimal prints during normal operation of the device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: free host huge va_range if not used	Ofir Bitton
	If huge range is not valid, driver uses the host range also for huge page allocations, but driver never frees its allocation. This introduces a memory leak every time a user closes its context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: add missing counter update	Oded Gabbay
	The global CS drop-on-reset counter wasn't updated together with the context counter. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: add support for cs with timestamp	Ofir Bitton
	add support for user to request a timestamp upon cs completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: indicate to user that a cs is gone	Ofir Bitton
	We want to indicate to the user that a certain command submission is finished long time ago and it is no longer in database. This means no further information regarding this cs can be obtained. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30	habanalabs: fetch pll frequency from firmware	Alon Mizrahi
	Once firmware security is enabled, driver must fetch pll frequencies through the firmware message interface instead of reading the registers directly. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>