linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2022-05-22	habanalabs: rephrase device out-of-memory message	Ohad Sharabi
	The out of memory message is rephrased to more subtle expression as out of memory may be caused by the user in case of, for example, greedy allocation. In addition the user is also being notified by an error code. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: add MMU prefetch to ASIC-specific code	Ohad Sharabi
	This is necessary pre-requisite for future ASIC support, where MMU TLB prefetch is supported. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: convert ts to use unified memory manager	Yuri Nudelman
	With the introduction of the unified memory manager infrastructure, the timestamp buffers can be converted to use it. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: unified memory manager infrastructure	Yuri Nudelman
	This is a part of overall refactoring attempt to separate nic and the core drivers. Currently, there are 4 different flows, that contain very similar code. These are the ts, nic, hwblocks and cb alloc/map flows. The similar aspect of all these flows is that they all contain a central store, with memory buffers inside, supporting the following set of operations: - Allocate buffer and return handle - Get buffer from the store with handle - Put the buffer (last put releases the buffer) - Map the buffer to the user This patch contains a generic data structure used to implement the above memory buffer store interface. Conversion of the existing code to use the new data structure will follow. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: save f/w preboot major version	Ofir Bitton
	We need this property for doing backward compatibility hacks against the f/w. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: replace usage of found with dedicated list iterator variable	Jakob Koschel
	To move the list iterator variable into the list_for_each_entry_() macro in the future it should be avoided to use the list iterator variable after the loop body. To never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: modify dma_mask to be ASIC specific property	Tomer Tayar
	The required DMA mask is no longer based on input from the F/W, but it is fixed per ASIC according to its address space. As such, the per-ASIC function to get this value can be replaced with a property variable. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: parse full firmware versions	Ofir Bitton
	When parsing firmware versions strings, driver should not assume a specific length and parse up to the maximum supported version length. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: expose compute ctx status through info ioctl	Ofir Bitton
	In order for the user to know if he can try and open device, we expose the compute ctx state. The user can now know if the context is used by another process or whether the device is still ongoing through cleanup or reset and will be available soon. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: add new return code to device fd open	Ofir Bitton
	In order to be more informative during device open, we are adding a new return code -EAGAIN that indicates device is still going through resource reclaiming and hence it cannot be used yet. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: add user API to get valid DRAM page sizes	Ohad Sharabi
	Future devices will support multiple device memory page sizes. In addition, an API for the user was added for it to be able to control the device memory allocation page size. This patch is a complementary patch to inform the user of the available page size supported by the device. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: convert all MMU masks/shifts to arrays	Ohad Sharabi
	There is no need to hold each MMU mask/shift as a denoted structure member (e.g. hop0_mask). Instead converting it to array will result in smaller and more readable code. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: change mmu_get_real_page_size to be ASIC-specific	Ohad Sharabi
	This patch breaks the cumbersome implementation of "get real page size" along with it's multiple inner conditions and implement each case (according to the real complexity) inside an ASIC function. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: add DRAM default page size to HW info	Ohad Sharabi
	When using the device memory allocation API the user ought to know what is the default allocation page size. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22	habanalabs: set non-0 value in dram default page size	Ohad Sharabi
	Looking forward we will need to report to the user what is the default page size used. This will be done more conveniently by explicitly updating the property rather than to rely on a "0 meaning default" value. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-04	habanalabs: Fix test build failures	Guenter Roeck
	allmodconfig builds on 32-bit architectures fail with the following error. drivers/misc/habanalabs/common/memory.c: In function 'alloc_device_memory': drivers/misc/habanalabs/common/memory.c:153:49: error: cast from pointer to integer of different size Fix the typecast. While at it, drop other unnecessary typecasts associated with the same commit. Fixes: e8458e20e0a3c ("habanalabs: make sure device mem alloc is page aligned") Cc: Ohad Sharabi <osharabi@habana.ai> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20220404134859.3278599-1-linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-28	habanalabs: remove deprecated firmware states	Ofir Bitton
	During driver and F/W handshake, driver waits for F/W to reach certain states in order to progress with the boot flow. Some of the states were deprecated a long time ago and were never present on official firmwares. Therefore, let's remove them from the handshake process. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: add an option to delay a device reset	Tomer Tayar
	Several H/W events can be sent adjacently, even due to a single error. If a hard-reset is triggered as part of handling one of these events, the following events won't be handled. The debug info from these missed events is important, sometimes even more important than the one that was handled. To allow handling these close events, add an option to delay a device reset and use it when resetting due to H/W events. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: Add check for pci_enable_device	Jiasheng Jiang
	As the potential failure of the pci_enable_device(), it should be better to check the return value and return error if fails. Fixes: 70b2f993ea4a ("habanalabs: create common folder") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: Fix reset upon device release bug	farah kassabri
	In case user application was interrupted while some cs still in-flight or in the middle of completion handling in driver, the last refcount of the kernel private data for the user process will not be put in the fd close flow, but in the cs completion workqueue context. This means that the device reset-upon-device-release will be called from that context. During the reset flow, the driver flushes all the cs workqueue to ensure that any scheduled work has run to completion, and since we are running from the completion context we will have deadlock. Therefore, we need to skip flushing the workqueue in those cases. It is safe to do it because the user won't be able to release the device unless the workqueues are already empty. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: make sure device mem alloc is page aligned	Ohad Sharabi
	Working with MMU that supports multiple page sizes requires that mapping of a page of a certain size will be aligned to the same size (e.g. the physical address of 32MB page shall be aligned to 32MB). To achieve this the gen_poll allocation is now using the "align" variant to comply with the alignment requirements. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: allow user to set allocation page size	Ohad Sharabi
	In future ASICs the MMU will be able to work with multiple page sizes, thus a new flag is added to allow the user to set the requested page size. This flag is added since the whole DRAM is allocated for the user and the user also should be familiar with the memory usage use case. As such, the user may choose to "over allocate" memory in favor of performance (for instance- large page allocations covers more memory in less TLB entries). For example: say available page sizes are of 1MB and 32MB. If user wants to allocate 40MB the user can either set page size to 1MB and allocate the exact amount of memory (but will result in 40 TLB entries) or the user can use 32MB pages, "waste" 8MB of physical memory but occupy only 2 TLB entries. Note that this feature will be available only to ASIC that supports multiple DRAM page sizes. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: avoid using an uninitialized variable	Tomer Tayar
	Fix the following compilation warning in hl_cb_ioctl() @ command_buffer.c: warning: ‘device_va’ may be used uninitialized in this function Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: set max power on device init per ASIC	Tomer Tayar
	For current devices there is a need to send the max power value to F/W during device init, for example because there might be several card types. In future devices, this info will be programmed in the device's EEPROM and will be read by F/W, and hence the driver should not send it. Modify the sending of the relevant message to be done only for ASIC types that need it. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: use proper max_power variable for device utilization	Tomer Tayar
	The max_power variable which is used for calculating the device utilization is the ASIC specific property which is set during init. However, the max value can be modified via sysfs, and thus the updated value in the device structure should be used instead. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: enable stop-on-error debugfs setting per ASIC	Tomer Tayar
	On Goya and Gaudi, the stop-on-error configuration can be set via debugfs. However, in future devices, this configuration will always be enabled. Modify the debugfs node to be allowed only for ASICs that support this dynamic configuration. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: change function to static	Oded Gabbay
	handle_registration_node() is called directly from the irq handler in irq.c, so it can be static. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: add missing include of vmalloc.h	Oded Gabbay
	Use of vfree(), vmalloc_user(), vmalloc() and remap_vmalloc_range() requires this include in some architectures. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: fix use-after-free bug	Oded Gabbay
	When the code iterates over the free list of physical pages nodes, it deletes the physical page node which is used as the iterator. Therefore, we need to use the safe version of the iteration to prevent use-after-free. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: rephrase error messages in PCI initialization	Oded Gabbay
	The iATU is an internal h/w machine inside Habana's PCI controller. Mentioning it by name doesn't say anything to the user. It is better to say the PCI controller initialization was not done successfully. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: fix spelling mistake	Oded Gabbay
	The name of the property is hints_range_reservation Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: Timestamps buffers registration	farah kassabri
	Timestamp registration API allows the user to register a timestamp record event which will make the driver set timestamp when CQ counter reaches the target value and write it to a specific location specified by the user. This is a non blocking API, unlike the wait_for_interrupt which is a blocking one. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: fix race when waiting on encaps signal	Dani Liberman
	Scenario: 1. CS which is part of encaps signal has been completed and now executing kref_put to its encaps signal handle. The refcount of the handle decremented to 0, and called the encaps signal handle release function - hl_encaps_handle_do_release. 2. At this point the user starts waiting on the signal, and finds the encaps signal handle in the handlers list and increment the habdle refcount to 1. 3. Immediately after, hl_encaps_handle_do_release removed the handle from the list and free its memory. 4. Wait function using the handle although it has been freed. This scenario caused the slab area which was previously allocated for the handle to be poison overwritten which triggered kernel bug the next time the OS needed to allocate this slab. Fixed by getting the refcount of the handle only in case it is not zero. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: silence an uninitialized variable warning	Dan Carpenter
	Smatch warns that: drivers/misc/habanalabs/common/command_buffer.c:471 hl_cb_ioctl() error: uninitialized symbol 'device_va'. Which is true, but harmless. Anyway, it's easy to silence this by adding a error check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: remove duplicate print	Oded Gabbay
	We print detailed messages inside the internal ioctl functions. No need to print a generic message at the end, it doesn't add any information. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: prevent false heartbeat failure during soft-reset	Tomer Tayar
	The heartbeat thread is active during soft-reset, and it tries to send messages to CPU-CP core. Within the soft-reset, in the time window in which the device is marked as disabled, any CPU-CP command is "silently" skipped and a success value it returned. However, in addition to the return value, the heartbeat function also checks the F/W result, but because no command is sent in this time window, the result variable won't hold the expected value and we will have a false heartbeat failure. To avoid it, modify the "silent" skip to be done only in hard-reset. The CPU-CP should be able to handle messages during soft-reset. In addition to the heartbeat problem, this should also solve other issues in other flows that send messages during soft-reset and use the F/W result as it w/o being aware to the reset. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: fix race between wait and irq	Oded Gabbay
	There is a race in the user interrupts code, where between checking the target value and adding the new pend to the list, there is a chance the interrupt happened. In that case, no one will complete the node, and we will get a timeout on it. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: fix user interrupt wait when timeout is 0	Oded Gabbay
	When timeout is 0, we need to return the busy status in case the target value wasn't reached upon entry to the ioctl. Also return the correct timestamp. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: reject host map with mmu disabled	Oded Gabbay
	This is not something we can do a workaround. It is clearly an error and we should notify the user that it is an error. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: expose number of user interrupts	Oded Gabbay
	Currently we only expose to the user the ID of the first available user interrupt. To make user interrupts allocation truly dynamic, we need to also expose the number of user interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: add missing error check in sysfs max_power_show	Tomer Tayar
	Add a missing error check in the sysfs show function for max_power. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: fix soft reset flow in case of failure	Dani Liberman
	In case of soft reset failure, hard reset should be initiated, but reset flags were not set to enable it, which caused another soft reset followed by another failure. Updated reset flags to enable hard reset flow in case of soft reset failure. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: add missing error check in sysfs clk_freq_mhz_show	Tomer Tayar
	Add a missing error check in the sysfs show functions for clk_max_freq_mhz and clk_cur_freq_mhz_show. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: avoid copying pll data if pll_info_get fails	Tomer Tayar
	If reading PLL info from F/W fails, the PLL info is not set in the "result" variable, and hence shouldn't be copied to the caller's array. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: don't free phys_pg_pack inside lock	Oded Gabbay
	Freeing phys_pg_pack includes calling to scrubbing functions of the device's memory, taking locks and possibly even calling reset. This is not something that should be done while holding a device-wide spinlock. Therefore, save the relevant objects on a local linked-list and after releasing the spinlock, traverse that list and free the phys_pg_pack objects. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: there is no kernel TDR in future ASICs	Oded Gabbay
	In future ASICs, there is no kernel TDR for new workloads that are submitted directly from user-space to the device. Therefore, the driver can NEVER know that a workload has timed-out. So, when the user asks us to wait for interrupt on the workload's completion, and the wait has timed-out, it doesn't mean the workload has timed-out. It only means the wait has timed-out, which is NOT an error from driver's perspective. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: sysfs support for fw os version	Rajaravi Krishna Katta
	Adds new sysfs entry to display firmware os version /sys/class/habanalabs/hl<n>/fw_os_ver Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: remove power9 workaround for dma support	Oded Gabbay
	We don't need this workaround anymore. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: add vrm version to sysfs	Oded Gabbay
	infineon version is only applicable to GOYA and GAUDI. For later ASICs, we display the Voltage Regulator Monitor f/w version. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28	habanalabs: rename dev_attr_grp to dev_clk_attr_grp	Oded Gabbay
	In this attribute group we are only adding clocks. This is in preparation for adding a device specific attribute group which is not related to clocks. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>