summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs/common
AgeCommit message (Collapse)Author
2021-06-18habanalabs: added open_stats info ioctlYuri Nudelman
In a system with multiple ASICs, there is a need to provide monitoring tools with information on how long a device was opened and how many times a device was opened. Therefore, we add a new opcode to the INFO ioctl to provide that information. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: set rc as 'valid' in case of intentional func exitKoby Elbaz
fix the following smatch warnings: hl_fw_static_init_cpu() warn: missing error code 'rc' Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: zero complex structures using memsetKoby Elbaz
fix the following sparse warnings: 'warning: Using plain integer as NULL pointer' 'warning: missing braces around initializer' Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: print more info when failing to pin user memoryTomer Tayar
pin_user_pages_fast() might fail and return a negative number, or pin less pages than requested and return the number of the pages that were pinned. For the latter, it is informative to print also the memory size and the number of requested pages. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: Fix an error handling path in 'hl_pci_probe()'Christophe JAILLET
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it must be undone by a corresponding 'pci_disable_pcie_error_reporting()' call, as already done in the remove function. Fixes: 2e5eda4681f9 ("habanalabs: PCIe Advanced Error Reporting support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: print firmware versionsOded Gabbay
Firmware in habanalabs devices is composed of several components. During device initialization, we read these versions from the device. Print them during device initialization to allow better visibility in automated systems. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: add hard reset timeout for PLDMOmer Shpigelman
Hard reset flow on PLDM might take more than 2 minutes. Hence add a dedicated hard reset timeout of 6 minutes for PLDM. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: enable dram scramble before linux f/wBharat Jauhari
In current code, for dynamic f/w loading flow, DRAM scrambling is enabled post Linux fit image is loaded to the card. This can cause the device CPU to go into reset state. The correct sequence should be: 1. Load boot fit image 2. Enable scrambling 3. Load Linux fit image This commit aligns the DRAM scrambling enabling with the static f/w load flow. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: enable stop on error for all QMANs and enginesOfir Bitton
If there is an error in the QMAN/engine, there is no point of trying to continue running the workload. It is better to stop to allow the user to debug the program. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: report EQ fault during heartbeatOhad Sharabi
In case we have EQ fault we would like to know about it. For this, a status bitmask was added in which EQ_FAULT bit is set by FW in case of EQ fault. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: small code refactoringKoby Elbaz
Use datatype defines instead of hard coded values, and rename set_fixed_properties function. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: fix mask to obtain page offsetOhad Sharabi
When converting virtual address to physical we need to add correct offset to the physical page. For this we need to use mask that include ALL bits of page offset. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: skip valid test for boot_dev_sts regsOhad Sharabi
Get rid of the need to check if boot_dev_sts is valid on every access to value read from these registers. This is done by storing the register value in hdev props ONLY if register is enabled. This way if register is NOT enabled all capability bits will not be set. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: reset device upon FD close if not idleOfir Bitton
If device is not idle after user closes the FD we must reset device as next user that will try to open FD will encounter a non-functional device. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: add debug flag to prevent failure on timeoutYuri Nudelman
Sometimes it is useful to allow the command to continue running despite the timeout occurred, to differentiate between really stuck or just very time consuming commands. This can be achieved by passing a new debug flag alongside the cs, HL_CS_FLAGS_SKIP_RESET_ON_TIMEOUT. Anyway, if the timeout occurred, a warning print shall be issued, however this shall not fail the submission. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: split host irq interfaces towards FWOfir Bitton
Current implementation uses a single interrupt interface towards FW, this interface is causing races between interrupt types. We split this interface to interface per interrupt type. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: prefer ASYNC device probingOded Gabbay
There is no dependency when probing multiple devices so indicate to the kernel that it can probe our devices in ASYNC fashion. This shortens insmod of the driver from ~2 minutes to 20 seconds on a system with 8 devices. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: track security status using positive logicOhad Sharabi
Using negative logic (i.e. fw_security_disabled) is confusing. Modify the flag to use positive logic (fw_security_enabled). Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: use COMMS to reset device / halt CPUKoby Elbaz
This is needed because legacy FW 'communication' protocol will soon become obsolete. Because COMMS is a boot protocol, communicating through it is supported only until Linux is loaded to the device CPU, where in that case we will fallback to the former implementation. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: disable GIC usage if security is enabledKoby Elbaz
Security is set based on PCI ID, and after reading preboot status bits. GIC usage is set in both scenarios since GIC can't be used when security is enabled. Moreover, writing to GIC/SP is enabled only after Linux is fully loaded. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: read preboot status bits in an earlier stageKoby Elbaz
On newer releases, host won't be able to trigger an interrupt directly to the ASIC GIC controller. To be able to decide whether GIC can/not be used, we must read device's preboot status bits in a stage that precedes the possible first use of GIC (when device is in dirty state). Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: check running index in eqe controlOded Gabbay
To harden the event queue mechanism, we add a running index to the control header of the entry. The firmware writes the index in each entry and the driver verifies that the index of the current entry is larger by 1 of the index of the previous entry. In case it isn't, the driver will treat the entry as if it wasn't valid (it won't process it but won't skip it). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: set memory scrubbing to disabled by defaultOded Gabbay
Scrubbing memory after every unmap is very costly in terms of performance. If a user wants it he can enable it but the default should prioritize performance. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: read GIC sts after FW is loadedKoby Elbaz
Reading of GIC privileged status will be done after F/W is loaded, because privileged GIC capability is only available with the correct ARMCP version, and after it's loaded. Such versions necessarily support COMMS, so GIC alternatives (SP regs) will be read directly from dynamic regs. As well, initiation of DMA QMANs will occur after F/W is loaded since it depends on GIC configuration. In case F/W isn't loaded there's no problem since either way there won't be any GIC IRQ handling. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: check if asic secured with asic typeOhad Sharabi
Fix issue in which the input to the function is_asic_secured was device PCI_IDS number instead of the asic_type enumeration. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: send hard reset cause to prebootKoby Elbaz
LKD should provide hard reset cause to preboot prior to loading any FW components (in case needed). Current implementation is based on the new FW 'COMMS' protocol In cased 'COMMS' is disabled - reset cause won't be sent. Currently, only 2 reset causes are shared: HEARTBEAT & TDR. Sending the reset cause will provide the missing watchdog info that the firmware needs to provide to the BMC. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: notify before f/w loadingOded Gabbay
An information print notifying on starting to load the f/w was removed by mistake when moving to the new dynamic f/w loading mechanism. Restore that print as the F/W loading usually takes between 10 to 20 seconds and this print helps the user know the status of the driver load. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs/gaudi: use scratchpad regs instead of GIC controllerKoby Elbaz
Due to new security restrictions, GIC controller can no longer be accessed from user/kernel. To monitor that, a new status bit will be read from preboot caps, indicating whether direct access to GIC is blocked. In case it is blocked, driver will use scratchpad registers instead of using GIC interface on two main scenarios: The first of which LKD triggers interrupts to F/W through GIC, and the second of when LKD configures all engines/QMANs to write to GIC when they want to report an error. From F/W perspective, it will poll on all SPs, and once IRQ number is retrieved, SP register is cleared, and it will perform the write to the GIC to trigger the IRQ handler. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: read f/w's 2-nd sts and err registersOhad Sharabi
Maintain both STS1 and ERR1 registers used for status communication with F/W. Those are not maintained as we currently have less than 31 statuses/error defined and so LKD did not refer to those register. The reason to read them now is to try to support future f/w versions with current driver. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: avoid using uninitialized pointerOhad Sharabi
When attempting to read FW component's version we should break if input FW component is invalid in order to avoid using uninitialized destination pointer. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: set dma mask from fw once fw done iatu configOhad Sharabi
When setting "DMA mask from FW" we are reading PSOC_GLOBAL_CONF register which is allowed only once FW has done it's iATU configuration. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: better error print for pin failureOded Gabbay
Print the user given pointer and error code on failure to get user pages for easier debugging. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: add missing space after castingOmer Shpigelman
Change casting code according to kernel coding style. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: ignore device unusable statusOded Gabbay
Some users might want to implement their own policy of when the device is unusable so we need to ignore this status in the driver and continue loading as normal. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: load linux image to deviceOhad Sharabi
Implementing dynamic linux image load to the device. This patch also implements the FW communication steps during the boot-fit. This patch also enables the dynamic protocol based on the compatibility flag. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: load boot fit to deviceOhad Sharabi
Implementing dynamic boot fit image load to the device. Note that some necessary adjustment were added to the static loader as well so that both loaders can co-exist. as this is not the final FW load stage the dynamic FW load is still forced to be non functional. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: use dev_dbg upon hint address failureOded Gabbay
Hint address failure that results in a valid mapping with an address that was allocated by the driver is not a real failure. Therefore, the driver shouldn't notify about this in kernel log. The user is responsible to check the returned address. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: modify progress status messagesGuy Nisan
Indicate "progress" instead of "error" when reporting progress status. Change "u-boot stopped by user" to "Cannot boot" message as CPU_BOOT_STATUS_UBOOT_NOT_READY may indicate a fatal error that prevent u-boot from loading firmware. Signed-off-by: Guy Nisan <gnisan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: give FW a grace time for configuring iATUOfir Bitton
iATU (internal Address Translation Unit of the PCI controller) configuration is being done by FW right after driver enables the PCI device. Hence, driver must add a minor sleep afterwards in order to make sure FW finishes configuring iATU regions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: update to latest f/w headersOded Gabbay
Update the common and GAUDI firmware header files to the latest version. The latest version use the correct endianness types so this commit also contains minor changes to the code to use the correct conversions when reading/writing to the firmware structures. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: expose ASIC specific PCI info to common codeOhad Sharabi
LKD has interfaces in which it receives device address. For instance the debugfs_read/write variants receives device address for CFG/SRAM/DRAM for read/write and need to translate to the mapped PCI BAR address. In addition, the dynamic FW load protocol dictates that the address to which the LKD will copy the image for the next FW component will be received as a device address and can be placed either in SRAM or DRAM. We need to distinguish those regions as the access methods to those regions are different (in DRAM we possibly need to set the BAR base). Looking forward this code will be used to remove duplicated code in the debugfs_read/write that search the memory region for the input device address. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: dynamic fw load reset protocolOhad Sharabi
First stage of the dynamic FW load protocol is to reset the protocol to avoid residues from former load cycles. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: use common fw_version readOhad Sharabi
Instead of using multiple ASIC specific copies of functions to read the FW version use single common one that gets ASIC specific arguments. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: use mmu cache range invalidationAlon Mizrahi
Use mmu cache range invalidation instead of entire cache invalidation because it yields better performance. In GOYA and GAUDI, always use entire cache invalidation because these ASICs don't support range invalidation. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: refactor init device cpu codeOhad Sharabi
Replace multiple arguments to init device CPU function by passing firmware loader managing structure that is initialized per ASIC with the loader parameters. In addition, the FW loader management structure is now part of the habanalabs device, this way the loader parameters will be able to be communicated across various boot stages. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: request f/w in separate functionOhad Sharabi
This refactor is needed due to the dynamic FW load in which requesting the FW file (and getting its attributes) is not immediately followed by copying FW file content. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: prepare preboot stage to dynamic f/w loadOhad Sharabi
Start the skeleton for the dynamic F/W load by marking current preboot code path as legacy. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18habanalabs: increase ELBI reset timeout for PLDMMoti Haimovski
On PLDM, in case of NIC hangs, the ELBI reset to take much longer than expected. As a result an increase in the ELBI reset timeout is required. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-05-08habanalabs: wait for interrupt wrong timeout calculationOfir Bitton
Wait for interrupt timeout calculation is wrong, hence timeout occurs when user waits on an interrupt with certain timeout values. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-05-08habanalabs: ignore f/w status errorOded Gabbay
In case firmware has a bug and erroneously reports a status error (e.g. device unusable) during boot, allow the user to tell the driver to continue the boot regardless of the error status. This will be done via kernel parameter which exposes a mask. The user that loads the driver can decide exactly which status error to ignore and which to take into account. The bitmask is according to defines in hl_boot_if.h Signed-off-by: Oded Gabbay <ogabbay@kernel.org>