summaryrefslogtreecommitdiff
path: root/drivers/misc/habanalabs/gaudi
AgeCommit message (Collapse)Author
2020-09-22habanalabs: use FIELD_PREP() instead of <<Oded Gabbay
Use the standard FIELD_PREP() macro instead of << operator to perform bitmask operations. This ensures type check safety and eliminate compiler warnings. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: use standard BIT() and GENMASK()Oded Gabbay
Use the standard macros to define bitmasks. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: print the queue id in case of an errorDotan Barak
If there is a failure during the testing of a queue, to ease up debugging - print the queue id. Signed-off-by: Dotan Barak <dbarak@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: remove security from ARB_MST_QUIET registerfarah kassabri
Allow user application to write to this register in order to be able to configure the quiet period of the QMAN between grants. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: expose sync manager resources allocation in INFO IOCTLOfir Bitton
Although the driver defines the first user-available sync manager object and monitor in habanalabs.h, we would like to also expose this information via the INFO IOCTL so the runtime can get this information dynamically. This is because in future ASICs we won't need to define it statically. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22habanalabs: add information about PCIe controllerOfir Bitton
Update firmware header with new API for getting pcie info such as tx/rx throughput and replay counter. These counters are needed by customers for monitor and maintenance of multiple devices. Add new opcodes to the INFO ioctl to retrieve these counters. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22habanalabs: set max power according to card typeOded Gabbay
In Gaudi, the default max power setting is different between PCI and PMC cards. Therefore, the driver need to set the default after knowing what is the card type. The current code has a bug where it limits the maximum power of the PMC card to 200W after a reset occurs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22habanalabs: proper handling of alloc size in coresightOfir Bitton
Allocation size can go up to 64bit but truncated to 32bit, we should make sure it is not truncated and validate no address overflow. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22habanalabs: set clock gating according to maskOfir Bitton
Once clock gating is set we enable clock gating according to mask, we should also disable clock gating according to relevant bits. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22habanalabs: Fix a loop in gaudi_extract_ecc_info()Dan Carpenter
The condition was reversed. It should have been less than instead of greater than. The result is that we never enter the loop. Fixes: fcc6a4e60678 ("habanalabs: Extract ECC information from FW") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-08-22habanalabs: validate packet id during CB parseOfir Bitton
During command buffer parsing, driver extracts packet id from user buffer. Driver must validate this packet id, since it is being used in order to extract information from internal structures. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-29habanalabs: goya_ctx_init() can be statickernel test robot
Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20200729000313.GA14680@e442e3f624c4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29habanalabs: fix up absolute include instructionsGreg Kroah-Hartman
There's no need to try to be cute with the include file locations in the Makefile, so just specify exactly where the files are. Bonus is this fixes the problem of building with O= as well as trying to just build the subdirectory alone. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Ben Segal <bpsegal20@gmail.com> Cc: Christine Gharzuzi <cgharzuzi@habana.ai> Cc: Pawel Piskorski <ppiskorski@habana.ai> Link: https://lore.kernel.org/r/20200728171851.55842-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-27Merge 5.8-rc7 into char-misc-nextGreg Kroah-Hartman
This should resolve the merge/build issues reported when trying to create linux-next. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-24habanalabs: use no flags on MMU cache invalidationTomer Tayar
gaudi_mmu_invalidate_cache() doesn't use the flags parameter, and thus it can be set to 0 when the function is called in the gaudi only files. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: create internal CB poolOfir Bitton
Create a device MMU-mapped internal command buffer pool, in order to allow the driver to allocate CBs for the signal/wait operations that are fetched by the queues when they are configured with the user's address space ID. We must pre-map this internal pool due to performance issues. This pool is needed for future ASIC support and it is currently unused in GOYA and GAUDI. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: create common folderOded Gabbay
For internal needs of our CI we need to move all the common code into a common folder instead of putting them in the root folder of the driver. Same applies to the common header files under include/ Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2020-07-24habanalabs: check for DMA errors when clearing memoryMoti Haimovski
In GAUDI we use QMAN0 DMA for clearing the MMU memory region at initialization. if this operation fails it places the DMA in an error state and then when trying to initialize QMAN0 we fail and erroneously assume its the QMAN that failed. This commit adds a check and clear of such DMA errors at initialization so we will have a better understanding of what went wrong. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: halt device CPU only upon certain resetOded Gabbay
Currently the driver halts the device CPU in the halt engines function, which halts all the engines of the ASIC. The problem is that if later on we stop the reset process (due to inability to clean memory mappings in time), the CPU will remain in halt mode. This creates many issues, such as thermal/power control and FLR handling. Therefore, move the halting of the device CPU to the very end of the reset process, just before writing to the registers to initiate the reset. In addition, the driver now needs to send a message to the device F/W to disable it from sending interrupts to the host machine because during halt engines function the driver disables the MSI/MSI-X interrupts. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-07-24habanalabs: configure maximum queues per asicOfir Bitton
Currently the amount of maximum queues is statically configured. Using a static value is causing redundunt cycles when traversing all queues and consumes more memory than actually needed. In this patch we configure each asic with the exact number of queues needed. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: remove soft-reset support from GAUDIOded Gabbay
Soft-reset isn't supported in GAUDI. Remove the code that performs it and print error in case the user wants to do it via sysfs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-07-24habanalabs: PCIe iATU refactoringOfir Bitton
Divide iATU initialization into inbound/outbound methods. We must separate it in order to enable different match mode per PCIe region. In addition, added support for PCI address match mode. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: Extract ECC information from FWOded Gabbay
ECC (Error Correcting Code) interrupts are going to be handled by the FW. Hence, we define an interface in which the driver can obtain the relevant ECC information. This information is needed for monitoring and can also lead to a hard reset if ECC error is not correctable. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: Increase queues depthOfir Bitton
After recent concurrent cs amount increase, we must also increase queues depth since much more concurrent work can be done. All external queue depths were increased to 4096 as gaudi's internal queue depths were also increased to 1024. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: calculate trace frequency from PLLAdam Aharon
The profiler needs to know the PLL values for correctly showing the profiling data. Because our firmware can use different PLL configurations, we need to read the PLL values from the ASIC to pass them to the profiler. Signed-off-by: Adam Aharon <aaharon@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: Use mask instead of shift in sync stream registersOfir Bitton
Use proper bitfield masks instead of shifting values when configuring packets sent to device. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: sync stream generic functionalityOfir Bitton
Currently sync stream is limited only for external queues. We want to remove this constraint by adding a new queue property dedicated for sync stream. In addition we move the initialization and reset methods to the common code since we can re-use them with slight changes. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: Use pending CS amount per ASICOfir Bitton
Training schemes requires much more concurrent command submissions than inference does. In addition, training command submissions can be completed in a non serialized manner. Hence, we add support in which each ASIC will be able to configure the amount of concurrent pending command submissions, rather than use a predefined amount. This change will enhance performance by allowing the user to add more concurrent work without waiting for the previous work to be completed. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-24habanalabs: remove rate limiters from GAUDIOded Gabbay
We no longer need to initialize the rate limiters in GAUDI A1. Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-07-10habanalabs: set 4s timeout for message to device CPUOded Gabbay
We see that sometimes the CPU in GOYA and GAUDI is occupied by the power/thermal loop and can't answer requests from the driver fast enough. Therefore, to avoid false notifications on timeouts, increase the timeout to 4 seconds on each message sent to the device CPU. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-07-10habanalabs: set clock gating per engineOded Gabbay
For debugging purposes, we need to allow the root user better control of the clock gating feature of the DMA and compute engines. Therefore, change the clock gating debugfs interface to be bitmask instead of true/false. Each bit represents a different engine, according to gaudi_engine_id enum. See debugfs documentation for more details. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2020-07-10habanalabs: block WREG_BULK packet on PDMAOded Gabbay
WREG_BULK is a special packet that has a variable length. Therefore, we can't parse it when validating CBs that go to the PCI DMA queue. In case the user needs to use it, it can put multiple WREG32 packets instead. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
2020-07-01misc: habanalabs: gaudi: gaudi_security: Repair incorrectly named function argLee Jones
gaudi_pb_set_block()'s argument 'base' was incorrectly named 'block' in its function header. Fixes the following W=1 kernel build warning(s): drivers/misc/habanalabs/gaudi/gaudi_security.c:454: warning: Function parameter or member 'base' not described in 'gaudi_pb_set_block' drivers/misc/habanalabs/gaudi/gaudi_security.c:454: warning: Excess function parameter 'block' description in 'gaudi_pb_set_block' Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-11-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-01misc: habanalabs: gaudi: Remove ill placed asterisk from kerneldoc headerLee Jones
W=1 kernel builds report a lack of description of gaudi_set_asic_funcs()'s 'hdev' argument. In reality it is documented, but the formatting was not as expected '@.*:'. Instead, there was a misplaced asterisk which was confusing the kerneldoc validator. Squashes the following W=1 warning: drivers/misc/habanalabs/gaudi/gaudi.c:6746: warning: Function parameter or member 'hdev' not described in 'gaudi_set_asic_funcs' Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-10-lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24habanalabs: increase h/w timer when checking idleOmer Shpigelman
In GAUDI the current timer value for the hardware to check if it is in IDLE state is too low. As a result, there are occasions where the H/W wrongly reports it is not IDLE. The driver checks that before submitting work on behalf of the driver during initialization, so a false report might cause the driver to fail during device initialization. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-06-24habanalabs: increase GAUDI QMAN ARB WDT timeoutOded Gabbay
The current timeout is too low for some of the workloads and we see false errors as a result. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-06-24habanalabs: use PI in MMU cache invalidationOmer Shpigelman
The PS flow for MMU cache invalidation caused timeouts in stress tests. Use PS + PI flow so no timeouts should happen whatsoever. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-06-24habanalabs: block scalar load_and_exe on external queueOded Gabbay
In Gaudi, the user can't execute scalar load_and_exe on external queue because it can be a security hole. The driver doesn't parse the commands being loaded and it can be msg_prot, which the user isn't allowed to use. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-25habanalabs: handle MMU cache invalidation timeoutOmer Shpigelman
MMU cache invalidation timeout indicates that the device is unstable and therefore unusable. Hence in such case do hard reset and return an error to the user if was called from ioctl. In addition, change the print to error level and rephrase its text. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-25habanalabs: GAUDI does not support soft-resetOded Gabbay
GAUDI does not support soft-reset as it leaves the NIC ports in an awkward state, where their QMANs were reset but the NIC itself is still working. In addition, there is not much sense in doing soft-reset when training is done on multiple GAUDIs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by: Tomer Tayar <ttayar@habana.ai>
2020-05-25habanalabs: add print for soft reset due to eventOmer Shpigelman
Print the event name that caused the soft reset. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-25habanalabs: improve MMU cache invalidation codeOmer Shpigelman
A new sequence is introduced to invalidate the MMU cache in order to avoid timeouts. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19habanalabs: move event handling to common firmware fileOfir Bitton
Instead of writing similar event handling code for each ASIC, move the code to the common firmware file. This code will be used for GAUDI and all future ASICs. In addition, add two new fields to the auto-generated events file: valid and description. This will save the need to manually write the events description in the source code and simplify the code. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19habanalabs: add gaudi profiler moduleOmer Shpigelman
Add the GAUDI code to initialize the ASIC's profiler. The profile receives its initialization values from the user, same as in Goya, but the code to initialize is in the driver because the configuration space of the device is not directly exposed to the user. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19habanalabs: add gaudi security moduleOmer Shpigelman
Add the code to initialize the security module of GAUDI. Similar to Goya, we have two dedicated mechanisms for security: Range Registers and Protection bits. Those mechanisms protect sensitive memory and configuration areas inside the device. In addition, in Gaudi we moved to a 3-level security scheme, where the F/W runs with the highest security level (Privileged), the driver runs with a less secured level (Secured) and the user is neither privileged nor secured. The security module in the driver configures the Secured parts so the user won't be able to access them. The Privileged parts are configured by the F/W. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19habanalabs: add hwmgr module for gaudiOded Gabbay
The hwmgr module is responsible for messages sent to GAUDI F/W that are not common to all habanalabs ASICs. In GAUDI, we provide the user a simplified mode of controlling the ASIC clock frequency. Instead of three different clocks, we present a single clock property that the user can configure via sysfs. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-05-19habanalabs: add gaudi asic-dependent codeOded Gabbay
Add the ASIC-dependent code for GAUDI. Supply (almost) all of the function callbacks that the driver's common code need to initialize, finalize and submit workloads to the GAUDI ASIC. It also contains the code to initialize the F/W of the GAUDI ASIC and to receive events from the F/W. Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>