summaryrefslogtreecommitdiff
path: root/drivers/misc
AgeCommit message (Collapse)Author
2022-09-19habanalabs: send device activity in a proper contextOfir Bitton
'Device activity open packet' should be sent outside of mutex as there is no real necessity for a lock. In addition 'device activity close packet' should be sent upon an actual release of the device. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19habanalabs: send device active message to f/wfarah kassabri
As part of the RAS that is done by the f/w, we should send a message to the f/w when a user either acquires or releases the device. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19habanalabs/gaudi2: dump detailed information upon RAZWIOfir Bitton
In order to improve debuggability, we add all available information when a RAZWI event occur. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi2: log critical events with no rate limitfarah kassabri
When we have a storm of errors of HBM ECC SERR we can reach a situation where driver start hard reset flow without logging the error cause that caused the hard reset due to logs rate limiting. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: ignore EEPROM errors during bootOfir Bitton
EEPROM errors reported by firmware are basically warnings and should not fail the boot process. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: perform context switch flow only if neededOfir Bitton
Except Goya, none of our ASICs require context switch flow, hence we enable this flow only where it is needed. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: set command buffer host VA dynamicallyDafna Hirschfeld
Set the addresses for userspace command buffer dynamically instead of hard-coded. There is no reason for it to be hard-coded. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: trace DMA allocationsOhad Sharabi
This patch add tracepoints in the code for DMA allocation. The main purpose is to be able to cross data with the map operations and determine whether memory violation occurred, for example free DMA allocation before unmapping it from device memory. To achieve this the DMA alloc/free code flows were refactored so that a single DMA tracepoint will catch many flows. To get better understanding of what happened in the DMA allocations the real allocating function is added to the trace as well. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: trace MMU map/unmap pageOhad Sharabi
This patch utilize the defined tracepoint to trace the MMU's pages map/unmap operations. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: define trace eventsOhad Sharabi
This patch adds trace events for habanalabs driver to gain all the benefits such an infrastructure can supply. The following events were added: - MMU map/unmap: to be able to track driver's memory allocations - DMA alloc/free: to track our DMA allocation the above trace points in conjunction will help us map the device memory usage as well as to be able to track memory violations. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Acked-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi2: assigning PQFs for ARC f/w in PDMARajarama Manjukody Bhat
Assigning 3 PQFs in PDMA1 and 2 PQFs in PDMA0 for ARC firmware usage. Signed-off-by: Rajarama Manjukody Bhat <rmbhat@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: fix calculation of DRAM base address in PCIe BARTomer Tayar
The calculation of the device DRAM base address before setting the relevant PCIe BAR to point at it, has an assumption that this BAR is used to access only the DRAM, and thus the covered DRAM size is a power of 2. In future ASICs it is not necessarily true, so need to update the calculation to support also a non-power-of-2 size. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: if map page fails don't try to unmap itDafna Hirschfeld
The original code tried to unmap a page that was not mapped as part of the map page error path. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: select FW_LOADER in KconfigOded Gabbay
The driver is loading firmware to the device and we use the firmware loading functions from the FW_LOADER module. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: add cdev index data memberOmer Shpigelman
Instead of recalculating the cdev index, store it in a dedicated data member. This data member is intended to be passed to other drivers using the auxiliary bus infra and hence this new data member is necessary in case that the calculation is changed in the future. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: fix bug when setting va block sizeDafna Hirschfeld
the size of a block is always 'block->end - block->start + 1' Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: expose device security status using info ioctlOfir Bitton
In order for the user to know if he is running on a secured device or not, we add it also to the hw_ip info ioctl. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: expose device security status through sysfsOfir Bitton
In order for the user to know if he is running on a secured device or not, a sysfs node is added. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: remove secured PCI IDsOfir Bitton
Secured PCI ID will not be supported in new asics because the security status can always be read from the f/w. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: fix H/W block handling for partial unmappingsTomer Tayar
Several munmap() calls can be done or a mapped H/W block that has a larger size than a page size. Releasing the object should be done only when all mapped range is unmapped. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: unify hwmon resources clean upDani Liberman
Since hwmon fini code is common for all asics, unified it to common function. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi2: new API to control engine cores running modeTal Cohen
The current flow of halting the engine cores is implemented by command buffers built by the user space and sent towards the Driver. This current flow is broken since the user space does not know when the cores actually halt as sending a workload is async op. Therefore the application can not free the memory that is mapped to the engine cores. This new API allows the user space to control the running mode. The API call is sync (returns after the cores are set to the requested mode). Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: remove left-over code from bring-upOded Gabbay
There is some left-over code from the gaudi2 bring-up that wasn't removed so far. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi2: change device f/w security checkfarah kassabri
On Gaudi2 the f/w always configures the PCIe iATU and allows access to scratchpad registers. Therefore, we can know if the f/w is secured by reading a status bit from the f/w registers. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: move common function out of debugfs.cOded Gabbay
A common function that is called from multiple places can't be located in degugfs.c because that file is only compiled if debugfs is enabled in the kernel config file. This can lead to undefined symbol compilation error. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: add a missing lock for in_reset indicationTomer Tayar
Add a missing lock in hl_device_resume() when it assigns a value to the 'in_reset' indication. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: fix vma fields assignments order in hl_hw_block_mmap()Tomer Tayar
In hl_hw_block_mmap(), the vma's 'vm_private_data' and 'vm_ops' fields are assigned before filling the content of the private data. In between there is a call to the ASIC hw_block_mmap() function, and if it fails, the vma close function will be called with a bad private data value. Fix the order of assignments to avoid this issue. In hl_hw_block_mmap() the vma's 'vm_private_data and vm_ops are assigned before setting the Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: avoid returning a valid handle if map_block() failsTomer Tayar
map_block() sets the block id handle even if get_hw_block_id() fails, and in this case it uses block id 0 which might be a valid id. Modify it to set the handle only if get_hw_block_id() succeeds. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: fix command submission sanity checkTal Cohen
When a CS is submitted, the ioctl handler checks the CS flags and performs a sanity check, according to its value. As new CS flags are added, the sanity check needs to be updated according to the new flags. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi: read div_sel value from firmwareOhad Sharabi
Even when running with unsecured f/w, we should read the PLL div_sel value from the f/w as this register is always privileged. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi: fix print format for div_selOhad Sharabi
Print format was for int (%d) while variable is u32. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi2: mark PCIE access error as fatalTomer Tayar
F/W events are enabled in a late phase of the device init, so an event for a PCIE access error during the init, can be received after the init is already done and considered as successful. A resulting device reset, which does the same H/W init, can end similarly with this event right after the reset is done and considered as successful, and a loop of this sequence can continue. To avoid it mark the PCIE access error as a fatal event, so after 2 consecutive events no more resets will be done. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: add uapi to retrieve engines statusDani Liberman
Currently, to get engines status, user needed to read debugfs file with root permissions. This new uapi allows user apace apps retrieve status, so for example, in case of failure, status can be retrieved immediately by the application itself which runs without root permissions. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: remove all kdma locksOded Gabbay
We don't use KDMA concurrently in the driver. The only use is through debugfs and we don't protect concurrent access through it. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: wrap macro arg with parenthesesOhad Sharabi
The macro argument <val> is cast-ed to u32 in some of the places. Because this arg can be some arithmetic computation (e.g. address + offset) the cast should be on the whole expression. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: fix spelling mistakesBharat Jauhari
Cosmetic commit, no logical changes. It just fixes the spelling mistakes. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi2: remove old interrupt mappingsOfir Bitton
Interrupt enumration has changed some time ago but the old mapping was accidentally left in the driver. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi: increase default cs timeout to 10 minutesOded Gabbay
In order to improve scalability and reduce host overhead, it is better to increase the default TDR timeout of Gaudi1 from 30 seconds to 10 minutes. This will allow the DL Framework (e.g. PyTorch, TensorFlow) to remove the host sync they are using now and improve overall performance on scaleout training. Note that one can always set the timeout to a custom value via a kernel module parameter given during driver load. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: add return code field to module iteratorOhad Sharabi
Up until now the module iterator called void callback functions and so caller activating callback that may fail suffered from 2 issues: 1. The need to "plant" return called in the private data. This is a drawback since the iterator itself should not be aware of the private data of the caller. 2. Due to 1 even in a failure the iterator would keep iterating instead of break upon error. To overcome this an optional rc field added to the iterator context. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs/gaudi2: enable all MMU SPI/SEI interruptsTomer Tayar
Currently only part of the MMU SPI/SEI interrupts are enabled, although there is no real reason to not enable all. The only exception is "burst_fifo_full" which is expected for PMMU because it has a 2 entries FIFO, and thus is it not enabled for it. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: rename non_hard_reset to compute_resetOfir Bitton
In order to be more explicit we should use the term compute_reset for describing the reset in which only the compute engines gets reset. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: Fix spelling mistake "Scrubing" -> "Scrubbing"Colin Ian King
There is a spelling mistake in a dev_dbg message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: Simplify bool conversionYang Li
Fix the following coccicheck warning: ./drivers/misc/habanalabs/gaudi2/gaudi2.c:9727:48-53: WARNING: conversion to bool not needed here Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-18habanalabs: removed seq_file parameter from is_idle asic functionsDani Liberman
Change is_idle functions so it would be more usable outside debugfs. Do this by replacing seq_file parameter with regular string. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-16Merge tag 'v6.0-rc5' into i2c/for-mergewindowWolfram Sang
Linux 6.0-rc5
2022-09-12mei: debugfs: add pxp mode to devstate in debugfsTomas Winkler
Add pxp mode devstate to debugfs to monitor pxp state machine progress. This is useful to debug issues in scenarios in which the pxp state needs to be re-initialized, like during power transitions such as suspend/resume. With this debugfs the state could be monitored to ensure that pxp is in the ready state. CC: Vitaly Lubart <vitaly.lubart@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-15-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-09-12mei: drop ready bits check after startAlexander Usyskin
The check that hardware and host ready bits are set after start is redundant and may fail and disable driver if there is back-to-back link reset issued right after start. This happens during pxp mode transitions when firmware undergo reset. Remove these checks to eliminate such failures. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-14-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-09-12mei: gsc: add transition to PXP mode in resume flowVitaly Lubart
Added transition to PXP mode in resume flow. CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Vitaly Lubart <vitaly.lubart@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-13-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-09-12mei: gsc: setup gsc extended operational memoryTomas Winkler
1. Retrieve extended operational memory physical pointers from the auxiliary device info. 2. Setup memory registers. 3. Notify firmware that the memory is ready by sending the memory ready command. 4. Disable PXP device if GSC is not in PXP mode. CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-12-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-09-12mei: mkhi: add memory ready commandTomas Winkler
Add GSC memory ready command. The command indicates to the firmware that extend operation memory was setup and the firmware may enter PXP mode. CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-11-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>