summaryrefslogtreecommitdiff
path: root/drivers/accel
AgeCommit message (Collapse)Author
2023-10-09accel/habanalabs/gaudi2: include block id in ECC error reportingOfir Bitton
During ECC event handling, Memory wrapper id was mistakenly printed as block id. Fix the print and in addition fetch the actual block-id from firmware. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: improve etf configurationBenjamin Dotan
coresight ETF blocks have different size. As a result, sync packets need to be aligned based on fifo size. Signed-off-by: Benjamin Dotan <bdotan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: refactor deprecated strncpyJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on its destination buffer argument which is _not_ the case for `strncpy`! There is likely no bug happening in this case since HL_STR_MAX is strictly larger than all source strings. Nonetheless, prefer a safer and more robust interface. It should also be noted that `strscpy` will not pad like `strncpy`. If this NUL-padding behavior is _required_ we should use `strscpy_pad` instead of `strscpy`. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi2: Fix incorrect string length computation in ↵Christophe JAILLET
gaudi2_psoc_razwi_get_engines() snprintf() returns the "number of characters which *would* be generated for the given input", not the size *really* generated. In order to avoid too large values for 'str_size' (and potential negative values for "PSOC_RAZWI_ENG_STR_SIZE - str_size") use scnprintf() instead of snprintf(). Fixes: c0e6df916050 ("accel/habanalabs: fix address decode RAZWI handling") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: refactor deprecated strncpy to strscpy_padJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. We see that `prop->cpucp_info.card_name` is supposed to be NUL-terminated based on its usage within `__hwmon_device_register()` (wherein it's called "name"): | if (name && (!strlen(name) || strpbrk(name, "-* \t\n"))) | dev_warn(dev, | "hwmon: '%s' is not a valid name attribute, please fix\n", | name); A suitable replacement is `strscpy_pad` [2] due to the fact that it guarantees both NUL-termination and NUL-padding on its destination buffer. NUL-padding on `prop->cpucp_info.card_name` is not strictly necessary as `hdev->prop` is explicitly zero-initialized but should be used regardless as it gets copied out to userspace directly -- as per Kees' suggestion. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: fix ETR/ETF flush logicBenjamin Dotan
When config_etr or config_etf are called we need to validate the parameters that are passed into them to make sure the requested operation is valid. Signed-off-by: Benjamin Dotan <bdotan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi2 : remove psoc_arc accessBenjamin Dotan
Because firmware is blocking PSOC_ARC_DBG, we need to disable access to this block. Signed-off-by: Benjamin Dotan <bdotan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi2: prepare to remove cpu_rst_statusIgor Grinberg
The soft reset has transitioned to CPUCP packet instead of plain register write and is about to be removed from the struct cpu_dyn_regs. As a preparation for removing the cpu_rst_status field from struct cpu_dyn_regs, switch to use the plain macro - this keeps the backward compatibility. Signed-off-by: Igor Grinberg <igrinberg@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: Move ioctls to the device specific ioctls rangeTomer Tayar
To use drm_ioctl(), move the ioctls to the device specific ioctls range at [DRM_COMMAND_BASE, DRM_COMMAND_END). Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: register compute device as an accel deviceTomer Tayar
Register the compute device as an accel device, and remove the creation of the habanalabs compute char device. The IOCTLs in this patch are still handled by the current driver handler. Moving to DRM IOCTL handling requires moving the IOCTLs numbers to a specific range, so it will be handled in subsequent patches. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: add info ioctl for engine error reportsOfir Bitton
User gets notification for every engine error report, but he still lacks the exact engine information. Hence, we allow user to query for the exact engine reported an error. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: set default device release watchdog T/O as 30 secTomer Tayar
After being notified about certain errors, user is expected to finish his post-errors actions and to release the device within some timeout, after which is deice is being reset. The default timeout value is 5 sec, which in some case is not enough for a user application to collect debug data. Increase the default value to 30 sec. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: handle f/w reserved dram space requestDani Liberman
It is possible for FW to request reserved space in dram. If the device supports this option, it will retrieve the size from the f/w and will reserve it. Currently we add the common code infrastructure to support it. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi2: fix missing check of kernel ctxOded Gabbay
If we are initializing the kernel context when we have a Gaudi2 device, we don't need to do any late initializing of that context with specific Gaudi2 code. Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi2: prepare to remove soft_rst_irqIgor Grinberg
The soft reset has transitioned to CPUCP packet instead of plain register write and is about to be removed from the struct cpu_dyn_regs. As a preparation for removing the gic_host_soft_rst_irq field from struct cpu_dyn_regs, switch to use the plain macro - this keeps the backward compatibility. Signed-off-by: Igor Grinberg <igrinberg@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi2: unsecure tpc count registersOfir Bitton
As TPC kernels now must use those registers we unsecure them. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi2: un-secure register for engine cores interruptTomer Tayar
The F/W dynamically allocates one of the PSOC scratchpad registers for the engine cores, so they can raise events towards the F/W. To allow the engine cores to access this register, this register must be non-secured. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs/gaudi: Add MODULE_FIRMWARE macrosJuerg Haefliger
The module loads firmware so add MODULE_FIRMWARE macros to provide that information via modinfo. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel: make accel_class a static const structureIvan Orlov
Now that the driver core allows for struct class to be in read-only memory, move the accel_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: dri-devel@lists.freedesktop.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: dump temperature threshold boot errorOfir Bitton
Add dump of an error reported from f/w during boot time. This error indicates a failure with setting temperature threshold. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: reset device if scrubbing failedOded Gabbay
If scrubbing memory after user released device has failed it means the device is in a bad state and should be reset. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-10-09accel/habanalabs: remove pdev check on idle checkOded Gabbay
Our simulator supports idle check so no need anymore to check if pdev exists. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
2023-10-09accel/habanalabs: fix wait_for_interrupt abortion flowfarah kassabri
When the driver needs to abort waiters for interrupts, for cases such as critical events that occur and driver need to do hard reset, in such scenario the driver will complete the fence to wake up the waiting thread, and will set the fence error indication. The return value of the completion API will be greater than 0 since it will return the timeout, but as this indicates successful completion, the driver should mark it as aborted. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: Allow single timestamp registration request at a timefarah kassabri
Protect against concurrency of user requesting to register a timestamp offset (where the driver fills the timestamp when the command submission has finished executing) to a specific user interrupt ID. The protection is basically to allow only one timestamp registration request to be handled at a time. This is needed because the user can decide to re-use a timestamp offset (register an already registered offset, to a different interrupt ID). This means the request will cause the timestamp node to move from one interrupt list to another interrupt list. In such scenario, without proper protection, we could end up adding the same node twice to the interrupts wait lists. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: rename fd_list to hpriv_listKoby Elbaz
Every time an FD is returned to the user, the driver adds a corresponding private structure to the list. Yet, it's still a list of private structures rather than of FDs. Remove, as well, an unnecessary comment. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: call put_pid after hpriv list is updatedKoby Elbaz
Because we might still be using related resources, decrementing PID's reference count should be done at later stages of the device release. A good place is right after the representing private structure is removed from LKD's list. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: print return code when process termination failsKoby Elbaz
As part of driver teardown, we attempt to kill all user processes. It shouldn't fail, but if it does we want to print the error code that the kapi returned to us. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: fix standalone preboot descriptor requestfarah kassabri
The preboot used to statically allocate memory for the comms descriptor on the device memory when driver requested the descriptor information. Now preboot moved to dynamic memory allocation where it wants to check the size the driver expects vs. what the f/w expects. Note there are no backward compatibility issues as older f/w versions simply ignore this value. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: handle arc farm razwiDani Liberman
Implement razwi handling for arc farm and add it to arc farm sei event handler. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: stop fetching MME SBTE error causeOfir Bitton
Because in this case we have only a single possible cause, we can safely stop fetching the cause from firmware. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: set device status 'malfunction' while in rmmodKoby Elbaz
hl_device_status() returns the status of an acquired device. If a device is going down (following an rmmod cmd), it should be marked as an unusable/malfunctioning device, and hence should not be acquired. However, since this was not the case so far (i.e., a device going down would inaccurately return 'in reset' status allowing the user to acquire the device) it introduced a bug where as part of a reset flow, the driver could not kill processes that have not run yet, and since those processes aren't blocked from reacquiring a device, we get eventually a new flow of a driver attempting to kill all processes in a list that can't be ever really empty. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: print task name upon creation of a user contextTomer Tayar
It is useful for debug to know which user process have acquired the device. Add this info to the relevant debug print, in addition to the already printed user context's ASID. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: print task name and request code upon ioctl failureTomer Tayar
When an ioctl fails, it is useful to know what is the task command name and the full ioctl request code, in addition to the task pid and the ioctl number. Add the additional information to the relevant debug error prints. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: notify user about undefined opcode eventOfir Bitton
In order for user to be aware of undefined opcode events, we must store all relevant information and notify user about the failure. The user will fetch the stored info via info ioctl. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: update pending reset flags with new reset requestsTomer Tayar
If hl_device_cond_reset() is called while a reset is already pending but hasn't started, the reset request will be dropped. If the flags of the new request are more severe, e.g. a hard reset while the pending reset is a compute reset, the eventual reset won't be suitable for the device status. To prevent such cases, update the pending reset flags with the new requests flags before the requests are dropped. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: prevent immediate hard reset due to 2 adjacent H/W eventsTomer Tayar
When a H/W event is received while a user is registered to events, no immediate hard reset will happen, and instead the user will be notified and will have some time to handle it and eventually release the device, after which the reset will be done. If a user, as part of the handling and as part of the cleanup steps towards releasing the device, unregisters from receiving those events, and at that time an adjacent H/W event is received, it will be assumed that the user is not registered to events and thus an immediate hard reset is required. To prevent such an unwanted immediate reset, modify the driver to perform it if the user is not registered to events AND we don't already have a pending reset for a previous H/W event. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-04kthread: add kthread_stop_putAndreas Gruenbacher
Add a kthread_stop_put() helper that stops a thread and puts its task struct. Use it to replace the various instances of kthread_stop() followed by put_task_struct(). Remove the kthread_stop_put() macro in usbip that is similar but doesn't return the result of kthread_stop(). [agruenba@redhat.com: fix kerneldoc comment] Link: https://lkml.kernel.org/r/20230911111730.2565537-1-agruenba@redhat.com [akpm@linux-foundation.org: document kthread_stop_put()'s argument] Link: https://lkml.kernel.org/r/20230907234048.2499820-1-agruenba@redhat.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-28accel/ivpu: Annotate struct ivpu_job with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ivpu_job. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: llvm@lists.linux.dev Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://lore.kernel.org/r/20230922175416.work.272-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-09-29Merge tag 'drm-misc-next-2023-09-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.7-rc1: UAPI Changes: - drm_file owner is now updated during use, in the case of a drm fd opened by the display server for a client, the correct owner is displayed. - Qaic gains support for the QAIC_DETACH_SLICE_BO ioctl to allow bo recycling. Cross-subsystem Changes: - Disable boot logo for au1200fb, mmpfb and unexport logo helpers. Only fbcon should manage display of logo. - Update freescale in MAINTAINERS. - Add some bridge files to bridge in MAINTAINERS. - Update gma500 driver repo in MAINTAINERS to point to drm-misc. Core Changes: - Move size computations to drm buddy allocator. - Make drm_atomic_helper_shutdown(NULL) a nop. - Assorted small fixes in drm_debugfs, DP-MST payload addition error handling. - Fix DRM_BRIDGE_ATTACH_NO_CONNECTOR handling. - Handle bad (h/v)sync_end in EDID by clipping to htotal. - Build GPUVM as a module. Driver Changes: - Simple drivers don't need to cache prepared result. - Call drm_atomic_helper_shutdown() in shutdown/unbind for a whole lot more drm drivers. - Assorted small fixes in amdgpu, ssd130x, bridge/it6621, accel/qaic, nouveau, tc358768. - Add NV12 for komeda writeback. - Add arbitration lost event to synopsis/dw-hdmi-cec. - Speed up s/r in nouveau by not restoring some big bo's. - Assorted nouveau display rework in preparation for GSP-RM, especially related to how the modeset sequence works and the DP sequence in relation to link training. - Update anx7816 panel. - Support NVSYNC and NHSYNC in tegra. - Allow multiple power domains in simple driver. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f1fae5eb-25b8-192a-9a53-215e1184ce81@linux.intel.com
2023-09-27accel/ivpu: Compile ivpu_debugfs.c conditionallyStanislaw Gruszka
Only compile ivpu_debugfs.c file with CONFIG_DEBUG_FS. Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230907072610.433497-2-stanislaw.gruszka@linux.intel.com
2023-09-27accel/ivpu: Update debugfs to latest changes in DRMStanislaw Gruszka
Use new drm debugfs helpers. This is needed after changes from commit 78346ebf9f94 ("drm/debugfs: drop debugfs_init() for the render and accel node v2"). Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230907072610.433497-1-stanislaw.gruszka@linux.intel.com
2023-09-27accel/ivpu: Use cached buffers for FW loadingKarol Wachowski
Create buffers with cache coherency on the CPU side (write-back) while disabling snooping on the VPU side. These buffers require an explicit cache flush after each CPU-side modification. Configuring pages as write-combined may introduce significant delays, potentially taking hundreds of milliseconds for 64 MB buffers. Added internal DRM_IVPU_BO_NOSNOOP mask which disables snooping on the VPU side. Allocate FW runtime memory buffer (64 MB) as cached with snooping-disabled. This fixes random long FW loading times and boot params memory corruption on warmboot (due to missed wmb). Fixes: 02d5b0aacd05 ("accel/ivpu: Implement firmware parsing and booting") Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230926120943.GD846747@linux.intel.com
2023-09-27accel/ivpu/40xx: Fix missing VPUIP interruptsKarol Wachowski
Move sequence of masking and unmasking global interrupts from buttress interrupt handler to generic one that handles both VPUIP and BTRS interrupts. Unmasking global interrupts will re-trigger MSI for any pending interrupts. Lack of this sequence can randomly cause to miss any VPUIP interrupt that comes after reading VPU_40XX_HOST_SS_ICB_STATUS_0 and before clearing all active interrupt sources. Fixes: 79cdc56c4a54 ("accel/ivpu: Add initial support for VPU 4") Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925121137.872158-6-stanislaw.gruszka@linux.intel.com
2023-09-27accel/ivpu/40xx: Disable frequency change interruptKarol Wachowski
Do not enable frequency change interrupt on 40xx as it might lead to an interrupt storm in current design. FREQ_CHANGE interrupt is triggered on D0I2 entry which will cause KMD to check VPU interrupt sources by reading VPUIP registers. Access to those registers will toggle necessary clocks and trigger another FREQ_CHANGE interrupt possibly ending in an infinite loop. FREQ_CHANGE interrupt has only debug purposes and can be permanently disabled. Fixes: 79cdc56c4a54 ("accel/ivpu: Add initial support for VPU 4") Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925121137.872158-5-stanislaw.gruszka@linux.intel.com
2023-09-27accel/ivpu/40xx: Ensure clock resource ownership Ack before Power-UpKarol Wachowski
We need to wait for the CLOCK_RESOURCE_OWN_ACK bit to be set after configuring the workpoint. This step ensures that the VPU microcontroller clock is actively toggling and ready for operation. Previously, we relied solely on the READY bit in the VPU_STATUS register, which indicated the completion of the workpoint download. However, this approach was insufficient, as the READY bit could be set while the device was still running on a sideband clock until the PLL locked. To guarantee that the PLL is locked and the device is running on the main clock source, we now wait for the CLOCK_RESOURCE_OWN_ACK before proceeding with the remainder of the power-up sequence. Fixes: 79cdc56c4a54 ("accel/ivpu: Add initial support for VPU 4") Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925121137.872158-4-stanislaw.gruszka@linux.intel.com
2023-09-27accel/ivpu: Don't flood dmesg with VPU ready messageJacek Lawrynowicz
Use ivpu_dbg() to print the VPU ready message so it doesn't pollute the dmesg. Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925121137.872158-3-stanislaw.gruszka@linux.intel.com
2023-09-27accel/ivpu: Do not use wait event interruptibleStanislaw Gruszka
If we receive signal when waiting for IPC message response in ivpu_ipc_receive() we return error and continue to operate. Then the driver can send another IPC messages and re-use occupied slot of the message still processed by the firmware. This can result in corrupting firmware memory and following FW crash with messages: [ 3698.569719] intel_vpu 0000:00:0b.0: [drm] ivpu_ipc_send_receive_internal(): IPC receive failed: type 0x1103, ret -512 [ 3698.569747] intel_vpu 0000:00:0b.0: [drm] ivpu_jsm_unregister_db(): Failed to unregister doorbell 3: -512 [ 3698.569756] intel_vpu 0000:00:0b.0: [drm] ivpu_ipc_tx_prepare(): IPC message vpu:0x88980000 not released by firmware [ 3698.569763] intel_vpu 0000:00:0b.0: [drm] ivpu_ipc_tx_prepare(): JSM message vpu:0x88980040 not released by firmware [ 3698.570234] intel_vpu 0000:00:0b.0: [drm] ivpu_ipc_send_receive_internal(): IPC receive failed: type 0x110e, ret -512 [ 3698.570318] intel_vpu 0000:00:0b.0: [drm] *ERROR* ivpu_mmu_dump_event(): MMU EVTQ: 0x10 (Translation fault) SSID: 0 SID: 3, e[2] 00000000, e[3] 00000208, in addr: 0x88988000, fetch addr: 0x0 To fix the issue don't use interruptible variant of wait event to allow firmware to finish IPC processing. Fixes: 5d7422cfb498 ("accel/ivpu: Add IPC driver and JSM messages") Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925121137.872158-2-stanislaw.gruszka@linux.intel.com
2023-09-25accel/ivpu: Add Arrow Lake pci idStanislaw Gruszka
Enable VPU on Arrow Lake CPUs. Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230922132206.812817-1-stanislaw.gruszka@linux.intel.com
2023-09-22accel/qaic: Add QAIC_DETACH_SLICE_BO IOCTLPranjal Ramajor Asha Kanojiya
Once a BO is attached with slicing configuration that BO can only be used for that particular setting. With this new feature user can detach slicing configuration off an already sliced BO and attach new slicing configuration using QAIC_ATTACH_SLICE_BO. This will support BO recycling. detach_slice_bo() detaches slicing configuration from a BO. This new helper function can also be used in release_dbc() as we are doing the exact same thing. Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> [jhugo: add documentation for new ioctl] Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230901172247.11410-8-quic_jhugo@quicinc.com
2023-09-22accel/qaic: Create a function to initialize BOPranjal Ramajor Asha Kanojiya
This makes sure that we have a single place to initialize and re-initialize BO. Use this new API to cleanup release_dbc() We will need this for next patch to detach slicing to a BO. Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230901172247.11410-7-quic_jhugo@quicinc.com