summaryrefslogtreecommitdiff
path: root/drivers/crypto/intel/qat/qat_common/adf_aer.c
AgeCommit message (Collapse)Author
2024-11-10crypto: qat - Fix missing destroy_workqueue in adf_init_aer()Wang Hai
The adf_init_aer() won't destroy device_reset_wq when alloc_workqueue() for device_sriov_wq failed. Add destroy_workqueue for device_reset_wq to fix this issue. Fixes: 4469f9b23468 ("crypto: qat - re-enable sriov after pf reset") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-08-02crypto: qat - preserve ADF_GENERAL_SECAdam Guerin
The ADF_GENERAL_SEC configuration section contains values that must be preserved during state transitions (down -> up, up -> down). This patch modifies the logic in adf_dev_shutdown() to maintain all key values within this section, rather than selectively saving and restoring only the ADF_SERVICES_ENABLED attribute. To achieve this, a new function has been introduced that deletes all configuration sections except for the one specified by name. This function is invoked during adf_dev_down(), with ADF_GENERAL_SEC as the argument. Consequently, the adf_dev_shutdown_cache_cfg() function has been removed as it is now redundant. Additionally, this patch eliminates the cache_config parameter from the adf_dev_down() function since ADF_GENERAL_SEC should always be retained. This change does not cause any side effects because all entries in the key-value store are cleared when a module is unloaded. Signed-off-by: Adam Guerin <adam.guerin@intel.com> Co-developed-by: Michal Witwicki <michal.witwicki@intel.com> Signed-off-by: Michal Witwicki <michal.witwicki@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-17crypto: qat - Fix ADF_DEV_RESET_SYNC memory leakHerbert Xu
Using completion_done to determine whether the caller has gone away only works after a complete call. Furthermore it's still possible that the caller has not yet called wait_for_completion, resulting in another potential UAF. Fix this by making the caller use cancel_work_sync and then freeing the memory safely. Fixes: 7d42e097607c ("crypto: qat - resolve race condition during AER recovery") Cc: <stable@vger.kernel.org> #6.8+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-17crypto: qat - resolve race condition during AER recoveryDamian Muszynski
During the PCI AER system's error recovery process, the kernel driver may encounter a race condition with freeing the reset_data structure's memory. If the device restart will take more than 10 seconds the function scheduling that restart will exit due to a timeout, and the reset_data structure will be freed. However, this data structure is used for completion notification after the restart is completed, which leads to a UAF bug. This results in a KFENCE bug notice. BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat] Use-after-free read at 0x00000000bc56fddf (in kfence-#142): adf_device_reset_worker+0x38/0xa0 [intel_qat] process_one_work+0x173/0x340 To resolve this race condition, the memory associated to the container of the work_struct is freed on the worker if the timeout expired, otherwise on the function that schedules the worker. The timeout detection can be done by checking if the caller is still waiting for completion or not by using completion_done() function. Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Cc: <stable@vger.kernel.org> Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09crypto: qat - improve aer error reset handlingMun Chun Yep
Rework the AER reset and recovery flow to take into account root port integrated devices that gets reset between the error detected and the slot reset callbacks. In adf_error_detected() the devices is gracefully shut down. The worker threads are disabled, the error conditions are notified to listeners and through PFVF comms and finally the device is reset as part of adf_dev_down(). In adf_slot_reset(), the device is brought up again. If SRIOV VFs were enabled before reset, these are re-enabled and VFs are notified of restarting through PFVF comms. Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09crypto: qat - add auto reset on errorDamian Muszynski
Expose the `auto_reset` sysfs attribute to configure the driver to reset the device when a fatal error is detected. When auto reset is enabled, the driver resets the device when it detects either an heartbeat failure or a fatal error through an interrupt. This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09crypto: qat - re-enable sriov after pf resetMun Chun Yep
When a Physical Function (PF) is reset, SR-IOV gets disabled, making the associated Virtual Functions (VFs) unavailable. Even after reset and using pci_restore_state, VFs remain uncreated because the numvfs still at 0. Therefore, it's necessary to reconfigure SR-IOV to re-enable VFs. This commit introduces the ADF_SRIOV_ENABLED configuration flag to cache the SR-IOV enablement state. SR-IOV is only re-enabled if it was previously configured. This commit also introduces a dedicated workqueue without `WQ_MEM_RECLAIM` flag for enabling SR-IOV during Heartbeat and CPM error resets, preventing workqueue flushing warning. This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09crypto: qat - update PFVF protocol for recoveryMun Chun Yep
Update the PFVF logic to handle restart and recovery. This adds the following functions: * adf_pf2vf_notify_fatal_error(): allows the PF to notify VFs that the device detected a fatal error and requires a reset. This sends to VF the event `ADF_PF2VF_MSGTYPE_FATAL_ERROR`. * adf_pf2vf_wait_for_restarting_complete(): allows the PF to wait for `ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE` events from active VFs before proceeding with a reset. * adf_pf2vf_notify_restarted(): enables the PF to notify VFs with an `ADF_PF2VF_MSGTYPE_RESTARTED` event after recovery, indicating that the device is back to normal. This prompts VF drivers switch back to use the accelerator for workload processing. These changes improve the communication and synchronization between PF and VF drivers during system restart and recovery processes. Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09crypto: qat - disable arbitration before resetFurong Zhou
Disable arbitration to avoid new requests to be processed before resetting a device. This is needed so that new requests are not fetched when an error is detected. Signed-off-by: Furong Zhou <furong.zhou@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09crypto: qat - add fatal error notify methodFurong Zhou
Add error notify method to report a fatal error event to all the subsystems registered. In addition expose an API, adf_notify_fatal_error(), that allows to trigger a fatal error notification asynchronously in the context of a workqueue. This will be invoked when a fatal error is detected by the ISR or through Heartbeat. Signed-off-by: Furong Zhou <furong.zhou@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20crypto: qat - fix double free during resetSvyatoslav Pankratov
There is no need to free the reset_data structure if the recovery is unsuccessful and the reset is synchronous. The function adf_dev_aer_schedule_reset() handles the cleanup properly. Only asynchronous resets require such structure to be freed inside the reset worker. Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Signed-off-by: Svyatoslav Pankratov <svyatoslav.pankratov@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-06crypto: qat - Move driver to drivers/crypto/intel/qatTom Zanussi
With the growing number of Intel crypto drivers, it makes sense to group them all into a single drivers/crypto/intel/ directory. Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>