summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-10Merge branch '6.10/scsi-fixes' into 6.11/scsi-stagingMartin K. Petersen
Pull in my fixes branch to resolve an mpi3mr merge conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04Merge patch series "mpi3mr: Support PCI Error Recovery"Martin K. Petersen
Sumit Saxena <sumit.saxena@broadcom.com> says: This patch series contains the changes done in the driver to support PCI error recovery. It is rework of older patch series from Ranjan Kumar, see [1]. [1] https://lore.kernel.org/all/20231214205900.270488-1-ranjan.kumar@broadcom.com/ Link: https://lore.kernel.org/r/20240627101735.18286-1-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Driver version updateSumit Saxena
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20240627101735.18286-4-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Prevent PCI writes from driver during PCI error recoverySumit Saxena
Prevent interaction with the hardware while the error recovery in progress. Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Co-developed-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20240627101735.18286-3-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Support PCI Error Recovery callback handlersSumit Saxena
PCI Error recovery support is required to recover the controller upon detection of PCI errors. Add support for the PCI error recovery callback handlers in mpi3mr driver. Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Co-developed-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20240627101735.18286-2-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04Merge patch series "Update lpfc to revision 14.4.0.3"Martin K. Petersen
Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.4.0.3 This patch set contains bug fixes related to discovery, submission of mailbox commands, and proper endianness conversions. The patches were cut against Martin's 6.11/scsi-queue tree. Link: https://lore.kernel.org/r/20240628172011.25921-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Update lpfc version to 14.4.0.3Justin Tee
Update lpfc version to 14.4.0.3. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Revise lpfc_prep_embed_io routine with proper endian macro usagesJustin Tee
On big endian architectures, it is possible to run into a memory out of bounds pointer dereference when FCP targets are zoned. In lpfc_prep_embed_io, the memcpy(ptr, fcp_cmnd, sgl->sge_len) is referencing a little endian formatted sgl->sge_len value. So, the memcpy can cause big endian systems to crash. Redefine the *sgl ptr as a struct sli4_sge_le to make it clear that we are referring to a little endian formatted data structure. And, update the routine with proper le32_to_cpu macro usages. Fixes: af20bb73ac25 ("scsi: lpfc: Add support for 32 byte CDBs") Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Fix incorrect request len mbox field when setting trunking via sysfsJustin Tee
When setting trunk modes through sysfs, the SLI_CONFIG mailbox command's command payload length is incorrectly hardcoded to 12 bytes. SLI_CONFIG's payload length field should be specified large enough to encompass both the submailbox command header and the submailbox request itself. Thus, replace the hardcoded 12 bytes with a clearer calculation by way of sizeof(struct lpfc_mbx_set_trunk_mode) - sizeof(struct lpfc_sli4_cfg_mhdr). Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Handle mailbox timeouts in lpfc_get_sfp_infoJustin Tee
The MBX_TIMEOUT return code is not handled in lpfc_get_sfp_info and the routine unconditionally frees submitted mailbox commands regardless of return status. The issue is that for MBX_TIMEOUT cases, when firmware returns SFP information at a later time, that same mailbox memory region references previously freed memory in its cmpl routine. Fix by adding checks for the MBX_TIMEOUT return code. During mailbox resource cleanup, check the mbox flag to make sure that the wait did not timeout. If the MBOX_WAKE flag is not set, then do not free the resources because it will be freed when firmware completes the mailbox at a later time in its cmpl routine. Also, increase the timeout from 30 to 60 seconds to accommodate boot scripts requiring longer timeouts. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Fix handling of fully recovered fabric node in dev_loss callbkJustin Tee
In rare cases when a fabric node is recovered after a link bounce and before dev_loss_tmo callbk is reached, the driver may leave the fabric node in an inconsistent state with the NLP_IN_DEV_LOSS flag perpetually set. In lpfc_dev_loss_tmo_callbk, a check is added for a recovered fabric node. If the node is recovered, then don't queue the lpfc_dev_loss_tmo_handler work. In lpfc_dev_loss_tmo_handler, the path taken for the recovered fabric nodes is updated to clear the NLP_IN_DEV_LOSS flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Relax PRLI issue conditions after GID_FT responseJustin Tee
If previously in REG_LOGIN_ISSUE state, then remove the requirement that PLOGI must have been received from the remote port before issuing a PRLI. After GID_FT completes, it does not matter whether the driver itself sent a PLOGI or received one. The fact that we're in REG_LOGIN_ISSUE state simply means that the next state should be issuing the PRLI to continue discovery of the remote port. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Allow DEVICE_RECOVERY mode after RSCN receipt if in PRLI_ISSUE stateJustin Tee
Certain vendor specific targets initially register with the fabric as an initiator function first and then re-register as a target function afterwards. The timing of the target function re-registration can cause a race condition such that the driver is stuck assuming the remote port as an initiator function and never discovers the target's hosted LUNs. Expand the nlp_state qualifier to also include NLP_STE_PRLI_ISSUE because the state means that PRLI was issued but we have not quite reached MAPPED_NODE state yet. If we received an RSCN in the PRLI_ISSUE state, then we should restart discovery again by going into DEVICE_RECOVERY. Fixes: dded1dc31aa4 ("scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN") Cc: <stable@vger.kernel.org> # v6.6+ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Cancel ELS WQE instead of issuing abort when SLI port is inactiveJustin Tee
During SLI port errata events, there should be no expectation that submitted outstanding WQEs will return back CQEs. In these situations, the driver should not rely on receiving CQEs from the SLI port to signal WQE resource clean up. Put an sli_flag LPFC_SLI_ACTIVE check in lpfc_els_flush_cmd() when walking the txcmplq. The sli_flag check helps determine whether to issue an abort or driver based cancel on outstanding WQEs. If !LPFC_SLI_ACTIVE, then there's no point to issue anything to the SLI port. Instead, let the driver based cancel logic clean up the submitted WQE resources. Also, enhance some abort log messages that help with future debugging. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-2-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: sd: Do not repeat the starting disk messageDamien Le Moal
The SCSI disk message "Starting disk" to signal resuming of a suspended disk is printed in both sd_resume() and sd_resume_common() which results in this message being printed twice when resuming from e.g. autosuspend: $ echo 5000 > /sys/block/sda/device/power/autosuspend_delay_ms $ echo auto > /sys/block/sda/device/power/control [ 4962.438293] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 4962.501121] sd 0:0:0:0: [sda] Stopping disk $ echo on > /sys/block/sda/device/power/control [ 4972.805851] sd 0:0:0:0: [sda] Starting disk [ 4980.558806] sd 0:0:0:0: [sda] Starting disk Fix this double print by removing the call to sd_printk() from sd_resume() and moving the call to sd_printk() in sd_resume_common() earlier in the function, before the check using sd_do_start_stop(). Doing so, the message is printed once regardless if sd_resume_common() actually executes sd_start_stop_device() (i.e. SCSI device case) or not (libsas and libata managed ATA devices case). Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240701215326.128067-1-dlemoal@kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: ufs: core: Fix ufshcd_abort_one racing issuePeter Wang
When ufshcd_abort_one is racing with the completion ISR, the completed tag of the request's mq_hctx pointer will be set to NULL by ISR. Return success when request is completed by ISR because ufshcd_abort_one does not need to do anything. The racing flow is: Thread A ufshcd_err_handler step 1 ... ufshcd_abort_one ufshcd_try_to_abort_task ufshcd_cmd_inflight(true) step 3 ufshcd_mcq_req_to_hwq blk_mq_unique_tag rq->mq_hctx->queue_num step 5 Thread B ufs_mtk_mcq_intr(cq complete ISR) step 2 scsi_done ... __blk_mq_free_request rq->mq_hctx = NULL; step 4 Below is KE back trace. ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device. ufshcd_try_to_abort_task: cmd at tag=41 is cleared. Aborting tag 41 / CDB 0x28 succeeded Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194 pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14 lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise] do_mem_abort+0x58/0x118 el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c blk_mq_unique_tag+0x8/0x14 ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise] process_one_work+0x208/0x4fc worker_thread+0x228/0x438 kthread+0x104/0x1d4 ret_from_fork+0x10/0x20 Fixes: 93e6c0e19d5b ("scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode") Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Wang <peter.wang@mediatek.com> Link: https://lore.kernel.org/r/20240628070030.30929-3-peter.wang@mediatek.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: ufs: core: Fix ufshcd_clear_cmd racing issuePeter Wang
When ufshcd_clear_cmd is racing with the completion ISR, the completed tag of the request's mq_hctx pointer will be set to NULL by the ISR. And ufshcd_clear_cmd's call to ufshcd_mcq_req_to_hwq will get NULL pointer KE. Return success when the request is completed by ISR because sq does not need cleanup. The racing flow is: Thread A ufshcd_err_handler step 1 ufshcd_try_to_abort_task ufshcd_cmd_inflight(true) step 3 ufshcd_clear_cmd ... ufshcd_mcq_req_to_hwq blk_mq_unique_tag rq->mq_hctx->queue_num step 5 Thread B ufs_mtk_mcq_intr(cq complete ISR) step 2 scsi_done ... __blk_mq_free_request rq->mq_hctx = NULL; step 4 Below is KE back trace: ufshcd_try_to_abort_task: cmd pending in the device. tag = 6 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194 pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14 lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise] Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise] Call trace: dump_backtrace+0xf8/0x148 show_stack+0x18/0x24 dump_stack_lvl+0x60/0x7c dump_stack+0x18/0x3c mrdump_common_die+0x24c/0x398 [mrdump] ipanic_die+0x20/0x34 [mrdump] notify_die+0x80/0xd8 die+0x94/0x2b8 __do_kernel_fault+0x264/0x298 do_page_fault+0xa4/0x4b8 do_translation_fault+0x38/0x54 do_mem_abort+0x58/0x118 el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c blk_mq_unique_tag+0x8/0x14 ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise] ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise] ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise] process_one_work+0x208/0x4fc worker_thread+0x228/0x438 kthread+0x104/0x1d4 ret_from_fork+0x10/0x20 Fixes: 8d7290348992 ("scsi: ufs: mcq: Add supporting functions for MCQ abort") Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Wang <peter.wang@mediatek.com> Link: https://lore.kernel.org/r/20240628070030.30929-2-peter.wang@mediatek.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: pm8001: Update log level when reading config tableTerrence Adams
Reading the main config table occurs as a part of initialization in pm80xx_chip_init(). Because of this it makes more sense to have it be a part of the INIT logging. Signed-off-by: Terrence Adams <tadamsjr@google.com> Link: https://lore.kernel.org/r/20240627155924.2361370-3-tadamsjr@google.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: pm80xx: Set phy->enable_completion only when we wait for itIgor Pylypiv
pm8001_phy_control() populates the enable_completion pointer with a stack address, sends a PHY_LINK_RESET / PHY_HARD_RESET, waits 300 ms, and returns. The problem arises when a phy control response comes late. After 300 ms the pm8001_phy_control() function returns and the passed enable_completion stack address is no longer valid. Late phy control response invokes complete() on a dangling enable_completion pointer which leads to a kernel crash. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Terrence Adams <tadamsjr@google.com> Link: https://lore.kernel.org/r/20240627155924.2361370-2-tadamsjr@google.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: ufs: core: Remove SCSI host only if addedKyoungrul Kim
If host tries to remove ufshcd driver from a UFS device it would cause a kernel panic if ufshcd_async_scan fails during ufshcd_probe_hba before adding a SCSI host with scsi_add_host and MCQ is enabled since SCSI host has been defered after MCQ configuration introduced by commit 0cab4023ec7b ("scsi: ufs: core: Defer adding host to SCSI if MCQ is supported"). To guarantee that SCSI host is removed only if it has been added, set the scsi_host_added flag to true after adding a SCSI host and check whether it is set or not before removing it. Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com> Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Link: https://lore.kernel.org/r/20240627085104epcms2p5897a3870ea5c6416aa44f94df6c543d7@epcms2p5 Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: ufs: qcom: Enable suspending clk scaling on no requestRam Prakash Gupta
Enable suspending clk scaling on no request for Qualcomm SoC. Signed-off-by: Ram Prakash Gupta <quic_rampraka@quicinc.com> Link: https://lore.kernel.org/r/20240627083756.25340-3-quic_rampraka@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: ufs: core: Suspend clk scaling on no requestRam Prakash Gupta
Currently UFS clk scaling is getting suspended only when the clks are scaled down. When high load is generated, a huge amount of latency is added due to scaling up the clk and completing the request post that. Suspending the scaling in its existing state when high load is generated improves the random performance KPI by 28%. So suspending the scaling when there are no requests. And the clk would be put in low scaled state when the actual request load is low. Make this change optional by having the check enabled using vops since for some devices suspending without bringing the clk in low scaled state might have impact on power consumption of the SoC. Signed-off-by: Ram Prakash Gupta <quic_rampraka@quicinc.com> Link: https://lore.kernel.org/r/20240627083756.25340-2-quic_rampraka@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Correct a test in mpi3mr_sas_port_add()Tomas Henzl
The test for a possible shift overflow is not correct. Fix it by replacing the '>' with a '>='. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://lore.kernel.org/r/20240627074827.13672-1-thenzl@redhat.com Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26Merge patch series "mpi3mr: Host diag buffer support"Martin K. Petersen
Ranjan Kumar <ranjan.kumar@broadcom.com> says: The controllers managed by mpi3mr driver requires system memory to save hardware and firmware diagnostic information, this patch set enhances the drivers to provide host memory to the controller for diagnostic information. This patch set also provides driver changes to push kernel messages into the diagnostic buffers reserved for the driver, so that the information will be available as part of debug data fetched from the controller. In addition, support for configuring automatic diagnostic information is added in the driver. Link: https://lore.kernel.org/r/20240626102646.14298-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: Update driver version to 8.9.1.0.50Ranjan Kumar
Update driver version to 8.9.1.0.50 Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-5-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: Add ioctl support for HDBRanjan Kumar
Add interface for applications to manage the host diagnostic buffers and update the automatic diag buffer capture triggers. Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-4-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: Trigger supportRanjan Kumar
Add functions to process automatic diag triggers. If a condition defined in the triggers is met, the driver will call appropriate controller functions to save the diagnostic information. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405151955.BiAWI1SY-lkp@intel.com/ Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-3-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffersRanjan Kumar
To be able to debug controller problems it is beneficial to allocate and configure system/host memory buffers which can be used to capture hardware and firmware diagnostic information. Add functions required to allocate and post firmware and hardware diagnostic buffers to the controller and to set up automatic diagnostic capture triggers. Captures will be triggered under the following circumstances: 1. Firmware is in FAULT state. 2. Admin commands time out. 3. Controller reset caused due to I/O timeout Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405151758.7xrJz6rp-lkp@intel.com/ Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: ufs: ufs-pci: Add support for Intel Panther LakeAdrian Hunter
Add PCI ID to support Intel Panther Lake, same as MTL. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240618073158.38504-1-adrian.hunter@intel.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: ufs: qcom: Add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=arm64, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/ufs/host/ufs-qcom.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240625-md-drivers-ufs-host-v2-1-59a56974b05a@quicinc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: lpfc: Fix a possible null pointer dereferenceHuai-Yuan Liu
In function lpfc_xcvr_data_show, the memory allocation with kmalloc might fail, thereby making rdp_context a null pointer. In the following context and functions that use this pointer, there are dereferencing operations, leading to null pointer dereference. To fix this issue, a null pointer check should be added. If it is null, use scnprintf to notify the user and return len. Fixes: 479b0917e447 ("scsi: lpfc: Create a sysfs entry called lpfc_xcvr_data for transceiver info") Signed-off-by: Huai-Yuan Liu <qq810974084@gmail.com> Link: https://lore.kernel.org/r/20240621082545.449170-1-qq810974084@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: libsas: Fix exp-attached device scan after probe failure scanned in ↵Xingui Yang
again after probe failed The expander phy will be treated as broadcast flutter in the next revalidation after the exp-attached end device probe failed, as follows: [78779.654026] sas: broadcast received: 0 [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 500e004aaaaaaa05 (stp) [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 ... [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [78835.171344] sas: sas_probe_sata: for exp-attached device 500e004aaaaaaa05 returned -19 [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone [78835.187487] sas: broadcast received: 0 [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 500e004aaaaaaa05 (stp) [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 The cause of the problem is that the related ex_phy's attached_sas_addr was not cleared after the end device probe failed, so reset it. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20240619091742.25465-1-yangxingui@huawei.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-25scsi: scsi_debug: Fix create target debugfs failureMing Lei
Target debugfs entry is removed via async_schedule() which isn't drained when adding same name target, so failure of "Directory 'target11:0:0' with parent 'scsi_debug' already present!" can be triggered easily. Fix it by switching to domain async schedule, and draining it before adding new target debugfs entry. Cc: Wenchao Hao <haowenchao2@huawei.com> Fixes: f084fe52c640 ("scsi: scsi_debug: Add debugfs interface to fail target reset") Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Wenchao Hao <haowenchao22@gmail.com> Link: https://lore.kernel.org/r/20240619013803.3008857-1-ming.lei@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-13scsi: usb: uas: Do not query the IO Advice Hints Grouping mode page for ↵Bart Van Assche
USB/UAS devices Recently it was reported that the following USB storage devices are unusable with Linux kernel 6.9: * Kingston DataTraveler G2 * Garmin FR35 This is because attempting to read the IO Advice Hints Grouping mode page causes these devices to reset. Hence do not read the IO Advice Hints Grouping mode page from USB/UAS storage devices. Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable@vger.kernel.org Fixes: 4f53138fffc2 ("scsi: sd: Translate data lifetime information") Reported-by: Joao Machado <jocrismachado@gmail.com> Closes: https://lore.kernel.org/linux-scsi/20240130214911.1863909-1-bvanassche@acm.org/T/#mf4e3410d8f210454d7e4c3d1fb5c0f41e651b85f Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Bisected-by: Christian Heusel <christian@heusel.eu> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Closes: https://lore.kernel.org/linux-scsi/CACLx9VdpUanftfPo2jVAqXdcWe8Y43MsDeZmMPooTzVaVJAh2w@mail.gmail.com/ Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240613211828.2077477-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-13scsi: core: Introduce the BLIST_SKIP_IO_HINTS flagBart Van Assche
Prepare for skipping the IO Advice Hints Grouping mode page for USB storage devices. Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Joao Machado <jocrismachado@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Christian Heusel <christian@heusel.eu> Cc: stable@vger.kernel.org Fixes: 4f53138fffc2 ("scsi: sd: Translate data lifetime information") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240613211828.2077477-2-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-13scsi: ufs: core: Free memory allocated for model before reinitJoel Slebodnick
Under the conditions that a device is to be reinitialized within ufshcd_probe_hba(), the device must first be fully reset. Resetting the device should include freeing U8 model (member of dev_info) but does not, and this causes a memory leak. ufs_put_device_desc() is responsible for freeing model. unreferenced object 0xffff3f63008bee60 (size 32): comm "kworker/u33:1", pid 60, jiffies 4294892642 hex dump (first 32 bytes): 54 48 47 4a 46 47 54 30 54 32 35 42 41 5a 5a 41 THGJFGT0T25BAZZA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc ed7ff1a9): [<ffffb86705f1243c>] kmemleak_alloc+0x34/0x40 [<ffffb8670511cee4>] __kmalloc_noprof+0x1e4/0x2fc [<ffffb86705c247fc>] ufshcd_read_string_desc+0x94/0x190 [<ffffb86705c26854>] ufshcd_device_init+0x480/0xdf8 [<ffffb86705c27b68>] ufshcd_probe_hba+0x3c/0x404 [<ffffb86705c29264>] ufshcd_async_scan+0x40/0x370 [<ffffb86704f43e9c>] async_run_entry_fn+0x34/0xe0 [<ffffb86704f34638>] process_one_work+0x154/0x298 [<ffffb86704f34a74>] worker_thread+0x2f8/0x408 [<ffffb86704f3cfa4>] kthread+0x114/0x118 [<ffffb86704e955a0>] ret_from_fork+0x10/0x20 Fixes: 96a7141da332 ("scsi: ufs: core: Add support for reinitializing the UFS device") Cc: <stable@vger.kernel.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Joel Slebodnick <jslebodn@redhat.com> Link: https://lore.kernel.org/r/20240613200202.2524194-1-jslebodn@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-13scsi: core: Fix an incorrect commentBart Van Assche
The comment that scsi_static_device_list would go away was added more than 18 years ago. Today, that list is still there and a large number of additional entries have been added. This shows that this comment is incorrect. Hence fix that comment. Cc: Christoph Hellwig <hch@infradead.org> Cc: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240612171522.2677600-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11scsi: mpi3mr: Fix ATA NCQ priority supportDamien Le Moal
The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to the HBA if a read or write command directed at an ATA device should be translated to an NCQ read/write command with the high prioiryt bit set when the request uses the RT priority class and the user has enabled NCQ priority through sysfs. However, unlike the mpt3sas driver, the mpi3mr driver does not define the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never actually set and NCQ Priority cannot ever be used. Fix this by defining these missing atributes to allow a user to check if an ATA device supports NCQ priority and to enable/disable the use of NCQ priority. To do this, lift the function scsih_ncq_prio_supp() out of the mpt3sas driver and make it the generic SCSI SAS transport function sas_ata_ncq_prio_supported(). Nothing in that function is hardware specific, so this function can be used in both the mpt3sas driver and the mpi3mr driver. Reported-by: Scott McCoy <scott.mccoy@wdc.com> Fixes: 023ab2a9b4ed ("scsi: mpi3mr: Add support for queue command processing") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240611083435.92961-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11scsi: Add missing MODULE_DESCRIPTION() macrosJeff Johnson
On x86, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/scsi_common.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/advansys.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/BusLogic.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/aha1740.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/isci/isci.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/elx/efct.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/atp870u.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/ppa.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/scsi/imm.o Add all missing invocations of the MODULE_DESCRIPTION() macro. This updates all files which have a MODULE_LICENSE() but which do not have a MODULE_DESCRIPTION(), even ones which did not produce the x86 allmodconfig warnings. Acked-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240610-md-drivers-scsi-v3-1-055da78d66b2@quicinc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11scsi: ufs: core: Quiesce request queues before checking pending cmdsZiqi Chen
In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked, ufshcd_pending_cmds() is called to check whether there are pending transactions or not. And only if there are no pending transactions can we proceed to kickstart the clock scaling sequence. ufshcd_pending_cmds() traverses over all SCSI devices and calls sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down to three steps: 1. Calculate the nr outstanding bits set in the 'word' bitmap. 2. Calculate the nr outstanding bits set in the 'cleared' bitmap. 3. Subtract the result from step 1 by the result from step 2. This can lead to a race condition as outlined below: Assume there is one pending transaction in the request queue of one SCSI device, say sda, and the budget token of this request is 0, the 'word' is 0x1 and the 'cleared' is 0x0. 1. When step 1 executes, it gets the result as 1. 2. Before step 2 executes, block layer tries to dispatch a new request to sda. Since the SCSI layer is blocked, the request cannot pass through SCSI but the block layer would do budget_get() and budget_put() to sda's budget map regardless, so the 'word' has become 0x3 and 'cleared' has become 0x2 (assume the new request got budget token 1). 3. When step 2 executes, it gets the result as 1. 4. When step 3 executes, it gets the result as 0, meaning there is no pending transactions, which is wrong. Thread A Thread B ufshcd_pending_cmds() __blk_mq_sched_dispatch_requests() | | sbitmap_weight(word) | | scsi_mq_get_budget() | | | scsi_mq_put_budget() | | sbitmap_weight(cleared) ... When this race condition happens, the clock scaling sequence is started with transactions still in flight, leading to subsequent hibernate enter failure, broken link, task abort and back to back error recovery. Fix this race condition by quiescing the request queues before calling ufshcd_pending_cmds() so that block layer won't touch the budget map when ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer blocking/unblocking to reduce redundancies and latencies. Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code") Co-developed-by: Can Guo <quic_cang@quicinc.com> Signed-off-by: Can Guo <quic_cang@quicinc.com> Signed-off-by: Ziqi Chen <quic_ziqichen@quicinc.com> Link: https://lore.kernel.org/r/1717754818-39863-1-git-send-email-quic_ziqichen@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-11scsi: core: Disable CDL by defaultDamien Le Moal
For SCSI devices supporting the Command Duration Limits feature set, the user can enable/disable this feature use through the sysfs device attribute "cdl_enable". This attribute modification triggers a call to scsi_cdl_enable() to enable and disable the feature for ATA devices and set the scsi device cdl_enable field to the user provided bool value. For SCSI devices supporting CDL, the feature set is always enabled and scsi_cdl_enable() is reduced to setting the cdl_enable field. However, for ATA devices, a drive may spin-up with the CDL feature enabled by default. But the SCSI device cdl_enable field is always initialized to false (CDL disabled), regardless of the actual device CDL feature state. For ATA devices managed by libata (or libsas), libata-core always disables the CDL feature set when the device is attached, thus syncing the state of the CDL feature on the device and of the SCSI device cdl_enable field. However, for ATA devices connected to a SAS HBA, the CDL feature is not disabled on scan for ATA devices that have this feature enabled by default, leading to an inconsistent state of the feature on the device with the SCSI device cdl_enable field. Avoid this inconsistency by adding a call to scsi_cdl_enable() in scsi_cdl_check() to make sure that the device-side state of the CDL feature set always matches the scsi device cdl_enable field state. This implies that CDL will always be disabled for ATA devices connected to SAS HBAs, which is consistent with libata/libsas initialization of the device. Reported-by: Scott McCoy <scott.mccoy@wdc.com> Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240607012507.111488-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-05scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memoryBreno Leitao
There is a potential out-of-bounds access when using test_bit() on a single word. The test_bit() and set_bit() functions operate on long values, and when testing or setting a single word, they can exceed the word boundary. KASAN detects this issue and produces a dump: BUG: KASAN: slab-out-of-bounds in _scsih_add_device.constprop.0 (./arch/x86/include/asm/bitops.h:60 ./include/asm-generic/bitops/instrumented-atomic.h:29 drivers/scsi/mpt3sas/mpt3sas_scsih.c:7331) mpt3sas Write of size 8 at addr ffff8881d26e3c60 by task kworker/u1536:2/2965 For full log, please look at [1]. Make the allocation at least the size of sizeof(unsigned long) so that set_bit() and test_bit() have sufficient room for read/write operations without overwriting unallocated memory. [1] Link: https://lore.kernel.org/all/ZkNcALr3W3KGYYJG@gmail.com/ Fixes: c696f7b83ede ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path") Cc: stable@vger.kernel.org Suggested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240605085530.499432-1-leitao@debian.org Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-05scsi: sd: Use READ(16) when reading block zero on large capacity disksMartin K. Petersen
Commit 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") triggered a read to LBA 0 before attempting to inquire about device characteristics. This was done because some protocol bridge devices will return generic values until an attached storage device's media has been accessed. Pierre Tomon reported that this change caused problems on a large capacity external drive connected via a bridge device. The bridge in question does not appear to implement the READ(10) command. Issue a READ(16) instead of READ(10) when a device has been identified as preferring 16-byte commands (use_16_for_rw heuristic). Link: https://bugzilla.kernel.org/show_bug.cgi?id=218890 Link: https://lore.kernel.org/r/70dd7ae0-b6b1-48e1-bb59-53b7c7f18274@rowland.harvard.edu Link: https://lore.kernel.org/r/20240605022521.3960956-1-martin.petersen@oracle.com Fixes: 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") Cc: stable@vger.kernel.org Reported-by: Pierre Tomon <pierretom+12@ik.me> Suggested-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Pierre Tomon <pierretom+12@ik.me> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-04Merge patch series "Declare local functions static"Martin K. Petersen
Bart Van Assche <bvanassche@acm.org> says: Hi Martin, There are several 32-bit ARM SCSI drivers that trigger compiler warnings about missing function declarations. This patch series fixes these compiler warnings by declaring local functions static. Please consider this patch series for the next merge window. Thanks, Bart. Link: https://lore.kernel.org/r/20240603172311.1587589-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-04scsi: powertec: Declare local function staticBart Van Assche
Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240603172311.1587589-5-bvanassche@acm.org Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-04scsi: eesox: Declare local function staticBart Van Assche
Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240603172311.1587589-4-bvanassche@acm.org Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-04scsi: cumana: Declare local function staticBart Van Assche
Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240603172311.1587589-3-bvanassche@acm.org Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-04scsi: acornscsi: Declare local functions staticBart Van Assche
Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240603172311.1587589-2-bvanassche@acm.org Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-04Merge patch series "ufs: pci: Add support UFSHCI 4.0 MCQ"Martin K. Petersen
Minwoo Im <minwoo.im@samsung.com> says: This patchset introduces add support for MCQ introduced in UFSHCI 4.0. The first patch adds a simple helper to get the address of MCQ queue config registers. The second one enables MCQ feature by adding mandatory vops callback functions required at MCQ initialization phase. The last one is to prevent a case where number of MCQ is given 1 since driver allocates poll_queues first rather than I/O queues to handle device commands. Instead of causing exception handlers due to no I/O queue, failfast during the initialization time. Link: https://lore.kernel.org/r/20240531212244.1593535-1-minwoo.im@samsung.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-04scsi: ufs: mcq: Prevent no I/O queue case for MCQMinwoo Im
If hba_maxq equals poll_queues, which means there are no I/O queues (HCTX_TYPE_DEFAULT, HCTX_TYPE_READ), the very first hw queue will be allocated as HCTX_TYPE_POLL and it will be used as the dev_cmd_queue. In this case, device commands such as QUERY cannot be properly handled. This patch prevents the initialization of MCQ when the number of I/O queues is not set and only the number of POLL queues is set. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Link: https://lore.kernel.org/r/20240531212244.1593535-3-minwoo.im@samsung.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>