summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-09scsi: hisi_sas: Put a limit of link reset retriesLuo Jiaxing
If an OOB event is received but the phy still fails to come up, a link reset will be issued repeatedly at an interval of 20s until the phy comes up. Set a limit for link reset issue retries to avoid printing the timeout message endlessly. Link: https://lore.kernel.org/r/1623058179-80434-2-git-send-email-john.garry@huawei.com Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: qedi: Fix host removal with running sessionsMike Christie
qedi_clear_session_ctx() could race with the in-kernel or userspace driven recovery/removal and we could access a NULL conn or do a double free. We should be using iscsi_host_remove() to start the removal process from the driver. It will start the in-kernel recovery and notify userspace that the driver's scsi_hosts are being removed. iscsid will then drive the session removal like is done when the logout command is run. When the sessions are removed, iscsi_host_remove() will return so qedi can finish knowing there are no running sessions and no new sessions will be allowed. This also fixes an issue where we check for a NULL conn after already accessing it introduced in commit 27e986289e73 ("scsi: iscsi: Drop suspend calls from ep_disconnect") by just removing the function completely. Link: https://lore.kernel.org/r/20210609192709.5094-1-michael.christie@oracle.com Fixes: 27e986289e73 ("scsi: iscsi: Drop suspend calls from ep_disconnect") Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: mpi3mr: Fix error handling in mpi3mr_setup_isr()Dan Carpenter
The pci_alloc_irq_vectors_affinity() function returns negative error codes or it returns a number between the minimum vectors (1 in this case) and max_vectors. It won't return zero. Because "i" is a u16 then the error handling won't work. And also if it did work the error code was not set. Really "max_vectors" can be an int as well because we're doing a min_t() on int type. The other change is that it's better to remove unnecessary initialization so that static checkers can warn us if there are ever uninitialized variable bugs introduced in the future. I changed the error code from -1 (-EPERM) if the kmalloc() failed to -ENOMEM. And on success path I changed it from "return retval;" to "return 0;" which shouldn't affect the compiled code but makes it more readable. Link: https://lore.kernel.org/r/YMCJcnmSI4kOIyv/@mwanda Fixes: 824a156633df ("scsi: mpi3mr: Base driver code") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: mpi3mr: Delete unnecessary NULL checkDan Carpenter
The "mrioc->intr_info" pointer can't be NULL, but if it could then the second iteration through the loop would Oops. Let's delete the confusing and impossible NULL check. Link: https://lore.kernel.org/r/YMCJKgykDYtyvY44@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: mpi3mr: Fix a double freeTomas Henzl
Fix a double free, scsi_tgt_priv_data will be freed in mpi3mr_target_destroy() so remove the kfree() from mpi3mr_target_alloc(). I've also removed few unneeded initialisations. Link: https://lore.kernel.org/r/20210608145712.16386-1-thenzl@redhat.com Acked-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-09scsi: ufs: core: Fix a possible use before initialization caseCan Guo
In ufshcd_exec_dev_cmd(), if error happens before lrpb is initialized, then we should bail out instead of letting trace record the error. Link: https://lore.kernel.org/r/1623227044-22635-1-git-send-email-cang@codeaurora.org Fixes: a45f937110fa ("scsi: ufs: Optimize host lock on transfer requests send/compl paths") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: core: Use UPIU query trace in devman_upiu_cmd()Bean Huo
Since devman_upiu_cmd() is not COMMAND UPIU, and doesn't have CDB, it is better to use UPIU query trace, which provides more helpful information for issue troubleshooting. Link: https://lore.kernel.org/r/20210531104308.391842-5-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: core: Capture command trace only for the cmd != NULL caseBean Huo
For the query request, we already have query_trace, but in ufshcd_send_command(), there will add two more redundant traces. Since lrbp->cmd is NULL in the query request, the two trace events below provide nothing except the tag and DB. Instead of letting them take up the limited trace ring buffer, it’s better not to print these traces in case of cmd == NULL. ufshcd_command: send_req: ff3b0000.ufs: tag: 28, DB: 0x0, size: -1, IS: 0, LBA: 18446744073709551615, opcode: 0x0 (0x0), group_id: 0x0 ufshcd_command: dev_complete: ff3b0000.ufs: tag: 28, DB: 0x0, size: -1, IS: 0, LBA: 18446744073709551615, opcode: 0x0 (0x0), group_id: 0x0 Link: https://lore.kernel.org/r/20210531104308.391842-4-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: core: Let UPIU completion trace print RSP UPIU headerBean Huo
The current UPIU completion event trace still prints the COMMAND UPIU header, rather than the RSP UPIU header. This makes UPIU command trace useless in problem shooting in case we receive a trace log from the customer/field. There are two important fields in RSP UPIU: 1. The response field, which indicates the UFS defined overall success or failure of the series of Command, Data and RESPONSE UPIU’s that make up the execution of a task. 2. The Status field, which contains the command set specific status for a specific command issued by the initiator device. Before this commit, the UPIU paired trace events: ufshcd_upiu: send_req: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 ufshcd_upiu: complete_rsp: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 After this commit: ufshcd_upiu: send_req: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 ufshcd_upiu: complete_rsp: fe3b0000.ufs: HDR:21 00 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00 Link: https://lore.kernel.org/r/20210531104308.391842-3-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: core: Clean up ufshcd_add_command_trace()Bean Huo
To consistent with trace event print, convert the value of the variable 'lba' from a block layer sector address to a logical block adress. Link: https://lore.kernel.org/r/20210531104308.391842-2-huobean@gmail.com Suggested-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: core: Remove repeated word in commentKeoseong Park
Remove repeated word "for" in comment. Link: https://lore.kernel.org/r/1891546521.01622777101796.JavaMail.epsvc@epcpadp3 Signed-off-by: Keoseong Park <keosung.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: mpi3mr: Fix fall-through warning for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a fall-through warning by explicitly adding a break statement instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Link: https://lore.kernel.org/r/20210604023530.GA180997@embeddedor Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: NCR5380: Fix fall-through warning for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a fall-through warning by replacing a /* fallthrough */ comment with the new pseudo-keyword macro fallthrough; Link: https://github.com/KSPP/linux/issues/115 Link: https://lore.kernel.org/r/20210604022752.GA168289@embeddedor Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: Utilize Transfer Request List Completion Notification RegisterCan Guo
By reading the UTP Transfer Request List Completion Notification Register, which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer requests. Thus, SW can get rid of host lock, which is used to synchronize the tr_doorbell and outstanding_reqs, on transfer requests dispatch and completion paths. This can further benefit random read/write performance. Link: https://lore.kernel.org/r/1621845419-14194-4-git-send-email-cang@codeaurora.org Cc: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: Optimize host lock on transfer requests send/compl pathsCan Guo
Current UFS IRQ handler is completely wrapped by host lock, and because ufshcd_send_command() is also protected by host lock, when IRQ handler fires, not only the CPU running the IRQ handler cannot send new requests, the rest CPUs can neither. Move the host lock wrapping the IRQ handler into specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(), ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is no longer required to call __ufshcd_transfer_req_compl(). As per test, the optimization can bring considerable gain to random read/write performance. Link: https://lore.kernel.org/r/1621845419-14194-3-git-send-email-cang@codeaurora.org Cc: Stanley Chu <stanley.chu@mediatek.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Co-developed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: Remove a redundant command completion logic in error handlerCan Guo
ufshcd_host_reset_and_restore() anyways completes all pending requests before starts re-probing, so there is no need to complete the command on the highest bit in tr_doorbell in advance. Link: https://lore.kernel.org/r/1621845419-14194-2-git-send-email-cang@codeaurora.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg()Dan Carpenter
The "retval" variable needs to be signed for the error handling to work. Link: https://lore.kernel.org/r/YLjMEAFNxOas1mIp@mwanda Fixes: 7e26e3ea0287 ("scsi: scsi_dh_alua: Check for negative result value") Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: ufs: core: Remove irrelevant reference to non-existing docAvri Altman
Remove all references to the description of __ufshcd_wl_{suspend,resume} as no such description exist. Fixes: b294ff3e3449 (scsi: ufs: core: Enable power management for wlun) Link: https://lore.kernel.org/r/20210603122209.635799-1-avri.altman@wdc.com Signed-off-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: fcoe: Statically initialize flogi_maddrKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy() avoid using an inline const buffer argument and instead just statically initialize the destination array directly. Link: https://lore.kernel.org/r/20210602180000.3326448-1-keescook@chromium.org Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-07scsi: qedf: Update the max_id value in host structureSaurav Kashyap
host->max_id defines the maximum target id that the SCSI midlayer will attempt to manually scan. The default is 8. Update the value to the max sessions the driver supports. [mkp: applied by hand] Link: https://lore.kernel.org/r/20210602104653.17278-1-jhasan@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: core: Change the type of the second argument of ↵Bart Van Assche
scsi_host_complete_all_commands() Allow the compiler to verify the type of the second argument passed to scsi_host_complete_all_commands(). Link: https://lore.kernel.org/r/20210524025457.11299-4-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Cc: John Garry <john.garry@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: core: Introduce enums for the SAM and host status codesBart Van Assche
Make it possible for the compiler to verify whether SAM and host status codes are used correctly. [mkp: resolve conflicts with Hannes' SCSI result series] Link: https://lore.kernel.org/r/20210524025457.11299-3-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: libsas: Introduce more SAM status code aliases in enum exec_statusBart Van Assche
This patch prepares for converting SAM status codes into an enum. Without this patch converting SAM status codes into an enumeration type would trigger complaints about enum type mismatches for the SAS code. Link: https://lore.kernel.org/r/20210524025457.11299-2-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Cc: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02Merge branch '5.14/scsi-result' into 5.14/scsi-stagingMartin K. Petersen
Include Hannes' SCSI command result rework in the staging branch. [mkp: remove DRIVER_SENSE from mpi3mr] Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Wake up if cmd_cleanup_req is setMike Christie
If we got a response then we should always wake up the conn. For both the cmd_cleanup_req == 0 or cmd_cleanup_req > 0, we shouldn't dig into iscsi_itt_to_task because we don't know what the upper layers are doing. We can also remove the qedi_clear_task_idx call here because once we signal success libiscsi will loop over the affected commands and end up calling the cleanup_task callout which will release it. Link: https://lore.kernel.org/r/20210525181821.7617-29-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Complete TMF works before disconnectMike Christie
We need to make sure that abort and reset completion work has completed before ep_disconnect returns. After ep_disconnect we can't manipulate cmds because libiscsi will call conn_stop and take onwership. We are trying to make sure abort work and reset completion work has completed before we do the cmd clean up in ep_disconnect. The problem is that: 1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work was still pending we would not see the bit set. We need to do this before the work is queued. 2. If we had multiple works queued then we could break from the loop in qedi_ep_disconnect early because when abort work 1 completes it could clear QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that before work 2 has run. 3. A TMF reset completion work could run after ep_disconnect starts cleaning up cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq -> qedi_cleanup_all_io would might think it's done cleaning up cmds, but the reset completion work could still be running. We then return from ep_disconnect while still doing cleanup. This replaces the bit with a counter to track the number of queued TMF works, and adds a bool to prevent new works from starting from the completion path once a ep_disconnect starts. Link: https://lore.kernel.org/r/20210525181821.7617-28-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Pass send_iscsi_tmf task to abortMike Christie
qedi_abort_work knows what task to abort so just pass it to send_iscsi_tmf. Link: https://lore.kernel.org/r/20210525181821.7617-27-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Fix cleanup session block/unblock useMike Christie
Drivers shouldn't be calling block/unblock session for cmd cleanup because the functions can change the session state from under libiscsi. This adds a new a driver level bit so it can block all I/O the host while it drains the card. Link: https://lore.kernel.org/r/20210525181821.7617-26-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Fix TMF session block/unblock useMike Christie
Drivers shouldn't be calling block/unblock session for tmf handling because the functions can change the session state from under libiscsi. iscsi_queuecommand's call to iscsi_prep_scsi_cmd_pdu-> iscsi_check_tmf_restrictions will prevent new cmds from being sent to qedi after we've started handling a TMF. So we don't need to try and block it in the driver, and we can remove these block calls. Link: https://lore.kernel.org/r/20210525181821.7617-25-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Use GFP_NOIO for TMF allocationMike Christie
We run from a workqueue with no locks held so use GFP_NOIO. Link: https://lore.kernel.org/r/20210525181821.7617-24-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Fix TMF tid allocationMike Christie
qedi_iscsi_abort_work and qedi_tmf_work both allocate a tid then call qedi_send_iscsi_tmf which also allocates a tid. This removes the tid allocation from the callers. Link: https://lore.kernel.org/r/20210525181821.7617-23-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Fix use after free during abort cleanupMike Christie
If qedi_tmf_work's qedi_wait_for_cleanup_request call times out we will also force the clean up of the qedi_work_map but qedi_process_cmd_cleanup_resp could still be accessing the qedi_cmd. To fix this issue we extend where we hold the tmf_work_lock and back_lock so the qedi_process_cmd_cleanup_resp access is serialized with the cleanup done in qedi_tmf_work and any completion handling for the iscsi_task. Link: https://lore.kernel.org/r/20210525181821.7617-22-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Fix race during abort timeoutsMike Christie
If the SCSI cmd completes after qedi_tmf_work calls iscsi_itt_to_task then the qedi qedi_cmd->task_id could be freed and used for another cmd. If we then call qedi_iscsi_cleanup_task with that task_id we will be cleaning up the wrong cmd. Wait to release the task_id until the last put has been done on the iscsi_task. Because libiscsi grabs a ref to the task when sending the abort, we know that for the non-abort timeout case that the task_id we are referencing is for the cmd that was supposed to be aborted. A latter commit will fix the case where the abort times out while we are running qedi_tmf_work. Link: https://lore.kernel.org/r/20210525181821.7617-21-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: qedi: Fix null ref during abort handlingMike Christie
If qedi_process_cmd_cleanup_resp finds the cmd it frees the work and sets list_tmf_work to NULL, so qedi_tmf_work should check if list_tmf_work is non-NULL when it wants to force cleanup. Link: https://lore.kernel.org/r/20210525181821.7617-20-michael.christie@oracle.com Reviewed-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Move pool freeingMike Christie
This doesn't fix any bugs, but it makes more sense to free the pool after we have removed the session. At that time we know nothing is touching any of the session fields, because all devices have been removed and scans are stopped. Link: https://lore.kernel.org/r/20210525181821.7617-19-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Hold task ref during TMF timeout handlingMike Christie
For aborts, qedi needs to cleanup the FW then send the TMF from a worker thread. While it's doing these the cmd could complete normally and the TMF could time out. libiscsi would then complete the iscsi_task which will call into the driver to cleanup the driver level resources while it still might be accessing them for the cleanup/abort. This has iscsi_eh_abort keep the iscsi_task ref if the TMF times out, so qedi does not have to worry about if the task is being freed while in use and does not need to get its own ref. Link: https://lore.kernel.org/r/20210525181821.7617-18-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Flush block work before unblockMike Christie
We set the max_active iSCSI EH works to 1, so all work is going to execute in order by default. However, userspace can now override this in sysfs. If max_active > 1, we can end up with the block_work on CPU1 and iscsi_unblock_session running the unblock_work on CPU2 and the session and target/device state will end up out of sync with each other. This adds a flush of the block_work in iscsi_unblock_session. Link: https://lore.kernel.org/r/20210525181821.7617-17-michael.christie@oracle.com Fixes: 1d726aa6ef57 ("scsi: iscsi: Optimize work queue flush use") Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Fix completion check during abort racesMike Christie
We have a ref to the task being aborted, so SCp.ptr will never be NULL. We need to use iscsi_task_is_completed to check for the completed state. Link: https://lore.kernel.org/r/20210525181821.7617-16-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Fix shost->max_id useMike Christie
The iscsi offload drivers are setting the shost->max_id to the max number of sessions they support. The problem is that max_id is not the max number of targets but the highest identifier the targets can have. To use it to limit the number of targets we need to set it to max sessions - 1, or we can end up with a session we might not have preallocated resources for. Link: https://lore.kernel.org/r/20210525181821.7617-15-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Fix conn use after free during resetsMike Christie
If we haven't done a unbind target call we can race where iscsi_conn_teardown wakes up the EH thread and then frees the conn while those threads are still accessing the conn ehwait. We can only do one TMF per session so this just moves the TMF fields from the conn to the session. We can then rely on the iscsi_session_teardown->iscsi_remove_session->__iscsi_unbind_session call to remove the target and it's devices, and know after that point there is no device or scsi-ml callout trying to access the session. Link: https://lore.kernel.org/r/20210525181821.7617-14-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Get ref to conn during reset handlingMike Christie
The comment in iscsi_eh_session_reset is wrong and we don't wait for the EH to complete before tearing down the conn. This has us get a ref to the conn when we are not holding the eh_mutex/frwd_lock so it does not get freed from under us. Link: https://lore.kernel.org/r/20210525181821.7617-13-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Have abort handler get ref to connMike Christie
If SCSI midlayer is aborting a task when we are tearing down the conn we could free the conn while the abort thread is accessing the conn. This has the abort handler get a ref to the conn so it won't be freed from under it. Note: this is not needed for device/target reset because we are holding the eh_mutex when accessing the conn. Link: https://lore.kernel.org/r/20210525181821.7617-12-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Add iscsi_cls_conn refcount helpersMike Christie
There are a couple places where we could free the iscsi_cls_conn while it's still in use. This adds some helpers to get/put a refcount on the struct and converts an exiting user. Subsequent commits will then use the helpers to fix 2 bugs in the eh code. Link: https://lore.kernel.org/r/20210525181821.7617-11-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: iscsi_tcp: Start socket shutdown during conn stopMike Christie
Make sure the conn socket shutdown starts before we start the timer to fail commands to upper layers. Link: https://lore.kernel.org/r/20210525181821.7617-10-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: iscsi_tcp: Set no lingerMike Christie
Userspace (open-iscsi based tools at least) sets no linger on the socket to prevent stale data from being sent. However, with the in-kernel cleanup if userspace is not up the sockfd_put will release the socket without having set that sockopt. iscsid sets that opt at socket close time, but it seems ok to set this at setup time in the kernel for all tools. Link: https://lore.kernel.org/r/20210525181821.7617-9-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Fix in-kernel conn failure handlingMike Christie
Commit 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") has the following regressions/bugs that this patch fixes: 1. It can return cmds to upper layers like dm-multipath where that can retry them. After they are successful the fs/app can send new I/O to the same sectors, but we've left the cmds running in FW or in the net layer. We need to be calling ep_disconnect if userspace is not up. This patch only fixes the issue for offload drivers. iscsi_tcp will be fixed in separate commit because it doesn't have a ep_disconnect call. 2. The drivers that implement ep_disconnect expect that it's called before conn_stop. Besides crashes, if the cleanup_task callout is called before ep_disconnect it might free up driver/card resources for session1 then they could be allocated for session2. But because the driver's ep_disconnect is not called it has not cleaned up the firmware so the card is still using the resources for the original cmd. 3. The stop_conn_work_fn can run after userspace has done its recovery and we are happily using the session. We will then end up with various bugs depending on what is going on at the time. We may also run stop_conn_work_fn late after userspace has called stop_conn and ep_disconnect and is now going to call start/bind conn. If stop_conn_work_fn runs after bind but before start, we would leave the conn in a unbound but sort of started state where IO might be allowed even though the drivers have been set in a state where they no longer expect I/O. 4. Returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in kernel stop_conn function is breaking userspace. We should have been doing this for the caller. Link: https://lore.kernel.org/r/20210525181821.7617-8-michael.christie@oracle.com Fixes: 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely in kernel space") Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Rel ref after iscsi_lookup_endpoint()Mike Christie
Subsequent commits allow the kernel to do ep_disconnect. In that case we will have to get a proper refcount on the ep so one thread does not delete it from under another. Link: https://lore.kernel.org/r/20210525181821.7617-7-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Use system_unbound_wq for destroy_workMike Christie
Use the system_unbound_wq for async session destruction. We don't need a dedicated workqueue for async session destruction because: 1. perf does not seem to be an issue since we only allow 1 active work. 2. it does not have deps with other system works and we can run them in parallel with each other. Link: https://lore.kernel.org/r/20210525181821.7617-6-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Force immediate failure during shutdownMike Christie
If the system is not up, we can just fail immediately since iscsid is not going to ever answer our netlink events. We are already setting the recovery_tmo to 0, but by passing stop_conn STOP_CONN_TERM we never will block the session and start the recovery timer, because for that flag userspace will do the unbind and destroy events which would remove the devices and wake up and kill the eh. Since the conn is dead and the system is going dowm this just has us use STOP_CONN_RECOVER with recovery_tmo=0 so we fail immediately. However, if the user has set the recovery_tmo=-1 we let the system hang like they requested since they might have used that setting for specific reasons (one known reason is for buggy cluster software). Link: https://lore.kernel.org/r/20210525181821.7617-5-michael.christie@oracle.com Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-02scsi: iscsi: Drop suspend calls from ep_disconnectMike Christie
libiscsi will now suspend the send/tx queue for the drivers so we can drop it from the drivers ep_disconnect. Link: https://lore.kernel.org/r/20210525181821.7617-4-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>