summaryrefslogtreecommitdiff
path: root/drivers/scsi/lpfc
AgeCommit message (Collapse)Author
2021-12-16scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()Dan Carpenter
The "mybuf" string comes from the user, so we need to ensure that it is NUL terminated. Link: https://lore.kernel.org/r/20211214070527.GA27934@kili Fixes: bd2cdd5e400f ("scsi: lpfc: NVME Initiator: Add debugfs support") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-11-23scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGOJames Smart
A commit introduced formal regstration of all Fabric nodes to the SCSI transport as well as REG/UNREG RPI mailbox requests. The commit introduced the NLP_RELEASE_RPI flag for rports set in the lpfc_cmpl_els_logo_acc() routine to help clean up the RPIs. This new code caused the driver to release the RPI value used for the remote port and marked the RPI invalid. When the driver later attempted to re-login, it would use the invalid RPI and the adapter rejected the PLOGI request. As no login occurred, the devloss timer on the rport expired and connectivity was lost. This patch corrects the code by removing the snippet that requests the rpi to be unregistered. This change only occurs on a node that is already marked to be rediscovered. This puts the code back to its original behavior, preserving the already-assigned rpi value (registered or not) which can be used on the re-login attempts. Link: https://lore.kernel.org/r/20211123165646.62740-1-jsmart2021@gmail.com Fixes: fe83e3b9b422 ("scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller") Cc: <stable@vger.kernel.org> # v5.14+ Co-developed-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-11-05Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, smartpqi, lpfc, target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug fixes. Notable core changes are the removal of scsi->tag which caused some churn in obsolete drivers and a sweep through all drivers to call scsi_done() directly instead of scsi->done() which removes a pointer indirection from the hot path and a move to register core sysfs files earlier, which means they're available to KOBJ_ADD processing, which necessitates switching all drivers to using attribute groups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: lpfc: Update lpfc version to 14.0.0.3 scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss scsi: lpfc: Fix link down processing to address NULL pointer dereference scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine scsi: lpfc: Correct sysfs reporting of loop support after SFP status change scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup() scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer scsi: ufs: mediatek: Avoid sched_clock() misuse scsi: mpt3sas: Make mpt3sas_dev_attrs static scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions scsi: target: core: Stop using bdevname() scsi: aha1542: Use memcpy_{from,to}_bvec() scsi: sr: Add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: target: Perform ALUA group changes in one step scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path scsi: target: Fix alua_tg_pt_gps_count tracking scsi: target: Fix ordered tag handling ...
2021-10-20scsi: lpfc: Update lpfc version to 14.0.0.3James Smart
Update lpfc version to 14.0.0.3. Link: https://lore.kernel.org/r/20211020211417.88754-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-20scsi: lpfc: Allow fabric node recovery if recovery is in progress before devlossJames Smart
A link bounce to a slow fabric may observe FDISC response delays lasting longer than devloss tmo. Current logic decrements the final fabric node kref during a devloss tmo event. This results in a NULL ptr dereference crash if the FDISC completes for that fabric node after devloss tmo. Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when devloss tmo triggers and we've noticed that fabric node recovery has already started or finished in between the time lpfc_dev_loss_tmo_callbk queues lpfc_dev_loss_tmo_handler. If fabric node recovery succeeds, then the driver reverses the devloss tmo marked kref put with a kref get. If fabric node recovery fails, then the final kref put relies on the ELS timing out or the REG_LOGIN cmpl routine. Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-20scsi: lpfc: Fix link down processing to address NULL pointer dereferenceJames Smart
If an FC link down transition while PLOGIs are outstanding to fabric well known addresses, outstanding ABTS requests may result in a NULL pointer dereference. Driver unload requests may hang with repeated "2878" log messages. The Link down processing results in ABTS requests for outstanding ELS requests. The Abort WQEs are sent for the ELSs before the driver had set the link state to down. Thus the driver is sending the Abort with the expectation that an ABTS will be sent on the wire. The Abort request is stalled waiting for the link to come up. In some conditions the driver may auto-complete the ELSs thus if the link does come up, the Abort completions may reference an invalid structure. Fix by ensuring that Abort set the flag to avoid link traffic if issued due to conditions where the link failed. Link: https://lore.kernel.org/r/20211020211417.88754-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-20scsi: lpfc: Allow PLOGI retry if previous PLOGI was abortedJames Smart
A remote nport can stop responding to PLOGI beyond the ELS I/O timeout under some fault conditions. When this happens, the non-response triggers a dev_loss_tmo event from the transport which causes the driver to abort the PLOGI and stop any retries. This was due to a policy in the ELS completion handler whenever an ELS was terminated due to driver request. Revise the ELS completion path to detect PLOGIs that were aborted and allow retries. Link: https://lore.kernel.org/r/20211020211417.88754-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-20scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routineJames Smart
An error is detected with the following report when unloading the driver: "KASAN: use-after-free in lpfc_unreg_rpi+0x1b1b" The NLP_REG_LOGIN_SEND nlp_flag is set in lpfc_reg_fab_ctrl_node(), but the flag is not cleared upon completion of the login. This allows a second call to lpfc_unreg_rpi() to proceed with nlp_rpi set to LPFC_RPI_ALLOW_ERROR. This results in a use after free access when used as an rpi_ids array index. Fix by clearing the NLP_REG_LOGIN_SEND nlp_flag in lpfc_mbx_cmpl_fc_reg_login(). Link: https://lore.kernel.org/r/20211020211417.88754-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-20scsi: lpfc: Correct sysfs reporting of loop support after SFP status changeJames Smart
Applications determine loop support in part by querying the 'pls' sysfs node. Reporting of 'pls' (Private Loop Support) is derived from the descriptor returned by the COMMON_GET_SLI4_PARAMETERS mailbox command, which is issued during initialization or after a reset. The value of this field may change if there is a dynamic SFP change. The driver currently will not pick up the change as there was no reset scenario. Rework to commonize the sending of the COMMON_GET_SLI4_PARAMETERS command. Add the calling of the routine after receipt of an async event indicating an SFP change. Link: https://lore.kernel.org/r/20211020211417.88754-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-20scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_resetJames Smart
A prior patch introduced HBA_NEEDS_CFG_PORT flag logic, but in lpfc_sli_brdrestart_s3() code path, right after HBA_NEEDS_CFG_PORT is set, the phba->hba_flag is cleared in lpfc_sli_brdreset(). Fix by calling lpfc_sli_chipset_init() to wait for successful restart of the HBA in lpfc_host_reset_handler() after lpfc_sli_brdrestart(). lpfc_sli_chipset_init() sets the HBA_NEEDS_CFG_PORT flag so that the lpfc_sli_hba_setup() routine from lpfc_online() will execute lpfc_sli_config_port() initialization step when the brdrestart is successful. Link: https://lore.kernel.org/r/20211020211417.88754-3-jsmart2021@gmail.com Fixes: d2f2547efd39 ("scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-20scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to ↵James Smart
driver_resource_setup() In cases when lpfc_enable_pci_dev() fails, lpfc_printf_log() with LOG_TRACE_EVENT set will call lpfc_dmp_dbg() which uses the phba->port_list_lock. However, phba->port_list_lock does not get initialized until lpfc_setup_driver_resource_phase1(). Thus, any initialization routine with LOG_TRACE_EVENT log message prior to lpfc_setup_driver_resource_phase1() will crash. Revert LOG_TRACE_EVENT back to LOG_INIT for all log messages in routines prior to lpfc_setup_driver_resource_phase1(). Link: https://lore.kernel.org/r/20211020211417.88754-2-jsmart2021@gmail.com CC: Zheyu Ma <zheyuma97@gmail.com> Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-18block: move elevator.h to block/Christoph Hellwig
Except for the features passed to blk_queue_required_elevator_features, elevator.h is only needed internally to the block layer. Move the ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to block/, dropping all the spurious includes outside of that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-16scsi: lpfc: Switch to attribute groupsBart Van Assche
struct device supports attribute groups directly but does not support struct device_attribute directly. Hence switch to attribute groups. Link: https://lore.kernel.org/r/20211012233558.4066756-28-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-16scsi: lpfc: Call scsi_done() directlyBart Van Assche
Conditional statements are faster than indirect calls. Hence call scsi_done() directly. Link: https://lore.kernel.org/r/20211007202923.2174984-47-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-12Merge branch '5.15/scsi-fixes' into 5.16/scsi-stagingMartin K. Petersen
Merge the 5.15/scsi-fixes branch into the staging tree to resolve UFS conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-04scsi: lpfc: Fix memory overwrite during FC-GS I/O abort handlingJames Smart
When an FC-GS I/O is aborted by lpfc, the driver requires a node pointer for a dereference operation. In the abort I/O routine, the driver miscasts a context pointer to the wrong data type and overwrites a single byte outside of the allocated space. This miscast is done in the abort I/O function handler because the handler works on both FC-GS and FC-LS commands. However, the code neglected to get the correct job location for the node. Fix this by acquiring the necessary node pointer from the correct job structure depending on the I/O type. Link: https://lore.kernel.org/r/20211004231210.35524-1-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-28scsi: lpfc: Add support for optional PLDV handlingJames Smart
At adapter attachment or SLI port initialization, read the SLIPORT_STATUS register to check for pldv_enable. If found, the driver will perform a PCIe configuration space write when attaching to an SLI port instance that is an LPe32000 series adapter. Link: https://lore.kernel.org/r/20210927183518.22130-1-jsmart2021@gmail.com Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-28scsi: lpfc: Return NULL rather than a plain 0 integerColin Ian King
Function lpfc_sli4_perform_vport_cvl() returns a pointer to struct lpfc_nodelist so returning a plain 0 integer isn't good practice. Fix this by returning a NULL instead. Link: https://lore.kernel.org/r/20210925224113.183040-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-28scsi: lpfc: Fix a function name in commentsCai Huoqing
Use dma_map_sg() instead of pci_map_sg() in comments. Link: https://lore.kernel.org/r/20210925125324.1760-3-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: lpfc: Fix mailbox command failure during driver initializationJames Smart
Contention for the mailbox interface may occur during driver initialization (immediately after a function reset), between mailbox commands initiated via ioctl (bsg) and those driver requested by the driver. After setting SLI_ACTIVE flag for a port, there is a window in which the driver will allow an ioctl to be initiated while the adapter is initializing and issuing mailbox commands via polling. The polling logic then gets confused. Correct by having thread setting SLI_ACTIVE spot an active mailbox command and allow it complete before proceeding. Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: lpfc: Fix gcc -Wstringop-overread warning, againArnd Bergmann
I fixed a stringop-overread warning earlier this year, now a second copy of the original code was added and the warning came back: drivers/scsi/lpfc/lpfc_attr.c: In function 'lpfc_cmf_info_show': drivers/scsi/lpfc/lpfc_attr.c:289:25: error: 'strnlen' specified bound 4095 exceeds source size 24 [-Werror=stringop-overread] 289 | strnlen(LPFC_INFO_MORE_STR, PAGE_SIZE - 1), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it the same way as the other copy. Link: https://lore.kernel.org/r/20210920095628.1191676-1-arnd@kernel.org Fixes: ada48ba70f6b ("scsi: lpfc: Fix gcc -Wstringop-overread warning") Fixes: 74a7baa2a3ee ("scsi: lpfc: Add cmf_info sysfs entry") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: lpfc: Use correct scnprintf() limitDan Carpenter
The limit should be "PAGE_SIZE - len" instead of "PAGE_SIZE". We're not going to hit the limit so this fix will not affect runtime. Link: https://lore.kernel.org/r/20210916132331.GE25094@kili Fixes: 5b9e70b22cc5 ("scsi: lpfc: raise sg count for nvme to use available sg resources") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-22scsi: lpfc: Fix sprintf() overflow in lpfc_display_fpin_wwpn()Dan Carpenter
This scnprintf() uses the wrong limit. It should be "LPFC_FPIN_WWPN_LINE_SZ - len" instead of LPFC_FPIN_WWPN_LINE_SZ. Link: https://lore.kernel.org/r/20210916132251.GD25094@kili Fixes: 428569e66fa7 ("scsi: lpfc: Expand FPIN and RDF receive logging") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Update lpfc version to 14.0.0.2James Smart
Update lpfc version to 14.0.0.2. Link: https://lore.kernel.org/r/20210910233159.115896-15-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Improve PBDE checks during SGL processingJames Smart
The PBDE feature, setting payload buffer address explicitly in the WQE so it doesn't have to be fetched from the SGL, only makes sense when there is a single buffer for the I/O. When there are multiple buffers it actually hurts performance as the SGL subsequently has to be fetched. Rework the SGL logic to only use PBDE when a single buffer. Link: https://lore.kernel.org/r/20210910233159.115896-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Zero CGN stats only during initial driver load and stat resetJames Smart
Currently congestion management framework results are cleared whenever the framework settings changed (such as it being turned off then back on). This unfortunately means prior stats, rolled up to higher time windows lose meaning. Change such that stats are not cleared. Thus they pause and resume with prior values still being considered. Link: https://lore.kernel.org/r/20210910233159.115896-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix I/O block after enabling managed congestion modeJames Smart
If the congestion management framework dynamically enables, it may do so while I/O is in flight. The updates of cmf info due to inflight I/O completing may happen before values have been initialized. Fix by ensure cmf_max_bytes_per_interval is initialized when checking bandwidth utilization for SCSI layer blocking. Link: https://lore.kernel.org/r/20210910233159.115896-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Adjust bytes received vales during cmf timer intervalJames Smart
The newly added congestion mgmt framework is seeing unexpected congestion FPINs and signals. In analysis, time values given to the adapter are not at hard time intervals. Thus the drift vs the transfer count seen is affecting how the framework manages things. Adjust counters to cover the drift. Link: https://lore.kernel.org/r/20210910233159.115896-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix EEH support for NVMe I/OJames Smart
Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix FCP I/O flush functionality for TMF routinesJames Smart
A prior patch inadvertently caused lpfc_sli_sum_iocb() to exclude counting of outstanding aborted I/Os and ABORT IOCBs. Thus, lpfc_reset_flush_io_context() called from any TMF routine does not properly wait to flush all outstanding FCP IOCBs leading to a block layer crash on an invalid scsi_cmnd->request pointer. kernel BUG at ../block/blk-core.c:1489! RIP: 0010:blk_requeue_request+0xaf/0xc0 ... Call Trace: <IRQ> __scsi_queue_insert+0x90/0xe0 [scsi_mod] blk_done_softirq+0x7e/0x90 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 do_IRQ+0x4c/0xd0 common_interrupt+0x87/0x87 </IRQ> Fix by separating out the LPFC_IO_FCP, LPFC_IO_ON_TXCMPLQ, LPFC_DRIVER_ABORTED, and CMD_ABORT_XRI_CN || CMD_CLOSE_XRI_CN checks into a new lpfc_sli_validate_fcp_iocb_for_abort() routine when determining to build an ABORT iocb. Restore lpfc_reset_flush_io_context() functionality by including counting of outstanding aborted IOCBs and ABORT IOCBs in lpfc_sli_sum_iocb(). Link: https://lore.kernel.org/r/20210910233159.115896-9-jsmart2021@gmail.com Fixes: e1364711359f ("scsi: lpfc: Fix illegal memory access on Abort IOCBs") Cc: <stable@vger.kernel.org> # v5.12+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix NVMe I/O failover to non-optimized pathJames Smart
Currently, we hold off unregistering with NVMe transport layer until GID_FT or ADISC completes upon receipt of RSCN. In the ADISC discovery routine, for nodes not found in the GID_FT response, the nodes are unregistered from the SCSI transport but not UNREG_RPI'd. Meaning outstanding WQEs continue to be outstanding and were not failed back to the OS. If an NVMe device, this mean there wasn't initial termination of the I/Os so they could be issued on a different NVMe path. Fix by unregistering the RPI so that I/O is cancelled. Link: https://lore.kernel.org/r/20210910233159.115896-8-jsmart2021@gmail.com Fixes: 0614568361b0 ("scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Don't remove ndlp on PRLI errors in P2P modeJames Smart
In pt-2-pt mode, the initiator does not log into the target after a PRLI error. In pt-2-pt mode, the target responded to the PRLI by sending a LOGO. The LOGO causes all ELS and I/Os to be aborted. This caused the PRLI to fail. The PRLI completion path caused the discovery node to be dropped to avoid being stick in an UNUSED (not logged in) state. As the node was dropped there is no retry of the login and as it is pt-2-pt, there is no RSCN to retrigger discovery. Thus the other end is not seen by the OS. Fix by ensuring the discovery node is not dropped if connecting pt-2-pt. This will cause PLOGI to be retried. Link: https://lore.kernel.org/r/20210910233159.115896-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix rediscovery of tape device after LIPJames Smart
On link up and node discovery, a remote port is registered with the SCSI transport and the driver sets fc4_xpt_flags to track transport registration. A link down event causes the driver to deregister with the SCSI transport, starting the devloss timer, and calls a local unreg routine to clear the login state. Part of the login state is the fc4_xpt_flags. However, with tape devices that support sequence level error recovery, which wants to preserve the login, the local unreg routine is skipped, thus the flags aren't cleared. A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node() routine is called. As the fc4_xpt_flags is not clear, it's believed the node is already registered with the transport. Unfortunately, the registration was already terminated. Eventually the devloss tmo timer expires and tears down the device. Fix by ensuring the tape device, known by the ADISC flag, is always unregistered if the link drops. Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix hang on unload due to stuck fport nodeJames Smart
A test scenario encountered an unload hang while an FLOGI ELS was in flight when a link down condition occurred. The driver fails unload as it never releases the fport node. For most nodes, when the link drops, devloss tmo is started and the timeout will cause the final node release. For the Fport, as it has not yet registered with the SCSI transport, there is no devloss timer to be started, so there is no final release. Additionally, the link down sequence causes ABORTS to be issued for pending ELS's. The completions from the ABORTS perform the release of node references. However, as the adapter is being reset to be unloaded, those completions will never occur. Fix by the following: - In the ELS cleanup, recognize when unloading and place the ELS's on a different list that immediately cleans up/completes the ELS's. It's recognized that this condition primarily affects only the fport, with other ports having normal clean up logic that handles things. - Resolve the devloss issue by, when cleaning up nodes on after link down, recognizing when the fabric node does not have a completed state (its state is UNUSED) and removing a reference so the node can delete after the ELS reference is released. Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix premature rpi release for unsolicited TPLS and LS_RJTJames Smart
A test scenario has a target issuing a TPLS after accepting the driver's PRLI. TPLS is not supported by the driver so it rejects the ELS. However, the reject was only happening on the primary N_Port. If the TPLS was to a NPIV vport, not only would it reject the ELS, but it would act on the TPLS, starting devloss, then unregister from the SCSI transport and release the node. When devloss expired, it would access the node again and cause a page faul. Fix by altering the NPIV code to recognize that a correctly registered node can reject unsolicited ELS I/O and to not unregister with the SCSI transport and tear the node down. Add a check of the fc4_xpt_flags so that only a zero value allows the unreg and teardown. Link: https://lore.kernel.org/r/20210910233159.115896-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Don't release final kref on Fport node while ABTS outstandingJames Smart
In a rarely executed path, FLOGI failure, there is a refcounting error. If FLOGI completed with an error, typically a timeout, the initial completion handler would remove the job reference. However, the job completion isn't the actual end of the job/exchange as the timeout usually initiates an ABTS, and upon that ABTS completion, a final completion is sent. The driver removes the reference again in the final completion. Thus the imbalance. In the buggy cases, if there was a link bounce while the delayed response is outstanding, the fport node may be referenced again but there was no additional reference as it is already present. The delayed completion then occurs and removes the last reference freeing the node and causing issues in the link up processed that is using the node. Fix this scenario by removing the snippet that removed the reference in the initial FLOGI completion. The bad snippet was poorly trying to identify the FLOGI as OK to do so by realizing the node was not registered with either SCSI or NVMe transport. Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com Fixes: 618e2ee146d4 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node") Cc: <stable@vger.kernel.org> # v5.13+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-14scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()James Smart
When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass the requests to the adapter. If such an attempt fails, a local "fail_msg" string is set and a log message output. The job is then added to a completions list for cancellation. Processing of any further jobs from the txq list continues, but since "fail_msg" remains set, jobs are added to the completions list regardless of whether a wqe was passed to the adapter. If successfully added to txcmplq, jobs are added to both lists resulting in list corruption. Fix by clearing the fail_msg string after adding a job to the completions list. This stops the subsequent jobs from being added to the completions list unless they had an appropriate failure. Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: lpfc: Remove unneeded variableChi Minghao
Fix the following coccicheck REVIEW: ./drivers/scsi/lpfc/lpfc_scsi.c:1498:9-12 REVIEW Unneeded variable Link: https://lore.kernel.org/r/20210831114058.17817-1-lv.ruyi@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cm> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Chi Minghao <chi.minghao@zte.com.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: lpfc: Fix compilation errors on kernels with no CONFIG_DEBUG_FSJames Smart
The Kernel test robot flagged the following warning: ".../lpfc_init.c:7788:35: error: 'struct lpfc_sli4_hba' has no member named 'c_stat'" Reviewing this issue highlighted that one of the recent patches caused the driver to no longer compile cleanly if CONFIG_DEBUG_FS is not set. Correct the different areas that are failing to compile. Link: https://lore.kernel.org/r/20210908050927.37275-1-jsmart2021@gmail.com Fixes: 02243836ad6f ("scsi: lpfc: Add support for the CM framework") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Build-tested-by: Nathan Chancellor <nathan@kernel.org> Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13scsi: lpfc: Fix CPU to/from endian warnings introduced by ELS processingJames Smart
The kernel test robot reported the following sparse warning: ".../lpfc_els.c:3984:25: sparse: sparse: cast from restricted __be16" For the error being flagged, using be32_to_cpu() on a be16 data type, it was simple enough. But a review of other elements and warnings were also evaluated. This patch corrected several items in the original patch: - Using be32_to_cpu() on a be16 data type - cpu_to_le32() used on a std uint32_t (CPU) data type. Note: This is a byte array, but stored in LE layout by hardware at 32-bit boundaries. So it possibly needed conversion. - Using cpu_to_le32() on a std uint16_t and assigned to a char typeA - Using le32_to_cpu() on a le16 type - Missing cpu_to_le16() on an assignment Link: https://lore.kernel.org/r/20210830231243.6227-1-jsmart2021@gmail.com Fixes: 9064aeb2df8e ("scsi: lpfc: Add EDC ELS support") Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-02Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, qla2xxx, target, smartpqi, lpfc, mpt3sas). The core change causing the most churn was replacing the command request field request with a macro, allowing us to offset map to it and remove the redundant field; the same was also done for the tag field. The most impactful change is the final removal of scsi_ioctl, which has been deprecated for over a decade" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits) scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1 scsi: ufs: ufs-exynos: Fix static checker warning scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI scsi: lpfc: Use the proper SCSI midlayer interfaces for PI scsi: lpfc: Copyright updates for 14.0.0.1 patches scsi: lpfc: Update lpfc version to 14.0.0.1 scsi: lpfc: Add bsg support for retrieving adapter cmf data scsi: lpfc: Add cmf_info sysfs entry scsi: lpfc: Add debugfs support for cm framework buffers scsi: lpfc: Add support for maintaining the cm statistics buffer scsi: lpfc: Add rx monitoring statistics scsi: lpfc: Add support for the CM framework scsi: lpfc: Add cmfsync WQE support scsi: lpfc: Add support for cm enablement buffer scsi: lpfc: Add cm statistics buffer support scsi: lpfc: Add EDC ELS support scsi: lpfc: Expand FPIN and RDF receive logging scsi: lpfc: Add MIB feature enablement support scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware scsi: fc: Add EDC ELS definition ...
2021-08-24scsi: lpfc: Use the proper SCSI midlayer interfaces for PIMartin K. Petersen
Use the SCSI midlayer interfaces to query protection interval, reference tag, per-command DIX flags, and logical block count. Link: https://lore.kernel.org/r/20210817025014.12085-3-martin.petersen@oracle.com CC: James Smart <james.smart@broadcom.com> CC: Dick Kennedy <dick.kennedy@broadcom.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Copyright updates for 14.0.0.1 patchesJames Smart
Update copyrights to 2021 for files modified in the 14.0.0.1 patch set. Link: https://lore.kernel.org/r/20210816162901.121235-17-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Update lpfc version to 14.0.0.1James Smart
Update lpfc version to 14.0.0.1 Link: https://lore.kernel.org/r/20210816162901.121235-16-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Add bsg support for retrieving adapter cmf dataJames Smart
Add a bsg ioctl to allow user applications to retrieve the adapter congestion management framework buffer. Link: https://lore.kernel.org/r/20210816162901.121235-15-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Add cmf_info sysfs entryJames Smart
Allow abbreviated cm framework status information to be obtained via sysfs. Link: https://lore.kernel.org/r/20210816162901.121235-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Add debugfs support for cm framework buffersJames Smart
Add support via debugfs to report the cm statistics, cm enablement, and rx monitor information. Link: https://lore.kernel.org/r/20210816162901.121235-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Add support for maintaining the cm statistics bufferJames Smart
Add the logic to move the congestion management and event information into the cmd statistics buffer maintained for the adapter. The update includes rolling up values for the last minute, hour, and day information. Link: https://lore.kernel.org/r/20210816162901.121235-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Add rx monitoring statisticsJames Smart
The driver provides overwatch of the cm behavior by maintaining a set of rx I/O statistics. This information is also used in later updating of the cm statistics buffer. Link: https://lore.kernel.org/r/20210816162901.121235-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24scsi: lpfc: Add support for the CM frameworkJames Smart
Complete the enablement of the cm framework feature in the adapter. Perform the following: - Detect the presence of the congestion management framework feature. When the cm framework is present: - Issue the SET_FEATURE command to enable the feature. - Register the cm statistics buffer with the adapter. - Read the cm enablement buffer to determine the cm framework state for cm management. When cm management is enabled: - Monitor all FPIN and congestion signalling events, incrementing counters. - Regularly sync with the adapter to communicate congestion events and to receive an rx request limit. - Monitor requests for rx data and ensure that no more than the adapter prescribed limit is issued on the link. If the limit is exceeded, SCSI and/or NVMe traffic is temporarily suspended. - Maintain the minute, hourly, daily statistics buffer. - Monitor for congestion enablement change events, causing a reread of the enablement buffer and acting on any change in enablement. And: - Add teardown logic, including buffer deregistration, on adapter detachment or reset. Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>