summaryrefslogtreecommitdiff
path: root/drivers/scsi/lpfc
AgeCommit message (Collapse)Author
2019-10-09scsi: lpfc: Make function lpfc_defer_pt2pt_acc staticzhengbin
Fix sparse warnings: drivers/scsi/lpfc/lpfc_nportdisc.c:290:1: warning: symbol 'lpfc_defer_pt2pt_acc' was not declared. Should it be static? Link: https://lore.kernel.org/r/1570183477-137273-1-git-send-email-zhengbin13@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Dick Kennedy <dick.kennedy@broadcom.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Update lpfc version to 12.4.0.1James Smart
Update lpfc version to 12.4.0.1 Link: https://lore.kernel.org/r/20190922035906.10977-21-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: cleanup: remove unused fcp_txcmlpq_cntJames Smart
Local variable fcp_txcmplq_cnt is initialized to 0 and then displayed in lpfc driver message 0387. Presumed residual (or unused) code from previous commit. Removed fcp_txcmplq_cnt. Link: https://lore.kernel.org/r/20190922035906.10977-20-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Complete removal of FCoE T10 PI support on SLI-4 adaptersJames Smart
T10 PI support on SLI-4-based FCoE adapters is not supported. A prior commit in the 12.4.0.0 stream added device recognition that would prevent T10 PI enablement. However, it didn't contain a complete device list. Thus some SLI-4 FCoE adapters still had T10 PI enabled. Fix by expanding the device list that identifies FCoE devices. Link: https://lore.kernel.org/r/20190922035906.10977-19-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Update async event loggingJames Smart
This patch updates ACQE handling for: - an EEPROM failure error reported by the adapter. - ensures that all data for any ACQE, recognized or not, is logged. - Given that all data is now logged unconditionally, the default case (unrecognized) data can be reduced. Link: https://lore.kernel.org/r/20190922035906.10977-18-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix list corruption detected in lpfc_put_sgl_per_hdwqJames Smart
In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool before any associated sgl or cmd and rsp buffers are returned via their respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf structure is reused quickly, there may be a race condition between release of old and association of new resources. Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp buffer lists before releasing the lpfc_io_buf structure for re-use. Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix hdwq sgl locks and irq handlingJames Smart
Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard deadlocks were seen around lpfc_scsi_prep_cmnd(). Fix by converting the locks to irqsave/irqrestore. Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix spinlock_irq issues in lpfc_els_flush_cmd()James Smart
While reviewing the CT behavior, issues with spinlock_irq were seen. The driver should be using spinlock_irqsave/irqrestore in the els flush routine. Changed to spinlock_irqsave/irqrestore. Link: https://lore.kernel.org/r/20190922035906.10977-15-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix list corruption in lpfc_sli_get_iocbqJames Smart
After study, it was determined there was a double free of a CT iocb during execution of lpfc_offline_prep and lpfc_offline. The prep routine issued an abort for some CT iocbs, but the aborts did not complete fast enough for a subsequent routine that waits for completion. Thus the driver proceeded to lpfc_offline, which releases any pending iocbs. Unfortunately, the completions for the aborts were then received which re-released the ct iocbs. Turns out the issue for why the aborts didn't complete fast enough was not their time on the wire/in the adapter. It was the lpfc_work_done routine, which requires the adapter state to be UP before it calls lpfc_sli_handle_slow_ring_event() to process the completions. The issue is the prep routine takes the link down as part of it's processing. To fix, the following was performed: - Prevent the offline routine from releasing iocbs that have had aborts issued on them. Defer to the abort completions. Also means the driver fully waits for the completions. Given this change, the recognition of "driver-generated" status which then releases the iocb is no longer valid. As such, the change made in the commit 296012285c90 is reverted. As recognition of "driver-generated" status is no longer valid, this patch reverts the changes made in commit 296012285c90 ("scsi: lpfc: Fix leak of ELS completions on adapter reset") - Modify lpfc_work_done to allow slow path completions so that the abort completions aren't ignored. - Updated the fdmi path to recognize a CT request that fails due to the port being unusable. This stops FDMI retries. FDMI will be restarted on next link up. Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix host hang at boot or slow bootJames Smart
Scenarios were seen where a host hung when the system booted or the host was very slow in booting. The link would not come up and no luns were visible to the host. After investigation, this was found to be due to the introduction of a new ACQE that adapter may generate to report a adapter hw warning. The ACQE was delivered to the driver very early in adapter initialization, when the driver did not expect command completion. As part of handling this unexpected interrupt the an EQEs are consumed and discarded and the EQ rearmed. The issue is the CQ that cause the EQE and thus the interrupt was not processed and the CQ was left unarmed. Meaning it would no longer generate a new interrupt condition. Subsequent mailbox commands used to initialize the adapter use the same CQ, and as there was no completion interrupt generated, the driver never saw the mailbox commands complete and it would wait long command timeouts. Fix by having the early flush routine also process the related CQ and rearm the CQ. Link: https://lore.kernel.org/r/20190922035906.10977-13-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix coverity errors on NULL pointer checksJames Smart
Coverity flagged several scenarios where checking of null pointer values wasn't consistent. Fix the code to that be consistent on checking. Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix NVMe ABTS in response to receiving an ABTSJames Smart
When the port, running as a nvme target, receives an ABTS, it submits commands to the adapter to Abort i/o outstanding in the adapter. The Abort command formatting routine left a command field set to zero, which instructs the adapter to generate an ABTS on the wire as part of cleaning up the I/O. This is common operation for an initiator, but not for a target. Fix the driver to check whether an ABTS had been received for the I/O, and if so, change the Abort command formatting so that the ABTS generation is disabled (IA=1). No need to ABTS it when the other side already has. Also refactored the code such that there is a single routine being used for nvme or nvmet ABORT requests, and IA is an argument. Link: https://lore.kernel.org/r/20190922035906.10977-11-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix discovery failures when target device connectivity bouncesJames Smart
An issue was seen discovering all SCSI Luns when a target device undergoes link bounce. The driver currently does not qualify the FC4 support on the target. Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is that the target will reject the PRLI if it is not supported. If a PRLI times out, the driver will retry. The driver will not proceed with the device until both SCSI and NVMe PRLIs are resolved. In the failure case, the device is FCP only and does not respond to the NVMe PRLI, thus initiating the wait/retry loop in the driver. During that time, a RSCN is received (device bounced) causing the driver to issue a GID_FT. The GID_FT response comes back before the PRLI mess is resolved and it prematurely cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE state. Discovery with the target never completes or resets. Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes, thereby restarting the discovery process for the node. Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix GPF on scsi command completionJames Smart
Faults are seen with RIP of lpfc_scsi_cmd_iocb_cmpl(). The failure is when lpfc_update_status is being called as part of the completion. After debugging, it was seen the issue was the shost pointer that the driver derived from the scsi cmd. The crash showed the cmd->device pointer being bogus, which is likely as the scsi devices were offlined prior. The bogus device pointer caused subsequent pointers derived from the location, specifically the vport, to be bogus. Fix by adjusting the calling sequence to pass in the vport rather than having to derive it from the cmd structure. Link: https://lore.kernel.org/r/20190922035906.10977-9-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix locking on mailbox command completionJames Smart
Symptoms were seen of the driver not having valid data for mailbox commands. After debugging, the following sequence was found: The driver maintains a port-wide pointer of the mailbox command that is currently in execution. Once finished, the port-wide pointer is cleared (done in lpfc_sli4_mq_release()). The next mailbox command issued will set the next pointer and so on. The mailbox response data is only copied if there is a valid port-wide pointer. In the failing case, it was seen that a new mailbox command was being attempted in parallel with the completion. The parallel path was seeing the mailbox no long in use (flag check under lock) and thus set the port pointer. The completion path had cleared the active flag under lock, but had not touched the port pointer. The port pointer is cleared after the lock is released. In this case, the completion path cleared the just-set value by the parallel path. Fix by making the calls that clear mbox state/port pointer while under lock. Also slightly cleaned up the error path. Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix device recovery errors after PLOGI failuresJames Smart
When target-side fault injections are made, the driver isn't reconnecting to the remote port. The driver is logging "2753" error messages which state: "PLOGI failure DID:1B2400 Status:x3/xf0240008" The failures status is indicating a Illegal field error, which points to the Temporary RPI field being used for the ELS. This error typically means the driver used an RPI that was already registered (shouldn't be registered if using it in this context). Study has found that if the driver were in discovery attempts and encountered an error, it wouldn't flag the temporary rpi in error. Yet the rpi was released for reallocation in these error paths and another ELS could allocate the rpi. In the failure situation a retry was done on an ELS that had encountered an error, and as the rpi wasn't marked in error, the ELS reused the rpi it originally allocated. But that rpi had been allocated by a different ELS issued after the original error and before the retry attempt. The different ELS had succeeded and the RPI was registered. Fix by marking the rpi state for the node to be in error, aka as needing reallocation, upon an error in the els processing. Error state marking is always done prior to release back to the internal rpi free list, which the driver wasn't doing in cases prior. Also enhanced some of the logging to help in the next case of problem troubleshooting. Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix rpi release when deleting vportJames Smart
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp pointer. However, further testing has shown that this change causes a later state change to occasionally be skipped, which results in a reference count never being decremented thus the rpi is never released, which causes a vport delete to never succeed. Revise the fix in the prior patch to no longer null the ndlp. Instead the RELEASE_RPI flag is set which will drive the release of the rpi. Given the new code was added at a deep indentation level, refactor the code block using a new routine that avoids the indentation issues. Fixes: 9b1640686470 ("scsi: lpfc: Fix use-after-free mailbox cmd completion") Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix NVME io abort failures causing hangsJames Smart
The nvme-fc transport may call to abort an io on controller reset. If the driver is out of resources to issue an abort command, it just gives up and does nothing. The transport expects the lldd to always be able to terminate an io it has issued. At that point, the controller hangs waiting for aborted ios to be returned. Note: flaged by "6136" and "6176" error messages. Root issue was the adapter mis-allocated the number resources it allocated for command entries for the adapter. Convert the driver to allocate command resources based on the number of xris supported by the FC port - 1 resource for the original command and 1 resource for the abort request. Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix miss of register read failure checkJames Smart
Coverity flagged missing status check on register read that flags a poisoned data return value. Add checking of register read status. Link: https://lore.kernel.org/r/20190922035906.10977-4-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix premature re-enabling of interrupts in lpfc_sli_host_downJames Smart
Use of spin_lock_irq may re-enable interrupts prematurely. Convert to spin_lock. Note: code is under the phba->hba_lock which has been locked with irqsave. Link: https://lore.kernel.org/r/20190922035906.10977-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30scsi: lpfc: Fix pt2pt discovery on SLI3 HBAsJames Smart
After exchanging PLOGI on an SLI-3 adapter, the PRLI exchange failed. Link trace showed the port was assigned a non-zero n_port_id, but didn't use the address on the PRLI. The assigned address is set on the port by the CONFIG_LINK mailbox command. The driver responded to the PRLI before the mailbox command completed. Thus the PRLI response used the old n_port_id. Defer the PRLI response until CONFIG_LINK completes. Link: https://lore.kernel.org/r/20190922035906.10977-2-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-21Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi, lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates. The only core change this time around is the addition of request batching for virtio. Since batching requires an additional flag to use, it should be invisible to the rest of the drivers" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (264 commits) scsi: hisi_sas: Fix the conflict between device gone and host reset scsi: hisi_sas: Add BIST support for phy loopback scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation scsi: hisi_sas: Remove some unused function arguments scsi: hisi_sas: Remove redundant work declaration scsi: hisi_sas: Remove hisi_sas_hw.slot_complete scsi: hisi_sas: Assign NCQ tag for all NCQ commands scsi: hisi_sas: Update all the registers after suspend and resume scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device scsi: hisi_sas: Remove sleep after issue phy reset if sas_smp_phy_control() fails scsi: hisi_sas: Directly return when running I_T_nexus reset if phy disabled scsi: hisi_sas: Use true/false as input parameter of sas_phy_reset() scsi: hisi_sas: add debugfs auto-trigger for internal abort time out scsi: virtio_scsi: unplug LUNs when events missed scsi: scsi_dh_rdac: zero cdb in send_mode_select() scsi: fcoe: fix null-ptr-deref Read in fc_release_transport scsi: ufs-hisi: use devm_platform_ioremap_resource() to simplify code scsi: ufshcd: use devm_platform_ioremap_resource() to simplify code scsi: hisi_sas: use devm_platform_ioremap_resource() to simplify code scsi: ufs: Use kmemdup in ufshcd_read_string_desc() ...
2019-09-07scsi: lpfc: Fix reset recovery paths that are not recoveringJames Smart
A recent patch unconditionally marks the hba as in error as part of resetting the adapter. The driver flow that called the adapter reset was a recovery path, which expects the adapter to not be in an error state in order to finish the recovery. Given the new error state being set, the recovery fails and the adapter is left in limbo. Revise the adapter reset routine so that it will only mark the adapter in error if it was unable to reset the adapter. Fixes: 8c24a4f643ed ("scsi: lpfc: Fix crash due to port reset racing vs adapter error handling") Link: https://lore.kernel.org/r/20190903215441.10490-1-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-07scsi: lpfc: Convert existing %pf users to %psSakari Ailus
Convert the remaining %pf users to %ps to prepare for the removal of the old %pf conversion specifier support. Fixes: 323506644972 ("scsi: lpfc: Migrate to %px and %pf in kernel print calls") Link: https://lore.kernel.org/r/20190904160423.3865-1-sakari.ailus@linux.intel.com Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-29scsi: lpfc: fix 12.4.0.0 GPF at bootJames Smart
The 12.4.0.0 patch that merged WQ/CQ pairs into single per-cpu pair contained a bug: a local variable was set to the queue pair by index. This should have allowed the local variable to be natively used. Instead, the code reused the index relative to the local variable, obtaining a random pointer value that when used eventually faulted the system Convert offending code to use local variable. Fixes: c00f62e6c546 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair") Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-29scsi: lpfc: Raise config max for lpfc_fcp_mq_threshold variableJames Smart
Raise the config max for lpfc_fcp_mq_threshold variable to 256. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> CC: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-29scsi: lpfc: Remove bg debugfs buffersJames Smart
Capturing and downloading dif command data and dif data was done a dozen years ago and no longer being used. Also creates a potential security hole. Remove the debugfs buffer for dif debugging. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> CC: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com> CC: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-29scsi: lpfc: Resolve checker warning for lpfc_new_io_buf()James Smart
Per Dan Carpenter: The patch d79c9e9d4b3d: "scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware." from Aug 14, 2019, leads to the following static checker warning: drivers/scsi/lpfc/lpfc_init.c:4107 lpfc_new_io_buf() error: not allocating enough data 784 vs 768 There was no need to compare sizes nor to allocate size based on a define. Change allocation to use actual structure length Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> CC: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Update lpfc version to 12.4.0.0James Smart
Update lpfc version to 12.4.0.0 Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pairJames Smart
Currently, each hardware queue, typically allocated per-cpu, consists of a WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2 WQ/CQ pairs will exist for the hardware queue. Separate queues are unnecessary. The current implementation wastes memory backing the 2nd set of queues, and the use of double the SLI-4 WQ/CQ's means less hardware queues can be supported which means there may not always be enough to have a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their own WQ/CQ. Rework the implementation to use a single WQ/CQ pair by both protocols. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Add NVMe sequence level error recovery supportJames Smart
FC-NVMe-2 added support for sequence level error recovery in the FC-NVME protocol. This allows for the detection of errors and lost frames and immediate retransmission of data to avoid exchange termination, which escalates into NVMeoFC connection and association failures. A significant RAS improvement. The driver is modified to indicate support for SLER in the NVMe PRLI is issues and to check for support in the PRLI response. When both sides support it, the driver will set a bit in the WQE to enable the recovery behavior on the exchange. The adapter will take care of all detection and retransmission. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.James Smart
Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI to contain the exchanges Scatter/Gather List. This caps the number of SGL elements that can be in the SGL. There are not extensions to extend the list out of the 2 pages. The G7 hardware adds a SGE type that allows the SGL to be vectored to a different scatter/gather list segment. And that segment can contain a SGE to go to another segment and so on. The initial segment must still be pre-registered for the XRI, but it can be a much smaller amount (256Bytes) as it can now be dynamically grown. This much smaller allocation can handle the SG list for most normal I/O, and the dynamic aspect allows it to support many MB's if needed. The implementation creates a pool which contains "segments" and which is initially sized to hold the initial small segment per xri. If an I/O requires additional segments, they are allocated from the pool. If the pool has no more segments, the pool is grown based on what is now needed. After the I/O completes, the additional segments are returned to the pool for use by other I/Os. Once allocated, the additional segments are not released under the assumption of "if needed once, it will be needed again". Pools are kept on a per-hardware queue basis, which is typically 1:1 per cpu, but may be shared by multiple cpus. The switch to the smaller initial allocation significantly reduces the memory footprint of the driver (which only grows if large ios are issued). Based on the several K of XRIs for the adapter, the 8KB->256B reduction can conserve 32MBs or more. It has been observed with per-cpu resource pools that allocating a resource on CPU A, may be put back on CPU B. While the get routines are distributed evenly, only a limited subset of CPUs may be handling the put routines. This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because all the resources are being put on a limited subset of CPUs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Add MDS driver loopback diagnostics supportJames Smart
Added code to support driver loopback with MDS Diagnostics. This style of diagnostics passes frames from the fabric to the driver who then echo them back out the link. SEND_FRAME WQEs are used to transmit the frames. Added the SOF and EOF field location definitions for use by SEND_FRAME. Also ensure that enable_mds_diags is a RW parameter. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Add first and second level hardware revisions to sysfs reportingJames Smart
To aid better hardware detection when there are issues, report the first and second level hardware revisions from the READ_REV command. Add the elements to the existing hardware id string. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Migrate to %px and %pf in kernel print callsJames Smart
In order to see real addresses, convert %p with %px for kernel addresses and replace %p with %pf for functions. While converting, standardize on "x%px" throughout (not %px or 0x%px). Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Add simple unlikely optimizations to reduce NVME latencyJames Smart
While performing code review, several relatively simple optimizations can be done in the fast path. Add these optimizations (unlikely designators). Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix coverity warningsJames Smart
Running on Coverity produced the following errors: - coding style (indentation) - memset size mismatch errors note: comment cases where it is purposely a mismatch Fix the errors. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix nvme first burst module parameter descriptionJames Smart
modinfo for lpfc_nvme_enable_fb is incorrect. FirstBurst on lpfc target is not fully supported. Update the attribute description Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix BlockGuard enablement on FCoE adaptersJames Smart
The driver is allowing the user to change lpfc_enable_bg while loading the driver against a FCoE adapter. This is not supported. No check is made for the adapter type when applying the blockguard enablement value. Fix by verifying the adapter type before setting the enablement flag. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix reported physical link speed on a disabled trunked linkJames Smart
GetTrunkInfo is displaying an incorrect link speed when the link is a trunk and the link has gone down. The driver is not clearing the logical speed as part of the link down transition. Fix by setting the logical speed to UNKNOWN SPEED when the link goes down. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix Max Frame Size value shown in fdmishow outputJames Smart
Max Frame Size value is shown as 34816 in fdmishow from Switch. The driver uses bbRcvSize in common service param which is obtained from the READ_SPARM mailbox command. The bbRcvSize field which is displayed is a three nibble field but the driver is printing a full four nibbles. Fix by masking off the upper nibble. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix upcall to bsg done in non-success casesJames Smart
The scsi transport fc bsg interface does not expect the bsg_job_done() callback to be done if the bsg request call returns failure. Several of the HST_VENDOR cases in the driver unconditionally call bsg_job_done() regardless of the returning value. Fix the code to only call bsg_job_done() if the call to lpfc_bsg_request() will return success. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix sli4 adapter initialization with MSIJames Smart
When forcing the use of MSI (vs MSI-X) the driver is crashing in pci_irq_get_affinity. The driver was not using the new pci_alloc_irq_vectors interface in the MSI path. Fix by using pci_alloc_irq_vectors() with PCI_RQ_MSI in the MSI path. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix nvme sg_seg_cnt display if HBA does not support NVMEJames Smart
The driver is currently reporting a non-zero nvme sg_seg_cnt value of 256 when nvme is disabled. It should be zero. Fix by ensuring the value is cleared. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix nvme target mode ABTSing a received ABTSJames Smart
If an unsolicited ABTS was received, the driver looks up the exchange it references. It it does various searches looking for the exchange context. When one is eventually matched and it is associated with an XRI context, the driver sends an ABORT WQE to terminate the exchange. Current code looks at whether the transport had taken action on the XRI yet or not (no action if set to LPFC_NVMET_STE_RCV; action if non-LPFC_NVMET_STE_RCV). Based on action or not one of two (sol vs unsol) issue abort routines are called. The unsol version cheats and transmits a sequence containing an ABTS with no interaction with the adapter. The sol version issues an Abort WQE and lets the adapter manage whether the ABTS is sent to not. The issue is the unsol version is sending ABTS unconditionally for the exchange that received the ABTS. It's unnecessary. Remove the conditional and just call the adapter command-based routine to let the adapter manage the ABTS. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix hang when downloading fw on port enabled for nvmeJames Smart
As part of firmware download, the adapter is reset. On the adapter the reset causes the function to stop and all outstanding io is terminated (without responses). The reset path then starts teardown of the adapter, starting with deregistration of the remote ports with the nvme-fc transport. The local port is then deregistered and the driver waits for local port deregistration. This never finishes. The remote port deregistrations terminated the nvme controllers, causing them to send aborts for all the outstanding io. The aborts were serviced in the driver, but stalled due to its state. The nvme layer then stops to reclaim it's outstanding io before continuing. The io must be returned before the reset on the controller is deemed complete and the controller delete performed. The remote port deregistration won't complete until all the controllers are terminated. And the local port deregistration won't complete until all controllers and remote ports are terminated. Thus things hang. The issue is the reset which stopped the adapter also stopped all the responses that would drive i/o completions, and the aborts were also stopped that stopped i/o completions. The driver, when resetting the adapter like this, needs to be generating the completions as part of the adapter reset so that I/O complete (in error), and any aborts are not queued. Fix by adding flush routines whenever the adapter port has been reset or discovered in error. The flush routines will generate the completions for the scsi and nvme outstanding io. The abort ios, if waiting, will be caught and flushed as well. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix too many sg segments spamming in kernel logJames Smart
This issue is specific to SLI-3 adapters, specifically when DIF is used. Once seen, this message floods the logs: 9064 BLKGRD: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg The driver, upon detecting an error such as too many elements in an sglist, misrepresents the error by treating it as a temporary resource issue by returning MLQUEUE_HOST_BUSY. In these cases, no retry will fix it and it should have been a hard error. The repeated retry was causing the spamming of the log. As for the initial reason of why an I/O encountered this issue at all is not clear as parameters set by the driver should have avoided this. The dm multipath maintainer has been notified of the issue. Fix by changing the return code for the dma mapping routines to indicate cases that are not retryable and return DID_ERROR on those cases. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix crash due to port reset racing vs adapter error handlingJames Smart
If the adapter encounters a condition which causes the adapter to fail (driver must detect the failure) simultaneously to a request to the driver to reset the adapter (such as a host_reset), the reset path will be racing with the asynchronously-detect adapter failure path. In the failing situation, one path has started to tear down the adapter data structures (io_wq's) while the other path has initiated a repeat of the teardown and is in the lpfc_sli_flush_xxx_rings path and attempting to access the just-freed data structures. Fix by the following: - In cases where an adapter failure is detected, rather than explicitly calling offline_eratt() to start the teardown, change the adapter state and let the later calls of posted work to the slowpath thread invoke the adapter recovery. In essence, this means all requests to reset are serialized on the slowpath thread. - Clean up the routine that restarts the adapter. If there is a failure from brdreset, don't immediately error and leave things in a partial state. Instead, ensure the adapter state is set and finish the teardown of structures before returning. - If in the scsi host reset handler and the board fails to reset and restart (which can be due to parallel reset/recovery paths), instead of hard failing and explicitly calling offline_eratt() (which gets into the redundant path), just fail out and let the asynchronous path resolve the adapter state. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix deadlock on host_lock during cable pullsJames Smart
During cable pull testing a deadlock was seen between lpfc_nlp_counters() vs lpfc_mbox_process_link_up() vs lpfc_work_list_done(). They are all waiting on the shost->host_lock. Issue is all of these cases raise irq when taking out the lock but use spin_unlock_irq() when unlocking. The unlock path is will unconditionally re-enable interrupts in cases where irq state should be preserved. The re-enablement allowed the other paths to execute which then causes the deadlock. Fix by converting the lock/unlock to irqsave/irqrestore. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-19scsi: lpfc: Fix error in remote port address changeJames Smart
In a test with high nvme remote port counts connected via a multi-hop FC switch config where switches were systematically reset (e.g. fabric partitioning and re-establishment), the nvme remote ports would switch addresses based on the switch reconfiguration events. The driver would get into a situation where the nvme port changed address, PLOGI and PRLI would succeed nvme transport registration occurred, but subsequent LS requests by the nvme subsystem failed due to a bad ndlp state and connectivity to the device failed. The driver hit a race condition on multiple devices that address swapped simultaneously. In cases where the driver notices the remote port structure came back as the same value as previously (meaning a nvme_rport structure was re-enabled and did not go through devloss_tmo/connect_tmo_failures on all controllers) the driver would unconditionally exit assuming the ndlp information was correct. But, if the ndlp's had been swapped, the ndlp had stale port state information, which when used by the LS request commands, would fail the commands. Fix by checking whether a node swap had occurred, and only exit if no ndlp swap had occurred. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>